# Stage 3 weird stuff
## Reg 0.01 on swish layer


train mae = 27.308105
validation mae = 29.559326
Strangly capped to 400.
## No reg on swish layer


train mae = 27.329678
validation mae = 29.81079
Is able to reach the peaks (no more cap on 400) but overfits more in general.
## No dense layer in the beginning


train mae = 27.710047
validation mae = 30.06289
Is not able to go sub 30 anymore. Cant do complex combinations of features too well?
## Dense layer of 32 in the beginning


train mae = 27.703184
validation mae = 29.692074
No performance increase nor more or less overfitting. The model can simply do nothing with the extra combinations anymore? There are no more sensible feature combinations.
## Window size increased to one week


train mae = 27.52961
validation mae = 29.551836
Similar performance to one day. Overfits a bit less. Maybe it is able to remember on which days the pollution is generally higher?
## Window size increased to two weeks


train mae = 27.841536
validation mae = 29.77808
Learns a lot slower we thing that the large window size pollutes the memory with too much noisy data. That is why it is not really able to learn that much more.
## Window size of 3 days


train mae = 27.845932
validation mae = 29.91356
Funny enough, 3 days is really not good. One week, fine. One day, fine. 3 Days, bad. Why?
## Adding sin and cos


train mae = 27.645061
validation mae = 29.514944
Normally, no improvement noticed. When we widened the dense layer before the GRU, the model improved. We think it was able to combine the larger trends with the sin and the cos, thus improving the model a bit. Because of these combination, for the model, the data became less noisy and the model overfitted less.