# GA3: Iterative improvement GRU ## Baseline ![](https://i.imgur.com/zr7d6s8.png) loss: 33.6489 - mae: 33.6489 - mse: 3142.6521 - val_loss: 34.4024 - val_mae: 34.4024 - val_mse: 3118.5088 ## learning rate /10 ![](https://i.imgur.com/7ELxWTw.png) loss: 35.8021 - mae: 35.8021 - mse: 3852.7041 - val_loss: 38.0383 - val_mae: 38.0383 - val_mse: 4034.3892 ## epochs to 40 ![](https://i.imgur.com/jytgbNt.png) loss: 30.4760 - mae: 30.4760 - mse: 2664.6270 - val_loss: 31.7697 - val_mae: 31.7697 - val_mse: 2747.3564 ## Make more complex, since the model is not that powerful, while te curves are clean ## Hidden RNN to 16 ![](https://i.imgur.com/jRynGV5.png) loss: 30.2478 - mae: 30.2478 - mse: 2621.0774 - val_loss: 31.6304 - val_mae: 31.6304 - val_mse: 2710.1279 ## Add dense layer 16 ![](https://i.imgur.com/TT2YlLh.png) loss: 30.1622 - mae: 30.1622 - mse: 2611.0139 - val_loss: 31.5466 - val_mae: 31.5466 - val_mse: 2672.4226 ## Increase windows size to 24 (one day) ![](https://i.imgur.com/xFddCwE.png) loss: 30.0298 - mae: 30.0298 - mse: 2583.3704 - val_loss: 31.7806 - val_mae: 31.7806 - val_mse: 2685.6548 ## Window size back to 4, change activation function to 'swish' ![](https://i.imgur.com/7XIxo0C.png) loss: 30.3862 - mae: 30.3862 - mse: 2635.8213 - val_loss: 32.7821 - val_mae: 32.7821 - val_mse: 2852.8528 ## Activation on dense layer back to relu ![](https://i.imgur.com/i1l46Bv.png) loss: 30.1691 - mae: 30.1691 - mse: 2625.6216 - val_loss: 31.5403 - val_mae: 31.5403 - val_mse: 2700.7539 ## Add Dense(16) in front (feature expansion) ![](https://i.imgur.com/AQisXNt.png) loss: 31.0327 - mae: 31.0327 - mse: 2717.8989 - val_loss: 32.1549 - val_mae: 32.1549 - val_mse: 2676.1357 ## Change to feature reduction (8) with increased Window size (24) ![](https://i.imgur.com/ivpOH6y.png) loss: 31.5027 - mae: 31.5027 - mse: 2780.7532 - val_loss: 32.5979 - val_mae: 32.5979 - val_mse: 2816.5869 ## RNN to 8 ![](https://i.imgur.com/wAkDCTc.png) loss: 30.6311 - mae: 30.6311 - mse: 2683.5879 - val_loss: 31.9729 - val_mae: 31.9729 - val_mse: 2842. ## Remove first dense layer. RNN back to 16 ## Activation relu ## Activation tanh Very slow learner ## Increase learning rate tanh is shit 🤣 ## Window size 24, play with RNN RNN = 4 -> not bad, but not good (+- 32) RNN = 16 -> 31.8 RNN = 32 -> 31.72 RNN = 64 -> 31.47 😍 winner winner chicken dinner RNN = 128 -> 32 ## Try out swish again 31.65 Tanh 32.81.... Tanh only after GRU layer -> +- 31.80 ## We re-executed RNN 64 and got 32.5 as last value, so maybe we are not observing in the right way ## remove last dense layer 31.45 ## Increase batch size (256) actually looks decent ## L2 can literally go up to 1000 and not be shit. 🥶🥵🥶 Regularization does kinda nothing? No big changes from 0.01 to 1000