First Attempt - HackMD

# First Attempt - Without any preprocessing - Parameters like these ```bash "task": "MS", "lstm_layer": 1, "dropout": 0.05, "train_epochs": 1, "batch_size": 64, "patience": 3, "lr": 1e-4, "lr_adjust": "type1", ``` ### Result Score: -55.30953 --- # Problems - Slow Training: It take about 6 hours to train all the turbine, and the evaluation set does not really reflect the final score - Low Accuracy: Because the first attempt is for speed only, we need to improve the accuracy --- # 70% Faster … sort of So we split the training work into 4 partition on 4 different Colab machine (manually). So it’s roughly 70% faster now considering the manual operation. --- # Grid search we use grid search method to try all the combination on turbine 0 ~ 10 ```python lstm_layer = [2,3,4] drop_out = [0,0.05,0.1,0.3] batch_size=[16,32,64] lr = [1e-3, 1e-4, 1e-5] ``` And found that baseline code default yield the best result ### Result -42.3428 (Big improvement) --- # GRU → LSTM - The mechnism of LSTM and GRU is similar (GRU is more like a subcategory of LSTM) - But GRU has two gates and LSTM has three gates to decide the in-n-out of the information - Since LSTM has more gate, we assume that LSTM will be more accurate - We test it without parameter chage. The accuracy are around the same, but spend more time on training for LSTM - GRU is more efficient --- # Increase Train size Train: Test: Validation = 214:15:31 → 235:10:0 --- # Pre-processing Experiments - Fill all the unqualified data with previous and next qualified data average of the same turbine Result: -42.18417 (New Best) - Fill with average of average of ( data from last normal day and the next normal day at the same time stamp Result: -42.37965 --- # Add more features to Training - We tried to take Time stamp and Day into training with the second processing method Result: -42.3459 --- # What we have learned - Splitting the training data make the speed of traing faster than before. We now know how to deal with the dataset with large scale. - An appropriate Model is better than a cool Model - Pre-processing Experiments is very very important since that "garbage in garbage out".