LEDGAR/News20 results

# LEDGAR/News20 results ## hyperparameter selection learning_rate: 0.1 0.03 0.01 0.003 0.001 num_filter_per_size: 128 256 384 512 1024 embed_dropout: 0 0.2 0.4 encoder_dropout: 0 0.2 0.4 ``` # data training_file: data/news20/train.txt val_file: data/news20/val.txt test_file: data/news20/test.txt data_name: news20 min_vocab_freq: 1 max_seq_length: 512 include_test_labels: false remove_no_label_data: false add_special_tokens: false # train seed: 1337 epochs: 200 batch_size: 16 optimizer: adam learning_rate: 0.001 momentum: 0 weight_decay: 0 patience: 10 early_stopping_metric: RP@5 shuffle: true # Please update the repo to the latest LibMultiLabel lr_scheduler: ReduceLROnPlateau scheduler_config: factor: 0.9 patience: 9 min_lr: 0.0001 # eval eval_batch_size: 16 monitor_metrics: ['Macro-F1', 'Micro-F1'] val_metric: Micro-F1 # model model_name: KimCNN loss_function: cross_entropy init_weight: kaiming_uniform network_config: activation: relu embed_dropout: ['grid_search', [0, 0.2, 0.4]] encoder_dropout: ['grid_search', [0, 0.2, 0.4]] filter_sizes: [2, 4, 8] num_filter_per_size: 128 # pretrained vocab / embeddings vocab_file: null embed_file: glove.6B.300d normalize_embed: false # hyperparamter search search_alg: basic_variant embed_cache_dir: .vector_cache num_samples: 1 scheduler: null # Uncomment the following lines to enable the ASHAScheduler. # See the documentation here: https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#asha-tune-schedulers-ashascheduler #scheduler: #time_attr: training_iteration #max_t: 50 # the maximum epochs to run for each config (parameter R in the ASHA paper) #grace_period: 10 # the minimum epochs to run for each config (parameter r in the ASHA paper) #reduction_factor: 3 # reduce the number of configuration to floor(1/reduction_factor) each round of successive halving (called rung in ASHA paper) #brackets: 1 # number of brackets. A smaller bracket index (parameter s in the ASHA paper) means earlier stopping (i.e., less total resources used) # other parameters specified in main.py::get_args checkpoint_path: null cpu: false data_workers: 4 eval: false label_file: null limit_train_batches: 1.0 limit_val_batches: 1.0 limit_test_batches: 1.0 metric_threshold: 0.5 result_dir: runs save_k_predictions: 0 silent: true val_size: 0.2 lr_scheduler: null scheduler_config: null ``` ## news20 results | Model + Optimizer/Loss | test_Macro-F1 | test_Micro-F1 | | ----------------------- | ---------------- | --------------- | | **KimCNN + SGD/MSE loss**| 0.7962 | 0.8035 | | **KimCNN + Adam/Cross Entropy loss**| **0.8381** | **0.8435** | | **BERT (tuned) + Adam/Cross Entropy loss** | **0.849** | **0.856** | | **Linear** | 0.846 | 0.853 | ### hyperparameter search | optimizer | learning_rate | num_filter_per_size | embed_dropout | encoder_dropout | test_Macro-F1 | test_Micro-F1 | | - | - | - | - | - | - | - | | **adam** | 0.001 | 128 | 0.2 | 0 | 0.8253 | 0.8286 | | **adam** | **0.001** | **256** | **0** | **0** | **0.8381** | **0.8435** | | **adam** | 0.001 | 384 | 0.4 | 0.2 | 0.8142 | 0.8214 | | **adam** | 0.001 | 512 | 0 | 0 | 0.8334 | 0.8386 | | **adam** | 0.001 | 1024 | 0.2 | 0.2 | 0.7931 | 0.7955 | | **sgd** | 0.001 | 128 | 0.2 | 0 | 0.7860 | 0.7917 | | **sgd** | 0.001 | 256 | 0.4 | 0 | 0.7913 | 0.7950 | | **sgd** | 0.001 | 384 | 0 | 0.4 | 0.7908 | 0.7973 | | **sgd** | 0.001 | 512 | 0.2 | 0 | 0.7912 | 0.7963 | | **sgd** | **0.001** | **1024** | **0.2** | **0** | **0.7962** | **0.8035** | ## LEDGAR results | Model + Optimizer/Loss | Macro-F1 | Micro-F1 | | ----------------------- | ---------------- | --------------- | | **KimCNN + SGD/MSE loss**| **0.8128** | **0.8702** | | **KimCNN + Adam/Cross Entropy loss**| 0.7705 | 0.8409 | | **BERT (tuned) + Adam/Cross Entropy loss** | **0.807** | **0.870** | | **Linear** | 0.800 | 0.864 | ### hyperparameter search | optimizer | learning_rate | num_filter_per_size | embed_dropout | encoder_dropout | test_Macro-F1 | test_Micro-F1 | | - | - | - | - | - | - | - | | **adam** | **0.001** | **128** | **0.2** | **0.2** | **0.7705** | **0.8409** | | **adam** | 0.001 | 256 | 0.2 | 0.4 | 0.7550 | 0.8317 | | **adam** | 0.001 | 384 | 0.4 | 0 | 0.7563 | 0.8314 | | **adam** | 0.001 | 512 | 0.2 | 0 | 0.7561 | 0.8323 | | **adam** | 0.001 | 1024 | 0.4 | 0 | 0.7656 | 0.8361 | | **sgd** | 0.001 | 128 | 0 | 0.2 | 0.7788 | 0.8572 | | **sgd** | 0.001 | 256 | 0 | 0.2 | 0.7966 | 0.8615 | | **sgd** | 0.001 | 384 | 0 | 0 | 0.8078 | 0.8677 | | **sgd** | 0.001 | 512 | 0 | 0.2 | 0.7994 | 0.8650 | | **sgd** | **0.001** | **1024** | **0** | **0** | **0.8128** | **0.8702** |