Master Thesis: To-Do's

# Master Thesis: To-Do's * [**GitHub, visrepProb**](https://github.com/desastrae/visrepProb) * [**Results, plotted**](https://hackmd.io/@600DW6b8QBqBb1Gv8lZYwg/r1n2R5Yo5) * [**OneDrive, dataset**](https://unisaarlandde-my.sharepoint.com/:f:/g/personal/s8anbori_uni-saarland_de/Eqvk-BGL4sxAjsp700yG4kYBzvakxaZd__LqI4bwrCfdww?e=CwGHmd) * [**OneDrive, Powerpoints**](https://unisaarlandde-my.sharepoint.com/:f:/g/personal/s8anbori_uni-saarland_de/EkMIK5b2tUpMk70ivsOKF08B2BO75t_MMm9jRv6ssW8n3A?e=QSxNwA) ## date - [ ] 2023-01-26 - [ ] 2022-12-15 - [ ] 2022-12-01 - [ ] 2022-10-20 (Proposal/Summary) - [x] 2022-10-06 - [ ] 2022-09-29 - [x] 2022-09-22 - [x] 2022-09-01 - [x] 2022-08-25 - [ ] 2022-08-18 (later) - [ ] 2022-08-11 (later) - [ ] 2022-08-04 (later) - [ ] 2022-07-28 (literature) - [ ] 2022-07-21 (literature) - [x] 2022-07-07 - [x] 2022-06-30 - [x] 2022-06-16 - [ ] 2022-06-09 (later) - [x] 2022-06-02 - [x] 2022-05-30 - [x] 2022-05-05 + 05-12 - [x] 2022-04-28 - [x] 2022-04-21 ## 2023-01-26 - [x] **PLOTS Noise** - noise-types: stay with cam, l33t, swap - [ ] fix error for subjnum **(WIP)** - [ ] create plots for bishift & tense - [ ] Create step-by-step doc - [ ] **BLEU Score** - [ ] Evaluate translations with BLEU score - [ ] find all scripts which are necessary - [hackmd; BLEU score](https://hackmd.io/avL3nS1uRJiZQrUBiI0Zow) - [ ] data: raw, Noise: 10, 20, 40, 80 % (Later in 10% steps?) - [ ] **PLOTS generally** - [ ] crowded/messy, clean up! - [ ] **Proposal** - [ ] *Related Work*: Make *Background & Related Work* - write briefly in own works, not tooo detailed - focus on essential knowledge ## ... ... ## 2022-12-15 [Structure MA Proposal](https://unisaarlandde-my.sharepoint.com/:w:/g/personal/s8anbori_uni-saarland_de/EUmwt2_ZgNtOt4mPQGDDko4Bbzow4CglOd7Y6SfLtbsr6A?e=YVuvbr) ## 2022-12-01 - [x] **Noise** - [x] PLOTS - [x] obtain missing plots - [x] create line-plots for difference between clean and noisy data - [ ] **PROPOSAL** - [ ] **write proposal/summary !!** ## 2022-10-20 - [x] **Evaluation** - [x] check keys in SUBJ task for bugs - [x] create new plots which are more visually pleasant - [x] **Noise** - [x] create python script for noise - use script by [charNMT-noise](https://github.com/ybisk/charNMT-noise/blob/master/scrambler.py) as inspiration - [x] apply noise to test set - [x] **percentage:** 10%, 20%, 40%, 80% - [x] **type:** swap, cam - [x] create encodings - [x] run new noise data on classifier - [ ] **Written Summary (*Sentence-Level Evaluation*)** - [ ] create summary about work & experiments so far, should be half a page long - Intro - Experiments - Results - Analysis ## 2022-10-06 - [x] **Evaluation** - [x] Plots - [x] change order of subplots - [x] add **distribution** of evaluation set in classifier train- and test-set in **title of subplots** ## 2022-09-29 - [x] **Noisy Data** - [x] ~~create noisy data: 10%, 20%, 40%, 80%~~ - [x] noise type (*swap, cam, real, "key"?*): - first one type of noise - if time available, try more noise types - [x] understand *"key"* noise - [x] evaluate trained classifier on noise data - [x] noise type: **swap** - [x] plots: - [x] create one plot? create four plots? - [x] **Pipeline Noise:** 1. ***Create noisy test data*** - [x] Adapt *scrambler.py* -> *scrambler_changed.py* - handle *.npy* files 2. Is amount of noise applied to sentence or dataset? - 4. Create encodings 5. Load & classify encodings 6. Save results for noise amount in one CSV file 7. Create plot(s) - [x] **Classifier** - [x] create pie-charts for OBJ & SUBJ task (train- and test-data separately) - [x] o-Plural - [x] o-Singular - [x] s-Plural - [x] s-Singular - [x] spaCy - [x] identify errors in test sentences - [x] **General** - [x] remove percentage label on bar-charts ## 2022-09-22 - [x] **Noisy Data** - [x] inform about available resources - Paper: [Synthetic and Natural Noise Both Break Neural Machine Translation](https://arxiv.org/abs/1711.02173) - Code: [github, charNMT-noise](https://github.com/ybisk/charNMT-noise) - [x] Try-Out - [x] How much noise in train and test set do we want? - [x] Noise in specific words: - [x] SUBJ-/OBJ_NUMBER: induce noise in subject/object? - [x] **Classifier** - [x] create plots for all 4 scenarios for SUBJ & OBJ task - [x] plot old results behind new results ## 2022-09-01 - rediscussed next steps and how to best approach these (pipeline, noise...) ## 2022-08-25 - [x] **Classifier** - [x] save & load model in code - [x] for model size and layer (**new test set**) - [x] test with 20 sentences (**results should be random**): ~~- 10 sent label *Sing*, 10 sent label *Plur*~~ - 20 sentences of each case for SUBJ & OBJ task - [x] sb - *Sing*, o - *Plur* - [x] sb - *Plur*, o - *Sing* - [x] sb - *Sing*, o - *Sing* - [x] sb - *Plur*, o - *Plur* - [x] **Noisy Data** - [x] .. - [x] **"Pipeline"** (for each layer) 1. Load trained probe classifier 2. Load csv with test sentences 3. Read in raw sentences for testing & create their encodings / embeddings 4. Feed encodings to probe classifer 5. Save results of classifier to a **csv file** ## 2022-08-18 - [x] **Classifier** - [x] (**later**) Create all encodings for all layers, make directory for every class to later draw a distribution ## 2022-08-11 - [x] **Classifier** - [x] create balanced df with pandas (no sorted data needed) - [x] use example given by Marius (**don't stratify!**) - [x] **!! separate train and test data !!** - [x] **New Probing tasks** (later) - [Ideas](https://hackmd.io/@600DW6b8QBqBb1Gv8lZYwg/SkST7rGAc) - [x] **Evaluate Probing Tasks Data** - e.g. *"Die Katze[sing.] frisst die Mäuse[plur.]"* - [x] ~~**Control..**~~ - ~~Is *"Katze"* in Train **and** Test set?~~ - ~~Is *"Katze"* used as Subject **and** Object in the dataset?~~ - ~~Is *"Katze"* used as Singular **and** Plural in the dataset?~~ - [x] Use spaCy - [x] plot ~~4 bars~~ pie chart: | | bar 1| bar 2 | bar 3| bar 4 | | --------- | ---- | ----- | -----| ----- | | **SUBJ** | sing | sing | plu | plu | | **OBJ** | sing | plu | sing | plu | - [x] **Noisy Data** - [x] focus on character permutation (swap & cam) | | set 1 | set 2 | set 3 | set 4 | | -----------| ----- | ------ | ------| ------| | **Train** | clean | clean | noisy | noisy | | **Test** | clean | noisy | clean | noisy | - [x] **Next Steps** - [x] 1) Fix **Classifier** - [x] 2) **Evaluate Probing Tasks Data** - [x] 3) Create **Noisy Data** ## 2022-08-04 - [x] **Classifier** - [x] create balanced df with pandas (no sorted data needed) - [x] use example given by Badr - [x] use balanced data in training and test set - **sklearn: stratify=labels_shuffled** `train_test_split(features_shuffled, labels_shuffled, random_state=42, stratify=labels_shuffled)` - [ ] **Probing tasks** - [x] **!!** think of more tasks but specifically for **German** - [Deutsch_im_Sprachvergleich_2012](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjH--Lbu7r5AhUFhP0HHRqFBIMQFnoECBQQAQ&url=https%3A%2F%2Fids-pub.bsz-bw.de%2Ffiles%2F4925%2FIDS_Jahrbuch_2011_Deutsch_im_Sprachvergleich_2012.pdf&usg=AOvVaw23VTxw6HnkOlj9Xpb0FSU9) - [German NLP](https://github.com/adbar/German-NLP) - [ ] **Language families** (later) - test robustness of models for probing tasks with different language of same family - **e.g.** fr-en tranlation model: **train** classifier on french encodings, **test** on spanish encodings ## 2022-07-28 - [x] **Classifier // Probing tasks** - [x] Draw a balanced distribution for the probing tasks? We only consider 10k out of 120k sentences. - [ ] **!! Literature** - [ ] **create tabular overview and read!!** - [x] ~~**Probing**~~ - [x] ~~think about specific tasks for German, with example~~ ## 2022-07-21 - [x] ~~**Classifier**~~ - [x] ~~text model~~ - ~~not able to extract layer_norm~~ - **no normalization layer in text model** - [x] **Clean up dataset** - [x] Subj_number: Dataset makes trouble, fix this - [x] Remove problematic sentences - line 547: *" Ein Lied geht um die Welt ( Die Joseph-Schmidt-Story ) Sing* - line 6751: *" , 1957 ) und Michel Legrand tätig war . Sing* - **Found the problem: lines starting with > " <** - [x] Use preprocessing in python to identify and remove white space - [x] rewrite them in a new csv/tsv file, encode in utf-8 - [x] **Probing tasks** - [x] Compute performance of majority classifier - See https://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyClassifier.html - [x] Look at other probing tasks at the literature - [x] Probing task for the German model (subj gender prediction? Obj gender prediction? Obj case prediction (dative vs acusative)) - [ ] **Situate your research within the literature** - [ ] How is the performance to other models for each task? - [ ] Are there differences? - [ ] How do you explain the differences between visual vs text encoder? ## 2022-07-07 - [x] **Classifier** - linear regression plot for task past_pres, visual model, 118840 sentences ![](https://i.imgur.com/NFGRvFc.png) - [x] check if there's a bug while creating average for sentence tokens - this seems to be fine, I was just confused during the meeting - maybe collection of tensors in dict was broken (fixed) - [x] **sanity check for encodings** - [x] get first token for all sentences & layers => classify (on-going...) - [x] try-out 3 other probing tasks with encodings - classify on **10k** and **1k** since ~120k might be too much - [x] **past_pres** (see plots [here](https://hackmd.io/BVxsI6PTR7yjukkHwcZh7Q)) - [x] classify **10k** sentences - [x] visual - [x] text - [x] first token - [x] classify **1k** sentences - [x] visual - [x] text - [x] first token - [x] **obj_num** - [x] classify **10k** sentences - [x] visual - [x] text - [x] first token - [x] classify **1k** sentences - [x] visual - [x] text - [x] first token - [x] **subj_num** - [x] classify **10k** sentences - [x] visual - [x] text - [x] first token - [x] classify **1k** sentences - [x] visual - [x] text - [x] first token - [x] **bigram_shift** - [x] classify **10k** sentences - [x] visual - [x] text - [x] first token - [x] classify **1k** sentences - [x] visual - [x] text - [x] first token ## 2022-06-30 - [x] **Encodings** - [x] get hidden representation - understood where to obtain x from all layers - **obtained encodings from all layers** - understood difference between`enc_outs` and `new_outs`: `enc_outs`: [[1,1], [2,2], [3,3], [4,4]]` `new_outs`: [[[1,1],[1,1],[1,1],[1,1],[1,1]], [[2,2],[2,2],[2,2],[2,2],[2,2]], [[3,3],[3,3],[3,3],[3,3],[3,3]], [[4,4],[4,4],[4,4],[4,4],[4,4]]] - [x] use `enc_outs` or `new_outs` for classifier? - use `enc_outs` - [x] collect encodings from all layers and classify by layer - [x] collect encodings from visual model - [x] collect encodings from text model - [x] **talk/ask about self.layer_norm** - debugger extractions [extracted encodings in-between layers](https://hackmd.io/1XVDgiftS0GwHBmrOHYRug) - **save as extra layer, e.g. 'l6_ln'** - [x] **Classifier** - [x] classify encodings - [x] from visual model - [x] from text model - [x] map results from classified encodings - [x] from visual model - [x] from text model ## 2022-06-16 - [x] **Probing** - [x] Squib Paper - [x] watch YouTube videos ([Probing Classifiers: A Gentle Intro](https://www.youtube.com/watch?v=HJn-OTNLnoE) and [Inspecting Neural Networks with CCA](https://www.youtube.com/watch?v=u7Dvb_a1D-0)) - [x] try-out **sentence probing**: Task 9) Tense prediction - take average of all token vectors for one layer - [x] **Encodings** - [x] ~~get hidden representation:~~ should allow to get encodings from **all** layers, not just the last `return_all_hidden=False` set to `True` - didn't work - [x] **Classifier** - [x] **Example, [Notebook](https://colab.research.google.com/github/jalammar/jalammar.github.io/blob/master/notebooks/bert/A_Visual_Notebook_to_Using_BERT_for_the_First_Time.ipynb)** - [x] understand notebook - [x] make it run - [x] apply ideas to encodings from VisRep ## 2022-06-09 - [ ] (**Later:** get mappings for visual / text-based encodings for words) - [ ] visual: - [ ] make statistics for character pictures - [ ] identify space in pictures (heuristic for whitespace) - [ ] print np.matrix, will make sense when you look at it - [ ] text: - [ ] extract tokens of sentence - are there special tokens like beginning / end of sentence - [x] **SentEval (Baseline Ideas)** - [x] Paper - [x] ~~tasks from SentEval are set-up for EN, think about adaptations/extensions for DE (e.g. gender system)~~ - [x] **~~XProbe~~ / Probing** - [x] Xprobe Paper - [x] get encodings from German Data (try 10 sentences) - [x] ~~set-up Xprobe~~ - [x] ~~try-out **sentence probing**: Task 9) Tense prediction (see 06-16)~~ - [x] **Encodings** - [x] get & save encodings (as np tensors) **from every layer** - [x] research what **T x B X C** means (e.g. *"My name is Anastasia"*: [8, 5, 512]) - **T**: 8, Tokens - **B**: 5, Layers ?!? - **C**: 512, Embed. Dim. - [x] save `enc_outs` representations: - [x] save each sentence as np tensor in extra file - [x] **Classifier** - [x] use sklearn ## 2022-06-02 - [x] **mail to Liz** - [x] compose email about 1. text-baseline models 2. mapping of encodings - [x] send draft to Badr & Marius - [x] send to Liz - [x] try-out soltuions from email of Liz - [x] make baseline text models run - [x] run visualisations notebook of Liz - [x] adapt notebook to your needs - [x] **SentEval** - [x] read paper - [x] questions? - [x] research German SentEval - not existent - [x] prepare **Document/Presentation** with relevant infos/things to talk about ## 2022-05-30 - [x] align the vector encoding to the text fragment - [x] To-Do: image vector back to text - [x] make text-based representation models run - [x] where is the bpecodes-file for each text model? - [x] pre-trained model from Fairseq works (contains bpecodes-file). - [x] ~~save `enc_outs` representations:~~ - [x] ~~save in HDF5~~ - [x] ~~how to use that? which design?~~ - [x] ~~(think about sorting)~~ ## 2022-05-05 + 05-12 - [x] find `enc_outs` - [x] make print-statements - [x] ~~save `enc_outs` representations:~~ - [x] convert to numpy - [x] send code/representations to Marius - [x] write Mail to Marius about Meeting on Thursday ## 2022-04-28 - [x] get pre-trained text-based representation models - [x] find some collecting places for representations from last -layer - [x] **Encoder** - [x] get representation from one, two sentences - ~~use pt.save~~ - [x] **Decoder** - *models/transformer.py*: encoder_out, encoder_states save in numpy - **script:** menge von sätzen, speichert diese embeddings - [x] ~~research what **T x B X C** means (e.g. [27, 3, 512])~~ - ~~**T**: 27, Tokens~~ - ~~**B**: 3, Sentences~~ - ~~**C**: 512, Embed. Dim.~~ ## 2022-04-21 - [x] find text-based fairseq from paper - not found / not provided - [x] **email adress liz !** - [x] write email to liz - [x] get transformation/tensors for all layers (same in encoder and decoder) - is this the right place to look for the tensors? - *fairseq/models/fairseq_encoder.py -> def upgrade_state_dict_named(self, state_dict, name):* - *fairseq/modules/transformer_layer.py -> Class TransformerEncoder / -Decoder* - modify the code for the representations: (reverse engineer: extract representations) - *fairseq/modules/transformer_layer.py -> Class TransformerEncoder // def forward* - [x] understand how to modify trans enc., trans dec. - [x] we care about the **last** layer ([Layers of VisRep](https://hackmd.io/fDP_MkM3Qt-f0cvRLxsPbA)) - [x] ~~write own encoder~~ - [x] i. ~~init with exisiting weights~~ - [x] ii. ~~look-up encoder: *fairseq/modules/transformer_layer.py -> Class TransformerEncoder*~~ - [x] input individual sentences and get representations ## 2022-03-14 ### papers to read - [x] [**Understanding intermediate layers using linear classifier probes**](https://arxiv.org/abs/1610.01644) - [x] read - [x] summary - [x] [**Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks**](https://aclanthology.org/I17-1001) - [x] read - [x] summary - [x] [**Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information**](https://arxiv.org/abs/1808.08079) - [x] read - [x] summary - [x] read again - [ ] [**What do you learn from context? Probing for sentence structure in contextualized word representations**](https://arxiv.org/abs/1905.06316) - [x] read - [ ] summary - [ ] [**What you can cram into a single vector: Probing sentence embeddings for linguistic properties**](https://arxiv.org/abs/1805.01070) - [x] read - [ ] summary - [ ] [**Probing Multilingual Sentence Representations With X-PROBE**](https://arxiv.org/pdf/1906.05061.pdf) - [x] read - [ ] summary - [ ] [**Squib: Probing Classifiers: Promises, Shortcomings, and Advances**](https://arxiv.org/pdf/2102.12452.pdf) - [x] read - [ ] summary ### set-up - [x] try-out setting up fairseq - [x] questions