# Master Thesis: To-Do's
* [**GitHub, visrepProb**](https://github.com/desastrae/visrepProb)
* [**Results, plotted**](https://hackmd.io/@600DW6b8QBqBb1Gv8lZYwg/r1n2R5Yo5)
* [**OneDrive, dataset**](https://unisaarlandde-my.sharepoint.com/:f:/g/personal/s8anbori_uni-saarland_de/Eqvk-BGL4sxAjsp700yG4kYBzvakxaZd__LqI4bwrCfdww?e=CwGHmd)
* [**OneDrive, Powerpoints**](https://unisaarlandde-my.sharepoint.com/:f:/g/personal/s8anbori_uni-saarland_de/EkMIK5b2tUpMk70ivsOKF08B2BO75t_MMm9jRv6ssW8n3A?e=QSxNwA)
## date
- [ ] 2023-01-26
- [ ] 2022-12-15
- [ ] 2022-12-01
- [ ] 2022-10-20 (Proposal/Summary)
- [x] 2022-10-06
- [ ] 2022-09-29
- [x] 2022-09-22
- [x] 2022-09-01
- [x] 2022-08-25
- [ ] 2022-08-18 (later)
- [ ] 2022-08-11 (later)
- [ ] 2022-08-04 (later)
- [ ] 2022-07-28 (literature)
- [ ] 2022-07-21 (literature)
- [x] 2022-07-07
- [x] 2022-06-30
- [x] 2022-06-16
- [ ] 2022-06-09 (later)
- [x] 2022-06-02
- [x] 2022-05-30
- [x] 2022-05-05 + 05-12
- [x] 2022-04-28
- [x] 2022-04-21
## 2023-01-26
- [x] **PLOTS Noise**
- noise-types: stay with cam, l33t, swap
- [ ] fix error for subjnum **(WIP)**
- [ ] create plots for bishift & tense
- [ ] Create step-by-step doc
- [ ] **BLEU Score**
- [ ] Evaluate translations with BLEU score
- [ ] find all scripts which are necessary
- [hackmd; BLEU score](https://hackmd.io/avL3nS1uRJiZQrUBiI0Zow)
- [ ] data: raw, Noise: 10, 20, 40, 80 % (Later in 10% steps?)
- [ ] **PLOTS generally**
- [ ] crowded/messy, clean up!
- [ ] **Proposal**
- [ ] *Related Work*: Make *Background & Related Work*
- write briefly in own works, not tooo detailed - focus on essential knowledge
## ...
...
## 2022-12-15
[Structure MA Proposal](https://unisaarlandde-my.sharepoint.com/:w:/g/personal/s8anbori_uni-saarland_de/EUmwt2_ZgNtOt4mPQGDDko4Bbzow4CglOd7Y6SfLtbsr6A?e=YVuvbr)
## 2022-12-01
- [x] **Noise**
- [x] PLOTS
- [x] obtain missing plots
- [x] create line-plots for difference between clean and noisy data
- [ ] **PROPOSAL**
- [ ] **write proposal/summary !!**
## 2022-10-20
- [x] **Evaluation**
- [x] check keys in SUBJ task for bugs
- [x] create new plots which are more visually pleasant
- [x] **Noise**
- [x] create python script for noise
- use script by [charNMT-noise](https://github.com/ybisk/charNMT-noise/blob/master/scrambler.py) as inspiration
- [x] apply noise to test set
- [x] **percentage:** 10%, 20%, 40%, 80%
- [x] **type:** swap, cam
- [x] create encodings
- [x] run new noise data on classifier
- [ ] **Written Summary (*Sentence-Level Evaluation*)**
- [ ] create summary about work & experiments so far, should be half a page long
- Intro
- Experiments
- Results
- Analysis
## 2022-10-06
- [x] **Evaluation**
- [x] Plots
- [x] change order of subplots
- [x] add **distribution** of evaluation set in classifier train- and test-set in **title of subplots**
## 2022-09-29
- [x] **Noisy Data**
- [x] ~~create noisy data: 10%, 20%, 40%, 80%~~
- [x] noise type (*swap, cam, real, "key"?*):
- first one type of noise
- if time available, try more noise types
- [x] understand *"key"* noise
- [x] evaluate trained classifier on noise data
- [x] noise type: **swap**
- [x] plots:
- [x] create one plot? create four plots?
- [x] **Pipeline Noise:**
1. ***Create noisy test data***
- [x] Adapt *scrambler.py* -> *scrambler_changed.py*
- handle *.npy* files
2. Is amount of noise applied to sentence or dataset?
-
4. Create encodings
5. Load & classify encodings
6. Save results for noise amount in one CSV file
7. Create plot(s)
- [x] **Classifier**
- [x] create pie-charts for OBJ & SUBJ task (train- and test-data separately)
- [x] o-Plural
- [x] o-Singular
- [x] s-Plural
- [x] s-Singular
- [x] spaCy
- [x] identify errors in test sentences
- [x] **General**
- [x] remove percentage label on bar-charts
## 2022-09-22
- [x] **Noisy Data**
- [x] inform about available resources
- Paper: [Synthetic and Natural Noise Both Break Neural Machine Translation](https://arxiv.org/abs/1711.02173)
- Code: [github, charNMT-noise](https://github.com/ybisk/charNMT-noise)
- [x] Try-Out
- [x] How much noise in train and test set do we want?
- [x] Noise in specific words:
- [x] SUBJ-/OBJ_NUMBER: induce noise in subject/object?
- [x] **Classifier**
- [x] create plots for all 4 scenarios for SUBJ & OBJ task
- [x] plot old results behind new results
## 2022-09-01
- rediscussed next steps and how to best approach these (pipeline, noise...)
## 2022-08-25
- [x] **Classifier**
- [x] save & load model in code
- [x] for model size and layer (**new test set**)
- [x] test with 20 sentences (**results should be random**):
~~- 10 sent label *Sing*, 10 sent label *Plur*~~
- 20 sentences of each case for SUBJ & OBJ task
- [x] sb - *Sing*, o - *Plur*
- [x] sb - *Plur*, o - *Sing*
- [x] sb - *Sing*, o - *Sing*
- [x] sb - *Plur*, o - *Plur*
- [x] **Noisy Data**
- [x] ..
- [x] **"Pipeline"** (for each layer)
1. Load trained probe classifier
2. Load csv with test sentences
3. Read in raw sentences for testing & create their encodings / embeddings
4. Feed encodings to probe classifer
5. Save results of classifier to a **csv file**
## 2022-08-18
- [x] **Classifier**
- [x] (**later**) Create all encodings for all layers, make directory for every class to later draw a distribution
## 2022-08-11
- [x] **Classifier**
- [x] create balanced df with pandas (no sorted data needed)
- [x] use example given by Marius (**don't stratify!**)
- [x] **!! separate train and test data !!**
- [x] **New Probing tasks** (later)
- [Ideas](https://hackmd.io/@600DW6b8QBqBb1Gv8lZYwg/SkST7rGAc)
- [x] **Evaluate Probing Tasks Data**
- e.g. *"Die Katze[sing.] frisst die Mäuse[plur.]"*
- [x] ~~**Control..**~~
- ~~Is *"Katze"* in Train **and** Test set?~~
- ~~Is *"Katze"* used as Subject **and** Object in the dataset?~~
- ~~Is *"Katze"* used as Singular **and** Plural in the dataset?~~
- [x] Use spaCy
- [x] plot ~~4 bars~~ pie chart:
| | bar 1| bar 2 | bar 3| bar 4 |
| --------- | ---- | ----- | -----| ----- |
| **SUBJ** | sing | sing | plu | plu |
| **OBJ** | sing | plu | sing | plu |
- [x] **Noisy Data**
- [x] focus on character permutation (swap & cam)
| | set 1 | set 2 | set 3 | set 4 |
| -----------| ----- | ------ | ------| ------|
| **Train** | clean | clean | noisy | noisy |
| **Test** | clean | noisy | clean | noisy |
- [x] **Next Steps**
- [x] 1) Fix **Classifier**
- [x] 2) **Evaluate Probing Tasks Data**
- [x] 3) Create **Noisy Data**
## 2022-08-04
- [x] **Classifier**
- [x] create balanced df with pandas (no sorted data needed)
- [x] use example given by Badr
- [x] use balanced data in training and test set
- **sklearn: stratify=labels_shuffled**
`train_test_split(features_shuffled,
labels_shuffled, random_state=42, stratify=labels_shuffled)`
- [ ] **Probing tasks**
- [x] **!!** think of more tasks but specifically for **German**
- [Deutsch_im_Sprachvergleich_2012](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjH--Lbu7r5AhUFhP0HHRqFBIMQFnoECBQQAQ&url=https%3A%2F%2Fids-pub.bsz-bw.de%2Ffiles%2F4925%2FIDS_Jahrbuch_2011_Deutsch_im_Sprachvergleich_2012.pdf&usg=AOvVaw23VTxw6HnkOlj9Xpb0FSU9)
- [German NLP](https://github.com/adbar/German-NLP)
- [ ] **Language families** (later)
- test robustness of models for probing tasks with different language of same family
- **e.g.** fr-en tranlation model:
**train** classifier on french encodings, **test** on spanish encodings
## 2022-07-28
- [x] **Classifier // Probing tasks**
- [x] Draw a balanced distribution for the probing tasks? We only consider 10k out of 120k sentences.
- [ ] **!! Literature**
- [ ] **create tabular overview and read!!**
- [x] ~~**Probing**~~
- [x] ~~think about specific tasks for German, with example~~
## 2022-07-21
- [x] ~~**Classifier**~~
- [x] ~~text model~~
- ~~not able to extract layer_norm~~
- **no normalization layer in text model**
- [x] **Clean up dataset**
- [x] Subj_number: Dataset makes trouble, fix this
- [x] Remove problematic sentences
- line 547: *" Ein Lied geht um die Welt ( Die Joseph-Schmidt-Story ) Sing*
- line 6751: *" , 1957 ) und Michel Legrand tätig war . Sing*
- **Found the problem: lines starting with > " <**
- [x] Use preprocessing in python to identify and remove white space
- [x] rewrite them in a new csv/tsv file, encode in utf-8
- [x] **Probing tasks**
- [x] Compute performance of majority classifier
- See https://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyClassifier.html
- [x] Look at other probing tasks at the literature
- [x] Probing task for the German model (subj gender prediction? Obj gender prediction? Obj case prediction (dative vs acusative))
- [ ] **Situate your research within the literature**
- [ ] How is the performance to other models for each task?
- [ ] Are there differences?
- [ ] How do you explain the differences between visual vs text encoder?
## 2022-07-07
- [x] **Classifier**
- linear regression plot for task past_pres, visual model, 118840 sentences 
- [x] check if there's a bug while creating average for sentence tokens
- this seems to be fine, I was just confused during the meeting
- maybe collection of tensors in dict was broken (fixed)
- [x] **sanity check for encodings**
- [x] get first token for all sentences & layers => classify (on-going...)
- [x] try-out 3 other probing tasks with encodings
- classify on **10k** and **1k** since ~120k might be too much
- [x] **past_pres** (see plots [here](https://hackmd.io/BVxsI6PTR7yjukkHwcZh7Q))
- [x] classify **10k** sentences
- [x] visual
- [x] text
- [x] first token
- [x] classify **1k** sentences
- [x] visual
- [x] text
- [x] first token
- [x] **obj_num**
- [x] classify **10k** sentences
- [x] visual
- [x] text
- [x] first token
- [x] classify **1k** sentences
- [x] visual
- [x] text
- [x] first token
- [x] **subj_num**
- [x] classify **10k** sentences
- [x] visual
- [x] text
- [x] first token
- [x] classify **1k** sentences
- [x] visual
- [x] text
- [x] first token
- [x] **bigram_shift**
- [x] classify **10k** sentences
- [x] visual
- [x] text
- [x] first token
- [x] classify **1k** sentences
- [x] visual
- [x] text
- [x] first token
## 2022-06-30
- [x] **Encodings**
- [x] get hidden representation
- understood where to obtain x from all layers
- **obtained encodings from all layers**
- understood difference between`enc_outs` and `new_outs`:
`enc_outs`:
[[1,1],
[2,2],
[3,3],
[4,4]]`
`new_outs`:
[[[1,1],[1,1],[1,1],[1,1],[1,1]],
[[2,2],[2,2],[2,2],[2,2],[2,2]],
[[3,3],[3,3],[3,3],[3,3],[3,3]],
[[4,4],[4,4],[4,4],[4,4],[4,4]]]
- [x] use `enc_outs` or `new_outs` for classifier?
- use `enc_outs`
- [x] collect encodings from all layers and classify by layer
- [x] collect encodings from visual model
- [x] collect encodings from text model
- [x] **talk/ask about self.layer_norm**
- debugger extractions [extracted encodings in-between layers](https://hackmd.io/1XVDgiftS0GwHBmrOHYRug)
- **save as extra layer, e.g. 'l6_ln'**
- [x] **Classifier**
- [x] classify encodings
- [x] from visual model
- [x] from text model
- [x] map results from classified encodings
- [x] from visual model
- [x] from text model
## 2022-06-16
- [x] **Probing**
- [x] Squib Paper
- [x] watch YouTube videos ([Probing Classifiers: A Gentle Intro](https://www.youtube.com/watch?v=HJn-OTNLnoE) and [Inspecting Neural Networks with CCA](https://www.youtube.com/watch?v=u7Dvb_a1D-0))
- [x] try-out **sentence probing**: Task 9) Tense prediction
- take average of all token vectors for one layer
- [x] **Encodings**
- [x] ~~get hidden representation:~~ should allow to get encodings from **all** layers, not just the last
`return_all_hidden=False` set to `True`
- didn't work
- [x] **Classifier**
- [x] **Example, [Notebook](https://colab.research.google.com/github/jalammar/jalammar.github.io/blob/master/notebooks/bert/A_Visual_Notebook_to_Using_BERT_for_the_First_Time.ipynb)**
- [x] understand notebook
- [x] make it run
- [x] apply ideas to encodings from VisRep
## 2022-06-09
- [ ] (**Later:** get mappings for visual / text-based encodings for words)
- [ ] visual:
- [ ] make statistics for character pictures
- [ ] identify space in pictures (heuristic for whitespace)
- [ ] print np.matrix, will make sense when you look at it
- [ ] text:
- [ ] extract tokens of sentence
- are there special tokens like beginning / end of sentence
- [x] **SentEval (Baseline Ideas)**
- [x] Paper
- [x] ~~tasks from SentEval are set-up for EN, think about adaptations/extensions for DE (e.g. gender system)~~
- [x] **~~XProbe~~ / Probing**
- [x] Xprobe Paper
- [x] get encodings from German Data (try 10 sentences)
- [x] ~~set-up Xprobe~~
- [x] ~~try-out **sentence probing**: Task 9) Tense prediction (see 06-16)~~
- [x] **Encodings**
- [x] get & save encodings (as np tensors) **from every layer**
- [x] research what **T x B X C** means (e.g. *"My name is Anastasia"*: [8, 5, 512])
- **T**: 8, Tokens
- **B**: 5, Layers ?!?
- **C**: 512, Embed. Dim.
- [x] save `enc_outs` representations:
- [x] save each sentence as np tensor in extra file
- [x] **Classifier**
- [x] use sklearn
## 2022-06-02
- [x] **mail to Liz**
- [x] compose email about
1. text-baseline models
2. mapping of encodings
- [x] send draft to Badr & Marius
- [x] send to Liz
- [x] try-out soltuions from email of Liz
- [x] make baseline text models run
- [x] run visualisations notebook of Liz
- [x] adapt notebook to your needs
- [x] **SentEval**
- [x] read paper
- [x] questions?
- [x] research German SentEval
- not existent
- [x] prepare **Document/Presentation** with relevant infos/things to talk about
## 2022-05-30
- [x] align the vector encoding to the text fragment
- [x] To-Do: image vector back to text
- [x] make text-based representation models run
- [x] where is the bpecodes-file for each text model?
- [x] pre-trained model from Fairseq works (contains bpecodes-file).
- [x] ~~save `enc_outs` representations:~~
- [x] ~~save in HDF5~~
- [x] ~~how to use that? which design?~~
- [x] ~~(think about sorting)~~
## 2022-05-05 + 05-12
- [x] find `enc_outs`
- [x] make print-statements
- [x] ~~save `enc_outs` representations:~~
- [x] convert to numpy
- [x] send code/representations to Marius
- [x] write Mail to Marius about Meeting on Thursday
## 2022-04-28
- [x] get pre-trained text-based representation models
- [x] find some collecting places for representations from last -layer
- [x] **Encoder**
- [x] get representation from one, two sentences
- ~~use pt.save~~
- [x] **Decoder**
- *models/transformer.py*: encoder_out, encoder_states save in numpy
- **script:** menge von sätzen, speichert diese embeddings
- [x] ~~research what **T x B X C** means (e.g. [27, 3, 512])~~
- ~~**T**: 27, Tokens~~
- ~~**B**: 3, Sentences~~
- ~~**C**: 512, Embed. Dim.~~
## 2022-04-21
- [x] find text-based fairseq from paper
- not found / not provided
- [x] **email adress liz !**
- [x] write email to liz
- [x] get transformation/tensors for all layers (same in encoder and decoder)
- is this the right place to look for the tensors?
- *fairseq/models/fairseq_encoder.py
-> def upgrade_state_dict_named(self, state_dict, name):*
- *fairseq/modules/transformer_layer.py
-> Class TransformerEncoder / -Decoder*
- modify the code for the representations: (reverse engineer: extract representations)
- *fairseq/modules/transformer_layer.py
-> Class TransformerEncoder // def forward*
- [x] understand how to modify trans enc., trans dec.
- [x] we care about the **last** layer ([Layers of VisRep](https://hackmd.io/fDP_MkM3Qt-f0cvRLxsPbA))
- [x] ~~write own encoder~~
- [x] i. ~~init with exisiting weights~~
- [x] ii. ~~look-up encoder: *fairseq/modules/transformer_layer.py
-> Class TransformerEncoder*~~
- [x] input individual sentences and get representations
## 2022-03-14
### papers to read
- [x] [**Understanding intermediate layers using linear classifier probes**](https://arxiv.org/abs/1610.01644)
- [x] read
- [x] summary
- [x] [**Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks**](https://aclanthology.org/I17-1001)
- [x] read
- [x] summary
- [x] [**Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information**](https://arxiv.org/abs/1808.08079)
- [x] read
- [x] summary
- [x] read again
- [ ] [**What do you learn from context? Probing for sentence structure in contextualized word representations**](https://arxiv.org/abs/1905.06316)
- [x] read
- [ ] summary
- [ ] [**What you can cram into a single vector: Probing sentence embeddings for linguistic properties**](https://arxiv.org/abs/1805.01070)
- [x] read
- [ ] summary
- [ ] [**Probing Multilingual Sentence Representations With X-PROBE**](https://arxiv.org/pdf/1906.05061.pdf)
- [x] read
- [ ] summary
- [ ] [**Squib: Probing Classifiers: Promises, Shortcomings, and Advances**](https://arxiv.org/pdf/2102.12452.pdf)
- [x] read
- [ ] summary
### set-up
- [x] try-out setting up fairseq
- [x] questions