# Paper content
* Abstract
* Introduction
* what is definition modeling?
*
* why generating definition is important?
* To understand word embeddings and check its quality
* To generate definitions of new words
* our work
* leverage the pre-trained model T5 (no multiple components like LSTM, avoid OOV)
* show bertscore is a better evaluation method (comparing to BLEU)
* Related Work
* word embedding analysis task(Hill)
* DF task
* LSTM
* ni (local context)
* noraset (global information)
* gadetsky (local + global)
* ishiwatari (local + global)
* Transformer-based
* mark my word
* incorporate with chinese sememes
* Definition modeling as a seq2seq task
* why df should be regarded as s2s
* ishiwatari view: simulate human's behavior first look at local and then global context
* mark my word view: distributional hypothesis
* researches treated task as s2s performed better
* oxford dataset contains noise
* have no effect (observation from attention)
* pre-trained sentence embeddings
* bert, elmo... trained with huge dataset
* all other papers used word2vec and GloVe
* Model
* T5 model ([example explanation](https://arxiv.org/pdf/2002.08910.pdf))
* Experiment and Result
* Dataset (wordnet, oxford, slang, wiki)
* Result
* BLEU increase by ? points
* bertscore
* confirmed (give some examples)
* bleu will punish different style of dictionary
In reality, we only care about the correctness of the definition but not the style of it
* show examples of generated definition
* for slang lstm generated many unk tokens (since too many rare words)
* Discussion
* LSTM tend to generate some specific patterns (-ness 80% starts from :the quality of being.. )
* How context information helps the model?(context length)**from ishiwatari**
* able to generate definition without context(show performance of global embeddings)
* similar meaning but low bleu score (so use bertscore)
* Conclusion
* Reference
## [analysis from others](/g1Oy8QlDTqCVU8VRuX-cvg)