# Paper content * Abstract * Introduction * what is definition modeling? * * why generating definition is important? * To understand word embeddings and check its quality * To generate definitions of new words * our work * leverage the pre-trained model T5 (no multiple components like LSTM, avoid OOV) * show bertscore is a better evaluation method (comparing to BLEU) * Related Work * word embedding analysis task(Hill) * DF task * LSTM * ni (local context) * noraset (global information) * gadetsky (local + global) * ishiwatari (local + global) * Transformer-based * mark my word * incorporate with chinese sememes * Definition modeling as a seq2seq task * why df should be regarded as s2s * ishiwatari view: simulate human's behavior first look at local and then global context * mark my word view: distributional hypothesis * researches treated task as s2s performed better * oxford dataset contains noise * have no effect (observation from attention) * pre-trained sentence embeddings * bert, elmo... trained with huge dataset * all other papers used word2vec and GloVe * Model * T5 model ([example explanation](https://arxiv.org/pdf/2002.08910.pdf)) * Experiment and Result * Dataset (wordnet, oxford, slang, wiki) * Result * BLEU increase by ? points * bertscore * confirmed (give some examples) * bleu will punish different style of dictionary In reality, we only care about the correctness of the definition but not the style of it * show examples of generated definition * for slang lstm generated many unk tokens (since too many rare words) * Discussion * LSTM tend to generate some specific patterns (-ness 80% starts from :the quality of being.. ) * How context information helps the model?(context length)**from ishiwatari** * able to generate definition without context(show performance of global embeddings) * similar meaning but low bleu score (so use bertscore) * Conclusion * Reference ## [analysis from others](/g1Oy8QlDTqCVU8VRuX-cvg)