owned this note
owned this note
Published
Linked with GitHub
---
tags: deeplearning
---
# CS224N (2019) Lecture 16 Further Readings
## Constituency Parsing

([Source](https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/slides/cs224n-2019-lecture05-dep-parsing.pdf))

## Neural Coref Models
### More Details about Input Layer

([source](https://arxiv.org/pdf/1606.01323.pdf))
- *Embedding Features*: Word embeddings of the head word, dependency parent, first word, last word, two preceding words, and two following words of the mention. Averaged word embeddings of the five preceding words, five following words, all words in the mention, all words in the mention’s sentence, and all words in the mention’s document.
- *Additional Mention Features:*: The type of the mention (pronoun, nominal, proper, or list), the mention’s position (index of the mention divided by the number of mentions in the document), whether the mentions is contained in another mention, and the length of the mention in words.
- *Document Genre*: The genre of the mention’s document (broadcast news, newswire, web data, etc.).
- *Distance Features*: The distance between the mentions in sentences, the distance between the mentions in intervening mentions, and whether thementions overlap.
- *Speaker Features*: Whether the mentions have the same speaker and whether one mention is the other mention’s speaker as determined by string matching rules from Raghunathan et al. (2010)
- *String Matching Features*: Head match, exact string match, and partial string match.
## Clustering-Based Models
[Clark, Kevin, and Christopher D. Manning. "Improving coreference resolution by learning entity-level distributed representations." arXiv preprint arXiv:1606.01323 (2016).](https://arxiv.org/pdf/1606.01323.pdf)
### More Details about Pruning
Purning is performed both on training and evaluation:
- Only consider spans with up to $L$ words and compute their unary mention scores $s_{m}(i)$
- Only keep up to $λT$ spans with the highest mention scores and consider only up to $K$ antecedents for each (It works well when $λ = 0.4$)
- Enforce non-crossing bracketing structures with a simple suppression scheme
## End-to-end Models
[Lee, Kenton, et al. "End-to-end neural coreference resolution." arXiv preprint arXiv:1707.07045 (2017).](https://arxiv.org/pdf/1707.07045.pdf)
([PyTorch Implementation](https://github.com/shayneobrien/coreference-resolution))
## Demo
[Neural Coreference – Hugging Face](https://huggingface.co/coref/)
## Gentle Introduction
- [State-of-the-art neural coreference resolution for chatbots](https://medium.com/huggingface/state-of-the-art-neural-coreference-resolution-for-chatbots-3302365dcf30)
- [Coreference Resolution for Chatbots](https://chatbotslife.com/coreference-resolution-for-chatbots-f49a720c428b)