# Multilingual NLP, papers
## Tokenisation and morphology
* Morita et al., Morphological Analysis for Unsegmented Languages using Recurrent Neural Network Language Model, https://aclanthology.org/D15-1276.pdf (tm1)
* Hoffman et al., Superbizarre Is Not Superb: Derivational Morphology Improves BERT's Interpretation of Complex Words, https://arxiv.org/pdf/2101.00403.pdf (tm2)
* Maget et al., BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages, https://arxiv.org/pdf/2203.08954 (tm3)
* Hiraoka et al., Stochastic Tokenization with a Language Model for Neural Text Classification, https://aclanthology.org/P19-1158.pdf (tm4)
* Chen et al., CLOWER: A Pre-trained Language Model with Contrastive Learning over Word and Character Representations, https://arxiv.org/pdf/2208.10844.pdf (tm5)
* Sandhan et al., TransLIST: A Transformer-Based Linguistically Informed Sanskrit Tokenizer, https://arxiv.org/abs/2210.11753 (tm6)
* Imamura & Sumita, Extending the Subwording Model of Multilingual Pretrained Models for New Languages, https://arxiv.org/abs/2211.15965 (tm7)
## SigTyp papers
### Typological-feature prediction
* Gutkin & Sproat, NEMO: Frequentist Inference Approach to Constrained Linguistic Typology Feature Prediction in SIGTYP 2020 Shared Task, https://aclanthology.org/2020.sigtyp-1.3.pdf (tfp1)
* Jäger, Imputing typological values via phylogenetic inference, https://aclanthology.org/2020.sigtyp-1.5.pdf (tfp2)
### Other topics
* Marjou, OTEANN: Estimating the Transparency of Orthographies with an Artificial Neural Network, https://aclanthology.org/2021.sigtyp-1.1.pdf (ot1)
* Villa & Inglese, Inferring morphological complexity from syntactic dependency networks: a test, https://aclanthology.org/2021.sigtyp-1.2.pdf (ot2)
* Hammarström, Measuring Prefixation and Suffixation in the Languages of the World, https://aclanthology.org/2021.sigtyp-1.8.pdf (ot3)
* Mikhailov et al., Morph Call: Probing Morphosyntactic Content of Multilingual Transformers, https://aclanthology.org/2021.sigtyp-1.10.pdf (ot4)
## SigMorphon papers
### Morphological reinflection: Generalisations across languages and related papers
* Elsner, What transfers in morphological inflection? Experiments with analogical models, https://aclanthology.org/2021.sigmorphon-1.18.pdf (mr1)
* Jayanthi & Pratapa, A Study of Morphological Robustness of Neural Machine Translation, https://aclanthology.org/2021.sigmorphon-1.6.pdf (mr2)
* Silfverberg et al., Data Augmentation for Morphological Reinflection, https://aclanthology.org/K17-2010.pdf (mr3)
* Goldman & Tsarfaty, Morphology Without Borders: Clause-Level Morphological Annotation, https://arxiv.org/pdf/2202.12832.pdf (mr4)
### Multilingual grapheme-to-phoneme conversion
* Lo & Nicolai, Linguistic Knowledge in Multilingual Grapheme-to-Phoneme Conversion, https://aclanthology.org/2021.sigmorphon-1.15.pdf (gtp1)
* Ryskina et al., Comparative Error Analysis in Neural and Finite-state Models for Unsupervised Character-level Transduction, https://aclanthology.org/2021.sigmorphon-1.22.pdf (gtp2)
* Sharma et al., Improved pronunciation prediction accuracy using morphology, https://aclanthology.org/2021.sigmorphon-1.24.pdf (gtp3)
## Multilingual syntax
* Deng & Xue, Translation Divergences in Chinese–English Machine Translation: An Empirical Investigation, https://aclanthology.org/J17-3002.pdf (mls1)
* Nikolaev et. al, Fine-Grained Analysis of Cross-Linguistic Syntactic Divergences, https://aclanthology.org/2020.acl-main.109.pdf (mls2)
## Analysis of multilingual models
* Chi et al., Finding Universal Grammatical Relations in Multilingual BERT, https://aclanthology.org/2020.acl-main.493.pdf (mlm1)
* Choenni & Shutova, Investigating Language Relationships in Multilingual Sentence Encoders Through the Lens of Linguistic Typology, https://direct.mit.edu/coli/article/48/3/635/110573 (mlm2)
## NLP meets linguistic typology
### Language typology as predictor of model performance
* Cotterell et al., Are All Languages Equally Hard to Language-Model?, https://aclanthology.org/N18-2085v1.pdf (perf1)
* Mielke et al., What Kind of Language Is Hard to Language-Model?, https://aclanthology.org/P19-1491/ (perf2)
* Park et al., Morphology Matters: A Multilingual Language Modeling Analysis, https://aclanthology.org/2021.tacl-1.16/ (perf3)
* Ravfogel et al., Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages, https://arxiv.org/pdf/1903.06400.pdf (perf4)
* Gerz et al., On the Relation between Linguistic Typology and (Limitations of) Multilingual Language Modeling, https://aclanthology.org/D18-1029.pdf (perf5)
### Predicting/using typological features
* Gonzalez-Dominguez et al., Automatic Language Identification using Long Short-Term Memory Recurrent Neural Networks, https://storage.googleapis.com/pub-tools-public-publication-data/pdf/42540.pdf + Gutkin et al., Predicting the Features of World Atlas of Language Structures from Speech, https://storage.googleapis.com/pub-tools-public-publication-data/pdf/7c28e5e5a21718a6c692c7d999671dab947aedc1.pdf (nltp1)
* Tsvetkov et al., Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning, https://arxiv.org/pdf/1605.03832.pdf (nltp2)
* Qian et al., Investigating Language Universal and Specific Properties in Word Embeddings, https://aclanthology.org/P16-1140.pdf (nltp3)
### Politeness systems
* Srinivasan & Choi, TyDiP: A Dataset for Politeness Classification in Nine Typologically Diverse Languages, https://arxiv.org/abs/2211.16496 (nltp4)
## Language embeddings
* Östling & Tiedemann, Continuous multilinguality with language vectors, https://aclanthology.org/E17-2102.pdf + Malaviya et al., Learning Language Representations for Typology Prediction, https://aclanthology.org/D17-1268.pdf (le1)
* Bjerva & Augenstein, From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings, https://aclanthology.org/N18-1083.pdf (le2)
## Linguistic complexity
* Cotterell et al., On the Complexity and Typology of Inflectional Morphological Systems, https://aclanthology.org/Q19-1021.pdf (lc1)
* Pimentel et al., Phonotactic Complexity and Its Trade-offs, https://aclanthology.org/2020.tacl-1.1/ (lc2)
## Computational linguistic typology
* Daumé III, Non-Parametric Bayesian Areal Linguistics, https://aclanthology.org/N09-1067.pdf (clt1)
* Georgi et al., Comparing Language Similarity across Genetic and Typologically-Based Groupings, https://aclanthology.org/C10-1044.pdf (clt2)
* Cotterell & Eisner, Probabilistic Typology: Deep Generative Models of Vowel Inventories, https://arxiv.org/pdf/1705.01684.pdf (clt3)
* Bjerva et al., A Probabilistic Generative Model of Linguistic Typology, https://arxiv.org/pdf/1903.10950.pdf (clt4)
* Abdou et al., Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color, https://arxiv.org/pdf/2109.06129.pdf (clt5)
## Computational historical-comparative linguistics
* Meloni et al., Ab Antiquo: Neural Proto-language Reconstruction, https://aclanthology.org/2021.naacl-main.353/ (hc1)