# Multilingual NLP, papers ## Tokenisation and morphology * Morita et al., Morphological Analysis for Unsegmented Languages using Recurrent Neural Network Language Model, https://aclanthology.org/D15-1276.pdf (tm1) * Hoffman et al., Superbizarre Is Not Superb: Derivational Morphology Improves BERT's Interpretation of Complex Words, https://arxiv.org/pdf/2101.00403.pdf (tm2) * Maget et al., BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages, https://arxiv.org/pdf/2203.08954 (tm3) * Hiraoka et al., Stochastic Tokenization with a Language Model for Neural Text Classification, https://aclanthology.org/P19-1158.pdf (tm4) * Chen et al., CLOWER: A Pre-trained Language Model with Contrastive Learning over Word and Character Representations, https://arxiv.org/pdf/2208.10844.pdf (tm5) * Sandhan et al., TransLIST: A Transformer-Based Linguistically Informed Sanskrit Tokenizer, https://arxiv.org/abs/2210.11753 (tm6) * Imamura & Sumita, Extending the Subwording Model of Multilingual Pretrained Models for New Languages, https://arxiv.org/abs/2211.15965 (tm7) ## SigTyp papers ### Typological-feature prediction * Gutkin & Sproat, NEMO: Frequentist Inference Approach to Constrained Linguistic Typology Feature Prediction in SIGTYP 2020 Shared Task, https://aclanthology.org/2020.sigtyp-1.3.pdf (tfp1) * Jäger, Imputing typological values via phylogenetic inference, https://aclanthology.org/2020.sigtyp-1.5.pdf (tfp2) ### Other topics * Marjou, OTEANN: Estimating the Transparency of Orthographies with an Artificial Neural Network, https://aclanthology.org/2021.sigtyp-1.1.pdf (ot1) * Villa & Inglese, Inferring morphological complexity from syntactic dependency networks: a test, https://aclanthology.org/2021.sigtyp-1.2.pdf (ot2) * Hammarström, Measuring Prefixation and Suffixation in the Languages of the World, https://aclanthology.org/2021.sigtyp-1.8.pdf (ot3) * Mikhailov et al., Morph Call: Probing Morphosyntactic Content of Multilingual Transformers, https://aclanthology.org/2021.sigtyp-1.10.pdf (ot4) ## SigMorphon papers ### Morphological reinflection: Generalisations across languages and related papers * Elsner, What transfers in morphological inflection? Experiments with analogical models, https://aclanthology.org/2021.sigmorphon-1.18.pdf (mr1) * Jayanthi & Pratapa, A Study of Morphological Robustness of Neural Machine Translation, https://aclanthology.org/2021.sigmorphon-1.6.pdf (mr2) * Silfverberg et al., Data Augmentation for Morphological Reinflection, https://aclanthology.org/K17-2010.pdf (mr3) * Goldman & Tsarfaty, Morphology Without Borders: Clause-Level Morphological Annotation, https://arxiv.org/pdf/2202.12832.pdf (mr4) ### Multilingual grapheme-to-phoneme conversion * Lo & Nicolai, Linguistic Knowledge in Multilingual Grapheme-to-Phoneme Conversion, https://aclanthology.org/2021.sigmorphon-1.15.pdf (gtp1) * Ryskina et al., Comparative Error Analysis in Neural and Finite-state Models for Unsupervised Character-level Transduction, https://aclanthology.org/2021.sigmorphon-1.22.pdf (gtp2) * Sharma et al., Improved pronunciation prediction accuracy using morphology, https://aclanthology.org/2021.sigmorphon-1.24.pdf (gtp3) ## Multilingual syntax * Deng & Xue, Translation Divergences in Chinese–English Machine Translation: An Empirical Investigation, https://aclanthology.org/J17-3002.pdf (mls1) * Nikolaev et. al, Fine-Grained Analysis of Cross-Linguistic Syntactic Divergences, https://aclanthology.org/2020.acl-main.109.pdf (mls2) ## Analysis of multilingual models * Chi et al., Finding Universal Grammatical Relations in Multilingual BERT, https://aclanthology.org/2020.acl-main.493.pdf (mlm1) * Choenni & Shutova, Investigating Language Relationships in Multilingual Sentence Encoders Through the Lens of Linguistic Typology, https://direct.mit.edu/coli/article/48/3/635/110573 (mlm2) ## NLP meets linguistic typology ### Language typology as predictor of model performance * Cotterell et al., Are All Languages Equally Hard to Language-Model?, https://aclanthology.org/N18-2085v1.pdf (perf1) * Mielke et al., What Kind of Language Is Hard to Language-Model?, https://aclanthology.org/P19-1491/ (perf2) * Park et al., Morphology Matters: A Multilingual Language Modeling Analysis, https://aclanthology.org/2021.tacl-1.16/ (perf3) * Ravfogel et al., Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages, https://arxiv.org/pdf/1903.06400.pdf (perf4) * Gerz et al., On the Relation between Linguistic Typology and (Limitations of) Multilingual Language Modeling, https://aclanthology.org/D18-1029.pdf (perf5) ### Predicting/using typological features * Gonzalez-Dominguez et al., Automatic Language Identification using Long Short-Term Memory Recurrent Neural Networks, https://storage.googleapis.com/pub-tools-public-publication-data/pdf/42540.pdf + Gutkin et al., Predicting the Features of World Atlas of Language Structures from Speech, https://storage.googleapis.com/pub-tools-public-publication-data/pdf/7c28e5e5a21718a6c692c7d999671dab947aedc1.pdf (nltp1) * Tsvetkov et al., Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning, https://arxiv.org/pdf/1605.03832.pdf (nltp2) * Qian et al., Investigating Language Universal and Specific Properties in Word Embeddings, https://aclanthology.org/P16-1140.pdf (nltp3) ### Politeness systems * Srinivasan & Choi, TyDiP: A Dataset for Politeness Classification in Nine Typologically Diverse Languages, https://arxiv.org/abs/2211.16496 (nltp4) ## Language embeddings * Östling & Tiedemann, Continuous multilinguality with language vectors, https://aclanthology.org/E17-2102.pdf + Malaviya et al., Learning Language Representations for Typology Prediction, https://aclanthology.org/D17-1268.pdf (le1) * Bjerva & Augenstein, From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings, https://aclanthology.org/N18-1083.pdf (le2) ## Linguistic complexity * Cotterell et al., On the Complexity and Typology of Inflectional Morphological Systems, https://aclanthology.org/Q19-1021.pdf (lc1) * Pimentel et al., Phonotactic Complexity and Its Trade-offs, https://aclanthology.org/2020.tacl-1.1/ (lc2) ## Computational linguistic typology * Daumé III, Non-Parametric Bayesian Areal Linguistics, https://aclanthology.org/N09-1067.pdf (clt1) * Georgi et al., Comparing Language Similarity across Genetic and Typologically-Based Groupings, https://aclanthology.org/C10-1044.pdf (clt2) * Cotterell & Eisner, Probabilistic Typology: Deep Generative Models of Vowel Inventories, https://arxiv.org/pdf/1705.01684.pdf (clt3) * Bjerva et al., A Probabilistic Generative Model of Linguistic Typology, https://arxiv.org/pdf/1903.10950.pdf (clt4) * Abdou et al., Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color, https://arxiv.org/pdf/2109.06129.pdf (clt5) ## Computational historical-comparative linguistics * Meloni et al., Ab Antiquo: Neural Proto-language Reconstruction, https://aclanthology.org/2021.naacl-main.353/ (hc1)