A list of papers that I found interesting while exploring the task of tackling machine translation in low-resource settings, in descending order of the year published. [Google Slides]
2021
A Comparison of Different NMT Approaches to Low-Resource Dutch-Albanian Machine Translation [arXiv]
Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data [arXiv][Notes]
Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages [arXiv]
IndicBART: A Pre-trained Model for Natural Language Generation of Indic Languages [arXiv][Notes]
YANMTT: Yet Another Neural Machine Translation Toolkit [arXiv]
Itihāsa: A large-scale corpus for Sanskrit to English translation [arXiv][Notes]
AugVic: Exploiting BiText Vicinity for Low-Resource NMT [arXiv]
Unsupervised Translation of German–Lower Sorbian: Exploring Training and Novel Transfer Methods on a Low-Resource Language [arXiv]
Optimal Word Segmentation for Neural Machine Translation into Dravidian Languages [aclweb][Notes]
Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation [aclweb]
MuRIL: Multilingual Representations for Indian Languages [arXiv]
2020
Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation, Siddhant et al. [arXiv]
AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages [arXiv]
IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages [aclweb]
Semi-Supervised Low-Resource Style Transfer of Indonesian Informal to Formal Language with Iterative Forward-Translation [arXiv]
2019
The Missing Ingredient in Zero-Shot Neural Machine Translation, Arivazhagan et al. [arXiv]
Sanskrit Sandhi Splitting using seq2(seq)²[arXiv][Notes]