Try   HackMD

Let's try Neural Machine Traslation

Recent days, Neural Machine Translation (NMT) is more and more improved, and catch eyes of professional translators.

DeepL is famous one that utilize NMT with massive training. Academic research efforts makes tremendous progress in the area.

Fortunately, there are several OpenSource NMT engines, we can try it.

OpenSource NMT framework projects

OpenNMT-py

NMT engine based on PyTorch, machine learning library supports GPU. OpenNMT was started in 2016 by the Harvard NLP group and SYSTRAN.

https://opennmt.net/

ModernMT

ModernMT is a context-aware, incremental and distributed general purpose Neural Machine Translation technology based on Fairseq Transformer model.

OpenSource interference engine

Argos Translate

Argos translate is Open source neural machine translation package, that provide Python library, Web app and Deskotp App. LibreTranslate is a web app part of Argos Translate. Argos Trasnlate provide Train and package custom language models.

Argos Translate uses OpenNMT for translations, SentencePiece for tokenization, Stanza for sentence boundary detection, and PyQt for GUI. LibreTranslate is an API and web-app built on top of Argos Translate.

Argos-translate provide following language pairs

Install library

$ pip install argostranslate

Install GUI

$ pip install argostranslategui

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

CTranslate2

We can use CTranslate2, C++ and Python library for efficient inference with Transformer models.

We can make model by self training OpenNMT or can download trained models. Then, we convert models and run interference/translation with CTranslate2.

  • Supports CPU and GPU
  • Light weight and high performance.
  • Use MKL library for CPU execution
  • Use cuBLAS, Thrust for GPU

OpenSource translation web server software

LibreTranslate

LibreTranslate is a python project to realize translation web service based on Argos Translate.

It is very easy to install and run LibreTranslate.

$ pip install libretranslate
$ libretranslate [<options>]

Then you can access web user interface at http://localhost:5000/

LibreTrasnlate is licensed under AGPL-3.0. You can reach LibreTranslate project at github https://github.com/LibreTranslate/LibreTranslate

Web services

ModernMT.com

Translated provide web service based on ModernMT library and their traning models.

Trados and MemoQ supports the API.

Here is a MemoQ description; https://www.memoq.com/integrations/machine-translation/modernmt

LibreTranslate.com

LibreTranslate.com is a web service based on LibreTranslate OSS by Argos Open Technologies, LLC.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Data set

The Pile Dataset

https://pile.eleuther.ai/

The Pile is a 825 GiB diverse, open source language modelling data set that consists of 22 smaller, high-quality datasets combined together.

An open source parralel corpus (OPUS)

https://opus.nlpl.eu/

OPUS is a growing collection of translated texts from the web.

OPUS-MT

OPUS is not only data set, also has an API and tool projects. OPUS-MT and OPUS-CAT are the project OPUS lead.

OPUS-CAT

https://github.com/Helsinki-NLP/OPUS-CAT

There is an OmegaT plugin available for the development version of the MT engine. It communicate with OPUS-MT running on same machine. OPUS-CAT is licensed under MIT-license, but the plugin is GPL3, because the plugin uses OmegaT source then GPL is win.

KFTT

http://www.phontron.com/kftt/index-ja.html

  • Kyoto Free Translation Task
  • CC-BY-SA-3.0

Reference

NMT

Transformers