# OmegaT note: dictionary ## Summary 1. Dictionary is still important for translator 2. There are several known dictionary file formats 3. OmegaT supports some popular dictionary formats 4. There are a few online resources of dictionaries 5. There are some online services to look up words 6. OmegaT has a feature to look up dictionary but there are some places to improve ## Background ### Dictionary for translation professionals When you act as professional translator, you may use several type of dictionary daily. - Hard cover/Paper cover dictionary books - electronic handheld translators - translation software - smart phone applications(iPhone, Android) A dictionary is important for translator as same as law libraries for laywers and medical references for doctors. I've discussed a current situation of dictionary data and a current situation how OmegaT handles dictionaries, its issues and solutions. ### Does a good translator need a dictionary? The answer is certainly yes. It is because OmegaT has a feature to look up dictionary terms and display automatically. Even though current AI translator produce better candidate translation for users, professional translator post-edit it with deep knowledges and good references. reference: https://www.tomedes.com/translator-hub/does-good-translator-need-dictionary.php ### Data format for dictionary There are many known dictionary formats. Many of them are considered as proprietary data format. Some formats are known as OPEN data format, that is defined openly and allow freely accesible to specification. EPWING is a Japanese national standard dictionary/ebook format. DictZip is a compression specification, not dictionary standard, but it is Internet Standard. Other formats are defined as open proprietary specification. XDXF is a specification to intend to allow open dictionary data exchange with open specification. OmegaT can use several format types of dictionary data. Standard support - ABBY Lingvo DSL format(.dsl) - StartDict format(.ifo + .dict + .idx) - DictZip compression for above two formats(.dsl.dz, .dict.dz) Supported by plugins - EPWING and compressed EPWING format - MDict format(*.mdx) - PDIC/Unicode format(*.dic) Known but not supported - Babylon BGL (*.bgl) - Apple dictionary - XDXF (.xdxf) - Lingoes (.ld2) ### Online dictionary resources #### LingvoDSL dictionary Dictionary Specification Language is a widely used format, which was introduced by ABBYY Lingvo. DSL format is open, but not well described. DSL file is a text file that you can write it on Notepad application on MS Windows, by saving it with Unicode(UTF-16LE) encoding. It is why OmegaT supports DSL format as standard supported format. #### StarDict dictionary StarDict project web site share a link of dictionary resources. http://stardict.net/stardict/ StarDict provide stardict-tools convert other format data into stardict format. #### mdict - MDict web site provide a link of dictionary resources. https://www.mdict.cn/wp/?page_id=5325&lang=en There are several free dictionaries such as EDICT and wordnet. ### Online dictionaries service There are many online service to look up words - Longman Dictionary of Contemporary English Online https://www.ldoceonline.com/ - ABBYY Lingvo Live - Lingvo and Collins dictionaries https://www.lingvolive.com/en-us - Merriam-Webster dictionary https://www.merriam-webster.com/ - MULTITRAN service https://www.multitran.com/ ### Web API for dictionary Some welknown dictionary publisher provide Web API. - Oxford Dictionaries API https://developer.oxforddictionaries.com/ - Merriam-Webster dictionary API https://dictionaryapi.com/ There is an effort to support it in OmegaT but not merged yet. ## OmegaT dictionary feature OmegaT supports some types of dictionary as standard formats, and there are several formats supported by plugins. It has an option to query every words on source text segment automatically, that is default on. The GUI has a dictionary pane that display every terms and its articles. It also have an option to query words with stemming and predictive search, that is default off. ## OmegaT performance to query and display terms A number of terms can be growing tremendously when using multiple professional dictionary files. If some segment has 40 words in sentence, and use monolingual dictionary and multilingual dictionary which has 3 entries for each word, then 40 x 2 x 3 = 200 articles are displayed. OmegaT will display over 20,000 words when each articles has over 100 words. It will cause performance issue on GUI component to display formatted articles.