Try   HackMD

OmegaT note: dictionary

Summary

  1. Dictionary is still important for translator
  2. There are several known dictionary file formats
  3. OmegaT supports some popular dictionary formats
  4. There are a few online resources of dictionaries
  5. There are some online services to look up words
  6. OmegaT has a feature to look up dictionary but there are some places to improve

Background

Dictionary for translation professionals

When you act as professional translator, you may use several type of dictionary daily.

  • Hard cover/Paper cover dictionary books
  • electronic handheld translators
  • translation software
  • smart phone applications(iPhone, Android)

A dictionary is important for translator as same as law libraries for laywers and medical references for doctors.
I've discussed a current situation of dictionary data and a current situation how OmegaT handles dictionaries, its issues and solutions.

Does a good translator need a dictionary?

The answer is certainly yes. It is because OmegaT has a feature
to look up dictionary terms and display automatically.

Even though current AI translator produce better candidate translation
for users, professional translator post-edit it with deep knowledges and
good references.

reference:

https://www.tomedes.com/translator-hub/does-good-translator-need-dictionary.php

Data format for dictionary

There are many known dictionary formats. Many of them are considered as proprietary data format. Some formats are known as OPEN data format, that is defined openly and allow freely accesible to specification.

EPWING is a Japanese national standard dictionary/ebook format.
DictZip is a compression specification, not dictionary standard, but
it is Internet Standard.

Other formats are defined as open proprietary specification.

XDXF is a specification to intend to allow open dictionary data exchange
with open specification.

OmegaT can use several format types of dictionary data.

Standard support

  • ABBY Lingvo DSL format(.dsl)
  • StartDict format(.ifo + .dict + .idx)
  • DictZip compression for above two formats(.dsl.dz, .dict.dz)

Supported by plugins

  • EPWING and compressed EPWING format
  • MDict format(*.mdx)
  • PDIC/Unicode format(*.dic)

Known but not supported

  • Babylon BGL (*.bgl)
  • Apple dictionary
  • XDXF (.xdxf)
  • Lingoes (.ld2)

Online dictionary resources

LingvoDSL dictionary

Dictionary Specification Language is a widely used format, which was introduced by ABBYY Lingvo. DSL format is open, but not well described.

DSL file is a text file that you can write it on Notepad application on MS Windows, by saving it with Unicode(UTF-16LE) encoding. It is why OmegaT supports DSL format as standard supported format.

StarDict dictionary

StarDict project web site share a link of dictionary resources.

http://stardict.net/stardict/

StarDict provide stardict-tools convert other format data into stardict format.

mdict

  • MDict web site provide a link of dictionary resources.

https://www.mdict.cn/wp/?page_id=5325&lang=en

There are several free dictionaries such as EDICT and wordnet.

Online dictionaries service

There are many online service to look up words

Web API for dictionary

Some welknown dictionary publisher provide Web API.

There is an effort to support it in OmegaT but not merged yet.

OmegaT dictionary feature

OmegaT supports some types of dictionary as standard formats, and there are several formats supported by plugins.
It has an option to query every words on source text segment automatically, that is default on. The GUI has a dictionary pane that display every terms and its articles.

It also have an option to query words with stemming and predictive search, that is default off.

OmegaT performance to query and display terms

A number of terms can be growing tremendously when using multiple professional dictionary files.

If some segment has 40 words in sentence, and use monolingual dictionary and multilingual dictionary which has 3 entries for each word, then 40 x 2 x 3 = 200 articles are displayed. OmegaT will display over 20,000 words when each articles has over 100 words. It will cause performance issue on GUI component to display formatted articles.