NLP Progress - Documenting SMC's efforts
===
Swathanthra Malayalam Computing is a group of people working on accesibility technologies including language computing.
NLP-Progress.com is a community driven website used to showcase NLP technologies in different langugages including dataset, code and and link to peer reviewed research papers.
Update this document to reflect Malayalam language computing community's efforts to showcase on NLP-Progress.com
## About Malayalam and its complexities
Malayalam is a language spoken in India, pre dominantly in the state of Kerala with about 38 million speakers. Malayalam is a heavily agglutinated and in- flected language[1]. The words are formed by the morphological processes involving
a. Inflection where a word in a lexical category undergoes inflection by attaching suffixes to it, generating a new word in the same category
b. Derivation where a word belonging to a category becomes another category byattaching a suffix,
c. Compounding where a new word is formed by combining two or more nouns, noun and adjective, adjective and noun, verb and noun, or adverb and verb.
### 1. mlmorph
Malayalam morphological analyser is implemented using Stuttgart Finite State Transducer(SFST) formalism and uses Helsinki Finite-State Technology(HFST) as Toolkit. Evaluations show that it is fast and effective to address the morphological and phonological nature of Malayalam. Applications like spellchecker, named entity
recognition, number spell out parser and generator are also built on top of Mlmorph
### Spell checker
### Malayalam Named Entity Recognition
Named entity recognition (NER) is the task of tagging entities in text with their corresponding type. Approaches typically use BIO notation, which differentiates the beginning (B) and the inside (I) of entities. O is used for non-entity tokens.
### corpora
### number spellout
### ocr
### tts
dhvani is a text to speech system designed for Indian Languages. The aim of this project is to ensure that literacy and knowledge of English are not essential for using a Computer.
[1]: Asher, Ronald E. 2013. Malayalam. Routledge.