# A (very) brief introduction to Natural Language Processing **NLP** Computational reading, deciphering and "understanding" of human languages through machine learning > Natural Language Processing (NLP) covers a broad range of techniques that apply computational analytical methods to textual content, which provide means of categorizing and quantifying text. ([Saldaña 2018](https://programminghistorian.org/en/lessons/sentiment-analysis)) **Tokenization:** splitting text into meaningful elements (Arnold and Tilton 2015) **Lemmatization:** "using the "dictionary form" of words, versus whatever inflected form is actually present in the text" Quinn Dombrowski https://github.com/multilingual-dh/nlp-resources **Named entity recognition (NER):** determining parts in a text or a corpus that can be associated with and categorized into predefined groups (e.g. people, places, organizations) **Sentiment analysis:** using natural language processing, text analysis, computational linguistics, to identify, extract, quantify, and study affective states and subjective information. ### What are NLP packages and application useful for? • Keyword extraction • Named entity recognition • Clustering documents in a corpus • Comparing document(s) to a reference corpus • Topic modelling • Sentiment / opinion analysis ### NLP packages and applications #### Stanford NPL / Named Entity Recognizer https://nlp.stanford.edu/software/CRF-NER.html ![](https://i.imgur.com/vzhm6cT.png) **Task: **try the Stanford Named Entity Recognizer online: http://corenlp.run/ * Open the "Once_Upon_a_Time_in_America" plain text file from the "Wikipedia Movie Summaries" dataset in a text editor * copy the text, paste it into NER's text box * click "Submit" * scroll down on your browser to identify parts of speech and named entities (e.g. people, places). Does it work? #### Wordseer Wordseer is a Java-based text analysis environment that combined NLP and visualization. https://wordseer.berkeley.edu/ ![](https://i.imgur.com/lP2TNrG.png) #### Other NLP packages * The Classical Language Toolkit http://cltk.org/ * OpeNER, Open Source Named Entity Recognition: https://www.opener-project.eu/ ### Multilingual NLP * Multilingual NLP Resources by Quinn Dombrowski: https://github.com/multilingual-dh/nlp-resources * NLP Resources for Nigerian Languages https://orikiwa.wordpress.com/nlp-resources-for-nigerian-languages/ * A Vietnamese NLP toolkit: https://github.com/vncorenlp/VnCoreNLP