# A (very) brief introduction to Natural Language Processing
**NLP**
Computational reading, deciphering and "understanding" of human languages through machine learning
> Natural Language Processing (NLP) covers a broad range of techniques that apply computational analytical methods to textual content, which provide means of categorizing and quantifying text.
([Saldaña 2018](https://programminghistorian.org/en/lessons/sentiment-analysis))
**Tokenization:**
splitting text into meaningful elements
(Arnold and Tilton 2015)
**Lemmatization:**
"using the "dictionary form" of words, versus whatever inflected form is actually present in the text"
Quinn Dombrowski
https://github.com/multilingual-dh/nlp-resources
**Named entity recognition (NER):**
determining parts in a text or a corpus that can be associated with and categorized into predefined groups (e.g. people, places, organizations)
**Sentiment analysis:**
using natural language processing, text analysis, computational linguistics, to identify, extract, quantify, and study affective states and subjective information.
### What are NLP packages and application useful for?
• Keyword extraction
• Named entity recognition
• Clustering documents in a corpus
• Comparing document(s) to a reference corpus
• Topic modelling
• Sentiment / opinion analysis
### NLP packages and applications
#### Stanford NPL / Named Entity Recognizer
https://nlp.stanford.edu/software/CRF-NER.html

**Task: **try the Stanford Named Entity Recognizer online:
http://corenlp.run/
* Open the "Once_Upon_a_Time_in_America" plain text file from the "Wikipedia Movie Summaries" dataset in a text editor
* copy the text, paste it into NER's text box
* click "Submit"
* scroll down on your browser to identify parts of speech and named entities (e.g. people, places). Does it work?
#### Wordseer
Wordseer is a Java-based text analysis environment that combined NLP and visualization.
https://wordseer.berkeley.edu/

#### Other NLP packages
* The Classical Language Toolkit
http://cltk.org/
* OpeNER, Open Source Named Entity Recognition:
https://www.opener-project.eu/
### Multilingual NLP
* Multilingual NLP Resources by Quinn Dombrowski:
https://github.com/multilingual-dh/nlp-resources
* NLP Resources for Nigerian Languages
https://orikiwa.wordpress.com/nlp-resources-for-nigerian-languages/
* A Vietnamese NLP toolkit:
https://github.com/vncorenlp/VnCoreNLP