NLP
Computational reading, deciphering and "understanding" of human languages through machine learning
Natural Language Processing (NLP) covers a broad range of techniques that apply computational analytical methods to textual content, which provide means of categorizing and quantifying text.
Tokenization:
splitting text into meaningful elements
(Arnold and Tilton 2015)
Lemmatization:
"using the "dictionary form" of words, versus whatever inflected form is actually present in the text"
Quinn Dombrowski
https://github.com/multilingual-dh/nlp-resources
Named entity recognition (NER):
determining parts in a text or a corpus that can be associated with and categorized into predefined groups (e.g. people, places, organizations)
Sentiment analysis:
using natural language processing, text analysis, computational linguistics, to identify, extract, quantify, and study affective states and subjective information.
• Keyword extraction
• Named entity recognition
• Clustering documents in a corpus
• Comparing document(s) to a reference corpus
• Topic modelling
• Sentiment / opinion analysis
https://nlp.stanford.edu/software/CRF-NER.html
**Task: **try the Stanford Named Entity Recognizer online:
http://corenlp.run/
Wordseer is a Java-based text analysis environment that combined NLP and visualization.
https://wordseer.berkeley.edu/