A (very) brief introduction to Natural Language Processing

NLP
Computational reading, deciphering and "understanding" of human languages through machine learning

Natural Language Processing (NLP) covers a broad range of techniques that apply computational analytical methods to textual content, which provide means of categorizing and quantifying text.

(Saldaña 2018)

Tokenization:
splitting text into meaningful elements
(Arnold and Tilton 2015)

Lemmatization:
"using the "dictionary form" of words, versus whatever inflected form is actually present in the text"
Quinn Dombrowski
https://github.com/multilingual-dh/nlp-resources

Named entity recognition (NER):
determining parts in a text or a corpus that can be associated with and categorized into predefined groups (e.g. people, places, organizations)

Sentiment analysis:
using natural language processing, text analysis, computational linguistics, to identify, extract, quantify, and study affective states and subjective information.

What are NLP packages and application useful for?

• Keyword extraction
• Named entity recognition
• Clustering documents in a corpus
• Comparing document(s) to a reference corpus
• Topic modelling
• Sentiment / opinion analysis

NLP packages and applications

Stanford NPL / Named Entity Recognizer

https://nlp.stanford.edu/software/CRF-NER.html

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

**Task: **try the Stanford Named Entity Recognizer online:
http://corenlp.run/

Open the "Once_Upon_a_Time_in_America" plain text file from the "Wikipedia Movie Summaries" dataset in a text editor
copy the text, paste it into NER's text box
click "Submit"
scroll down on your browser to identify parts of speech and named entities (e.g. people, places). Does it work?

Wordseer

Wordseer is a Java-based text analysis environment that combined NLP and visualization.
https://wordseer.berkeley.edu/

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Other NLP packages

The Classical Language Toolkit
http://cltk.org/
OpeNER, Open Source Named Entity Recognition:
https://www.opener-project.eu/

Multilingual NLP

Multilingual NLP Resources by Quinn Dombrowski:
https://github.com/multilingual-dh/nlp-resources
NLP Resources for Nigerian Languages
https://orikiwa.wordpress.com/nlp-resources-for-nigerian-languages/
A Vietnamese NLP toolkit:
https://github.com/vncorenlp/VnCoreNLP

A (very) brief introduction to Natural Language Processing

What are NLP packages and application useful for?

NLP packages and applications

Stanford NPL / Named Entity Recognizer

Wordseer

Other NLP packages

Multilingual NLP

Read more

<font style="color:#2454FF">Designing Syllabi in the age of AI</font>

Untitled

Acknowledgements

Text (as) data: digital tools and methods for text analysis