NLP CPSC-298 Planning
(Dan Haub and Alexander Kurz … previous notes and links)
Dan:
- Rehearsal Schedule (busy during these times)
- 1/13 Thursday
- 1/14 Friday
- 1/
- Free all of 1/3 - 1/7
- Free all of 1/27 - 1/31
aim: keep it hands on, teach the basics of how to do NLP, connect it with programming languages
format: 3x3 mini-projects (intro, start project, finish it), 1h per week in class, 2h per week private study plus 1x6 final project.
one source of ideas: https://web.stanford.edu/class/cs224n/
Ideas for Projects
- Introductory class 2/2
- Rule-Based Models (Assigned 2/9, Due 2/23?)
- Grammatical Framework Project: Multi-Language Math Translator
- General idea is to build a domain specific translation tool in grammatical framework that allows translation between mathmatical notation, English, and another lanugage of your choosing.
- Word2Vec (Assigned 3/2, Due 3/16?)
- Compositional Distributional Semantics?
- Transformer (Assigned 3/30, Due 4/13?)
- Introduction to general purpose NLP
- Group project + presentation (Assigned 4/13, Due 5/18?)
- (For large projects having incremental deliverables is a great way to keep teams accountable. For my database management class we had a project proposal, then a more in depth plan, then finally the final version of the project due a few weeks apart from one another)
(Possible other topics we talked about: Part of speech tagging, Dependency parsing, BERT model)
Lesson Plan Outline
- Part 1: Rule Based Translation Systems (2/2 - 2/16)
- 2/2/22: Syllabus Day/Course Introduction
- Review syllabus/course outline
- Introuduce challenges of machine translation
- give short lecture on rule based translation systems
- Maybe instead of asking students to install stuff, we can just use the GF Cloud grammar maker
- Homework
- Install Grammatical Framwork in whichever env. works best.
- (I'll provide a docker container and instructions for mounting to vscode as well as instructions for installing on local machines)
- 2/9/22: Grammatical Framework Lab
- If needed, troubleshoot installation issues (Hopefully students will be able to ask questions and troubleshoot outside of class time in between the first two classes)
- Go through simple "Hello World" Grammar to get students familiar with GF
- Start with English and German, then move to languages spoken by students, to make things interactive.
- Play around with grammar in GF Shell
- See GF notes below for lab outline
- Introduce Project 1: Mathematics to Language Translation?
- 2/16/22: Project 1 Work/Question session?
- Give students time to work on their grammar in class and ask questions that may come up
- Part 2: Word2Vec (2/23 - 3/9)
- 2/23: Introduce Statistical NLP techniques, then focus in on Word Vector Encodings
- 10 mins. intro
- 10-20 mins. corpra/web scraping
- 10 mins. intro to Jupyter
- 20 mins. word embeddings/notebook
- Homework: Look into text corpi to encode later
- Homework: Set up jupyter notebooks
- 3/2: Lab on Word Vectors (pared down Stanford lab) Assign Project 2 Out
- Homework: Run vector encoding algorithm on chosen corpus
- 3/9: Finish word vector lab, explore uses for word vectors
- Part 3: Neural Translation Systems (3/16 - 4/6)
- 3/16: Project 2 Due
- (Spring Break)
- 3/30: Assign Project 3
- 4/6: Tell students to form groups.
- Part 4: Final Project/Seminars (4/13 - 5/18)
- 4/13: Project 3 Due, Seminar 1
- 4/20: Seminar 2, Friday after, submit project proposal
- 4/27: Seminar 3 (open to students)
- 5/4: Seminar 4 (open to students)
- 5/11: Video Presentations (one per final project group)
Possible Sources and Materials
Previous notes and links available here.
Stanford's Natural Language Processing with Deep Learning.
Steven Bird, Ewan Klein, and Edward Loper: Natural Language Processing with Python, Chapter 2, 2019.
Dive into Deep Learning
Appendix: Course Description
This course will introduce hands-on experience with natural language processing (NLP). It is based around 3 mini projects of 3 weeks and one final project. First, we will learn how to use rule-based natural language processing such as the one provided by Grammatical Framework. Second, we will learn how word embeddings such as Word2Vec or GloVE are used to overcome limitations of the symbolic approach. Third, we will learn how to build transformer based language models such as BERT. Finally, students will apply some of these techniques in a final project.
Appendix: Preliminary Description of the Course
Sections of CPSC298 are taught as a one credit class, one hour per week. They aren't meant to be super in depth but can serve as a good way to test the waters in a certain subject or experiment with unique styles of classes (one that I took was just a series of interviews between Andrew Lion and some industry professionals). Just off the top of my head, I could see a logical flow for the course structure that might be really helpful for students that take the course.
The first 5 weeks could be focused on structured models and feature engineered systems, such as were common before neural NLP. During this time we could introduce Grammatical Framework and have one of the projects be developing a simple domain-specific grammar that could be translated between a few languages. The tutorial on the GF could serve as inspiration for a project like this. We could also talk about CNLs during this time.
The next 10 weeks could then move into neural models, maybe starting out with the general theory and the state of the art followed by specific problems and interesting solutions that have been posed over the last few years. During this time we could talk about part of speech tagging, dependency parsing, machine translation, seq2seq, transformers, etc.
In my experience, getting all the students to read whatever academic papers are recommended/required is a challenge, so I'm not sure about having the class be paper-based. We'll have to discuss more in depth what will be required of students later.
Appendix: General Course Ideas
- Potentially having students get into defined groups where at least one student has already taken programming languages.
- Private slack channel that opens up to whole school when the final project starts
Notes on GF Lab
moved to here
GF Calculator
Notes and assignment