or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Syncing
xxxxxxxxxx
Week 9
In this lesson we learn how to preprocess text-based data and train deep learning models on that data.
Objectives
After completing this week, you should be able to:
Readings
Weekly Resources
Assignment 9
9.1
In the first part of the assignment, you will implement basic text-preprocessing functions in Python. These functions do not need to scale to large text documents and will only need to handle small inputs.
a.
Create a
tokenize
function that splits a sentence into words. Ensure that your tokenizer removes basic punctuation.b.
Implement an
ngram
function that splits tokens into N-grams.c.
Implement an
one_hot_encode
function to create a vector from a numerical vector from a list of tokens.9.2
Using listings 6.16, 6.17, and 6.18 in Deep Learning with Python as a guide, train a sequential model with embeddings on the IMDB data found in
data/external/imdb/
. Save the model performance metrics and training and validation accuracy curves in thedsc650/assignments/assignment9/results/model_1
directory.9.3
Using listing 6.27 in Deep Learning with Python as a guide, fit the same data with an LSTM layer. Save the model performance metrics and training and validation accuracy curves in the
dsc650/assignments/assignment9/results/model_2
directory.9.4
Using listing 6.46 in Deep Learning with Python as a guide, fit the same data with a simple 1D convnet. Save the model performance metrics and training and validation accuracy curves in the
dsc650/assignments/assignment09/results/model_3
directory.Submission Instructions
For this assignment, you will submit a zip archive containing the contents of the
dsc650/assignments/assignment09/
directory. Use the naming convention ofassignment09_LastnameFirstname.zip
for the zip archive. You can create this archive in Bash (or a similar Unix shell) using the following commands.Likewise, you can create a zip archive using Windows PowerShell with the following command.
Discussion Board
For this discussion, pick one of the following topics and write a 250 to 750-word discussion board post. Use the DSC 650 Slack channel for discussion and replies. For grading purposes, copy and paste your initial post and at least two replies to the Blackboard discussion board.
Topic 1
Compare and contrast using MapReduce, Spark, and Deep Learning Frameworks (e.g. TensorFlow) for performing text preprocessing and building text-based models. Are there use cases where it makes sense to use one over another?
Topic 2
How might you combine stream processing such as Spark's stream processing framework with deep learning models? Provide use cases that are relevant to your professional or personal interests.