dimentionality reduction: we usually want to reduce in txtanlss.
how to choose raw and columns determines evrthng.
whether these words discriminate texts or just noises: if latter, remove them
most study unigrams
1. get texts and import to R
2. convert them into a corpus
3. get info and analys
types: unique words
```R
corpus_subset(name,key)
tokens(name,function)