dimentionality reduction: we usually want to reduce in txtanlss. how to choose raw and columns determines evrthng. whether these words discriminate texts or just noises: if latter, remove them most study unigrams 1. get texts and import to R 2. convert them into a corpus 3. get info and analys types: unique words ```R corpus_subset(name,key) tokens(name,function)