ML project note
Instruction
- no ML package

- code + report
- instructions on how to run code
- submit system's output (same format as training set)
Summary
main goal:
- sequence labelling model for informal texts using Hidden Markov Model (HMM)
- build two sentiment analysis systems for one different language from scratch, using our notations (?)
- also using annotations from others (?)
En.zip contains:
- train: labelled training set
dev.in
: unlabelled development set
- dev.out:
dev.in
but with label
labels:
- O: outside of any entity
- B-{sentiment}, I-{sentiment}: Beginning and Inside sentimental entites
–> sentiment can be "positive", "negative" and "neutral"
–> what is "B-NP" then?
Refs
HMM
stochastic process is a collection of random variable indexed by mathematical sets
e.g.
states S = {hot, cold}
States series over a time –>
weather for 4 days can be a seq –> {z1=hot, z2=cold…}
assumption
- limited horizon assumption
Probability of state being on time T only depends on state on time T-1
- Stationary Process assumption
conditional prob does not change over time, i.e.
Maximum Likelihood Estimation
Theory
MLE is a method to estimate the parameters of a distribution based
first define problem, we have:
- distribution
- samples S = ( ,… )
- parameter space: range of possible values for
- Bernouli: (0, 1)
- Gaussian:
We do not know actual , so we want to estimate it using S
the likelihood defined as
For Bernouli, it is defined as:
For bernouli, calculate log likelihood:
derivation
1/T S(x) - 1/(1-T) S(1-x) = 0 = (1-T) S(x) - T S(1-x)
For HMM, we can use Expectation Maximization (EM), which use iterative process to perform MLE in statistical models with latent variables