# ML project note
## Instruction
- no ML package :cry:
- code + report
- instructions on how to run code
- submit system's output (same format as training set)
## Summary
main goal:
- sequence labelling model for informal texts using Hidden Markov Model (HMM)
- build two sentiment analysis systems for one different language from scratch, using our notations (?)
- also using annotations from others (?)
En.zip contains:
- train: labelled training set
```
Municipal B-NP
bonds I-NP
are B-VP
generally B-ADVP
a B-ADJP
bit I-ADJP
```
- `dev.in`: unlabelled development set
- dev.out: `dev.in` but with label
```
HBO B-NP
has B-VP
close B-NP
to I-NP
24 I-NP
million I-NP
subscribers I-NP
```
::: info
labels:
- O: outside of any entity
- B-{sentiment}, I-{sentiment}: Beginning and Inside sentimental entites
--> sentiment can be "positive", "negative" and "neutral"
--> what is "B-NP" then?
:::
## Refs
### HMM
stochastic process is a collection of random variable indexed by mathematical sets
e.g.
states S = {hot, cold}
States series over a time --> $z\in S^T$
weather for 4 days can be a seq --> {z1=hot, z2=cold...}
#### assumption
1. limited horizon assumption
Probability of state being on time T only depends on state on time T-1
$$
P(z_t|z_{t-1},z_{t-2},...)=P(z_t|z_{t-1})
$$
2. Stationary Process assumption
conditional prob does not change over time, i.e.
$$
P(z_t|z_{t-1})=P(z_2|z_1),t\in{2,...,T}
$$
#### Maximum Likelihood Estimation
> [Theory](https://towardsdatascience.com/the-path-from-maximum-likelihood-estimation-to-hidden-markov-models-61aba5ba901c)
MLE is a method to estimate the parameters of a distribution based
first define problem, we have:
- distribution $D_\theta$
- samples S = ( $x_1$,...$x_n$ )
- parameter space: range of possible values for $D_\theta$
- Bernouli: (0, 1)
- Gaussian: $R*R$
We do not know actual $\theta$, so we want to estimate it using S
the likelihood defined as
$$
\Pi_{i=1}^NP[X=x_i]
$$
For Bernouli, it is defined as:
$$
\Pi_{i=1}^N\theta^{x_i}(1-\theta)^{1-x_i}
$$
For bernouli, calculate log likelihood:
> [derivation](https://towardsdatascience.com/the-path-from-maximum-likelihood-estimation-to-hidden-markov-models-61aba5ba901c)
$l(\theta';x)=\sum log(\theta^{x_i}*(1-\theta)^{1-x_i})$
1/T S(x) - 1/(1-T) S(1-x) = 0 = (1-T) S(x) - T S(1-x)
For HMM, we can use Expectation Maximization (EM), which use iterative process to perform MLE in statistical models with latent variables