DSP HW3

Author: B04705003 林子雋

In the following section,

W

denotes a random variable of a word, while

q

denotes a state.
Moverover,

W_{1 : t}

means a tuple of

(W_{1}, . . ., W_{t})

. If you have ever write python or matlab, you would be familar to this notation.
For example,

W_{1}

denotes the first word and

W_{1} = q_{1}

means the first word is

q_{1}

this word(ex.

q_{1} = speech

q_{2} = hello

…)

To derive bigram part of Viterbi algorithm, define:

δ_{t} (q_{i}) = max_{W_{1 : t - 1}} P (W_{1}, . . ., W_{t - 1}, W_{t} = q_{i})

where:

δ_{t} (q_{i}) = max_{W_{1 : t - 1}} P (q_{i} | W_{t - 1}) P (W_{1 : t - 1}) = max_{q_{j}} P (q_{i} | q_{j}) δ_{t - 1} (q_{j})

For

t = 1

, initialize first timestep like:

δ_{1} (q_{i}) = P (W_{1} = q_{i})

To derive trigram, define:

δ_{t} (q_{i}, q_{j}) = max_{W_{1 : t - 2}} P (W_{1}, . . ., W_{t - 1} = q_{j}, W_{t} = q_{i})

Then, by same method, we can get:

\begin{aligned} δ_{t} (q_{i}, q_{j}) & = max_{W_{1 : t - 2}} P (q_{i} | q_{j}, W_{t - 2}) P (W_{1}, . . ., W_{t - 2}, W_{t - 1} = q_{j}) (by conditional probability decomposition) \\ = max_{q_{k}} max_{W_{1 : t - 3}} P (q_{i} | q_{j}, q_{k}) P (W_{1}, . . ., W_{t - 3}, W_{t - 2} = q_{k}, W_{t - 1} = q_{j}) (maximize one variable first) \\ = max_{q_{k}} P (q_{i} | q_{j}, q_{k}) max_{W_{1 : t - 3}} P (W_{1}, . . ., W_{t - 3}, W_{t - 2} = q_{k}, W_{t - 1} = q_{j}) (take the independent term out) \\ = max_{q_{k}} P (q_{i} | q_{j}, q_{k}) δ_{t - 1} (q_{j}, q_{k}) \end{aligned}

For

t = 1

and

t = 2

, initialize first timestep like:

δ_{1} (q_{i}, q_{j}) = P (W_{1} = q_{i}) δ_{2} (q_{i}, q_{j}) = max_{q_{k}} P (q_{i} | q_{j}) δ_{1} (q_{j}, q_{k})