DSP HW3

Author: B04705003 林子雋

Notation

In the following section,

W denotes a random variable of a word, while
q
denotes a state.
Moverover,
W1:t
means a tuple of
(W1,...,Wt)
. If you have ever write python or matlab, you would be familar to this notation.
For example,
W1
denotes the first word and
W1=q1
means the first word is
q1
this word(ex.
q1=speech
,
q2=hello
)

Bigram Derivation of Viterbi

To derive bigram part of Viterbi algorithm, define:

δt(qi)=maxW1:t1P(W1,...,Wt1,Wt=qi)
where:
δt(qi)=maxW1:t1P(qi|Wt1)P(W1:t1)=maxqjP(qi|qj)δt1(qj)

For

t=1, initialize first timestep like:
δ1(qi)=P(W1=qi)

Trigram Derivation of Viterbi

To derive trigram, define:

δt(qi,qj)=maxW1:t2P(W1,...,Wt1=qj,Wt=qi)

Then, by same method, we can get:

δt(qi,qj)=maxW1:t2P(qi|qj,Wt2)P(W1,...,Wt2,Wt1=qj) (by conditional probability decomposition)=maxqkmaxW1:t3P(qi|qj,qk)P(W1,...,Wt3,Wt2=qk,Wt1=qj) (maximize one variable first)=maxqkP(qi|qj,qk)maxW1:t3P(W1,...,Wt3,Wt2=qk,Wt1=qj) (take the independent term out)=maxqkP(qi|qj,qk)δt1(qj,qk)

For

t=1 and
t=2
, initialize first timestep like:
δ1(qi,qj)=P(W1=qi)δ2(qi,qj)=maxqkP(qi|qj)δ1(qj,qk)