Forward Algorithm implement in Python

--- tags: HMM, python, Bioinformatics --- # Forward Algorithm implement in Python >本篇只會介紹程式碼，原理與更多解釋請參考[這裡](https://hackmd.io/@XibGbVrGSw6szTtbuQpNBw/HJdmQtqmj) ```python= def Forward(O, a, b, pi, isBW=False): # init parameters # scaled alpha (hat alpha) scaled_alpha = np.zeros((O.shape[0], a.shape[0])) alpha = np.zeros(a.shape[0]) # alpha C = np.zeros(O.shape[0]) # scaling factor C[0] = np.sum(pi * b[:, O[0]]) scaled_alpha[0, :] = (pi * b[:, O[0]]) / C[0] # init scaled alpha P = 0 # probability of the observation # compute alpha, t: time t for t in range(1, O.shape[0]): for j in range(a.shape[0]): alpha[j] = scaled_alpha[t - 1] @ a[:, j] * b[j, O[t]] C[t] += alpha[j] C[t] = 1 / C[t] P += math.log2(C[t]) scaled_alpha[t, :] = alpha * C[t] # update alpha P *= -1 ## if is called by Baum-Welch, return scaled_alpha ## if is called by other function, return P if isBW: return scaled_alpha else: return P ``` 讓我們一行一行來分析: ## 變數宣告與初始化首先是宣告各種需要的參數並執行初始化。 ```python= scaled_alpha = np.zeros((O.shape[0], a.shape[0])) ``` 我們要計算的終極目標，為了解決underflow而產生的scaled_alpha，大小是row:狀態數量、column:序列長度的矩陣，簡單來說看成我們介紹alpha時畫的表格就可以了。 >但為了後續計算方便我們把alpha的表格翻轉，他現在是直的。 ```python= alpha = np.zeros(a.shape[0]) # alpha ``` 之後要用來計算的scaled_alpha的alpha，因為每次用完就丟掉所以不用跟scaled_alpha一樣弄一個二維矩陣來存。 ```python= C = np.zeros(O.shape[0]) # scaling factor C[0] = np.sum(pi * b[:, O[0]]) ``` scaling factor，大小和O的長度一樣，初始值就和公式相同(初始alpha值總和)。 ```python= scaled_alpha[0, :] = (pi * b[:, O[0]]) / C[0] # init scaled alpha P = 0 # probability of the observation ``` 最後用初始值的C初始化scaled_alpha；P是用存scaled_alpha總和的變數。 ## 迴圈計算alpha ```python= for t in range(1, O.shape[0]): for j in range(a.shape[0]): alpha[j] = scaled_alpha[t - 1] @ a[:, j] * b[j, O[t]] C[t] += alpha[j] C[t] = 1 / C[t] P += math.log2(C[t]) scaled_alpha[t, :] = alpha * C[t] # update alpha ``` 迴圈長度就是alpha表格大小。套用公式先計算alpah再算C，求出全部的C後再計算新的scaled_alpha。(@代表矩陣內積) ## 計算結果 ```python= P *= -1 ``` 把剛剛在迴圈裡偷加的P再乘上一個負號就是我們要的答案。 ```python= if isBW: return scaled_alpha else: return P ``` 如果是Baum-Welch函式呼叫的就回傳scaled_alpha表格，因為後續計算要用；其他方式呼叫就直接回傳P就好。