# Week 6 :::info :::spoiler Click to Open TOC [TOC] ::: ## Question :::info :::spoiler Click to Open Question ![](https://i.imgur.com/JTRZ9nY.png) ![](https://i.imgur.com/81ld6Qh.png) ### Check - [x] Q1 - [x] Q2 - [ ] Q3 - [ ] Q4 - [ ] Q5 - [ ] Q6 - [ ] Q7 ::: ## Answer ### Q1 > - [name=Mark] ![](https://i.imgur.com/pTxfxuf.png) #### Example | Length i | 1 | 2 | 3 | 4 | |:-----:|:----:|:----:|:----:|:----:| | Price Pi | 15 | 20 | 33 | 36 | | Limit Li | 2 | 1 | 1 | 1 | Example : length = 4 4 => 36 1,3 => 48 2,1,1 => 50 best 與上周的做法比較: 1,1,1,1 => 60 但此作法會超過$Li$的限制,需改良 ### Q2 > - [name=xian] ![](https://i.imgur.com/MSOEz8G.png) | i | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |:-----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:| | $p_i$ | | 0.04 | 0.06 | 0.08 | 0.02 | 0.10 | 0.12 | 0.14 | | $q_i$ | 0.06 | 0.06 | 0.06 | 0.06 | 0.05 | 0.05 | 0.05 | 0.05 | #### 【解法】 Let `e[i,j]` = the value of optimal solution for $k_i,\cdots ,k_j$,而每一個最佳解的root會有左右子樹,左右子樹也分別有它的最佳解,可以利用DP二維矩陣找出最佳解,並利用例外一個二維矩陣`r[i,j]`來記錄root為多少時有最佳解。 $\begin{align} e[i,j] &= \begin{cases} min_{i\le r\le j}\{e[i,r-1]+e[r+1, j]+w_{i,j}\}\\ q_{i-1} \quad if\ j=i-1 \end{cases}\\ r[i,j]&=\text{a value of r that gives the minimum}\\ w_{i,j}&=\Sigma_{i\le k \le j}\ p_k+\Sigma_{i-1\le k \le j}\ q_k=\begin{cases} w_{i,j-1}+p_j+q_j \\ q_i\ \text{if }\ j = i-1 \end{cases} \end{align}$ #### 【$w_{i,j}$】 ![](https://i.imgur.com/KsfR4eo.png =430x200) #### 【$e[i,j]$】 ![](https://i.imgur.com/UdYoK5k.png =430x200) #### 【$r[i,j]$】 ![](https://i.imgur.com/6vDtgYk.png =430x200) #### 【ANS】 By table of e[i,j], the cost of an optimal binary search tree is **3.12**. By table of r[i,j], the structure of an optimal binary search tree is **the image below**. ![](https://i.imgur.com/YYNfqPB.jpg =500x400) ### Q3 > - [name=LXS] ![](https://i.imgur.com/bDOzbeZ.png) #### **Edit distance** Given two sequences $x[1..m]$ and $y[1..n]$ and set of transformation operation costs, the edit distance from $x$ to $y$ is the cost of the least expensive operation sequence that transforms $x$ to $y$. Describe a dynamic-programming algorithm that finds the edit distance from $x[1..m]$ to $y[1..n]$ and prints an optimal operation sequence. Analyze the running time and space requirements of your algorithm. **Operations:** dist[i][j] 表示 $x[1..i]$ 轉換至 $y[1..j]$ 的 Edit distance 題目目標為 dist[m][n] 即 $x[1..m]$ 轉換至 $y[1..n]$ 的 Edit distance 1. **Insert** a character into ith of x `dist[i][j] = dist[i][j-1]+cost[INSERT]` 1. **Delete** ith character of x `dist[i][j] = dist[i-1][j]+cost[DELETE]` 1. **Replace** a character from x by another character `dist[i][j]= min(dist[i][j-1], dist[i-1][j], dist[i-1][j-1])+cost[REPLACE]` 1. **Twiddle** two adjacent characters from x `dist[i][j] = min(dist[i][j], dist[i-2][j-2]+cost[TWIDDLE]);` Twiddle 不一定是最小 Distance 因此兩者取 min **Generalize:** $$ dist[i][j]=\min \begin{cases} dist[i-1][j-1]+\text{cost(replace)}\\ dist[i-1][j]+\text{cost(delete)}\\ dist[i][j-1]+\text{cost(insert)}\\ dist[i-2][j-2]+\text{cost(twiddle))} & \text{if }x[1..i-1] = y[1..j] \text{ and } x[1..i] = y[1..j-1]\\ dist[i-1, j-1] & \text{if }x[1..i] = y[1..j]\\ \end{cases} $$ **Boundary:** 當 `i==0` 或 `j==0` 即其中一者為空字串,最小 Edit distance 為 Insert/Delete 的次數。 ```python= for i in range(1, n+1): dist[0][i] = dist[0][i-1] + cost['INSERT'] for i in range(1, m+1): dist[i][0] = dist[i-1][0] + cost['DELETE'] ``` **Time complexity:** $O(m+n)$ **Find Distance:** ```python= dist = [[0 for i in range(n+1)] for j in range(m+1)] setBoundary(dist) for i in range(1, m+1): for j in range(1, n+1): minCost = cost['INFINITY'] if x[i] == y[j]: dist[i][j] = dist[i-1][j-1] operations[i][j] = 'NONE' else: if i > 1 and j > 1 and (x[i-1], y[j]) == (x[i], y[j-1]) and dist[i-2][i-2] + cost['TWIDDLE'] < minCost: dist[i][j] = minCost = dist[i-2][i-2] + cost['TWIDDLE'] operations[i][j] = 'TWIDDLE' if dist[i-1][j] + cost['DELETE'] < minCost: dist[i][j] = minCost = dist[i-1][j] + cost['DELETE'] operations[i][j] = 'DELETE' if dist[i][j-1] + cost['INSERT'] < minCost: dist[i][j] = minCost = dist[i][j-1] + cost['INSERT'] operations[i][j] = 'INSERT' if dist[i-1][j-1] + cost['REPLACE'] < minCost: dist[i][j] = minCost = dist[i-1][j-1] + cost['REPLACE'] operations[i][j] = 'REPLACE' ``` **Time complexity:** $O(mn)$ **Space complexity:** $O(mn)$ **Backtracking:** ```python= i, j = m, n while dist[i][j] > 0: match operations[i][j]: case 'NONE': i, j = i-1, j-1 case 'INSERT': buf.append(f'Insert {y[j]} to x.') j -= 1 case 'DELETE': buf.append(f'Delete {x[i]} from x.') i -= 1 case 'REPLACE': buf.append(f'Replace {x[i]} with {y[j]}') i, j = i-1, j-1 case 'TWIDDLE': buf.append(f'Twiddle {x[i]} and {x[i-1]}') i, j = i-2, j-2 print(i) for i in reversed(buf) ``` **Time complexity:** $O(m+n)$ **Space complexity:** $O(mn)$ <!-- 演習課的Code 如果cost(twiddle)=4 cost(replace)=2且replace的if寫在twiddle上方 如果發生twiddle 會被視作兩次replace -> 非最佳解 --> ### Q4 > - [name=songchiu] ![](https://i.imgur.com/0JylpxT.png) 透過$c[i,j]$來存放目前為止,字串$x$、字串$y$共有幾個相同的數 【式子】 $\begin{aligned} c[i,j] &=0 \quad\text{if i=0, or j=0} \\ &=c[i-1, j-1]+1 \quad\text{if i, j>0 and }x_i=y_i \\ &=max(c[i,j-1], c[i-1, j]) \quad\text{if i, j>0 and }x_i \neq y_i \\ \end{aligned}$ | c[] | B | A | D | B | C | B | A | |:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| |**A**| 0 |<font color="red">**1**</font>| 1$\leftarrow$ | 1$\leftarrow$ | 1$\leftarrow$ | 1$\leftarrow$ |<font color="red">**1**</font>| |**B**|<font color="green">**1$_\#$**</font>| 1$\leftarrow\uparrow$ | 1$\leftarrow\uparrow$ |<font color="red">**2**$\nwarrow$</font>| 2$\leftarrow$ |<font color="red">**2**$\nwarrow$</font>| 2$\leftarrow$ | |**A**| 1$\uparrow$ |<font color="green">**2**$\nwarrow$</font>| 2$\leftarrow$ | 2$\leftarrow\uparrow$ | 2$\leftarrow\uparrow$ | 2$\leftarrow\uparrow$ |<font color="red">**3**$\nwarrow$</font>| |**C**| 1$\uparrow$ | <font color="green">**2**$\uparrow$</font> | 2$\leftarrow\uparrow$ | 2$\leftarrow\uparrow$ |<font color="red">**3**$\nwarrow$</font>| 3$\leftarrow$ | 3$\leftarrow\uparrow$ | |**D**| 1$\uparrow$ | 2$\uparrow$ |<font color="green">**3**$\nwarrow$</font>| 3$\leftarrow$ | 3$\leftarrow\uparrow$ | 3$\leftarrow\uparrow$ | 3$\leftarrow\uparrow$ | |**B**|<font color="red">**1**</font>| 2$\uparrow$ | 3$\uparrow$ |<font color="green">**4**$\nwarrow$</font>| 4$\leftarrow$ |<font color="red">**4**$\nwarrow$</font>| 4$\leftarrow$ | |**C**| 1$\uparrow$ | 2$\uparrow$ | 3$\uparrow$ | 4$\uparrow$ |<font color="green">**5**$\nwarrow$</font>| <font color="green">**5**$\leftarrow$</font> | <font color="green">**5**$\leftarrow$</font> | 由上表可知,LCS為《$\text{BADBC}$》,長度為$5$ (順序請看<font color="green">**綠色的字**</font>) ### Q5 > - [name=chung] ![](https://i.imgur.com/e5PB6ZV.png) 【題目】 An algorithm to solve the LCS problem of two strings X and Y has space complexity O(|X||Y|) typically. #### (a) Design an algorithm to find the length of LCS of two string X and Y just using only 2⋅min(|X|,|Y|) cells for working space. #### 【解題思路】 LCS在計算某一格的解$c[i][j]$僅需要使用到其上方$c[i-1][j]$、左方$c[i][j-1]$或左上方$c[i-1][j-1]$格子的值,因此僅需要利用兩行的陣列(上一行與目前此行)空間即可算出答案,並將較短的字串設為index j,即可讓空間複雜度為2⋅min(|X|,|Y|) #### 【蘇都扣】 ```cpp= lcs-length(X, Y){ m <- length[X] n <- length[Y] for j <- 0 to m row_0[j] <- 0 row_1[0] <- 0 for i <- 1 to n for j <- 1 to m if (x_i == y_j) row_1[j] = row_0[j-1]+1 else if (row_0[j] >= row_1[j-1]) row_1[j] <- row_0[j] else row_1[j] <- row_1[j-1] for k <- 1 to m row_0[k] = row_1[k] return row_1[n] } ``` #### (b) Design an algorithm to find the length of LCS of two string X and Y just using only 1+min(|X|,|Y|) cells for working space. #### 【解題思路】 計算$c[i][j]$需要使用到$c[i-1][j]$(上)、$c[i][j-1]$(左)或$c[i-1][j-1]$(左上)的值,LCS的計算順序是由左到右、由上到下,因此可將陣列精簡成一行,並使用額外常數空間暫存左上方$c[i-1][j-1]$的值,因此可讓空間複雜度減少為1+min(|X|,|Y|) ![](https://i.imgur.com/4dhDxbu.jpg) #### 【蘇都扣】 ```cpp= lcs-length(X, Y){ m <- length[X] n <- length[Y] for j <- 0 to m row[j] <- 0 for i <- 1 to n s <- 0 for j <- 1 to m t <- row[j] if (x_i == y_j) row[j] <- s + 1 else if (row[j] < row[j-1]) row[j] <- row[j-1] s <- t return row[n] } ``` ### Q6 > - [name=yee] ![](https://i.imgur.com/xSeto5q.png) :::spoiler 【題目】 Let $σ$ be an alphabet set, $β$ denote the blank character in $σ$, and a measure function $F: σ×σ → R$. Where $F$ is defined as followings, for any $x$ and $y$ in $σ$, $F(x, y)<0$ if $x≠y$ and $F(x, y)>0$ if $x=y$; whereas $F(β,β)=-∞$. Given $X$ and $Y$ be two strings of $σ*$, let $X’$ and $Y’$ denote two new strings made by inserting some $β$ into $X$ and $Y$ respectively. The similarity of $X$ and $Y$ is defined by measuring the maximal value of $∑ 𝐹(𝑎_𝑖̇, 𝑏_𝑖)$ $𝑎_𝑖𝜖𝑋,𝑏_𝑖𝜖𝑌$ among all possible $X’$ and $Y’$. ::: #### 【解題思路】 與LCS很像,只是會插入空格來增加相似度,作法差不多,只是要考慮空格能插在最前面,因此建表時多加一列跟一行表示一開始空格再來建表即可 #### a. Design an algorithm to find the similarity of X and Y. ##### 【遞迴式】 $score[0,0] = 0\\ score[i,j]_{1 \leq i \leq m , j = 0} = score[i-1][j] + F(Y[i],β)\\ score[i,j]_{1 \leq j \leq n , i = 0} = score[i][j-1] + F(β,X[j])\\ score[i,j]_{1 \leq i \leq m \ , \ 1 \leq j \leq n} = max\begin{cases} score[i-1, j-1]+𝐹(Y[i], X[j])\\ score[i, j-1]+𝐹(β,X[j])\\ score[i-1, j]+𝐹(Y[i],β) \end{cases}$ ##### 【蘇都扣】 ```cpp= int score[Y_length + 1][X_length + 1]; score[0][0] = 0; for i = 1 to Y_length score[i][0] = score[i - 1][0] + F(Y[i],' '); for j = 1 to X_length score[0][j] = score[0][j - 1] + F(' ',X[j]); for i = 1 to Y_length for j = 1 to X_length v1 = score[i - 1][j - 1] + F(Y[i],X[j]); v2 = score[i - 1][j] + F(Y[i],' '); v3 = score[i][j - 1] + F(' ',X[j]); score[i][j] = max(v1,v2,v3); //F為題目所給計分的function //F(x, y) < 0 if x != y, F(x, y) > 0 if x == y, F(β,β) = -∞ ``` ##### 【複雜度】 Time : $\Theta(mn)$ Space : $\Theta(mn)$ #### b. Design an algorithm that describe where the blank characters are inserted to get the similarity. ##### 【想法】 透過建好的表(score)由最右下角回推比對左上左邊及上方的分數即可知道要於何處插入空格 ##### 【蘇都扣】 ```cpp= //score 為上方填好的表 traceBack(i, j) if i or j == 0 return if X[j] == Y[i] traceBack(i - 1, j - 1) else if(score[i - 1][j] < score[i][j - 1]) print("Insert the blank after the %d in Y.\n",j); traceBack(i, j - 1) else printf("Insert the blank after the %d in X.\n",i); traceBack(i - 1, j) ``` ##### 【複雜度】 最壞可能就是全部都空格(m+n),因此時間複雜度為$O(m + n)$ 而空間複雜度因為要建表仍然是$\Theta(mn)$ :::spoiler 【C++實作】 ```cpp= #include<iostream> using namespace std; int F(char a, char b); int main(){ string X,Y; cin >> X >> Y; X.insert(0,"1"); Y.insert(0,"1"); int X_length = X.length() ; int Y_length = Y.length() ; int score[Y_length][X_length]; score[0][0] = 0; for( int i = 1;i < Y_length; i++) score[i][0] = score[i - 1][0] + F(Y[i],' '); for(int j = 1;j < X_length; j++) score[0][j] = score[0][j - 1] + F(' ',X[j]); for( int i = 1;i < Y_length; i++) for(int j = 1;j < X_length; j++){ int v1 = score[i - 1][j - 1] + F(Y[i],X[j]); int v2 = score[i - 1][j] + F(Y[i],' '); int v3 = score[i][j - 1] + F(' ',X[j]); score[i][j] = max(max(v1,v2),v3); } int i = Y_length -1 , j = X_length - 1; string x , y; while(1){ if(i == 0 && j == 0) break; else if (i == 0 && j == 1){ x += X[j]; y += '-'; //printf("Insert the blank after the %d in Y.\n",i); break; } else if (i == 1 && j == 0){ y += Y[i]; x += '-'; //printf("Insert the blank after the %d in X.\n",j); break; } if(Y[i] == X[j] || score[i - 1][j - 1] >= max(score[i - 1][j],score[i][j - 1])){ x += X[j--]; y += Y[i--]; } else{ if(score[i - 1][j] < score[i][j - 1]){ x += X[j]; y += '-'; //printf("Insert the blank after the %d in Y.\n",j--);印出何處空白 j--; } else{ y += Y[i]; x += '-'; //printf("Insert the blank after the %d in X.\n",i--);印出何處空白 i--; } } } for(int i = x.length(); i >= 0;i--) printf("%c ",x[i]); printf("\n"); for(int i = y.length(); i >= 0;i--) printf("%c ",y[i]); //印出表的結果 // for( int i = 0;i < Y_length; i++){ // for(int j = 0;j < X_length; j++) // printf("%d ",score[i][j]); // printf("\n"); // } } int F(char a, char b){ if (a == ' ' || b == ' ') return -1; else{ if(a == b) return 1; else return -1; } } //TGTTACGG GGTTGACTA //ATCC GACG ``` ::: ### Q7 > - [name=xian] ![](https://i.imgur.com/WNZXVx3.png) #### 【解題思路】 因為在計算LCS的c table時,如果$X[i]=Y[j]$,則$c[i][j]=c[i-1][j-1]+1(左上角)$,$c[i-1][j] \ge c[i][j-1]$,則$c[i][j]=c[i-1][j]$ (上),$c[i][j-1] > c[i-1][j]$,則$c[i][j]=c[i][j-1]$ (左),由上面規則可知( from課本p.394 b table的code ),我們只需要考慮的點有`左上、上、左`這三個點,因此我們重建LCS時只需考慮這三點,即可回推所需要的解。 #### 【蘇都扣】 ```cpp= let i = m, j = n; print_ans(c,X,Y,i,j){ if (c[i][j] or i or j) is 0 return; if X[i] == Y[i] //往左上角前進 print_ans(c,X,Y,i-1,j-1); print X[i]; else c[i-1][j] >= c[i][j+1] print_ans(c,X,Y,i-1,j); else //c[i-1][j] < c[i][j+1] print_ans(c,X,Y,i,j-1); } ``` #### 【時間複雜度】 因為演算法一次只會向上一格,或是向左一格,所以在考慮最差的情形下,我們會從最右下角找的最左上角(m+n步),因此時間複雜度為$O(m+n)$。