* # 生物序列分析演算法 期中考Hit
考古題來源:https://www.ptt.cc/man/NTU-Exam/DE0A/D245/D298/DE44/D64F/D48/index.html
## •Huffman code construction (The occurrence probabilities of nucleotides A (adenine), C (cytosine), G (guanine), T (thymine) are given.)
>https://docs.google.com/presentation/d/1fSvSm5l2-X-Mtaem0BBTS5q7nQ0D_Hg0vL9OpWmtcLA/edit#slide=id.g163f743414f_0_584
>
- 99.1:
Suppose we are given a very long DNA sequence where the occurrence probabilities of nucleotides A (adenine), C (cytosine), G (guanine), T (thymine) are 0.1, 0.3, 0.4, and 0.2, respectively.
1. (10%): Construct a Huffman code for them. You should work out the binary tree constructionas well as the code assignment.
>
2. (5%): By the above Huffman coding scheme, what is the binary string for a 10-nucleotide DNA sequence “GGGCTTCACG.”
>A=11,C=100,G=0,T=101,
>So the DNA sequence “GGGCTTCACG” =0 0 0 100 101 101 100 11 100 0
- 99.2:
Suppose we are given a very long DNA sequence where the occurrence probabilities of nucleotides A, C, G, T are 0.3, 0.1, 0.2. and 0.4 respectively. Construct a Huffman code for them. You should work out the binary three construction as well as the code assignment.
>
## •Longest increasing subsequence (Show how the $O(n \log n)$-time algorithm works with a given test example.)
Dynamic Programming
- 99.1:
In class, we introduced an O(n log n)-time algorithm for finding a longest increasing subsequence. Use h8; 2; 6; 4; 5; 7; 3; 1; 12; 9; 10i to explain how the algorithm works.
>
- 99.2:
In class, we introduced an O(nlog n)-time algorithm fore finding a longest increasing subsequence of an input sequence of length n. Use <5,2,4,1,6,8,9,3,7> to explain how the algorithm works.
> - step1: initialize, A[1]=5, creat and input 5
> 5
> - step2: A[2]=2, copy, extend and discard
> 5 discarded
> 2
> - step3: A[3]=4, copy and extend
> 2
> 2, 4
> - step4: A[4]=1, copy, extend and discard
> 1
> 2 discarded
> 2, 4
> - step5: A[5]=6, copy, extend and discard
> 1
> 6 discarded
> 1, 6 discarded
> 2, 4
> 2, 4, 6
> - step6: A[6]=8, copy, extend and discard
> 1
> 8 discarded
> 1, 8 discarded
> 2, 4
> 2, 4, 8 discarded
> 2, 4, 6
> 2, 4, 6, 8
> - step7: A[7]=9, copy, extend and discard
> 1
> 9 discarded
> 1, 9 discarded
> 2, 4
> 2, 4, 9 discarded
> 2, 4, 6
> 2, 4, 6, 9 discarded
> 2, 4, 6, 8
> 2, 4, 6, 8, 9
> - step8: A[8]=3, copy, extend and discard
> 1
> 3 discarded
> 1, 3
> 2, 4 discarded
> 2, 4, 6
> 2, 4, 6, 8
> 2, 4, 6, 8, 9
> - step9: A[9]=7, copy, extend and discard
> 1
> 7 discarded
> 1, 7 discarded
> 1, 3
> 1, 3, 7 discarded
> 2, 4, 6
> 2, 4, 6, 7
> 2, 4, 6, 8 discarded
> 2, 4, 6, 8, 9 <-- LIS List
## •Two algorithms for the maximum-sum segment problem (Show how they work with a given test example.)
- 99.1:
Problem 3 (10%): Given a sequence of real numbers A = ha1; a2; : : : ; ani, the maximum-sum segment problem is to find a consecutive subsequence, i.e., a substring or segment, in A with the maximum sum. Let prefix sum P[i] = Pi j=1 aj be the sum of the first i elements. Explain how to use the prefix sum to deliver the maximum-sum segment in O(n) time.
In the following, we are given two sequences A = ha1; a2; : : : ; ami and B = hb1; b2; : : : ; bni. An alignment of A and B is obtained by introducing dashes into the two sequences such that the lengths of the two resulting sequences are identical and no column contains two dashes. Let § denote the input symbol alphabet. A score ¾(a; b) is defined for each (a; b) 2 § £ §. The score of an alignment is the sum of ¾ scores of all columns with no dashes minus the penalties of the gaps.
>
- 99.2:
We assume the following simple scoring scheme. A match is given a bonus score 8, a mismatch is penalized by assigning score -5, and the gap penalty for each gap symbol is -3.
> 
1. Write down the dynamic-programming matrix for computing an optimal global alignment of the sequences ATGGC and AGGTC. Print the global alignment and its score
> 
> A - G G T C
> A T G G - C
> 8-3+8+8-3+8=26
2. Write down the dynamic-programming matrix for computing an optimal local alignment of the sequences GATGGCT and TAGGTCG. Print the local alignment and its score.
> 
> A T G G - C
> A - G G T C
> 8-3+8+8-3+8=26
> 
> T - G G - C
> T A G G T C
> 8-3+8+8-3+8=26
> 
> T - G G - C
> T A G G T C
> 8-3+8+8-3+8=26
> 
> A T G G - C
> A - G G T C
> 8-3+8+8-3+8=26
3. Assume further that opening up a gap is charged an additional cost -8. Write down the dynamic-programming matrix for computing an optimal global alignment of the sequences ATGGC and AGGTC. Print the global alignment and its score.
> A - G G T C
> A T G G - C
> ^ ^
> 8-3+8+8-3+8=26
> Alignment score = 26-8-8 =10
## •Pairwise alignment (Show how to compute $S^{-}[i, j]$, $S^{+}[i, j]$, local alignment, and affine gap penalties for two given test sequences.)

> 
>
> https://www.csie.ntu.edu.tw/~kmchao/bioinformatics14spr/various_schemes.pdf
## •Some details about Hirschberg's linear-space
> http://www.cs.nthu.edu.tw/~wkhon/algo09/tutorials/tutorial-hirschberg.pdf
> https://www.cs.cmu.edu/~ckingsf/class/02-714/Lec07-linspace.pdf
> https://blog.csdn.net/weixin_43934036/article/details/105437343
>
## •Algorithms for the problem of computing all $\Delta$-points
> 
> 
> Same with global alignment
> 
> 
> 綠色部分總和皆為14
> 
> 
## •Scoring schemes for multiple sequence alignment (gap-start, gap-end, and quasi-gaps)

> quasi-gap應該是T型的gap(猜的)
生醫資訊導論 期中考 評分標準說明
> https://www.csie.ntu.edu.tw/~kmchao/bioinformatics14spr/mid1_grading_criteria.pdf