這是MIT OpenCourseWare所提供的免費線上課程,課程連結點這裡!
1927年,Erwin Schrodinger寫下以下方程式來代表粒子在微觀世界的行為。
Schrodinger equation:
解薛丁格方程式的目標在於:
對氫原子而言,將電子的K.E.及P.E.加總後可以得出以下公式:
n只能是整數的原因可以參考這部影片,這也是為什麼這們科學稱為量子力學的原因(principle quantum)。
Learn More →
所有的結合能均為負值,負值越高,電子越穩定。當n=
n = 1, 稱為ground state,是最穩定也是
游離能代表將位在
任何單電子的元素可用以下公式描述:
當激發態的光子降到較低的能量態時便會發散光子,其中光子所帶的能量等同於激發態與較低能量態之間的能量差。
Learn More →
其中
由於能量可由以下公式換算:
當能量差
lectures and hands on tutorial
Apr 24, 2022RNA-Seq stands for RNA sequencing. It is a technology for studying RNA species. In a cell, the majority of RNA sepcies is ribosomal RNA (rRNA) which accounts for 90% of total RNA. Conversly, messenger RNA (mRNA) only aocunts for 1-2% of total RNA. The central dogma of biology tells us that DNA transcribes into mRNA and then mRNA translates into protein to execute biological functions. mRNA is the only RNA species we know encodes protein seuqences. Studying the level of mRNAs enables us to understand the gene expression level and infer the protein expression level. Therefore, most of time we only interest in the level of mRNA, and wants to deplet rRNA from the total RNA. Preprocessing RNA samples before sequencing includes total RNA extraction, mRNA enrichment, fragmentation, first-strand cDNA synthesis, RNA strand digestion, second-strand cDNA synthesis, 3'end repair, adenylation and adatper ligation. In total RNA extraction step, extracted RNA should not have signs of degradation. In mRNA enrichment, poly-T beads or poly-T columns are used to separate mRNA from other RNA sepcies, since only mRNA contains poly-A tails. mRNAs are fragmented by chemicals or ultrasonic into certain size range that fits for the sequencing capacity of sequencer. mRNA are primed with random hexamer and reverse-transcribed into first-strand cDNA, followed by RNA strand digestion and second-strand cDNA synthesis. For making strand-specific RNA-seq, TTPs are replaced with dUTPs, which serves as marks for the second-strand, in the second-strand cDNA synthesis step. Since the nature of polymerase synthesis, the double strand cDNA product lose several bases at the 3' ends on each strand. The 3'ends are repaired by dna end repair enzyme and then added a adenine. Y-shape adapters with 3'-T overhang are ligated to the 3'-A overhang of the double strand cDNA. To reduce the complexity in short read assembly, dUTP marked cDNA stands are digested using uracil-N-glycolas (UNG). The single strand cDNAs are then subjected to sequencing. In sequencing step, primer targeting adapter sequence aneal to one end of cDNA. Fluoresence-labelled ATPs, CTPs, GTPs or TTPs is added one at a time and photographed. Layers of photographes are analyzied by computer to interpret which nucleotide is added to the growing strand in each reaction. For paired end sequencing, the other primer targeting adatper on the other side is added and squenced again. The resultant are 2 separate files, one from forward strand, the other from reverse strand. Raw reads obtained from sequencer should undergo quality control before downtream analysis. Raw reads data are normally reported in fastq format. Fastq format stores 4 lines of information for each read. The first line is read name, the second line is sequence, the thrid line is a "+" sign, and the forth line is ASCII characters encypting qualtiy scores. For illumina sequencer, the quality socre is reported in Phred33 scheme. To obtain numberical quality score, ASCII characters need to be converted to numbers they represented in computer and minus 33. Quality score is used to repersent how confidence we can say this base is called correctly. A 40 quality score means 1 in 10000 chances this base is called incorrectly. Thus, higher quality score indicating higher chance this base is called correctely. Usually, we would want to trim off bases with quality score lower than 20. Low quality score base calling usually happens at the 3' end of reads since the polymerase is getting weak attaching to the template and prone to add wrong nucleic acids. FastQC is a commonly used tool to give an overview of read quality. It reports per base quality, per sequence quality, per base nucleotide content, per sequence GC content, number of duplicated sequences, overrepresented sequences and adapter content metrices. Per base quality metric shows the distribution of quality score in box plot for each base location. The upper boundary of the yellow box is the third quantile of qualtiy scores, and the lower boundary of the box is the first quantile. The red line at the middle of box is the median value of quality scores. Per sequence quality metric averages base quality for each read and shows the distribution of per read quality. For good quality reads, the distribution of read quality should skews toward high quality side. Per base nucleotide content metric shows the frequencies of nucleic acid appear at each base location. In principle, RNA has been randomly fragmented, so the frequency of each nucleic acid should distribute evenly across all read length. However, RNA-Seq data is an exception, a research has found that priming with random hexamer in cDNA synthesis step seems to have certain selection for fragmented RNA sequences, which then results in not that random nucleic acid content at the 5'end of reads. But these bases are still real bases in mRNAs, not artifacts, it's ok to put them into short read assembly. Per sequence GC content metric cacluate GC content for each read and report the distribution of per read GC content. GC content is like a fingerprint of a species, it has a constant value for a given species. This also applys to GC content of reads. The distribution of per read GC content should peaks at the value equal to that species' GC content. If peak shifts away from the expected GC content, it can stem from foreign species contaminants in read dataset. Number of duplicated sequences metric shows how many reads are exact maches to each other. For RNA-seq data set, highly expressed transcripts usually repeatly sequenced resulting in many duplicated reads. Overrepresented sequences metric reports reads that appear repeatively and account for 0.1% of total reads. Adapters are the most likely sequences captured by this metric. Adapter content metric align reads with public adapters and report alignment hits.
Apr 15, 2022Fundamental Trigonometric Identities Power Reducing Formulas Fucdamental Trigonometric Identities Reciprocal identities (倒數) $sec\theta=\frac{1}{cos\theta}$ $csc\theta=\frac{1}{sin\theta}$ $cot\theta=\frac{1}{tan\theta}$ $cos\theta=\frac{1}{sec\theta}$
Oct 11, 2020FastaQ格式是用來儲存核酸序列及其對應鹼基品質的資料儲存格式。 序列品質 序列品質是用來評估次世代定序(NGS)之定序結果可信度的數值,其公式如下: $$Q = -10logP$$ Where Q:代表序列(鹼基)品質 P:代表出現錯誤的百分比
Aug 23, 2020or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up