Biocoding 2021 - Virtual

# Biocoding 2021 - Virtual --- ## Shared URLS - Course website: [link](https://jasonjwilliamsny.github.io/biocoding-2021-dnalc/) - Learn more markdown: [link](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) - Human genome: [link](https://www.ncbi.nlm.nih.gov/projects/genome/guide/human/index.shtml) - SNPedia: [link](https://www.snpedia.com/index.php/SNPedia) - Project Jupyter: [link](https://jupyter.org/) - Interesting Jupyter notebooks: [link](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks) - Try Linux terminal: [link](https://cocalc.com/doc/terminal.html) - mybinder.org: [link](https://mybinder.org/) - Notebooks: https://github.com/JasonJWilliamsNY/biocoding-2021-notebooks - Zoom link: [link](https://cshl-dnalc.zoom.us/j/5163675186) - JupyterHub: [http://128.196.142.17:8000](http://128.196.142.17:8000) - (Password = lastname.123) - Setup page for Jupyter: [link](https://jasonjwilliamsny.github.io/biocoding-2021-dnalc/setup.html) --- **Day One Sign In** - Jason - Ana - Kiana - Benji - Hanyu - Lexi - Bryan - Max ---- ## Day one notes pwd = print working directory ls = lists files and folders mkdir "name" = makes folder cd = change directory head = shows first 10 lines of file when you have an assignment operators, everything on the right side is evaluated and stored on the left hand side ---- **Day Two Sign In** - Ana - Kiana - Hanyu - Lexi - Bryan - Max ---- ## Day two notes strings can be a minimum of no characters and have an indefinite maximum all strings must be surrounded by quotations python is not inclusive when counting as it excludes the last number *what variable names would you use describe...* - Average weight of a mouse group? avWeight weight alpm, betm, gamm - Number of mice in a group? numMice n alpn, betn, gamn *print the alpha_id character by character in reverse* print(alpha_id[7],alpha_id[6],alpha_id[5],alpha_id[4],alpha_id[3],alpha_id[2],alpha_id[1],alpha_id[0]) print(alpha_id[7]) print(alpha_id[6]) print(alpha_id[5]) print(alpha_id[4]) print(alpha_id[3]) print(alpha_id[2]) print(alpha_id[1]) print(alpha_id[0]) print(alpha_id[7]) print(alpha_id[6]) print(alpha_id[5]) print(alpha_id[4]) print(alpha_id[3]) print(alpha_id[2]) print(alpha_id[1]) print(alpha_id[0]) print(alpha_id[7]) print(alpha_id[6]) print(alpha_id[5]) print(alpha_id[4]) print(alpha_id[3]) print(alpha_id[2]) print(alpha_id[1]) print(alpha_id[0]) *Create new variables that contain the initials of the experimenter #for each mouse group; print the value of these new variables* *Create new variables that contain the ID of the experimenter for each mouse group; print the value of these new variables* AlphaInitial = alpha_id[0],alpha_id[1],alpha_id[2] print (AlphaInitial) BetaInitial = beta_id[0],beta_id[1],beta_id[2] print(BetaInitial) GammaInitial = gamma_id[0],gamma_id[1],gamma_id[2] print (GammaInitial) initialAlpha = alpha_id[0:3] initialBeta = beta_id[0:3] initialGamma = gamma_id[0:3] print(initialAlpha) print(initialBeta) print(initialGamma) alpha_init = alpha_id[0:3] print(alpha_init) beta_init = beta_id[0:3] print(beta_init) gamma_init = gamma_id[0:3] print(gamma_init) alpha_id_initials = print(alpha_id[0:3]) beta_id_initials = print(beta_id[0:3]) gamma_id_initials = print(gamma_id[0:3]) alpha_id_number = print(alpha_id[3:9]) beta_id_number = print(beta_id[3:9]) gamma_id_number = print(gamma_id[3:9]) alpha_initials = alpha_id[0:3] beta_initials = beta_id[0:3] gamma_initials = gamma_id[0:3] print(alpha_initials, beta_initials, gamma_initials) second code IDAlpha = alpha_id[3:] IDBeta = beta_id[3:] IDGamma = gamma_id[3:] print(IDAlpha) print(IDBeta) print(IDGamma) **Challenge** *Let's create a simple sequence in Python that will do the following*: 1. Have a variable which will hold the name of the sequence 2. Have a variable which will hold the sequence string 3. Print the name and sequence in proper fasta format Name01 = ">sequence 001" Sequence01 = "ATTCGAGGATCGATTTCGATCGATGCTTAGCTTTAGCTTTTTTAGATCTCCCA" print(Name01 + "\n" + Sequence01) sequence_name = 'dna1' sequence_string = 'acgcagatcgctagagcatcggttc' print('>', sequence_name, '\n', sequence_string) dna = 'actgagcgcgagtcagcaactgatcgatacg' sequence = ">sequence1" print(sequence),"/n",print(dna) dnaseq1Name = 'sequence 001' dnaseq1 = 'ATCGTACGTAGCATCGTTCGATCGATGAT' print(">"+ dnaseq1Name + "\n" + dnaseq1) **Day Three Sign In** Ana Max Lexi Hanyu Kiana Bryan ---- ## Day three notes n=i+1 -> when counting in python, remember that index number is different from natural number *HIV Genome parsing challenge notebook 2* - Determine and print the length of the HIV genome print(len(hiv_genome)) --- print(len(hiv_genome)) length_hiv_genome = len(hiv_genome) print(length_hiv_genome) print(len(hiv_genome)) - Create variables for and print the sequences for the following HIV genes # gag gag = hiv_genome[790:2292] print("gag"+"\n"+gag) seq_gag = hiv_genome[789:2292] print(seq_gag) # pol pol = hiv_genome[2085:5096] print("pol"+"\n"+pol) - seq_pol = hiv_genome[2084:5096] - print(seq_pol) # vif vif = hiv_genome[5041:5619] print("vif"+"\n"+vif) - seq_vif = hiv_genome[5040:5619] - print(seq_vif) # vpr vpr = hiv_genome[5559:5850] print("vpr"+"\n"+vpr) - seq_vpr = hiv_genome[5558:5850] - print(seq_vpr) # env env = hiv_genome[6225:8795] print("env"+"\n"+env) - seq_env = hiv_genome[6224:8795] - print(seq_env) # gag gag = hiv_genome[790:2292] print("gag"+"\n"+gag) # pol pol = hiv_genome[2085:5096] print("pol"+"\n"+pol) # vif vif = hiv_genome[5041:5619] print("vif"+"\n"+vif) # vpr vpr = hiv_genome[5559:5850] print("vpr"+"\n"+vpr) # env env = hiv_genome[6225:8795] print("env"+"\n"+env) sequence1 = "gag sequence" sequence2 = "pol sequence" sequence3 = "vif sequence" sequence4 = "vpr sequence" sequence5 = "env sequence" # gag gag = hiv_genome[0:2292] # pol pol = hiv_genome[2084:5096] # vif vif = hiv_genome[5040:5619] # vpr vpr = hiv_genome[5558:5850] # env env = hiv_genome[6224:8795] print(">"+sequence1+"\n"+gag, "\n"+">"+sequence2+"\n"+pol, "\n"+">"+sequence3+"\n"+vif, "\n"+">"+sequence4+"\n"+vpr, "\n"+">"+sequence5+"\n"+env) - Generate a sum for each of the nuclotides (#of 'A',#of'U',#of'G',#of'C') RNA_hiv_genome = hiv_genome.replace('t','u') print("A:", RNA_hiv_genome.count('a')) print("U:", RNA_hiv_genome.count('u')) print("G:", RNA_hiv_genome.count('g')) print("C:", RNA_hiv_genome.count('c')) print("A:",hiv_genome.count('a')) print("U:",RNA_hiv.count('u')) print("G:",hiv_genome.count('g')) print("C:",hiv_genome.count('c')) sumA = hiv_genome.count('a') sumG = hiv_genome.count('g') sumU = hiv_genome.count('t') sumC = hiv_genome.count('c') print(sumA) print(sumG) print(sumU) print(sumC) - Caculate the GC content for each of the genes gagGCcont = gag.count('g') + gag.count('c') print("gag GC content:", gagGCcont) polGCcont = pol.count('g') + pol.count('c') print("pol GC content:", polGCcont) vifGCcont = vif.count('g') + vif.count('c') print("vif GC content:", vifGCcont) vprGCcont = vpr.count('g') + vpr.count('c') print("vpr GC content:", vprGCcont) envGCcont = env.count('g') + env.count('c') print("env GC content:", envGCcont) GCgag = seq_gag.count('g') + seq_gag.count('c') GCpol = seq_pol.count('g') + seq_pol.count('c') GCvif = seq_vif.count('g') + seq_vif.count('c') GCvpr = seq_vpr.count('g') + seq_vpr.count('c') GCenv = seq_env.count('g') + seq_env.count('c') print(GCgag) print(GCpol) print(GCvif) print(GCvpr) print(GCenv) print("vpr GC:", vpr_gene.count('g') + vpr_gene.count('c')) print("gag GC:", gag_gene.count('g') + gag_gene.count('c')) print("pol GC:", pol_gene.count('g') + pol_gene.count('c')) print("vif GC:", vif_gene.count('g') + vif_gene.count('c')) print("env GC:", env_gene.count('g') + env_gene.count('c')) gag_GC=gag.count('g')+gag.count('c') print(gag_GC) pol_GC=pol.count('g')+pol.count('c') print(pol_GC) vif_GC=vif.count('g')+vif.count('c') print(vif_GC) vpr_GC=vpr.count('g')+vpr.count('c') print(vpr_GC) env_GC=env.count('g')+env.count('c') print(env_GC) *Print the list of these HIV genes in order given the list below* - The correct order is: gag, pol, vif, vpr, vpu, env, nef print(hiv_gene_names[1], hiv_gene_names[3], hiv_gene_names[2], hiv_gene_names[4], hiv_gene_names[5], hiv_gene_names[0], hiv_gene_names[6]) print(hiv_gene_names[1],hiv_gene_names[3], hiv_gene_names[2], hiv_gene_names[4], hiv_gene_names[5], hiv_gene_names[0], hiv_gene_names[6] print(hiv_gene_names[1],hiv_gene_names[3],hiv_gene_names[2], hiv_gene_names[4],hiv_gene_names[5],hiv_gene_names[0],hiv_gene_names[6]) print(hiv_gene_names[1], hiv_gene_names[3], hiv_gene_names[2], hiv_gene_names[4], hiv_gene_names[5], hiv_gene_names[0], hiv_gene_names[6]) **Day Four Sign In** Ana Kiana Max Lexi Bryan Hanyu ---- ## Day four notes ### HIV Simulation 1. Write a simulation which determines if in one round of replication HIV will mutate or not from numpy import random HIVMutState = ["mutation", "no_mutation"] HIVMutStateProb = [0.000044, 0.999956] HIVMutation = random.choice(HIVMutState, p = HIVMutStateProb) if HIVMutation == "mutation": print("One round of HIV will mutate.") else: print("One round of HIV will not mutate.") --- mutation_states = ['a Mutation', 'no Mutation'] mutation_probability = [.37,.63] coin_flip = random.choice(mutation_states,p = mutation_probability) print("There is %s" %coin_flip) --- mutation_state = ['mutation','no mutation'] from numpy import random mutation_state_probabilities = [0.044,0.956] mutation_flip = random.choice(mutation_state,p = mutation_state_probabilities) print("%s" %mutation_flip) --- mutation_state = ['mutation', 'no_mutation'] mutation_probability = [0.044, 0.956] mutation_output = random.choice(mutation_state, p = mutation_probability) print(mutation_output) --- ``` mutation_state = ['mutation', 'no mutation'] mutation_probability = [0.000044, 0.999956] mutation_occurs = random.choice(mutation_state, p = mutation_probability) print("In 1 round of HIV replication, there is:", mutation_occurs) ``` 2. Determine how often would HIV mutate in 20 rounds of replication --- mutation_results = [] for simulation in range(1, 21): mutation = random.choice(mutation_state, p = mutation_probability) mutation_results.append(mutation) print(mutation_results) --- from numpy import random mutation_state = ['mutation','no mutation'] mutation_state_probabilities = [0.044,0.956] for flip in range(1,21): mutation_flip = random.choice(mutation_state,p = mutation_state_probabilities) print(mutation_flip) --- from numpy import random HIVMutState = ["mutation", "no_mutation"] HIVMutStateProb = [0.000044, 0.999956] MutResults = [] for mutation in range(1,21): HIVMutation = random.choice(HIVMutState, p = HIVMutStateProb) MutResults.append(HIVMutation) print(MutResults) --- mutated_hiv_genome = '' nucleotideOutput = ['a','c','g','t'] mutation_state = ['mutation', 'no_mutation'] mutation_probability = [0.044, 0.956] for i in range(0, len(hiv_genome)): mutated_not_mutated = random.choice(mutation_state, p = mutation_probability) if(mutated_not_mutated == 'no_mutation'): mutated_hiv_genome += hiv_genome[i] else: if(hiv_genome[i] == 'a'): nucleotideProb = [0, 1/33, 29/33, 3/33] mutatedNucleotide = random.choice(nucleotideOutput, p = nucleotideProb) mutated_hiv_genome += mutatedNucleotide elif(hiv_genome[i] == 'c'): nucleotideProb = [14/95, 0, 0, 81/95] mutatedNucleotide = random.choice(nucleotideOutput, p = nucleotideProb) mutated_hiv_genome += mutatedNucleotide elif(hiv_genome[i] == 'g'): nucleotideProb = [146/152, 2/152, 0, 4/152] mutatedNucleotide = random.choice(nucleotideOutput, p = nucleotideProb) mutated_hiv_genome += mutatedNucleotide else: nucleotideProb = [20/44, 18/44, 6/44, 0] mutatedNucleotide = random.choice(nucleotideOutput, p = nucleotideProb) mutated_hiv_genome += mutatedNucleotide print(mutated_hiv_genome) --- from numpy import random find_nucleotide = random.randint(9181) nucleotide = hiv_genome[find_nucleotide] print('nucleotide number:', find_nucleotide) print('nucleotide:', hiv_genome[find_nucleotide]) hiv_mutation_states = ['Transition','Transversion'] hiv_mutation_probabilities = [.85, .15] hiv_mutation_results = [] for hiv_mutation in range(1): hiv_mutation_flip = random.choice(hiv_mutation_states,p = hiv_mutation_probabilities) hiv_mutation_results.append(hiv_mutation_flip) for results in hiv_mutation_results: if results == 'Transition': if nucleotide == "a": print('new mutation:', nucleotide.replace('a','g')) elif nucleotide == 'c': print('new mutation:', nucleotide.replace('c','t')) elif nucleotide == 'g': print('new mutation:', nucleotide.replace('g','a')) elif nucleotide == 't': print('new mutation:', nucleotide.replace('t','c')) elif results == 'Transversion': if nucleotide == "a": print('new mutation:', nucleotide.replace('a','c'or't')) elif nucleotide == 'c': print('new mutation:', nucleotide.replace('c','a')) elif nucleotide == 'g': print('new mutation:', nucleotide.replace('g','t'or'c')) elif nucleotide == 't': print('new mutation:', nucleotide.replace('t','a'or'g')) *Challenge: Your dictionary should contain the key:value pair 'beta_id':'SJW99399'. Using only that value from the my_mouse_exp dictionary, create a new entry in my_mouse_exp , experimenter which has the value SJW extracted from the 'beta_id':'SJW99399' dictionary entry.* my_mouse_exp['beta_experimenter'] = beta_group_id[3:8] print(my_mouse_exp) my_mouse_exp['beta_experimenter'] = beta_group_id[0:3] print (my_mouse_exp['beta_experimenter']) my_mouse_exp['experimenter'] = my_mouse_exp['beta_id'][0:3] my_mouse_exp['experimenter']= my_mouse_exp['beta_id'][0:3] print (my_mouse_exp['experimenter']) ## RNA to Protein ---- **Day Five Sign In** Max Lexi Bryan Kiana Ana Hanyu ---- ## Day Five notes *Write a function that calculates the GC content of a DNA string* dna = 'gatgcattatcgtgagc' def GCcontent(): sum = 0 for i in range(0, len(dna)): if(dna[i] == 'c' or dna[i] =='g'): sum += 1 print(sum) GCcontent() --- dna = 'acttgtaccttgagattcag' def gc_content(): bob = (dna.count('c') + dna.count('g'))/len(dna) print(bob) gc_content() --- ``` ç ``` --- dna = "tgcatcgatcatcgatcgtagctagctagctactacg" def GCcontent(): GCcontent = ((dna.count('g')+dna.count('c'))/len(dna))*100 print("The GC content is", GCcontent) GCcontent() *Write a function that generates a random string of DNA of random length* ``` def generate_random_dna(): from numpy import random nucleotides = ['a', 't', 'c', 'g'] probabilities = [0.25, 0.25, 0.25, 0.25] dna_length = random.randint(1,100) initial_length = 0 dna = '' while initial_length < dna_length: dna = dna + random.choice(nucleotides, p = probabilities) initial_length = initial_length + 1 print(dna) generate_random_dna() ``` *Write a function that generates a random string of DNA of random a random length: use optional parameters to set the length of the strings and the probabilities of the nucleotides.* def generateDNA(dna_length = '', nucleotideprob = ['', '', '', '']): from numpy import random nucleotides = ('a','c','g','t') initial_length = 0 dna = '' while initial_length < dna_length: dna = dna + random.choice(nucleotides, p=nucleotideprob) initial_length = initial_length + 1 print(dna) generateDNA(98,[.3,.5,.1,.1]) --- ``` def generate_random_dna(dna_length = random.randint(1,100), probabilities = [.25, .25, .25, .25]): from numpy import random nucleotides = ['a', 't', 'c', 'g'] initial_length = 0 dna = '' while initial_length < dna_length: dna = dna + random.choice(nucleotides, p = probabilities) initial_length = initial_length + 1 print(dna) generate_random_dna(80, [.9, .03, .03, .04]) ``` from numpy import random def random(DNAlength = random.randint(0, 41), nucProb = [.25, .25, .25, .25]): from numpy import random nucleotide = ['a','c','g','t'] newDNA = '' for i in range(0, DNAlength): newDNA += random.choice(nucleotide, p = nucProb) print(newDNA) random(10, [.1, .2, .5, .2])

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.