# Biocoding 2021 - Virtual
---
## Shared URLS
- Course website: [link](https://jasonjwilliamsny.github.io/biocoding-2021-dnalc/)
- Learn more markdown: [link](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)
- Human genome: [link](https://www.ncbi.nlm.nih.gov/projects/genome/guide/human/index.shtml)
- SNPedia: [link](https://www.snpedia.com/index.php/SNPedia)
- Project Jupyter: [link](https://jupyter.org/)
- Interesting Jupyter notebooks: [link](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks)
- Try Linux terminal: [link](https://cocalc.com/doc/terminal.html)
- mybinder.org: [link](https://mybinder.org/)
- Notebooks: https://github.com/JasonJWilliamsNY/biocoding-2021-notebooks
- Zoom link: [link](https://cshl-dnalc.zoom.us/j/5163675186)
- JupyterHub: [http://128.196.142.17:8000](http://128.196.142.17:8000)
- (Password = lastname.123)
- Setup page for Jupyter: [link](https://jasonjwilliamsny.github.io/biocoding-2021-dnalc/setup.html)
---
**Day One Sign In**
- Jason
- Ana
- Kiana
- Benji
- Hanyu
- Lexi
- Bryan
- Max
----
## Day one notes
pwd = print working directory
ls = lists files and folders
mkdir "name" = makes folder
cd = change directory
head = shows first 10 lines of file
when you have an assignment operators, everything on the right side is evaluated and stored on the left hand side
----
**Day Two Sign In**
- Ana
- Kiana
- Hanyu
- Lexi
- Bryan
- Max
----
## Day two notes
strings can be a minimum of no characters and have an indefinite maximum
all strings must be surrounded by quotations
python is not inclusive when counting as it excludes the last number
*what variable names would you use describe...*
- Average weight of a mouse group?
avWeight
weight
alpm, betm, gamm
- Number of mice in a group?
numMice
n
alpn, betn, gamn
*print the alpha_id character by character in reverse*
print(alpha_id[7],alpha_id[6],alpha_id[5],alpha_id[4],alpha_id[3],alpha_id[2],alpha_id[1],alpha_id[0])
print(alpha_id[7])
print(alpha_id[6])
print(alpha_id[5])
print(alpha_id[4])
print(alpha_id[3])
print(alpha_id[2])
print(alpha_id[1])
print(alpha_id[0])
print(alpha_id[7])
print(alpha_id[6])
print(alpha_id[5])
print(alpha_id[4])
print(alpha_id[3])
print(alpha_id[2])
print(alpha_id[1])
print(alpha_id[0])
print(alpha_id[7])
print(alpha_id[6])
print(alpha_id[5])
print(alpha_id[4])
print(alpha_id[3])
print(alpha_id[2])
print(alpha_id[1])
print(alpha_id[0])
*Create new variables that contain the initials of the experimenter
#for each mouse group; print the value of these new variables*
*Create new variables that contain the ID of the experimenter for each mouse group; print the value of these new variables*
AlphaInitial = alpha_id[0],alpha_id[1],alpha_id[2]
print (AlphaInitial)
BetaInitial = beta_id[0],beta_id[1],beta_id[2]
print(BetaInitial)
GammaInitial = gamma_id[0],gamma_id[1],gamma_id[2]
print (GammaInitial)
initialAlpha = alpha_id[0:3]
initialBeta = beta_id[0:3]
initialGamma = gamma_id[0:3]
print(initialAlpha)
print(initialBeta)
print(initialGamma)
alpha_init = alpha_id[0:3]
print(alpha_init)
beta_init = beta_id[0:3]
print(beta_init)
gamma_init = gamma_id[0:3]
print(gamma_init)
alpha_id_initials = print(alpha_id[0:3])
beta_id_initials = print(beta_id[0:3])
gamma_id_initials = print(gamma_id[0:3])
alpha_id_number = print(alpha_id[3:9])
beta_id_number = print(beta_id[3:9])
gamma_id_number = print(gamma_id[3:9])
alpha_initials = alpha_id[0:3]
beta_initials = beta_id[0:3]
gamma_initials = gamma_id[0:3]
print(alpha_initials, beta_initials, gamma_initials)
second code
IDAlpha = alpha_id[3:]
IDBeta = beta_id[3:]
IDGamma = gamma_id[3:]
print(IDAlpha)
print(IDBeta)
print(IDGamma)
**Challenge**
*Let's create a simple sequence in Python that will do the following*:
1. Have a variable which will hold the name of the sequence
2. Have a variable which will hold the sequence string
3. Print the name and sequence in proper fasta format
Name01 = ">sequence 001"
Sequence01 = "ATTCGAGGATCGATTTCGATCGATGCTTAGCTTTAGCTTTTTTAGATCTCCCA"
print(Name01 + "\n" + Sequence01)
sequence_name = 'dna1'
sequence_string = 'acgcagatcgctagagcatcggttc'
print('>', sequence_name, '\n', sequence_string)
dna = 'actgagcgcgagtcagcaactgatcgatacg'
sequence = ">sequence1"
print(sequence),"/n",print(dna)
dnaseq1Name = 'sequence 001'
dnaseq1 = 'ATCGTACGTAGCATCGTTCGATCGATGAT'
print(">"+ dnaseq1Name + "\n" + dnaseq1)
**Day Three Sign In**
Ana
Max
Lexi
Hanyu
Kiana
Bryan
----
## Day three notes
n=i+1 -> when counting in python, remember that index number is different from natural number
*HIV Genome parsing challenge notebook 2*
- Determine and print the length of the HIV genome
print(len(hiv_genome))
---
print(len(hiv_genome))
length_hiv_genome = len(hiv_genome)
print(length_hiv_genome)
print(len(hiv_genome))
- Create variables for and print the sequences for the following HIV genes
# gag
gag = hiv_genome[790:2292]
print("gag"+"\n"+gag)
seq_gag = hiv_genome[789:2292]
print(seq_gag)
# pol
pol = hiv_genome[2085:5096]
print("pol"+"\n"+pol)
- seq_pol = hiv_genome[2084:5096]
- print(seq_pol)
# vif
vif = hiv_genome[5041:5619]
print("vif"+"\n"+vif)
- seq_vif = hiv_genome[5040:5619]
- print(seq_vif)
# vpr
vpr = hiv_genome[5559:5850]
print("vpr"+"\n"+vpr)
- seq_vpr = hiv_genome[5558:5850]
- print(seq_vpr)
# env
env = hiv_genome[6225:8795]
print("env"+"\n"+env)
- seq_env = hiv_genome[6224:8795]
- print(seq_env)
# gag
gag = hiv_genome[790:2292]
print("gag"+"\n"+gag)
# pol
pol = hiv_genome[2085:5096]
print("pol"+"\n"+pol)
# vif
vif = hiv_genome[5041:5619]
print("vif"+"\n"+vif)
# vpr
vpr = hiv_genome[5559:5850]
print("vpr"+"\n"+vpr)
# env
env = hiv_genome[6225:8795]
print("env"+"\n"+env)
sequence1 = "gag sequence"
sequence2 = "pol sequence"
sequence3 = "vif sequence"
sequence4 = "vpr sequence"
sequence5 = "env sequence"
# gag
gag = hiv_genome[0:2292]
# pol
pol = hiv_genome[2084:5096]
# vif
vif = hiv_genome[5040:5619]
# vpr
vpr = hiv_genome[5558:5850]
# env
env = hiv_genome[6224:8795]
print(">"+sequence1+"\n"+gag, "\n"+">"+sequence2+"\n"+pol, "\n"+">"+sequence3+"\n"+vif, "\n"+">"+sequence4+"\n"+vpr, "\n"+">"+sequence5+"\n"+env)
- Generate a sum for each of the nuclotides (#of 'A',#of'U',#of'G',#of'C')
RNA_hiv_genome = hiv_genome.replace('t','u')
print("A:", RNA_hiv_genome.count('a'))
print("U:", RNA_hiv_genome.count('u'))
print("G:", RNA_hiv_genome.count('g'))
print("C:", RNA_hiv_genome.count('c'))
print("A:",hiv_genome.count('a'))
print("U:",RNA_hiv.count('u'))
print("G:",hiv_genome.count('g'))
print("C:",hiv_genome.count('c'))
sumA = hiv_genome.count('a')
sumG = hiv_genome.count('g')
sumU = hiv_genome.count('t')
sumC = hiv_genome.count('c')
print(sumA)
print(sumG)
print(sumU)
print(sumC)
- Caculate the GC content for each of the genes
gagGCcont = gag.count('g') + gag.count('c')
print("gag GC content:", gagGCcont)
polGCcont = pol.count('g') + pol.count('c')
print("pol GC content:", polGCcont)
vifGCcont = vif.count('g') + vif.count('c')
print("vif GC content:", vifGCcont)
vprGCcont = vpr.count('g') + vpr.count('c')
print("vpr GC content:", vprGCcont)
envGCcont = env.count('g') + env.count('c')
print("env GC content:", envGCcont)
GCgag = seq_gag.count('g') + seq_gag.count('c')
GCpol = seq_pol.count('g') + seq_pol.count('c')
GCvif = seq_vif.count('g') + seq_vif.count('c')
GCvpr = seq_vpr.count('g') + seq_vpr.count('c')
GCenv = seq_env.count('g') + seq_env.count('c')
print(GCgag)
print(GCpol)
print(GCvif)
print(GCvpr)
print(GCenv)
print("vpr GC:", vpr_gene.count('g') + vpr_gene.count('c'))
print("gag GC:", gag_gene.count('g') + gag_gene.count('c'))
print("pol GC:", pol_gene.count('g') + pol_gene.count('c'))
print("vif GC:", vif_gene.count('g') + vif_gene.count('c'))
print("env GC:", env_gene.count('g') + env_gene.count('c'))
gag_GC=gag.count('g')+gag.count('c')
print(gag_GC)
pol_GC=pol.count('g')+pol.count('c')
print(pol_GC)
vif_GC=vif.count('g')+vif.count('c')
print(vif_GC)
vpr_GC=vpr.count('g')+vpr.count('c')
print(vpr_GC)
env_GC=env.count('g')+env.count('c')
print(env_GC)
*Print the list of these HIV genes in order given the list below*
- The correct order is: gag, pol, vif, vpr, vpu, env, nef
print(hiv_gene_names[1], hiv_gene_names[3], hiv_gene_names[2], hiv_gene_names[4], hiv_gene_names[5], hiv_gene_names[0], hiv_gene_names[6])
print(hiv_gene_names[1],hiv_gene_names[3], hiv_gene_names[2], hiv_gene_names[4], hiv_gene_names[5], hiv_gene_names[0], hiv_gene_names[6]
print(hiv_gene_names[1],hiv_gene_names[3],hiv_gene_names[2], hiv_gene_names[4],hiv_gene_names[5],hiv_gene_names[0],hiv_gene_names[6])
print(hiv_gene_names[1], hiv_gene_names[3], hiv_gene_names[2], hiv_gene_names[4], hiv_gene_names[5], hiv_gene_names[0], hiv_gene_names[6])
**Day Four Sign In**
Ana
Kiana
Max
Lexi
Bryan
Hanyu
----
## Day four notes
### HIV Simulation
1. Write a simulation which determines if in one round of replication HIV will mutate or not
from numpy import random
HIVMutState = ["mutation", "no_mutation"]
HIVMutStateProb = [0.000044, 0.999956]
HIVMutation = random.choice(HIVMutState, p = HIVMutStateProb)
if HIVMutation == "mutation":
print("One round of HIV will mutate.")
else:
print("One round of HIV will not mutate.")
---
mutation_states = ['a Mutation',
'no Mutation']
mutation_probability = [.37,.63]
coin_flip = random.choice(mutation_states,p = mutation_probability)
print("There is %s" %coin_flip)
---
mutation_state = ['mutation','no mutation']
from numpy import random
mutation_state_probabilities = [0.044,0.956]
mutation_flip = random.choice(mutation_state,p = mutation_state_probabilities)
print("%s" %mutation_flip)
---
mutation_state = ['mutation', 'no_mutation']
mutation_probability = [0.044, 0.956]
mutation_output = random.choice(mutation_state, p = mutation_probability)
print(mutation_output)
---
```
mutation_state = ['mutation', 'no mutation']
mutation_probability = [0.000044, 0.999956]
mutation_occurs = random.choice(mutation_state, p = mutation_probability)
print("In 1 round of HIV replication, there is:", mutation_occurs)
```
2. Determine how often would HIV mutate in 20 rounds of replication
---
mutation_results = []
for simulation in range(1, 21):
mutation = random.choice(mutation_state, p = mutation_probability)
mutation_results.append(mutation)
print(mutation_results)
---
from numpy import random
mutation_state = ['mutation','no mutation']
mutation_state_probabilities = [0.044,0.956]
for flip in range(1,21):
mutation_flip = random.choice(mutation_state,p = mutation_state_probabilities)
print(mutation_flip)
---
from numpy import random
HIVMutState = ["mutation", "no_mutation"]
HIVMutStateProb = [0.000044, 0.999956]
MutResults = []
for mutation in range(1,21):
HIVMutation = random.choice(HIVMutState, p = HIVMutStateProb)
MutResults.append(HIVMutation)
print(MutResults)
---
mutated_hiv_genome = ''
nucleotideOutput = ['a','c','g','t']
mutation_state = ['mutation', 'no_mutation']
mutation_probability = [0.044, 0.956]
for i in range(0, len(hiv_genome)):
mutated_not_mutated = random.choice(mutation_state, p = mutation_probability)
if(mutated_not_mutated == 'no_mutation'):
mutated_hiv_genome += hiv_genome[i]
else:
if(hiv_genome[i] == 'a'):
nucleotideProb = [0, 1/33, 29/33, 3/33]
mutatedNucleotide = random.choice(nucleotideOutput, p = nucleotideProb)
mutated_hiv_genome += mutatedNucleotide
elif(hiv_genome[i] == 'c'):
nucleotideProb = [14/95, 0, 0, 81/95]
mutatedNucleotide = random.choice(nucleotideOutput, p = nucleotideProb)
mutated_hiv_genome += mutatedNucleotide
elif(hiv_genome[i] == 'g'):
nucleotideProb = [146/152, 2/152, 0, 4/152]
mutatedNucleotide = random.choice(nucleotideOutput, p = nucleotideProb)
mutated_hiv_genome += mutatedNucleotide
else:
nucleotideProb = [20/44, 18/44, 6/44, 0]
mutatedNucleotide = random.choice(nucleotideOutput, p = nucleotideProb)
mutated_hiv_genome += mutatedNucleotide
print(mutated_hiv_genome)
---
from numpy import random
find_nucleotide = random.randint(9181)
nucleotide = hiv_genome[find_nucleotide]
print('nucleotide number:', find_nucleotide)
print('nucleotide:', hiv_genome[find_nucleotide])
hiv_mutation_states = ['Transition','Transversion']
hiv_mutation_probabilities = [.85, .15]
hiv_mutation_results = []
for hiv_mutation in range(1):
hiv_mutation_flip = random.choice(hiv_mutation_states,p = hiv_mutation_probabilities)
hiv_mutation_results.append(hiv_mutation_flip)
for results in hiv_mutation_results:
if results == 'Transition':
if nucleotide == "a":
print('new mutation:', nucleotide.replace('a','g'))
elif nucleotide == 'c':
print('new mutation:', nucleotide.replace('c','t'))
elif nucleotide == 'g':
print('new mutation:', nucleotide.replace('g','a'))
elif nucleotide == 't':
print('new mutation:', nucleotide.replace('t','c'))
elif results == 'Transversion':
if nucleotide == "a":
print('new mutation:', nucleotide.replace('a','c'or't'))
elif nucleotide == 'c':
print('new mutation:', nucleotide.replace('c','a'))
elif nucleotide == 'g':
print('new mutation:', nucleotide.replace('g','t'or'c'))
elif nucleotide == 't':
print('new mutation:', nucleotide.replace('t','a'or'g'))
*Challenge: Your dictionary should contain the key:value pair 'beta_id':'SJW99399'. Using only that value from the my_mouse_exp dictionary, create a new entry in my_mouse_exp , experimenter which has the value SJW extracted from the 'beta_id':'SJW99399' dictionary entry.*
my_mouse_exp['beta_experimenter'] = beta_group_id[3:8]
print(my_mouse_exp)
my_mouse_exp['beta_experimenter'] = beta_group_id[0:3]
print (my_mouse_exp['beta_experimenter'])
my_mouse_exp['experimenter'] = my_mouse_exp['beta_id'][0:3]
my_mouse_exp['experimenter']= my_mouse_exp['beta_id'][0:3]
print (my_mouse_exp['experimenter'])
## RNA to Protein
----
**Day Five Sign In**
Max
Lexi
Bryan
Kiana
Ana
Hanyu
----
## Day Five notes
*Write a function that calculates the GC content of a DNA string*
dna = 'gatgcattatcgtgagc'
def GCcontent():
sum = 0
for i in range(0, len(dna)):
if(dna[i] == 'c' or dna[i] =='g'):
sum += 1
print(sum)
GCcontent()
---
dna = 'acttgtaccttgagattcag'
def gc_content():
bob = (dna.count('c') + dna.count('g'))/len(dna)
print(bob)
gc_content()
---
```
ç
```
---
dna = "tgcatcgatcatcgatcgtagctagctagctactacg"
def GCcontent():
GCcontent = ((dna.count('g')+dna.count('c'))/len(dna))*100
print("The GC content is", GCcontent)
GCcontent()
*Write a function that generates a random string of DNA of random length*
```
def generate_random_dna():
from numpy import random
nucleotides = ['a', 't', 'c', 'g']
probabilities = [0.25, 0.25, 0.25, 0.25]
dna_length = random.randint(1,100)
initial_length = 0
dna = ''
while initial_length < dna_length:
dna = dna + random.choice(nucleotides, p = probabilities)
initial_length = initial_length + 1
print(dna)
generate_random_dna()
```
*Write a function that generates a random string of DNA of random a random length: use optional parameters to set the length of the strings and the probabilities of the nucleotides.*
def generateDNA(dna_length = '', nucleotideprob = ['', '', '', '']):
from numpy import random
nucleotides = ('a','c','g','t')
initial_length = 0
dna = ''
while initial_length < dna_length:
dna = dna + random.choice(nucleotides, p=nucleotideprob)
initial_length = initial_length + 1
print(dna)
generateDNA(98,[.3,.5,.1,.1])
---
```
def generate_random_dna(dna_length = random.randint(1,100), probabilities = [.25, .25, .25, .25]):
from numpy import random
nucleotides = ['a', 't', 'c', 'g']
initial_length = 0
dna = ''
while initial_length < dna_length:
dna = dna + random.choice(nucleotides, p = probabilities)
initial_length = initial_length + 1
print(dna)
generate_random_dna(80, [.9, .03, .03, .04])
```
from numpy import random
def random(DNAlength = random.randint(0, 41), nucProb = [.25, .25, .25, .25]):
from numpy import random
nucleotide = ['a','c','g','t']
newDNA = ''
for i in range(0, DNAlength):
newDNA += random.choice(nucleotide, p = nucProb)
print(newDNA)
random(10, [.1, .2, .5, .2])