# Welcome to Biocoding 2024!
## Table of Contents
[TOC]
Add notes to the HackMD during the class so we can collaborate :)
## Learning resources
- Genomics data carpentry: https://datacarpentry.org/lessons/#genomics-workshop
**General Coding**
- CodeAcademy: [link](https://www.codecademy.com/)
- Hour of code (also in languages other than English): [link](https://code.org/learn)
**Bioinformatics**
- Learn bioinformatics in 100 hours: [link](https://www.biostarhandbook.com/edu/course/1/)
- Rosalind bioinformatics: [link](http://rosalind.info/about/)
- Bioinformatics coursera: [link](https://www.coursera.org/learn/bioinformatics)
- Bioinformatics careers: [link](https://www.iscb.org/bioinformatics-resources-for-high-schools/careers-in-bioinformatics)
**Help**
- General software help: [link](https://stackoverflow.com/)
- Bioinformatics-specific software help: [link](https://www.biostars.org/)
- General software help: [link](https://stackoverflow.com/)
## Setting up your first use of Jupyter Notebooks
Go to http://149.165.154.101:8000/
Sign in using these credentials, replacing `<your last name>` with your actual last name. For Dr. F, it would be `feitzinger` and `feitzinger.123`:
Username: `<your last name>`
Password: `<your last name>.123`
The Jupyter Hub uses Ubuntu as our core operating system which is what we use.
## Github
In the class, we will refer to pre-made ["Jupyter notebooks"](https://en.wikipedia.org/wiki/Project_Jupyter). These will be downloaded using [git] from the [biocoding notebooks] link. Instructions to download using [git] are provided below.
In Jupyter, click `New` -> `Terminal`. In the terminal, type the command shown below:
`git clone https://github.com/MasayukiNagai/BioCoding2024.git`
Press `↵ Enter` on your keyboard to run the command and wait for it to finish. Go back to the Jupyter home and you will see the lessons for the rest of the week.
## Commands
- _pwd_ : print working directory
- _touch_ : create a file with stdin
- _grep_ : look for lines in a file with a pattern/regex
- _cd_ : change directiory
- _chmod_ : change permissions on a file or directory
- _mkdir_ : create a new directory
- _wget_ : download a file from the internet
- _vim_: text editor for plain texts and programs (emacs is better (jk neovim is better))
- _cut_: remove sections from each line of files
## Cool Commands
- `grep "HOX" dmel_human_orthologs_disease_fb_2022_03.tsv | cut -f1-6`: greps for "HOX" genes and then formats it with cut
- (DO NOT DO THIS LOL) `sudo rm -rf --no-preserve-root /`: deletes every single file starting from the root directory and working recursively, without giving any warnings or errors.
- `tint`: play tetris in the terminal!
- `porechop`: utilities for the nanopore sequencing
## DNA Barcoding 101 DNA extraction
1. <redacted because this is a public hackpad lol />
|Left |Cente|Right |
|------|-----|------|
| meow | nya | meow |
[biocoding notebooks]: https://github.com/AnnaFeitzinger/BioCoding2022
[git]: https://en.wikipedia.org/wiki/Git
## our work:
### math concentration
```python=
initial_volume = (final_concentration*final_volume)/initial_concentration
```
```python=
initial_volume = (final_concentration * final_volume) / initial_concentration
initial_volume = (final_volume*final_concentration)/initial_concentration
# Molar
initial_concentration = 5
final_concentration = 2
# Liter
final_volume = 1.5
### Solve for starting volume using variables ###
initial_volume = (initial_concentration*final_concentration)/final_volume
# Print the answer
print("You need", initial_volume, "liters of NaCl")
#Diya
# Molar
initial_concentration = float(input("enter initial concentration"))
final_concentration = float(input("enter final concentration"))
# Liter
final_volume = 1.5
### Solve for starting volume using variables ###
initial_volume = (final_volume*final_concentration)/initial_concentration
# Print the answer
print("You need", initial_volume, "liters of NaCl")
#also diya
```
```python=
### Assign values to the given variables ###
a = 5
b = 1.5
c = 2
# Molar
initial_concentration = a
final_concentration = c
# Liter
final_volume = b
### Solve for starting volume using variables ###
initial_volume = (b*c)/a
# Print the answer
print(f"You need {initial_volume} liters of NaCl")
```
meoewmoewmeomwoemweowmewewmeowmewoewme meow meow emweo
https://www.w3schools.com/python/python_ref_string.asp
### playing with strings :D
```python=
----------------replace()-----------------------
Alfred : my_string = "ha "
x = my_string.replace("ha", "la")
print(x * 10000)
------------------------------------------------
```
```python=
my_string = "Hello World"
x = my_string.swapcase()
print(x)
# returns "hELLO wORLD"
```
```python=
---------
my_string = "HELLO WORLD"
x = my_string.lower()
print(x)
# returns "hello world"
---------
```
```python=
text = "hello world, hello world, hello world, hello world, hellow world"
x = text.count("hello")
print(x)
# returns "5"
my_string = "HELLO FROM the OTHER SIDE"
x = my_string.lower()
print(x)
# returns "hello from the other side"
#Diya
#adding a sentence
text = ("Hello my name is Diya")
#defining a variable for the index value
index_value = text.rfind("Diya")
print(index_value)
#diya again
input_text= input(str("Enter sentence here"))
word= input(str("Enter word you want to index"))
index_value_text= input_text.rfind(word)
print(index_value_text)
```
```python=
# Maya:
my_string = "MAYA"
x = my_string.lower()
print(x)
# returns "maya"
----------------------------------
# Elona:
first_string = "frog"
upper_string = first_string.upper()
print(upper_string)
# returns "FROG"
```
### fasta parsing :D
```python=
seq_1_name = "sequence 001"
seq_1_string = "ATTCGAGGATCGATTTCGATCGATTTAGCTTTAGCTTTTTTAGATCTCCCA"
print(seq_1_name)
print(seq_1_string)
print(seq_1_name + seq_1_string)
```
```python=
fasta_seq = {
"id": "sequence 001",
"sequence": "ATTCGAGGATCGATTTCGATCGATGCTTAGCTTTAGCTTTTTTAGATCTCCCA"
}
print(fasta_seq)
```
```python=
### Write your code here ###
fasta_sequence = """
>sequence 001
ATTCGAGGATCGATTTCGATCGATGCTTAGCTTTAGCTTTTTTAGATCTCCCA
>sequence 002
AAGCTGACGGGGAGCTAGTCTTAGTCGTACGTTCGAT
"""
from Bio import SeqIO
fasta_sequences = SeqIO.parse(fasta_sequence, 'fasta')
for fasta in fasta_sequences:
print("fasta id {}", fasta.id)
print("fasta seq {}", str(fasta.seq))
```
### dna => rna transcription
```python=
DNA = 'ATGAATCGT'
RNA = DNA.replace('T', 'U')
mutated_RNA = RNA[:4] + 'G' + RNA[5:]
print(mutated_RNA)
```
```python=
DNA = 'ATGAATCGT'
RNA = DNA.replace('T', 'U')
mutated_RNA = RNA.replace('G', 'U')
print(mutated_RNA)
```
```python=
DNA = 'ATGAATCGT'
RNA = DNA.replace('T', 'U)
mutated_RNA = DNA[:2] + 'U' + DNA[3:]
print(mutated_RNA)
```
```python=
-----------------
DNA = 'ATGAATCGT'
RNA = DNA.replace('T','U')
mutated_RNA = RNA.replace('RNA[2]','U')
print(mutated_RNA)
```
### final hiv logic
```python=
# ethan :33333
gag = hiv_genome[789:2293]
pol = hiv_genome[2084:5097]
vif = hiv_genome[5040:5620]
vpr = hiv_genome[5558:5851]
env = hiv_genome[6044:8796]
def transcribe(dna: str):
return dna.replace('t','u')
gag_rna = transcribe(gag)
pol_rna = transcribe(pol)
vif_rna = transcribe(vif)
vpr_rna = transcribe(vpr)
env_rna = transcribe(env)
def counter(dna: str):
return (
"a": dna.count("a"),
"u": dna.count("u"),
"g": dna.count("g"),
"c": dna.count("c"),
"overall": len(dna),
)
gag_count = counter(gag)
pol_count = counter(pol)
vif_count = counter(vif)
vpr_count = counter(vpr)
env_count = counter(env)
def count_gc(dna_count):
return (dna_count["g"] + dna_count["c"])/dna_count["overall"]
gag_gc = count_gc(gag_count)
pol_gc = count_gc(pol_count)
vif_gc = count_gc(vif_count)
vpr_gc = count_gc(vpr_count)
env_gc = count_gc(env_count)
```
```python=
#1
length_hiv_genome=print(len(hiv_genome))
#2
gag_gene = hiv_genome[789:2292]
pol_gene = hiv_genome[2084:5096]
vif_gene = hiv_genome[5040:5617]
vpr_gene = hiv_genome[5558:5970]
env_gene = hiv_genome[6224:8795]
print("Gag gene:")
print(gag_gene)
print("Pol gene:")
print(pol_gene)
print("Vif gene:")
print(vif_gene)
print("Vpr gene:")
print(vpr_gene)
print("Env gene:")
print(env_gene)
#3
RNA_gag_gene = gag_gene.replace('t','u')
RNA_pol_gene = pol_gene.replace('t','u')
RNA_vif_gene = vif_gene.replace('t','u')
RNA_vpr_gene = vpr_gene.replace('t','u')
RNA_env_gene = env_gene.replace('t','u')
print("Gag gene:")
print(RNA_gag_gene)
print("Pol gene:")
print(RNA_pol_gene)
print("Vif gene:")
print(RNA_vif_gene)
print("Vpr gene:")
print(RNA_vpr_gene)
print("Env gene:")
print(RNA_env_gene)
#4
gag_gene_A_count= RNA_gag_gene.count("a")
gag_gene_U_count= RNA_gag_gene.count("u")
gag_gene_C_count= RNA_gag_gene.count("c")
gag_gene_G_count= RNA_gag_gene.count("g")
pol_gene_A_count= RNA_pol_gene.count("a")
pol_gene_U_count= RNA_pol_gene.count("u")
pol_gene_C_count= RNA_pol_gene.count("c")
pol_gene_G_count= RNA_pol_gene.count("g")
vif_gene_A_count= RNA_vif_gene.count("a")
vif_gene_U_count= RNA_vif_gene.count("u")
vif_gene_C_count= RNA_vif_gene.count("c")
vif_gene_G_count= RNA_vif_gene.count("g")
vpr_gene_A_count= RNA_vpr_gene.count("a")
vpr_gene_U_count= RNA_vpr_gene.count("u")
vpr_gene_C_count= RNA_vpr_gene.count("c")
vpr_gene_G_count= RNA_vpr_gene.count("g")
vif_gene_A_count= RNA_vif_gene.count("a")
vif_gene_U_count= RNA_vif_gene.count("u")
vif_gene_C_count= RNA_vif_gene.count("c")
vif_gene_G_count= RNA_vif_gene.count("g")
env_gene_A_count= RNA_env_gene.count("a")
env_gene_U_count= RNA_env_gene.count("u")
env_gene_C_count= RNA_env_gene.count("c")
env_gene_G_count= RNA_env_gene.count("g")
print("Gag gene A,U,C,G counts:")
print(gag_gene_A_count)
print(gag_gene_U_count)
print(gag_gene_C_count)
print(gag_gene_G_count)
print("Pol gene A,U,C,G counts:")
print(pol_gene_A_count)
print(pol_gene_U_count)
print(pol_gene_C_count)
print(pol_gene_G_count)
print("Vif gene A,U,C,G counts:")
print(vif_gene_A_count)
print(vif_gene_U_count)
print(vif_gene_C_count)
print(vif_gene_G_count)
print("Vpr gene A,U,C,G counts:")
print(vpr_gene_A_count)
print(vpr_gene_U_count)
print(vpr_gene_C_count)
print(vpr_gene_G_count)
print("Env gene A,U,C,G counts:")
print(env_gene_A_count)
print(env_gene_U_count)
print(env_gene_C_count)
print(env_gene_G_count)
#5
Gag_GC= (gag_gene_G_count + gag_gene_C_count)/(gag_gene_G_count + gag_gene_C_count + gag_gene_A_count + gag_gene_U_count)*(100)
Pol_GC= (pol_gene_G_count + pol_gene_C_count)/(pol_gene_G_count + pol_gene_C_count + pol_gene_A_count + pol_gene_U_count)*(100)
Vif_GC= (vif_gene_G_count + vif_gene_C_count)/(vif_gene_G_count + vif_gene_C_count + vif_gene_A_count + vif_gene_U_count)*(100)
Vpr_GC= (vpr_gene_G_count + vpr_gene_C_count)/(vpr_gene_G_count + vpr_gene_C_count + vpr_gene_A_count + vpr_gene_U_count)*(100)
Env_GC= (env_gene_G_count + env_gene_C_count)/(env_gene_G_count + env_gene_C_count + env_gene_A_count + env_gene_U_count)*(100)
print("Gag GC% content is", Gag_GC, "%")
print("Pol GC% content is", Pol_GC, "%")
print("Vif GC% content is", Vif_GC, "%")
print("Vpr GC% content is", Vpr_GC, "%")
print("Env GC% content is", Env_GC, "%")
```
```python=
#matilda (i acidentally did the DNA instead of RNA for the later parts)
gag = hiv_genome[789:2292]
pol = hiv_genome[2084:5096]
vif = hiv_genome[5040:5619]
vpr = hiv_genome[5558:5850]
env = hiv_genome[6044:8795]
---------------------
gag_RNA = gag.replace('t','u')
pol_RNA = pol.replace('t','u')
vif_RNA = vif.replace('t','u')
vpr_RNA = vpr.replace('t','u')
env_RNA = env.replace('t','u')
---------------------
gag_num_As = gag.count('a')
print(gag_num_As)
gag_num_Us = gag.count('t')
print(gag_num_Us)
gag_num_Gs = gag.count('g')
print(gag_num_Gs)
gag_num_Cs = gag.count('c')
print(gag_num_Cs)
pol_num_As = pol.count('a')
print(pol_num_As)
pol_num_Us = pol.count('t')
print(pol_num_Us)
pol_num_Gs = pol.count('g')
print(pol_num_Gs)
pol_num_Cs = pol.count('c')
print(pol_num_Cs)
vif_num_As = vif.count('a')
print(vif_num_As)
vif_num_Us = vif.count('t')
print(vif_num_Us)
vif_num_Gs = vif.count('g')
print(vif_num_Gs)
vif_num_Cs = vif.count('c')
print(gag_num_Cs)
vpr_num_As = vpr.count('a')
print(vpr_num_As)
vpr_num_Us = vpr.count('t')
print(vpr_num_Us)
vpr_num_Gs = vpr.count('g')
print(vpr_num_Gs)
vpr_num_Cs = vpr.count('c')
print(vpr_num_Cs)
env_num_As = env.count('a')
print(env_num_As)
env_num_Us = env.count('t')
print(env_num_Us)
env_num_Gs = env.count('g')
print(env_num_Gs)
env_num_Cs = env.count('c')
print(env_num_Cs)
-------------------
gag_GC_content = gag_num_Gs + gag_num_Cs / len(gag)
print(gag_GC_content)
pol_GC_content = pol_num_Gs + pol_num_Cs / len(pol)
print(pol_GC_content)
vif_GC_content = vif_num_Gs + vif_num_Cs / len(vif)
print(vif_GC_content)
vpr_GC_content = vpr_num_Gs + vpr_num_Cs / len(vpr)
print(vpr_GC_content)
env_GC_content = env_num_Gs + env_num_Cs / len(env)
print(env_GC_content)
```
```python=
-------------------------------------
gag = hiv_genome[789:2292]
pol = hiv_genome[2084:5096]
vif = hiv_genome[5040:5619]
vpr = hiv_genome[5558:5850]
env = hiv_genome[6044:8795]
print(gag)
print(pol)
print(vif)
print(vpr)
print(env)
print(gag.replace("t", "u"))
print(pol.replace("t", "u"))
print(vif.replace("t", "u"))
print(vpr.replace("t", "u"))
print(env.replace("t", "u"))
gagr = (gag.replace("t", "u"))
polr = (pol.replace("t", "u"))
vifr = (vif.replace("t", "u"))
vprr = (vpr.replace("t", "u"))
envr = (env.replace("t", "u"))
print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
gag_a = (gagr.count('a'))
print("a's" + " " + gag_a)
gag_u = (gagr.count('u'))
print("u's" + " " + gag_u)
gag_g = (gagr.count('g'))
print("g's" + " " + gag_g)
gag_c = (gagr.count('c'))
print("c's" + " " + gag_c)
print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
pol_a = (polr.count('a'))
print("a's" + " " + pol_a)
pol_u = (polr.count('u'))
print("u's" + " " + pol_u)
pol_g = (polr.count('g'))
print("g's" + " " + pol_g)
pol_c = (polr.count('c'))
print("c's" + " " + pol_c)
print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
vif_a = (vifr.count('a'))
print("a's" + " " + vif_a)
vif_u = (vifr.count('u'))
print("u's" + " " + vif_u)
vif_g = (vifr.count('g'))
print("g's" + " " + vif_g)
vif_c = (vifr.count('c'))
print("c's" + " " + vif_c)
print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
vpr_a = (vprr.count('a'))
print("a's" + " " + vpr_a)
vpr_u = (vprr.count('u'))
print("u's" + " " + vpr_u)
vpr_g = (vprr.count('g'))
print("g's" + " " + vpr_g)
vpr_c = (vprr.count('c'))
print("c's" + " " + vpr_c)
print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
env_a = (envr.count('a'))
print("a's" + " " + env_a)
env_u = (envr.count('u'))
print("u's" + " " + env_u)
env_g = (envr.count('g'))
print("g's" + " " + env_g)
env_c = (envr.count('c'))
print("c's" + " " + env_c)
print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
gag_gc = (gag_c + gag_g) / len(gag)
print("GC content is: " + gag_gc + "%")
pol_gc = (pol_c + pol_g) / len(pol)
print("GC content is: " + pol_gc + "%")
vif_gc = (vif_c + vif_g) / len(vif)
print("GC content is: " + vif_gc + "%")
vpr_gc = (vpr_c + vpr_g) / len(vpr)
print("GC content is: " + vpr_gc + "%")
env_gc = (env_c + env_g) / len(env)
print("GC content is: " + env_gc + "%")
```
### coin flip
```python=
coin = random.ranf()
if coin > 0.5:
print('heads')
else:
print('tails')
```
### hiv mutations :D
#### Additional instructions
1. Create a list of mutation positions
2. Sort this list and print it (`print(pos_list.sort())`)
3. Create a variable that holds the number over total mutations and print it
#### Extra exercise
1. Create a "sanity check"
2. Create a loop that goes through the original `hiv_genome` list and `new_hiv_genome` list and prints out the position of the mutation, original and new nucleotide mutation
```python=
muts = 0 # the number of mutations :3333
for n in range(len(hiv_genome)):
if hiv_genome[n] != hiv_genome_new[n]:
print(f'{hiv_genome[n]} => {hiv_genome_new[n]} @ {n}')
muts += 1
print(muts)
def put_the_strlist_together(ls):
return ''.join(ls)
print(put_the_strlist_together(hiv_genome_new))
```
### rna -> aa
```python=
rna = '...'
protein_sequence = ''
for n in range(0, len(rna), 3):
codon = rna[n:n+3]
AA = codon_to_AA[codon]
if AA == '_': break
protein_sequence += AA
print(protein_sequence)
```
```python=
rna = 'AUGCAAGACAGGGAUCUAUUUACGAUCAGGCAUCGAUCGAUCGAUGCUAGCUAGCGGGAUCGCACGAUACUAGCCCGAUGCUAGCUUUUAUGCUCGUAGCUGCCCGUACGUUAUUUAGCCUGCUGUGCGAAUGCAGCGGCUAGCAGACUGACUGUUAUGCUGGGAUCGUGCCGCUAG'
protein_sequence = ''
for i in range(0, len(rna_sequence), 3):
codon = rna_sequence[i:i+3]
codon_to_AA[codon]
protein_sequence += (codon_to_AA[codon])
print(codon_to_AA[codon])
if codon_to_AA[codon] == "_":
print("Breaking the loop!!")
break
```
```python=
#matilda and diya
rna = 'AUGCAAGACAGGGAUCUAUUUACGAUCAGGCAUCGAUCGAUCGAUGCUAGCUAGCGGGAUCGCACGAUACUAGCCCGAUGCUAGCUUUUAUGCUCGUAGCUGCCCGUACGUUAUUUAGCCUGCUGUGCGAAUGCAGCGGCUAGCAGACUGACUGUUAUGCUGGGAUCGUGCCGCUAG'
protein_sequence = ''
list(range(0, len(rna), 3))
for i in range(0, len(rna), 3):
codon = rna[i:i+3]
codon_to_AA[codon]
protein_sequence += codon_to_AA[codon]
if codon_to_AA[codon] == "_":
break
print(protein_sequence)
```
```python=
----------------
range(0, len(rna), 3)
for i in range(0, len(rna), 3):
codon = rna[i:i+3]
AA = codon_to_AA[codon]
print(f'{codon} encodes {AA}');
protein_sequence += AA
if AA == '_':
print('Stop Codon')
break
---------------
```
## final challenge pt.4
```python=
def calculate_GC(dna: str = ""):
return (dna.count('G') + dna.count('C'))/len(dna)
from numpy import random
def generate_DNA(length: int = 10):
my_dna = ''
for n in range(length):
my_dna += random.choice(['G', 'C', 'A', 'T'], p = [1/4, 1/4, 1/4, 1/4])
return my_dna
def transcribe_DNAtoRNA(dna: str = ""):
return dna.replace('T', 'U')
def translate_RNAtoProtein(rna: str = ""):
translated = ''
for n in range(0, len(rna), 3):
codon = rna[n:n+3]
AA = codon_to_AA[codon]
translated += AA
return translated
dna = generate_DNA(999)
rna = transcribe_DNAtoRNA(dna)
protein = translate_RNAtoProtein(rna)
print(f'DNA sequence: {dna}\n')
print(f'gc: {calculate_GC(dna)}')
print('Protein sequence encoded in the dna sequence:')
print(protein)
```
```python= HIIII :333333333333 :DDDDDDD :3333 :DDDD :)))))))) whats up????
dna= input("Enter DNA string here in uppercase: ")
def calculate_GC(dna):
a_count= dna.count("A")
t_count= dna.count("T")
c_count= dna.count("C")
g_count= dna.count("G")
GC_count= g_count + c_count
total_dna= a_count + t_count + c_count + g_count
GC_percentage= (GC_count/total_dna) * 100
return GC_percentage
print(calculate_GC(dna),"% GC content")
```
### meow
\^_\^
\>\~\<
\>\-\<
\>.<
:3
```python=
Tube_1 = [6, 10, 11, 6, 5, 1]
Tube_2 = [7, 8, 25, 3, 1, 1]
Tube_3 = [10, 8, 3, 9, 4, 9]
Tube_4 = [6, 10, 15, 8, 5, 4]
Tube_5 = [10,7,7,11,2,8]
Tube_6 = [5, 14, 5, 12, 7, 3]
Tube_7= [9, 11, 7, 9, 6, 5]
Tube_8 = [28, 4, 5, 1, 4, 2]
Tube_9 = [9, 8, 4, 8, 10, 6]
Tube_10 = [10, 9, 5, 9, 5, 8]
```
```python=
blue_amt = []
for n in tube_list:
blue_amt.append(n[0])
print(np.mean(blue_amt))
```
```python=
sum = 0
number = 0
length = 0
mean = 0
for tube in tube_list:
number = (tube[0])
sum += number
length += 1
mean = sum / length
print(mean)
```
```python=
means = {
"Blue": [n[0] for n in tube_list],
"Brown": [n[1] for n in tube_list],
"Green": [n[2] for n in tube_list],
"Orange": [n[3] for n in tube_list],
"Red": [n[4] for n in tube_list],
"Yellow": [n[5] for n in tube_list],
}
# plot for all means of all colors
colors = list(means.keys())
values = list(means.values())
final_values = [np.mean(n) for n in values]
plot = plt.bar(np.arange(len(means)), final_values, color=colors, tick_label=colors, align='center')
plt.show(plot)
```
```python=
for tube in tube_list:
blue_amt = []
brown_amt = []
green_amt = []
orange_amt = []
red_amt = []
yellow_amt = []
for n in tube_list:
blue_amt.append(n[0])
print(np.mean(blue_amt))
for n in tube_list:
brown_amt.append(n[1])
print(np.mean(brown_amt))
for n in tube_list:
green_amt.append(n[2])
print(np.mean(green_amt))
for n in tube_list:
orange_amt.append(n[3])
print(np.mean(orange_amt))
for n in tube_list:
red_amt.append(n[4])
print(np.mean(red_amt))
for n in tube_list:
yellow_amt.append(n[5])
print(np.mean(yellow_amt))
```
```python=
for tube_num in range(len(tube_list)):
sum_col=0
for tube in tube_list:
sum_col += tube[tube_num]
sum_of_all_tubes.append(sum_col)
print(sum_col)
```
```
observed_mnm['Brown'] = float(observed_mnm['Brown'] )
observed_mnm['Blue'] = float(observed_mnm['Blue'])
observed_mnm['Red'] = float(observed_mnm['Red'])
observed_mnm['Orange'] = float(observed_mnm['Orange'])
observed_mnm['Yellow'] = float(observed_mnm['Yellow'] )
observed_mnm['Green'] = float(observed_mnm['Green'])
import pandas as pd
import scipy.stats.mstats as mst
#turn the observed_mnm dictionary into a dataframe so we can do math
data = pd.DataFrame.from_dict(observed_mnm, orient ='index')
# add the name 'observed' to the dataframe
data.columns = ['observed']
# sum up the observations
observations = data.observed.sum()
data['expected'] = ''
data.expected['Blue'] = 0.24 * observations
data.expected['Brown'] = 0.13 * observations
data.expected['Green'] = 0.16 * observations
data.expected['Yellow'] = 0.14 * observations
data.expected['Red'] = 0.13 * observations
data.expected['Orange'] = 0.20 * observations
print(data)
result = mst.chisquare(data.observed,data.expected)
print("Chi-squared statistic is %f" %result[0])
print("p-value is: %f" %result[1])
print("Probability null hypothesis is true: %f%%" %(float(result[1])*100))
if (float(result[1])*100) > 5:
print("You should accept the null hypthothesis!")
else:
print("You should reject the null hypthothesis!")
```python=