# Welcome to Biocoding 2024! ## Table of Contents [TOC] Add notes to the HackMD during the class so we can collaborate :) ## Learning resources - Genomics data carpentry: https://datacarpentry.org/lessons/#genomics-workshop **General Coding** - CodeAcademy: [link](https://www.codecademy.com/) - Hour of code (also in languages other than English): [link](https://code.org/learn) **Bioinformatics** - Learn bioinformatics in 100 hours: [link](https://www.biostarhandbook.com/edu/course/1/) - Rosalind bioinformatics: [link](http://rosalind.info/about/) - Bioinformatics coursera: [link](https://www.coursera.org/learn/bioinformatics) - Bioinformatics careers: [link](https://www.iscb.org/bioinformatics-resources-for-high-schools/careers-in-bioinformatics) **Help** - General software help: [link](https://stackoverflow.com/) - Bioinformatics-specific software help: [link](https://www.biostars.org/) - General software help: [link](https://stackoverflow.com/) ## Setting up your first use of Jupyter Notebooks Go to http://149.165.154.101:8000/ Sign in using these credentials, replacing `<your last name>` with your actual last name. For Dr. F, it would be `feitzinger` and `feitzinger.123`: Username: `<your last name>` Password: `<your last name>.123` The Jupyter Hub uses Ubuntu as our core operating system which is what we use. ## Github In the class, we will refer to pre-made ["Jupyter notebooks"](https://en.wikipedia.org/wiki/Project_Jupyter). These will be downloaded using [git] from the [biocoding notebooks] link. Instructions to download using [git] are provided below. In Jupyter, click `New` -> `Terminal`. In the terminal, type the command shown below: `git clone https://github.com/MasayukiNagai/BioCoding2024.git` Press `↵ Enter` on your keyboard to run the command and wait for it to finish. Go back to the Jupyter home and you will see the lessons for the rest of the week. ## Commands - _pwd_ : print working directory - _touch_ : create a file with stdin - _grep_ : look for lines in a file with a pattern/regex - _cd_ : change directiory - _chmod_ : change permissions on a file or directory - _mkdir_ : create a new directory - _wget_ : download a file from the internet - _vim_: text editor for plain texts and programs (emacs is better (jk neovim is better)) - _cut_: remove sections from each line of files ## Cool Commands - `grep "HOX" dmel_human_orthologs_disease_fb_2022_03.tsv | cut -f1-6`: greps for "HOX" genes and then formats it with cut - (DO NOT DO THIS LOL) `sudo rm -rf --no-preserve-root /`: deletes every single file starting from the root directory and working recursively, without giving any warnings or errors. - `tint`: play tetris in the terminal! - `porechop`: utilities for the nanopore sequencing ## DNA Barcoding 101 DNA extraction 1. <redacted because this is a public hackpad lol /> |Left |Cente|Right | |------|-----|------| | meow | nya | meow | [biocoding notebooks]: https://github.com/AnnaFeitzinger/BioCoding2022 [git]: https://en.wikipedia.org/wiki/Git ## our work: ### math concentration ```python= initial_volume = (final_concentration*final_volume)/initial_concentration ``` ```python= initial_volume = (final_concentration * final_volume) / initial_concentration initial_volume = (final_volume*final_concentration)/initial_concentration # Molar initial_concentration = 5 final_concentration = 2 # Liter final_volume = 1.5 ### Solve for starting volume using variables ### initial_volume = (initial_concentration*final_concentration)/final_volume # Print the answer print("You need", initial_volume, "liters of NaCl") #Diya # Molar initial_concentration = float(input("enter initial concentration")) final_concentration = float(input("enter final concentration")) # Liter final_volume = 1.5 ### Solve for starting volume using variables ### initial_volume = (final_volume*final_concentration)/initial_concentration # Print the answer print("You need", initial_volume, "liters of NaCl") #also diya ``` ```python= ### Assign values to the given variables ### a = 5 b = 1.5 c = 2 # Molar initial_concentration = a final_concentration = c # Liter final_volume = b ### Solve for starting volume using variables ### initial_volume = (b*c)/a # Print the answer print(f"You need {initial_volume} liters of NaCl") ``` meoewmoewmeomwoemweowmewewmeowmewoewme meow meow emweo https://www.w3schools.com/python/python_ref_string.asp ### playing with strings :D ```python= ----------------replace()----------------------- Alfred : my_string = "ha " x = my_string.replace("ha", "la") print(x * 10000) ------------------------------------------------ ``` ```python= my_string = "Hello World" x = my_string.swapcase() print(x) # returns "hELLO wORLD" ``` ```python= --------- my_string = "HELLO WORLD" x = my_string.lower() print(x) # returns "hello world" --------- ``` ```python= text = "hello world, hello world, hello world, hello world, hellow world" x = text.count("hello") print(x) # returns "5" my_string = "HELLO FROM the OTHER SIDE" x = my_string.lower() print(x) # returns "hello from the other side" #Diya #adding a sentence text = ("Hello my name is Diya") #defining a variable for the index value index_value = text.rfind("Diya") print(index_value) #diya again input_text= input(str("Enter sentence here")) word= input(str("Enter word you want to index")) index_value_text= input_text.rfind(word) print(index_value_text) ``` ```python= # Maya: my_string = "MAYA" x = my_string.lower() print(x) # returns "maya" ---------------------------------- # Elona: first_string = "frog" upper_string = first_string.upper() print(upper_string) # returns "FROG" ``` ### fasta parsing :D ```python= seq_1_name = "sequence 001" seq_1_string = "ATTCGAGGATCGATTTCGATCGATTTAGCTTTAGCTTTTTTAGATCTCCCA" print(seq_1_name) print(seq_1_string) print(seq_1_name + seq_1_string) ``` ```python= fasta_seq = { "id": "sequence 001", "sequence": "ATTCGAGGATCGATTTCGATCGATGCTTAGCTTTAGCTTTTTTAGATCTCCCA" } print(fasta_seq) ``` ```python= ### Write your code here ### fasta_sequence = """ >sequence 001 ATTCGAGGATCGATTTCGATCGATGCTTAGCTTTAGCTTTTTTAGATCTCCCA >sequence 002 AAGCTGACGGGGAGCTAGTCTTAGTCGTACGTTCGAT """ from Bio import SeqIO fasta_sequences = SeqIO.parse(fasta_sequence, 'fasta') for fasta in fasta_sequences: print("fasta id {}", fasta.id) print("fasta seq {}", str(fasta.seq)) ``` ### dna => rna transcription ```python= DNA = 'ATGAATCGT' RNA = DNA.replace('T', 'U') mutated_RNA = RNA[:4] + 'G' + RNA[5:] print(mutated_RNA) ``` ```python= DNA = 'ATGAATCGT' RNA = DNA.replace('T', 'U') mutated_RNA = RNA.replace('G', 'U') print(mutated_RNA) ``` ```python= DNA = 'ATGAATCGT' RNA = DNA.replace('T', 'U) mutated_RNA = DNA[:2] + 'U' + DNA[3:] print(mutated_RNA) ``` ```python= ----------------- DNA = 'ATGAATCGT' RNA = DNA.replace('T','U') mutated_RNA = RNA.replace('RNA[2]','U') print(mutated_RNA) ``` ### final hiv logic ```python= # ethan :33333 gag = hiv_genome[789:2293] pol = hiv_genome[2084:5097] vif = hiv_genome[5040:5620] vpr = hiv_genome[5558:5851] env = hiv_genome[6044:8796] def transcribe(dna: str): return dna.replace('t','u') gag_rna = transcribe(gag) pol_rna = transcribe(pol) vif_rna = transcribe(vif) vpr_rna = transcribe(vpr) env_rna = transcribe(env) def counter(dna: str): return ( "a": dna.count("a"), "u": dna.count("u"), "g": dna.count("g"), "c": dna.count("c"), "overall": len(dna), ) gag_count = counter(gag) pol_count = counter(pol) vif_count = counter(vif) vpr_count = counter(vpr) env_count = counter(env) def count_gc(dna_count): return (dna_count["g"] + dna_count["c"])/dna_count["overall"] gag_gc = count_gc(gag_count) pol_gc = count_gc(pol_count) vif_gc = count_gc(vif_count) vpr_gc = count_gc(vpr_count) env_gc = count_gc(env_count) ``` ```python= #1 length_hiv_genome=print(len(hiv_genome)) #2 gag_gene = hiv_genome[789:2292] pol_gene = hiv_genome[2084:5096] vif_gene = hiv_genome[5040:5617] vpr_gene = hiv_genome[5558:5970] env_gene = hiv_genome[6224:8795] print("Gag gene:") print(gag_gene) print("Pol gene:") print(pol_gene) print("Vif gene:") print(vif_gene) print("Vpr gene:") print(vpr_gene) print("Env gene:") print(env_gene) #3 RNA_gag_gene = gag_gene.replace('t','u') RNA_pol_gene = pol_gene.replace('t','u') RNA_vif_gene = vif_gene.replace('t','u') RNA_vpr_gene = vpr_gene.replace('t','u') RNA_env_gene = env_gene.replace('t','u') print("Gag gene:") print(RNA_gag_gene) print("Pol gene:") print(RNA_pol_gene) print("Vif gene:") print(RNA_vif_gene) print("Vpr gene:") print(RNA_vpr_gene) print("Env gene:") print(RNA_env_gene) #4 gag_gene_A_count= RNA_gag_gene.count("a") gag_gene_U_count= RNA_gag_gene.count("u") gag_gene_C_count= RNA_gag_gene.count("c") gag_gene_G_count= RNA_gag_gene.count("g") pol_gene_A_count= RNA_pol_gene.count("a") pol_gene_U_count= RNA_pol_gene.count("u") pol_gene_C_count= RNA_pol_gene.count("c") pol_gene_G_count= RNA_pol_gene.count("g") vif_gene_A_count= RNA_vif_gene.count("a") vif_gene_U_count= RNA_vif_gene.count("u") vif_gene_C_count= RNA_vif_gene.count("c") vif_gene_G_count= RNA_vif_gene.count("g") vpr_gene_A_count= RNA_vpr_gene.count("a") vpr_gene_U_count= RNA_vpr_gene.count("u") vpr_gene_C_count= RNA_vpr_gene.count("c") vpr_gene_G_count= RNA_vpr_gene.count("g") vif_gene_A_count= RNA_vif_gene.count("a") vif_gene_U_count= RNA_vif_gene.count("u") vif_gene_C_count= RNA_vif_gene.count("c") vif_gene_G_count= RNA_vif_gene.count("g") env_gene_A_count= RNA_env_gene.count("a") env_gene_U_count= RNA_env_gene.count("u") env_gene_C_count= RNA_env_gene.count("c") env_gene_G_count= RNA_env_gene.count("g") print("Gag gene A,U,C,G counts:") print(gag_gene_A_count) print(gag_gene_U_count) print(gag_gene_C_count) print(gag_gene_G_count) print("Pol gene A,U,C,G counts:") print(pol_gene_A_count) print(pol_gene_U_count) print(pol_gene_C_count) print(pol_gene_G_count) print("Vif gene A,U,C,G counts:") print(vif_gene_A_count) print(vif_gene_U_count) print(vif_gene_C_count) print(vif_gene_G_count) print("Vpr gene A,U,C,G counts:") print(vpr_gene_A_count) print(vpr_gene_U_count) print(vpr_gene_C_count) print(vpr_gene_G_count) print("Env gene A,U,C,G counts:") print(env_gene_A_count) print(env_gene_U_count) print(env_gene_C_count) print(env_gene_G_count) #5 Gag_GC= (gag_gene_G_count + gag_gene_C_count)/(gag_gene_G_count + gag_gene_C_count + gag_gene_A_count + gag_gene_U_count)*(100) Pol_GC= (pol_gene_G_count + pol_gene_C_count)/(pol_gene_G_count + pol_gene_C_count + pol_gene_A_count + pol_gene_U_count)*(100) Vif_GC= (vif_gene_G_count + vif_gene_C_count)/(vif_gene_G_count + vif_gene_C_count + vif_gene_A_count + vif_gene_U_count)*(100) Vpr_GC= (vpr_gene_G_count + vpr_gene_C_count)/(vpr_gene_G_count + vpr_gene_C_count + vpr_gene_A_count + vpr_gene_U_count)*(100) Env_GC= (env_gene_G_count + env_gene_C_count)/(env_gene_G_count + env_gene_C_count + env_gene_A_count + env_gene_U_count)*(100) print("Gag GC% content is", Gag_GC, "%") print("Pol GC% content is", Pol_GC, "%") print("Vif GC% content is", Vif_GC, "%") print("Vpr GC% content is", Vpr_GC, "%") print("Env GC% content is", Env_GC, "%") ``` ```python= #matilda (i acidentally did the DNA instead of RNA for the later parts) gag = hiv_genome[789:2292] pol = hiv_genome[2084:5096] vif = hiv_genome[5040:5619] vpr = hiv_genome[5558:5850] env = hiv_genome[6044:8795] --------------------- gag_RNA = gag.replace('t','u') pol_RNA = pol.replace('t','u') vif_RNA = vif.replace('t','u') vpr_RNA = vpr.replace('t','u') env_RNA = env.replace('t','u') --------------------- gag_num_As = gag.count('a') print(gag_num_As) gag_num_Us = gag.count('t') print(gag_num_Us) gag_num_Gs = gag.count('g') print(gag_num_Gs) gag_num_Cs = gag.count('c') print(gag_num_Cs) pol_num_As = pol.count('a') print(pol_num_As) pol_num_Us = pol.count('t') print(pol_num_Us) pol_num_Gs = pol.count('g') print(pol_num_Gs) pol_num_Cs = pol.count('c') print(pol_num_Cs) vif_num_As = vif.count('a') print(vif_num_As) vif_num_Us = vif.count('t') print(vif_num_Us) vif_num_Gs = vif.count('g') print(vif_num_Gs) vif_num_Cs = vif.count('c') print(gag_num_Cs) vpr_num_As = vpr.count('a') print(vpr_num_As) vpr_num_Us = vpr.count('t') print(vpr_num_Us) vpr_num_Gs = vpr.count('g') print(vpr_num_Gs) vpr_num_Cs = vpr.count('c') print(vpr_num_Cs) env_num_As = env.count('a') print(env_num_As) env_num_Us = env.count('t') print(env_num_Us) env_num_Gs = env.count('g') print(env_num_Gs) env_num_Cs = env.count('c') print(env_num_Cs) ------------------- gag_GC_content = gag_num_Gs + gag_num_Cs / len(gag) print(gag_GC_content) pol_GC_content = pol_num_Gs + pol_num_Cs / len(pol) print(pol_GC_content) vif_GC_content = vif_num_Gs + vif_num_Cs / len(vif) print(vif_GC_content) vpr_GC_content = vpr_num_Gs + vpr_num_Cs / len(vpr) print(vpr_GC_content) env_GC_content = env_num_Gs + env_num_Cs / len(env) print(env_GC_content) ``` ```python= ------------------------------------- gag = hiv_genome[789:2292] pol = hiv_genome[2084:5096] vif = hiv_genome[5040:5619] vpr = hiv_genome[5558:5850] env = hiv_genome[6044:8795] print(gag) print(pol) print(vif) print(vpr) print(env) print(gag.replace("t", "u")) print(pol.replace("t", "u")) print(vif.replace("t", "u")) print(vpr.replace("t", "u")) print(env.replace("t", "u")) gagr = (gag.replace("t", "u")) polr = (pol.replace("t", "u")) vifr = (vif.replace("t", "u")) vprr = (vpr.replace("t", "u")) envr = (env.replace("t", "u")) print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~") gag_a = (gagr.count('a')) print("a's" + " " + gag_a) gag_u = (gagr.count('u')) print("u's" + " " + gag_u) gag_g = (gagr.count('g')) print("g's" + " " + gag_g) gag_c = (gagr.count('c')) print("c's" + " " + gag_c) print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~") pol_a = (polr.count('a')) print("a's" + " " + pol_a) pol_u = (polr.count('u')) print("u's" + " " + pol_u) pol_g = (polr.count('g')) print("g's" + " " + pol_g) pol_c = (polr.count('c')) print("c's" + " " + pol_c) print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~") vif_a = (vifr.count('a')) print("a's" + " " + vif_a) vif_u = (vifr.count('u')) print("u's" + " " + vif_u) vif_g = (vifr.count('g')) print("g's" + " " + vif_g) vif_c = (vifr.count('c')) print("c's" + " " + vif_c) print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~") vpr_a = (vprr.count('a')) print("a's" + " " + vpr_a) vpr_u = (vprr.count('u')) print("u's" + " " + vpr_u) vpr_g = (vprr.count('g')) print("g's" + " " + vpr_g) vpr_c = (vprr.count('c')) print("c's" + " " + vpr_c) print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~") env_a = (envr.count('a')) print("a's" + " " + env_a) env_u = (envr.count('u')) print("u's" + " " + env_u) env_g = (envr.count('g')) print("g's" + " " + env_g) env_c = (envr.count('c')) print("c's" + " " + env_c) print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~") gag_gc = (gag_c + gag_g) / len(gag) print("GC content is: " + gag_gc + "%") pol_gc = (pol_c + pol_g) / len(pol) print("GC content is: " + pol_gc + "%") vif_gc = (vif_c + vif_g) / len(vif) print("GC content is: " + vif_gc + "%") vpr_gc = (vpr_c + vpr_g) / len(vpr) print("GC content is: " + vpr_gc + "%") env_gc = (env_c + env_g) / len(env) print("GC content is: " + env_gc + "%") ``` ### coin flip ```python= coin = random.ranf() if coin > 0.5: print('heads') else: print('tails') ``` ### hiv mutations :D #### Additional instructions 1. Create a list of mutation positions 2. Sort this list and print it (`print(pos_list.sort())`) 3. Create a variable that holds the number over total mutations and print it #### Extra exercise 1. Create a "sanity check" 2. Create a loop that goes through the original `hiv_genome` list and `new_hiv_genome` list and prints out the position of the mutation, original and new nucleotide mutation ```python= muts = 0 # the number of mutations :3333 for n in range(len(hiv_genome)): if hiv_genome[n] != hiv_genome_new[n]: print(f'{hiv_genome[n]} => {hiv_genome_new[n]} @ {n}') muts += 1 print(muts) def put_the_strlist_together(ls): return ''.join(ls) print(put_the_strlist_together(hiv_genome_new)) ``` ### rna -> aa ```python= rna = '...' protein_sequence = '' for n in range(0, len(rna), 3): codon = rna[n:n+3] AA = codon_to_AA[codon] if AA == '_': break protein_sequence += AA print(protein_sequence) ``` ```python= rna = 'AUGCAAGACAGGGAUCUAUUUACGAUCAGGCAUCGAUCGAUCGAUGCUAGCUAGCGGGAUCGCACGAUACUAGCCCGAUGCUAGCUUUUAUGCUCGUAGCUGCCCGUACGUUAUUUAGCCUGCUGUGCGAAUGCAGCGGCUAGCAGACUGACUGUUAUGCUGGGAUCGUGCCGCUAG' protein_sequence = '' for i in range(0, len(rna_sequence), 3): codon = rna_sequence[i:i+3] codon_to_AA[codon] protein_sequence += (codon_to_AA[codon]) print(codon_to_AA[codon]) if codon_to_AA[codon] == "_": print("Breaking the loop!!") break ``` ```python= #matilda and diya rna = 'AUGCAAGACAGGGAUCUAUUUACGAUCAGGCAUCGAUCGAUCGAUGCUAGCUAGCGGGAUCGCACGAUACUAGCCCGAUGCUAGCUUUUAUGCUCGUAGCUGCCCGUACGUUAUUUAGCCUGCUGUGCGAAUGCAGCGGCUAGCAGACUGACUGUUAUGCUGGGAUCGUGCCGCUAG' protein_sequence = '' list(range(0, len(rna), 3)) for i in range(0, len(rna), 3): codon = rna[i:i+3] codon_to_AA[codon] protein_sequence += codon_to_AA[codon] if codon_to_AA[codon] == "_": break print(protein_sequence) ``` ```python= ---------------- range(0, len(rna), 3) for i in range(0, len(rna), 3): codon = rna[i:i+3] AA = codon_to_AA[codon] print(f'{codon} encodes {AA}'); protein_sequence += AA if AA == '_': print('Stop Codon') break --------------- ``` ## final challenge pt.4 ```python= def calculate_GC(dna: str = ""): return (dna.count('G') + dna.count('C'))/len(dna) from numpy import random def generate_DNA(length: int = 10): my_dna = '' for n in range(length): my_dna += random.choice(['G', 'C', 'A', 'T'], p = [1/4, 1/4, 1/4, 1/4]) return my_dna def transcribe_DNAtoRNA(dna: str = ""): return dna.replace('T', 'U') def translate_RNAtoProtein(rna: str = ""): translated = '' for n in range(0, len(rna), 3): codon = rna[n:n+3] AA = codon_to_AA[codon] translated += AA return translated dna = generate_DNA(999) rna = transcribe_DNAtoRNA(dna) protein = translate_RNAtoProtein(rna) print(f'DNA sequence: {dna}\n') print(f'gc: {calculate_GC(dna)}') print('Protein sequence encoded in the dna sequence:') print(protein) ``` ```python= HIIII :333333333333 :DDDDDDD :3333 :DDDD :)))))))) whats up???? dna= input("Enter DNA string here in uppercase: ") def calculate_GC(dna): a_count= dna.count("A") t_count= dna.count("T") c_count= dna.count("C") g_count= dna.count("G") GC_count= g_count + c_count total_dna= a_count + t_count + c_count + g_count GC_percentage= (GC_count/total_dna) * 100 return GC_percentage print(calculate_GC(dna),"% GC content") ``` ### meow \^_\^ \>\~\< \>\-\< \>.< :3 ```python= Tube_1 = [6, 10, 11, 6, 5, 1] Tube_2 = [7, 8, 25, 3, 1, 1] Tube_3 = [10, 8, 3, 9, 4, 9] Tube_4 = [6, 10, 15, 8, 5, 4] Tube_5 = [10,7,7,11,2,8] Tube_6 = [5, 14, 5, 12, 7, 3] Tube_7= [9, 11, 7, 9, 6, 5] Tube_8 = [28, 4, 5, 1, 4, 2] Tube_9 = [9, 8, 4, 8, 10, 6] Tube_10 = [10, 9, 5, 9, 5, 8] ``` ```python= blue_amt = [] for n in tube_list: blue_amt.append(n[0]) print(np.mean(blue_amt)) ``` ```python= sum = 0 number = 0 length = 0 mean = 0 for tube in tube_list: number = (tube[0]) sum += number length += 1 mean = sum / length print(mean) ``` ```python= means = { "Blue": [n[0] for n in tube_list], "Brown": [n[1] for n in tube_list], "Green": [n[2] for n in tube_list], "Orange": [n[3] for n in tube_list], "Red": [n[4] for n in tube_list], "Yellow": [n[5] for n in tube_list], } # plot for all means of all colors colors = list(means.keys()) values = list(means.values()) final_values = [np.mean(n) for n in values] plot = plt.bar(np.arange(len(means)), final_values, color=colors, tick_label=colors, align='center') plt.show(plot) ``` ```python= for tube in tube_list: blue_amt = [] brown_amt = [] green_amt = [] orange_amt = [] red_amt = [] yellow_amt = [] for n in tube_list: blue_amt.append(n[0]) print(np.mean(blue_amt)) for n in tube_list: brown_amt.append(n[1]) print(np.mean(brown_amt)) for n in tube_list: green_amt.append(n[2]) print(np.mean(green_amt)) for n in tube_list: orange_amt.append(n[3]) print(np.mean(orange_amt)) for n in tube_list: red_amt.append(n[4]) print(np.mean(red_amt)) for n in tube_list: yellow_amt.append(n[5]) print(np.mean(yellow_amt)) ``` ```python= for tube_num in range(len(tube_list)): sum_col=0 for tube in tube_list: sum_col += tube[tube_num] sum_of_all_tubes.append(sum_col) print(sum_col) ``` ``` observed_mnm['Brown'] = float(observed_mnm['Brown'] ) observed_mnm['Blue'] = float(observed_mnm['Blue']) observed_mnm['Red'] = float(observed_mnm['Red']) observed_mnm['Orange'] = float(observed_mnm['Orange']) observed_mnm['Yellow'] = float(observed_mnm['Yellow'] ) observed_mnm['Green'] = float(observed_mnm['Green']) import pandas as pd import scipy.stats.mstats as mst #turn the observed_mnm dictionary into a dataframe so we can do math data = pd.DataFrame.from_dict(observed_mnm, orient ='index') # add the name 'observed' to the dataframe data.columns = ['observed'] # sum up the observations observations = data.observed.sum() data['expected'] = '' data.expected['Blue'] = 0.24 * observations data.expected['Brown'] = 0.13 * observations data.expected['Green'] = 0.16 * observations data.expected['Yellow'] = 0.14 * observations data.expected['Red'] = 0.13 * observations data.expected['Orange'] = 0.20 * observations print(data) result = mst.chisquare(data.observed,data.expected) print("Chi-squared statistic is %f" %result[0]) print("p-value is: %f" %result[1]) print("Probability null hypothesis is true: %f%%" %(float(result[1])*100)) if (float(result[1])*100) > 5: print("You should accept the null hypthothesis!") else: print("You should reject the null hypthothesis!") ```python=