# Biocoding 2021
---
## Shared URLS
- Course website: [link](https://jasonjwilliamsny.github.io/biocoding-2021-dnalc/)
- Learn more markdown: [link](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)
- Human genome: [link](https://www.ncbi.nlm.nih.gov/projects/genome/guide/human/index.shtml)
- SNPedia: [link](https://www.snpedia.com/index.php/SNPedia)
- Project Jupyter: [link](https://jupyter.org/)
- Interesting Jupyter notebooks: [link](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks)
- Try Linux terminal: [link](https://cocalc.com/doc/terminal.html)
- Rapid DNA extraction protocol: [link](https://dnabarcoding101.org/lab/protocol-2.h(tml#standard)
- mybinder.org: [link](https://mybinder.org/)
- Notebooks: https://github.com/JasonJWilliamsNY/biocoding-2021-notebooks
- Zoom link: [link](https://cshl-dnalc.zoom.us/j/5163675186)
- JupyterHub: [http://128.196.142.15:8000](http://128.196.142.15:8000)
- Setup page for Jupyter: [link](https://jasonjwilliamsny.github.io/biocoding-2021-dnalc/setup.html)
---
Sign in below:
- Pierce
- Jason
- Anna
- George
- Evelyn
- Peter :)
- Harry
- Zabelle
- Ethan
- Samantha
- Andres
- Benjamin
---
## Shared notes
**Linux commands**
- PWD = print working directory
- ls = list
- cd = change directory
- [linux explainer](https://explainshell.com)
---
# Day II - Variables and strings
**Question**
Discuss with your partner, what variable names would you use describe
### Average weight of a mouse group?
- a_avgweight, b_avgweight, g_avgweight
- avgWeightAlpha, avgWeightBeta
- a_mass_avg, b_mass_avg, g_mass_avg
- alpha_avgMass
- avgMassA
- avgMceWght
- avgweight
- *alpha_Avg_mass*
- alpha_avgMass,
- avg_mass
- avg_mass
### Number of mice in a group?
- num_mice_a, num_mice_b, num_mice_g
- numMceGrp
- a_mice_amt
- *alpha_num_mice*
- numMiceAlpha, numMiceBeta
- num_mice
- numbermice
## Fasta printer example
sequence_name = "SEQ 001"
dna_sequence = "gactgatcgatcgcgatcgcgatcgatcgactcgccccgtgtgtg"
print(">" + sequence_name + "\n" + dna_sequence + "\n")
# Question
What is the index of the gag gene
gag = hiv_genome[791:2293]
gag_gene = HIV_genome[789:2292]
- gag = hiv_genome[790:2293]
- gag=(hiv_genome[789:2292])
- gag = hiv_genome[790:2293]
- gag = HIV_gene[789:2292]
- gag = hiv_genome[790:2292]
- gag = hiv_genome[790:2292]
- gag = hiv_genome[790:2293]
# Mutation simulation
## Room 1
pierce - i have done this:
williams@cshl.edu
http://128.196.142.15:8000/user/hsu/tree/biocoding-2021-notebooks/notebooks
## Room 2
## Room 3
# Thursday
# build a random dna sequence...
from numpy import random
final_sequence_length = 80
initial_sequence_length = 0
dna_sequence = ''
my_nucleotides = ['a','t','g','c']
my_nucleotide_probs = [0.25,0.25,0.25,0.25]
while initial_sequence_length < final_sequence_length:
nucleotide = random.choice(my_nucleotides,p=my_nucleotide_probs)
dna_sequence = dna_sequence + nucleotide
initial_sequence_length = initial_sequence_length + 1
print('>random_sequence (length:%d)\n%s' % (len(dna_sequence), dna_sequence))
## Write the appropriate code to translate an RNA string to a protein sequence:
### Room 1
### Room 2
### Room 3
## M&M Data
Order: Blue Brown Green Orange Red Yellow
tube_0 = [22, 13, 21, 18, 8, 7]
tube_1 = [8, 8, 13, 12, 6, 1]
tube_2 = [9,16,2,7,9,3]
tube_3 = [14, 9, 14, 9, 1, 1]
tube_4 = [6, 7, 5, 11, 12, 5]
tube_5 = [12,8,4,12,4,8]
tube_6 = [7,2,6,10,4,18]
tube_7 = [10, 8, 11, 8, 8, 3]
tube_8 = [18, 4, 9, 6, 4, 6]
tube_9 = [11, 4, 5, 12, 14, 2]
tube_10 = [23, 2, 3, 8, 4, 9]
tube_11 = [17, 3, 6, 10, 7, 4]
Sample code for multi plot
def plot_mm(tube):
observations = tube
n = len(observations)
index = np.arange(n)
colors = ['blue',
'brown',
'green',
'orange',
'red',
'yellow']
plot_1 = plot.bar(index,
observations,
color=colors,
tick_label=colors,
align='center')
plot.show(plot_1)
data = [tube_1,
tube_2,
tube_3,
tube_4,
tube_5,
tube_6,
tube_7,
tube_8,
tube_9,
tube_10,
tube_11]
plot_mm()
## State names
*read in state names*
state = pd.read_csv("data/StateNames.csv",sep=',',index_col=0,
dtype= {"Name":str, "Year": object, "Gender": object,"Count":int})
state = pd.read_csv("data/StateNames.csv",sep=',',index_col=0,
dtype= {"Name":str, "Year":object, "Gender":object, "State":str, "Count":int})
*subset by male condition*
condition_male = national['Gender'] == 'M'
male = national[condition_male]
male.head()
#
condition_m = national['Gender'] == 'M'
male = national[condition_m]
male.head()
* most popular boy's name*
male_byname = male.groupby('Name').sum()
male_byname_sorted = male_byname.sort_values(by='Count',ascending=False)
male_byname_sorted[0:50]
* most popular name in NY; by sex *
condition_state = state['State'] == "NY"
ny = state[condition_state]
condition_year = ny['Year']=='2014'
ny2014 = ny[condition_year]
condition_f = ny2014['Gender']=='F'
f2014 = ny2014[condition_f]
f2014.head()
condition_m = ny2014['Gender']=='M'
m2014 = ny2014[condition_m]
m2014.head()
*Sequencing results path*
../sequencing_results