Biocoding 2021

# Biocoding 2021 --- ## Shared URLS - Course website: [link](https://jasonjwilliamsny.github.io/biocoding-2021-dnalc/) - Learn more markdown: [link](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) - Human genome: [link](https://www.ncbi.nlm.nih.gov/projects/genome/guide/human/index.shtml) - SNPedia: [link](https://www.snpedia.com/index.php/SNPedia) - Project Jupyter: [link](https://jupyter.org/) - Interesting Jupyter notebooks: [link](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks) - Try Linux terminal: [link](https://cocalc.com/doc/terminal.html) - Rapid DNA extraction protocol: [link](https://dnabarcoding101.org/lab/protocol-2.h(tml#standard) - mybinder.org: [link](https://mybinder.org/) - Notebooks: https://github.com/JasonJWilliamsNY/biocoding-2021-notebooks - Zoom link: [link](https://cshl-dnalc.zoom.us/j/5163675186) - JupyterHub: [http://128.196.142.15:8000](http://128.196.142.15:8000) - Setup page for Jupyter: [link](https://jasonjwilliamsny.github.io/biocoding-2021-dnalc/setup.html) --- Sign in below: - Pierce - Jason - Anna - George - Evelyn - Peter :) - Harry - Zabelle - Ethan - Samantha - Andres - Benjamin --- ## Shared notes **Linux commands** - PWD = print working directory - ls = list - cd = change directory - [linux explainer](https://explainshell.com) --- # Day II - Variables and strings **Question** Discuss with your partner, what variable names would you use describe ### Average weight of a mouse group? - a_avgweight, b_avgweight, g_avgweight - avgWeightAlpha, avgWeightBeta - a_mass_avg, b_mass_avg, g_mass_avg - alpha_avgMass - avgMassA - avgMceWght - avgweight - *alpha_Avg_mass* - alpha_avgMass, - avg_mass - avg_mass ### Number of mice in a group? - num_mice_a, num_mice_b, num_mice_g - numMceGrp - a_mice_amt - *alpha_num_mice* - numMiceAlpha, numMiceBeta - num_mice - numbermice ## Fasta printer example sequence_name = "SEQ 001" dna_sequence = "gactgatcgatcgcgatcgcgatcgatcgactcgccccgtgtgtg" print(">" + sequence_name + "\n" + dna_sequence + "\n") # Question What is the index of the gag gene gag = hiv_genome[791:2293] gag_gene = HIV_genome[789:2292] - gag = hiv_genome[790:2293] - gag=(hiv_genome[789:2292]) - gag = hiv_genome[790:2293] - gag = HIV_gene[789:2292] - gag = hiv_genome[790:2292] - gag = hiv_genome[790:2292] - gag = hiv_genome[790:2293] # Mutation simulation ## Room 1 pierce - i have done this: williams@cshl.edu http://128.196.142.15:8000/user/hsu/tree/biocoding-2021-notebooks/notebooks ## Room 2 ## Room 3 # Thursday # build a random dna sequence... from numpy import random final_sequence_length = 80 initial_sequence_length = 0 dna_sequence = '' my_nucleotides = ['a','t','g','c'] my_nucleotide_probs = [0.25,0.25,0.25,0.25] while initial_sequence_length < final_sequence_length: nucleotide = random.choice(my_nucleotides,p=my_nucleotide_probs) dna_sequence = dna_sequence + nucleotide initial_sequence_length = initial_sequence_length + 1 print('>random_sequence (length:%d)\n%s' % (len(dna_sequence), dna_sequence)) ## Write the appropriate code to translate an RNA string to a protein sequence: ### Room 1 ### Room 2 ### Room 3 ## M&M Data Order: Blue Brown Green Orange Red Yellow tube_0 = [22, 13, 21, 18, 8, 7] tube_1 = [8, 8, 13, 12, 6, 1] tube_2 = [9,16,2,7,9,3] tube_3 = [14, 9, 14, 9, 1, 1] tube_4 = [6, 7, 5, 11, 12, 5] tube_5 = [12,8,4,12,4,8] tube_6 = [7,2,6,10,4,18] tube_7 = [10, 8, 11, 8, 8, 3] tube_8 = [18, 4, 9, 6, 4, 6] tube_9 = [11, 4, 5, 12, 14, 2] tube_10 = [23, 2, 3, 8, 4, 9] tube_11 = [17, 3, 6, 10, 7, 4] Sample code for multi plot def plot_mm(tube): observations = tube n = len(observations) index = np.arange(n) colors = ['blue', 'brown', 'green', 'orange', 'red', 'yellow'] plot_1 = plot.bar(index, observations, color=colors, tick_label=colors, align='center') plot.show(plot_1) data = [tube_1, tube_2, tube_3, tube_4, tube_5, tube_6, tube_7, tube_8, tube_9, tube_10, tube_11] plot_mm() ## State names *read in state names* state = pd.read_csv("data/StateNames.csv",sep=',',index_col=0, dtype= {"Name":str, "Year": object, "Gender": object,"Count":int}) state = pd.read_csv("data/StateNames.csv",sep=',',index_col=0, dtype= {"Name":str, "Year":object, "Gender":object, "State":str, "Count":int}) *subset by male condition* condition_male = national['Gender'] == 'M' male = national[condition_male] male.head() # condition_m = national['Gender'] == 'M' male = national[condition_m] male.head() * most popular boy's name* male_byname = male.groupby('Name').sum() male_byname_sorted = male_byname.sort_values(by='Count',ascending=False) male_byname_sorted[0:50] * most popular name in NY; by sex * condition_state = state['State'] == "NY" ny = state[condition_state] condition_year = ny['Year']=='2014' ny2014 = ny[condition_year] condition_f = ny2014['Gender']=='F' f2014 = ny2014[condition_f] f2014.head() condition_m = ny2014['Gender']=='M' m2014 = ny2014[condition_m] m2014.head() *Sequencing results path* ../sequencing_results

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.