**Computational Basics for Plant Biology** ---- **Jason Williams, CSHL** (e) williams@cshl.edu (t) @JasonWilliamsNY **Zoom** (for screensharing): [ link](https://cshl-dnalc.zoom.us/j/97683182809?pwd=RWJySmV3Y09DZS9UN1d2ZVdwM2g1QT09) ---- -Sristi [toc] ## Resources ### Data/knowledge management \* - FOSS [Data Management intro](https://cyverse-foss.readthedocs-hosted.com/en/latest/03_managing_data.html#) - Markdown: [Cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) - ReadTheDocs/mk-docs: [Documentation](https://readthedocs.com/), [https://mkdocs.readthedocs.io/en/stable/](https://mkdocs.readthedocs.io/en/stable/) - FairSharing: [FAIR sharing](https://fairsharing.org/) - [FAIR Principles paper](https://www.nature.com/articles/sdata201618) - Data Stewardship: [Wizzard](https://ds-wizard.org/) - Example [DMPs](https://www.lib.ncsu.edu/do/data-management/elements-of-a-dmp) ### Reproducibility - GitHub: [github.com](github.com/) - Conda: [Conda](https://docs.conda.io/en/latest/) - Bioconda: [Bioconda](https://bioconda.github.io/) - Docker: [Docker](https://www.docker.com/) - Ten Simple Rules on [Dockerfiles](https://journals.plos.org/ploscompbiol/article/comments?id=10.1371/journal.pcbi.1008316) - Biocontainers: [Biocontainers](https://biocontainers.pro/) - Jupyter: [jupyter.org](https://jupyter.org/) - Snakemake: [snakemake](https://snakemake.readthedocs.io/en/stable/) ### Software and coding skills - Software Carpentry: [SWC](https://software-carpentry.org/) - Data Carpentry: [DC](https://datacarpentry.org/) - Bioconductor: [bioconductor.org](http://bioconductor.org/) ### Computational resources - CyVerse: [https://cyverse.org/](https://cyverse.org/) - JetStream2: [https://jetstream-cloud.org/](https://jetstream-cloud.org/) - Galaxy: [https://usegalaxy.org/](https://usegalaxy.org/) ### Community resources - Biostars: [https://www.biostars.org/](https://www.biostars.org/) - StackOverflow: [https://stackoverflow.com/](https://stackoverflow.com/) - Twitter Bioinformatics: [Bioinformatics Community](https://twitter.com/i/communities/1506791236987879425) - LifeSciTrainers: [LifeSciTrainers.org](http://lifescitrainers.org) --- ## Survey Results **What's your current computational challenge(s)** -Sristi: In future I will be working with RNAseq and proteomics dataset. So I want to learn about it. -Deeksha: I have no idea where to basically start from and the how to utilize the command lines for data representation and organisation and Transcriptome/sc transcriptomics data analysis.. - Jason: - Dang: I don't know what the standards are for data organization, I kind of just make google sheets or excel sheets and throw them in the group drive. - Clair: I am working with many large scRNA datasets. While I am comfortable with the data analysis, I need to create a way for others in the lab and collaborators to access these datasets and edit them. - Sydney: I know next to nothing about any of these computer programs, but will have a ton of genetic data to sort through - Uzezi: How to create a website using github. I need to learn how to commit my code directly to github from my terminal. I also need to learn the basics of ML. - Kyle: Adapting published ML algorithms for my own applications in enzyme engineering. Also, better understanding the code and architecture of said algorithms so I can alter them for different applications. - Elena: qPCR/RNAseq data analysis, graph production and statistical analyses - Abe: I don't have any at the moment, but I suppose some day I might be working with some RNA seq data. Dealing with all of that will definitely require some computational effort. - Joe: I need to add features to a protein design machine learning algorithm, where 'features' might be tweaking the training regimen to encompass different types of protein properties, but I don't have a huge ML background - Rachel: Analyzing large RNA-seq datasets in R, given I have very little R experience. Also presenting these data on an accessible platform, and I don't really know where to start there. - Ziv: visualization of scR dNA-seq dataset. - Jade: Data visualization, learn how to make plots efficiently. Not familiar with coding. Need to learn how to use softwares like pyMol or other protein structure visualization. - Kevin: As I begin to navigate more complex data and in the future when I do, I anticipate having to learn better Github usage and management skills, code efficiency and higher levels commands to implement. I also anticipate needing to learn more bash coding to work with our cluster. - Michelle: Using Github and markdown to save/organize my code for genomic analyses I have done. - Kaotar: I would like to learn how to organise and manage my data, as well as learning how to use R to analyse sequencing data. - --- ## Notes ### Objectives - Prepare you with the basic Python for later in this course - Understand basic data management skills and resources - Learn about learning resources and how to focus your future learning ### Plotting and Programming in Python - Software Carpentry [Plotting and Programming in Python](http://swcarpentry.github.io/python-novice-gapminder/) - Google Colab [Colab](https://colab.research.google.com/) - [New Google account instructions](https://support.google.com/accounts/answer/27441?hl=en) ### Set up commands #Get the dataset !wget http://swcarpentry.github.io/python-novice-gapminder/files/python-novice-gapminder-data.zip #Check the download !ls #Unzip the dataset !unzip python-novice-gapminder-data.zip #### [Variables and assignment](http://swcarpentry.github.io/python-novice-gapminder/02-variables.html) #### [Built in and types](http://swcarpentry.github.io/python-novice-gapminder/03-types-conversion.html) #### [Libraries](http://swcarpentry.github.io/python-novice-gapminder/06-libraries.html) #### [Tabular data](http://swcarpentry.github.io/python-novice-gapminder/07-reading-tabular.html) ### Help code data_oceania_country = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country') print(data_oceania_country) #### [Pandas](http://swcarpentry.github.io/python-novice-gapminder/08-data-frames.html) #### [Plotting](http://swcarpentry.github.io/python-novice-gapminder/09-plotting.html)