Sateesh Peri
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee
  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    --- tags: workshop, notepad --- # Welcome to Data Carpentry Genomics Workshop UNR 2019 January 15-16th 2019 MIKC-107, UNR workshop lessons: https://unr-omics.readthedocs.io/en/latest/ --- Post workshop survey: https://www.surveymonkey.com/r/dcpostworkshopassessment?workshop_id=2019-01-15-reno --- ## We will use this HackMD to share links and snippets of code, take notes, ask and answer questions, and whatever else comes to mind. ### Modes The page displays a screen with three major parts: <i class="fa fa-edit fa-fw"></i> Edit: See only the editor. <i class="fa fa-eye fa-fw"></i> View: See only the result. <i class="fa fa-columns fa-fw"></i> Both: See both in split view. ## Announcements Bathrooms: gender neutral bathrooms - down the hallway Code of conduct: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html ### Instructors: * Elias Ozolor * Sateesh Peri ### TA's: * Richard Tillett * Gurlaz Kaur * Jeremiah Reyes * Ning Chang * Kyle Wang ### Introductions: Name, Institution, Email (optional), Twitter (optional). What is the last thing that you made? Yue Wang, UNR, ywang2@nevada.unr.edu, yue97734575 Marina MacLean, UNR, mmaclean@unr.edu, zucchini chicken enchiladas Vanessa Gutierrez, UNR, igutierrez@med.unr.edu, garlic-lemon mahi-mahi Stephanie Otto, UNR, sotto@nevada.unr.edu Jessica Reimche, UNR, reimchej@gmail.com, omelette Salome Manska, UNR, salomemanska@gmail.com, Made my flight! Tara Radniecki, UNR, tradniecki@unr.edu, cross-stich of avengers Isadora Batalha, UNR, isadora.batalha@nevada.unr.edu, eggs Anson Call, UNR, anson@nevada.unr.edu Samantha Romanick, UNR, sromanick@nevada.unr.edu, yogurt Chrystle Weigand, UNR, chrystleweigand@gmail.com, chow mein Andrew Hagemann, UNR, andrew.hagemann@nevada.unr.edu, eggs Chandra Sarkar, UNR, csarkar@nevada.unr.edu, https://twitter.com/genes_o_me, instant noodles Joshua Hallas, UNR,, jhallas@nevada.unr.edu Mustafa Solmaz, UNR, msolmaz@nevada.unr.edu, tea! Jess Danger, UNR Micro Dept, JDanger@med.unr.edu, last thing I made: knit dog sweaters Bruna Alves, UNR, brunaalves@cabnr.unr.edu, tea Avery Grant, UNR, averygrant@nevada.unr.edu, tea Chanchanok (Saw) Sudta, UNR, csudta@nevada.unr.edu Erica Shebs, UNR, eshebs@cabnr.unr.edu, chocolate cake! Hayden McSwiggin, UNR, hmcswiggin@nevada.inr.edu, Lauryn Eggleston, UNR, leggleston@nevada.unr.edu, blueberry waffles!!! Hedy Wang, UNR, hetanwa@163.com Lana Sheta, UNR, lsheta@nevada.unr.edu Jennifer Schoener, UNR, jschoener@nevada.unr.edu, coffee General notes == * Sometimes logging in to Atmosphere can lag and take a moment. Sateesh advises letting the page try to load for a bit, and if fails, try again. * **TODO: Add/link bioconda lesson in the TOC.** Until then, can be found at https://unr-omics.readthedocs.io/en/latest/bioconda-config.html Questions for instructors/helpers? =========== Q) _What size of instance on Atmosphere?_ A) **Medium2** Q) What other types of data can you work with in CyVerse? For example, can you process neuroimaging (MRI) data using CyVerse if it's command line based? A) CyVerse has more capabilities other than Atmosphere; I will Introduce CyVerse's Discovery Environment which does not require command-line expertise A) But yes, if things are command line based, you can process them in CyVerse Q) How to decide how many CPU and how much RAM A) Maybe depends on your data and what amount of storage and hard drive space is required for the tools you will run on your data A2) It can sometimes help to, beyond reading the papers, documentations, and forums related to a bioinformatics tool, to simply test the performance using a single sample or other subset of your data. Often you can extrapolate from that to your entire set. -- RLT Q) What are kmers? A) Q) I understand the purpose of doing log transformation of RNA expression data. But why do we do log2 transformation and not log10? Does it make any difference what the base for log transformation is used? Log of shell commands == if you want more information on any command, type "man" before the command name and press enter (takes you to the manual entry for that command) ### How to take notes: Sateesh Peri == This is a [link](http://angus.readthedocs.io/en/2018/) * point one * point two * point three ## Logging onto Atmosphere Cloud Data is huge, with all genomics projects, to be handled by local resources, so there is a need for high performance computer clusters e.g: Pronghorn, UNR's HPC. But you cannot install softwares easily on HPCs and have to wait in queue for your jobs to finish. Alternate to this is using virtual high speed computers such as Cyverse that we are using for this workshop. You can get your own allocation for your lab 'for free' by requesting it in a form from Cyverse. You can chose number of CPUs, hard drive and storage etc. on demand and delete it once you finish analysing your data. We will start with setting up our instances for Cyverse. Go to https://atmo.cyverse.org/application/images and login. It might take a minute to show up the website. Go to projects tab and create a new project, you can name it anything you like or DC_Genomics. Once it is done we go to new tab and select instance. There is already an image set with all the softwares/tools installed needed for this workshop. Go to show all tab and select UNR-RNAseq image. Now we do shopping for specifications of our instances (sounds fun!). Select the medium 2 instance size that is big enough to run all the lessons of this workshop. Once you launch your instances, it takes quite some time for the instance to become active 16 GBs of RAM is required for assembling the transcriptomes. This virtual machine we set up is remote, how do we keep our data safe? SSH is a secure way to do that. Secure Shell (SSH) is a cryptographic network protocol for operating network services securely over an unsecured network. Typical applications include remote command-line login and remote command execution, but any network service can be secured with SSH. (wikipedia definition). ### Create the RSA Key Pair Now we need to setup a key to login in our insrances from our terminal in the local machine (our laptops). For that we will follow the commands from the lesson. Window people should have mobaxterm running by know so that they can run the command lines to talk to their operating system. Mac people can open up the terminal and follow the lesson commands to generate a random key which is secure and easier way to login to our instances. Once the key is generated we need to copy our key in the atmosphere, below your user name go to settings, go to show more and in the SSH Configuration section paste your key. We have a key in our local machine and that same key is deposited in our instance (public key) so we can login to our instance from our laptops. The key generated is random and safe and hard to hack through so your data should be safe. Now we go back to our terminal in the laptop. Now we type Cyverse userid@ip adress of your instance and enter. It will ask few questions (yes or no). Windows may ask for the password, this is your Cyverse password. It is succesful when you see a big atmosphere written in your terminal window and now you are logged in your instance from your laptop. For windows if you did not type the password correctly or just followed the lesson and press enter instead of password, then you might be getting a port22 connection time out error. To solve this just delete your instance on atmosphere and create a new instance with a new ip address. You don't need to generate keys again, it is already saved and same. Just change the ip address after your userid in the terminal and try signing in again and just a reminder for windows, the password is Cyverse password. ## Using the command line We follow this lesson on our terminal. We type any command and press enter to execute it. * `ls` list items in the directory * `ls -lh` gives human readable long list of the items with their sizes, type of the item (directory, text file etc.), details of can we read or write those items. * `man ls` opens the manual of the typed command after man. Type q to quit manual page. Go to `man ls` and try three - options of it. * `ls -a` list all items, even the hidden ones (with . in front of the file names) * `ls -l` show everything * `ls -d` list directories themselves, not their contents * `ls -r` reverse order of sorting items while listing * `ls -S` sort by file sizes, largest first * `ls -t` sort by time, newest first * `ls -ltu ` this is -u with -lt: sort by, and show, access time; with -l: show access time and sort by name; otherwise: sort by access time, newest first * `wc` word count * `wc -l` count the number of lines * `wc -c` count the bytes * `wc -m` count the characters We can use pipes `|` to push one command's results into another command. * `ls | wc -l` it counts number of items in the current directory * `pwd` prints the path for the current directory * `cd` change directory * `cd` followed by the path of the directory you want to go to #takes you to the required directory * `cntrl + c` kills any command that did not work the **'tab'** button on your keyboard is very handy. It can be used to autocomplete the file names/directory names which are below/inside of the current directory. * cd followed by two dots `cd..` takes you to the previous directory (only one back) above the current directory * `cd ~` takes you the home directory from anywhere you are * cd followed by enter (just `cd`) also does the above function * `cd` followed by full path can be used to jump between directories * `ctrl+shift+C` can copy be used to copy in the terminal absolute path is the path that you get by using pwd. It is a complete address of the directory starting from the root directory. Command line is case sensitive so if a command used in wrong case will not work. Always be careful while deleting anything. Because it do not go to any trash folder but is deleted forever. * `mkdir` makes a new directory * `touch filename` makes a new file. e.g.: `touch simple.txt` creates a text file names simple * `rmdir directoryname` deletes the directory when it is empty * `rm` means remove, but it can delete only files, you will get an error that it is a directory and cannot remove it * `rm -rf directoryname` force deletes the mentioned directory * `*` is a wild card which means all. So if you type `rm -rf *` it will delete everything in the current directory * `nano document.txt` nano is a text editor that can be used to write inside the text file we just created or any mentioned text file. Write something, it will ask if you want to save changes, choose yes and exit by pressing `ctrl+x` * `cat document.txt` spit out the contents of the file in the terminal window * `cat document.txt | head` #by using head in a pipe it only shows the head of the file, not all of it. Imagine you have a really big text file, say a huge book, and you cat it will be spitting words on the window forever. So head comes very handy in this * `cat document.txt | head -n 3` #only lists first three lines of the document * `cat document.txt | tail -n 3` #spits up last 3 lines of the document.txt * `rm document.txt` #deletes the mentioned text file * `curl -O [followed by a weblink]` downloads the file in the current directory. It literally means copy url * `mv` means move, can be used to move files from one directory to other. If you use it in the same directory, it moves the file in the same directory so it can be used to change file names e.g. mv old file name new file name * `zcat` #reads any zipped files * `zcat name of the file | grep "^>"` #goes to the file and only spits out lines that begin with > * `zcat [name of the file] | grep "^>" | head` only first few lines from the file having > in the beginning will be written out * `grep` searches for PATTERNS in each FILE. PATTERNS is one or patterns separated by newline characters, and grep prints each line that matches a pattern * `grep -c` you can use -c option to count (-c) how often this pattern occurs * `zcat name of the file | grep "^>" | wc -l` counts the number of lines with > in the start * `zcat name of the file | tr "-" "." | grep "^>" | head` read file, pipe to tr, which translates _ to . and grep only first lines with > and show only the head on the window * `tr` can be used to change any particular pattern to another in throughout the file such as replacing , with . or , with \t to change commas with tab * `zcat file name | grep -v "^>" | wc -m` grep -v means anything other the pattern that follows, here we gave it >. So this command counts anything other than the line that start with > * `zcat file name | grep -v "^>" | head` to know if it is doing what we want it to do that is excluding header lines from the fasta file * `less` is an alternate of cat and is great for glancing at/through a file * `ctrl+l` clears all the contents on the terminal window * `ctrl+c` kills the runing function or the command You can use **history** command to view the commands you used before. e.g.(search the commands containing youur username): * `history | grep "username"` In order to do onther jobs while you are running a program, you can use **screen** or **tmux**. That will give you another working space to execute other jobs. * `tmux` launch a "terminal multiplexer" and intercept HUP(hangup) signals that might otherwise interrupt work when you close your laptop or lose internet. * `tmux list-sessions` (or `tmux ls` in super-duper shorthand) to view if you have any tmux sessions running on the server * `tmux attach -t 0` (or `tmux a -t 0`) to reconnect a tmux session (in this case, the session name is 0) tmus cheat sheet: https://tmuxcheatsheet.com * `screen` screen cheat sheet: https://gist.github.com/jctosta/af918e1618682638aa82 **sed** is the command for replacing or substituting the string. e.g: to replace the 'TRINITY' to 'NEMA' in the file: * `cat <filename> | sed 's/TRINITY/NEMA/g' > <new file name>` _<> is a place holder for your required filenames_ * `cat file name | grep "^>" | awk '{OFS=" "}{print $2}' | sed 's/len\=//g' | sort -rn | head` we concatenate the file and grab only the header line and then using awk that columns are separated by space and only print column 2 that has our transcript lengths. After that we use sed to replace the words len with nothing to leave us only transcripts lengths in the column 2 and then we sort it with size and now we know what is the size of our longest transcripts in decreasing order More information on sed, tr and other commands: https://astrobiomike.github.io/bash/six_commands#tr ## Bioconda An useful link: Dr. Tillett's notes on conda: https://github.com/rltillett/conda_notes Lesson link: https://unr-omics.readthedocs.io/en/latest/bioconda-config.html Conda installations make life much much easier. No need to install dependencies separately and moving the tools in your path so that operting system can find them. Bioconda is a channel for the conda package manager (specicalizing in bioinformatics software). after installing bioconda (it's already installed on this instance), we need to let the instance know the path to find the bioconda: `echo export PATH=$PATH:/opt/miniconda3/bin >> ~/.bashrc` * `>` sign is used for redirecting the output to a file: * `>>` appends to a file * `>` overwrites the file run the **source** command to execute the contents in bashrc: `source ~/.bashrc` adding channels: ``` conda config --add channels defaults conda config --add channels conda-forge conda config --add channels bioconda ``` you can use the **conda search** command to search for packages and display the information. e.g. search for a specific package named 'sourmash': `conda search sourmash` to install package, you can use **conda install** command, e.g.: `conda install -y checkm-genome` to list installed packages, use: `conda list` "Environments" are multiple different collections of installed software. Let's create a new environment named pony by using **conda create** command: `conda create -n pony` to activate/deavtivate the environment, use: ``` source activate pony source deactivate pony ``` to list the environment, type: `conda env list` you can also save this list (what softwares you have installed) for this particular environment, you can type: `conda list --export > packages.txt` ## RNA-Seq Workflow * Biological samples/Library preparation -- Technical replicates is not necessary -- Biologicla replicates is more important * Sequence reads * Quality control * Map to the reference genome * Count the reads * Statistical analysis to identify differentially expressed genes ## Short read quality and trimming Now, let's login your Atmosphere computer. Make sure you've added conda to your PATH and activated it by source command. create the directories and subdirectories: `cd ~/` `mkdir -p work work/data` `cd work/data` the -p option of mkdir command creates the specified intermediate directories for a new directory if they do not already exist. download subset of data by using curl command curl -O http://where.is.my/FILENAME.zip `curl -O http://where.is.my/FILENAME.zip` unzip the file by using unzip command `unzip [FILENAME.zip]` #define your $PROJECT variable: `export PROJECT=~/work` then you can check your data by typing: `ls $PROJECT/data` use the **less** command to view your fastq files: https://en.wikipedia.org/wiki/FASTQ_format print out the numbers of your data in the PROJECT location by using **printf** command: `set -u` `printf "\nMy raw data is in $PROJECT/data/, and consists of $(ls -1 ${PROJECT}/data/*.fastq | wc -l) files\n\n"` `set +u` link your data into your working directory by using **ln** command with **-s** option that avoids having to make a copy of the files, which will take up storage space. e.g.: `ln -s ../data/*.fastq .` the dot at the end `.` means right here (current directory) #run **FastQC** program on the files that end with .fastq: `fastqc *.fastq` #**scp**(secure copy) command can let you copy the files from the remote server to your personal laptops The first argument after scp is your username@atmosphere ip address: the full path of the files. The second argument is the path to place the files on your own computer. ### Trim the sample: Create a directory named trim, and navigate to that directory. ``` cd .. mkdir trim cd trim ``` Link the fastq data right here (.) `ln -s ../data/*.fastq .` Create a fa file containing the adaptor sequence information: `cat /opt/miniconda3/share/trimmomatic*/adapters/* > combined.fa` Use the **for** loop to run the trimmomatic program: The basic concept of a for loop is to do something the same way for each _thing_ in a _list of things_ ``` for thing in [a list of things] do trimmomatic done ``` Here is the info for trimmomatic: http://www.usadellab.org/cms/?page=trimmomatic ## GitHub: How not to lose your entire analysis! Github allows version control of all the anlysis. So that if you make a change and want to go back to the older script without losing the older steps so saving older versions of scripts using git helps you to chose and work with different versions. First of all we need a github account. Here is the link for github: https://github.com Link for git commands cheat sheet: https://services.github.com/on-demand/downloads/github-git-cheat-sheet.pdf Git is a program that can initiate your repositories. * `git init` initializes an empty repository for git * `ls -lah` shows the created repository * `git config --global user.name "your human name goes here"` **tells the git who you are** * `git config --global user.email "your email goes here"` **tells the git where can find you** * `git status` * `git add` followed by directory or file name * `git commit -m "Trimmed and quality control files for UNR workshop"` it commits all changes in the git with a comment/message within the "". Commit messages are usually required, so use the `-m` or it will try to force you to add one (it will even launch the tricky `vi` to make you write one. so use `-m`!) * `git log` gives you the unique identifier that tells who committed it what commit was made Now we go to github web account and go to create a new repository. You can name it anything or UNR-workshop. Then create this public repository. * `git remote add origin` your github account website goes here for the repository that we created above on the website * `git push -u origin master` push all the files that you committed into the UNR-workshop. You can go to the website and check the files uploaded. Good Job! ## Day 2 Workshop please put down your name below to mark your attendance Sateesh Peri Elias Ozolor Ning Chang Vanessa Gutierrez Salome Manska Andrew Hagemann Richard Tillett Chandra Sarkar Mustafa Solmaz Jessica Reimche Marina MacLean Chanchanok Sudta Erica Shebs Jess Danger Edgar Torres Lauryn Eggleston Hayden McSwiggin Kyle Wang Lana Sheta Jennifer Schoener ## De novo transcriptome assembly How to assemble your transcriptome? When you build a transcriptome, generally you should consider the all possible variations (e.g. different life stages, different tissues etc.) of your samples. There are two ways to do the transcriptome assembly: splice-aware alignment to the reference genome, and De novo transcriptome assembly. Trinity is a assembler. Trinity works in K-mer space (K=25). Different assembler starts with different K-mers. How to decide different K-mers? Trinity has four stages: * Jellyfish Extracts and counts K-mers from the reads * Inchworm * Chrysalis * Butterfly Let's try to assemble some transcriptome now. We will follow the lesson notes from here on. https://unr-omics.readthedocs.io/en/latest/transcriptome-assembly.html **Contig Nx values**: the total length of the contigs (if you rank the contigs by the length) that represent the x% of the transcriptomes **Annotation** https://angus.readthedocs.io/en/2018/dammit_annotation.html **Evaluation** https://dibsi-rnaseq.readthedocs.io/en/latest/evaluation.html ## Read Quantification Once we have the aligned reads, the next step is to count the reads. Index is the step to extract the information of your transcripts (or genome). Kallisto-Sleuth tutorial: https://sateeshperi.shinyapps.io/kallisto-sleuth/ ## Differential expression analysis with DESeq2 Before doing DE analysis, the read counts initially should be normalized. After normalization, the unsupervised clustering analysis should be performed for quality control. Let's run RStudio in a way of server interface on your browser by this link: $ echo http://$(hostname):8787/ What is gene dispersion? (answered by Michael Love, author of DESeq, and Gordon Smyth, author of EdgeR) https://support.bioconductor.org/p/75260/ ### Working in Rstudio: Here is the link for R and Rstudio: https://datacarpentry.org/R-ecology-lesson/00-before-we-start.html Here is the link for DESeq2 in Rstudio: https://unr-omics.readthedocs.io/en/latest/DE.html#working-in-rstudio * `library()` load a package * `setwd()` set the working directory * `<-` or `=` to define a value * `?` In R, you can always use ? to get the help of the functions. * `list.file()` to print out a vector of the names of files * `file.path()` a way to build the path to a file * `read.csv()`a function allows you to input the data (create a dataframe) * `colnames()` to view the columne names of the file * `head()` to view the first parts of the contents * `dim()` to view the dimensions of an object or a dataframe * `tximport()` import transcript-level abundances and counts * `DESeqDataSetFromTximport()` * `DESeq()` * `plotDispEsts()` plot the dispersion * `rlog` log transformed values * `plotPCA` to tear apart the data based on their grouping/treatments PCA(Principal Components Analysis)-- Another way to visualize sample-to-sample distances. It's a part of quality control of your RNA-Seq data. * `results()` extract the results from a DESeq analysis ### Other links on R for further learning: https://dss.princeton.edu/training/RStudio101.pdf https://www.rstudio.com/online-learning/ http://web.cs.ucla.edu/~gulzar/rstudio/basic-tutorial.html https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf http://datacarpentry.org/semester-biology/lectures/ ### Useful link for DESeq2 https://bioconductor.org/packages/release/bioc/manuals/DESeq2/man/DESeq2.pdf Fancy volcano plots: https://bioconductor.org/packages/release/bioc/vignettes/EnhancedVolcano/inst/doc/EnhancedVolcano.html https://www.datacamp.com/courses/rna-seq-differential-expression-analysis Jessica Reimche === How to decide how many CPU and how much RAM (Maybe depends on your data and what amount of storage and hard drive space is required for the tools you will run on your data) Chandra Sarkar === cyverse is FREE!!! Yeah! #document/text editor nano:: $ nano document_name #reads the first 10 lines(default) of the document:: $ document.txt | head #reads the first 3 lines of the document:: $ cat document.txt | head -n 3 #reads the last 3 lines of the document:: $ cat document.txt | tail -n 3 #curl = copy (any) URL #zcat = reads any zipped document #visualizing the information lines:: $ zcat fhet.tr.fna.gz | grep "^>" | head $ zcat fhet.tr.fna.gz | grep "^>" | wc -l #translating a character into another:: $ zcat fhet.tr.fna.gz | tr "_" "." | grep "^>" | head $ zcat fhet.tr.fna.gz | grep -v "^>" | wc -m #for double checking the previous command:: $ zcat fhet.tr.fna.gz | grep -v "^>" | head #one redirect symbol (>) = Its going to replace the contents of the entire file #two redirect symbol (>>) = It would not replace the contents of a file but will instead append #set +u puts a checkpoint in bash script. It will terminate when an error is met in the first instance #IMP = for viewing the files after secure copy as html files:: firefox ~/Desktop/nema_fastqc/0Hour_ATCACG_L002_R1_001_fastqc.html #IMP = everytime you are kicked out of the ssh connection, you need to define your variables again. OR you can initially modify the bash profile and declare it there. DAY 2 - - - - - - - For de novo assembly, it is better to use a less number f organisms so that suring assembly, the assembler does not confuse the original base and polymorphic bases. Trinity is just one assembler. Most people use multiple assembler and combine the results to derive the end assembly. #declaring variable after being logged out of workspace export PROJECT=~/work #why use the command time with Trinity? So it gives an output of the time taken to run the program. That gives an idea about when you want to run this program again with a different set of data (of different size). #to save your work from getting broken if you lose conenction $ tmux list-sessions $ tmux attach -yt 0 ###because session number 0 $ exit ### exit $ tmux #The no. of transcripts will ALWAYS be different. Always use the latest version and use CONDA. #if the parent directory does not exist, use -p with mkdir $ mkdir -p test1/test2 #Index command extracts a file which is easier to go through and does not slow down the computer. Because evrything is pulled in the memory, the index file speeds up the whole process of mapping. #in the fro loop for salmon it is not necessary to extract basename and do the other steps. BUT it is used for step checks and that each file is accessed. its basically QC. Stephanie Otto === What other types of data can you work with in CyVerse? For example, can you process neuroimaging (MRI) data using CyVerse if it's command line based? Anthony Harrington === zcat fhet.tr.fna.gz | tr "_" "." | grep "^>" | head #convert to tab delim? tr "," "\t" #to get fasta size zcat fhet.tr.fna.gz | grep -v "^>" | wc -m cat nema-transcriptome-assembly.fa | sed 's/TRINITY/NEMA/g' # Resources Shell novice guide: https://unr-dcg19.slack.com/archives/CDZ9692LA/p1547588745016300 https://www.tldp.org/LDP/Bash-Beginners-Guide/html/ https://linuxconfig.org/bash-scripting-tutorial-for-beginners 6 unix commands worth knowing: https://astrobiomike.github.io/bash/six_commands#tr Datacamp RNAseq analysis pdf's : https://unr-dcg19.slack.com/files/UDYGWD2RE/FFEPGL885/datacamp_rnaseq1.pdf https://unr-dcg19.slack.com/files/UDYGWD2RE/FFDV6R8TB/datacamp_rnaseq2.pdf https://unr-dcg19.slack.com/files/UDYGWD2RE/FFDR9SMA4/datacamp_rnaseq3.pdf https://unr-dcg19.slack.com/files/UDYGWD2RE/FFDR9USHW/datacamp_rnaseq4.pdf Datacamp course on RNAseq DE analysis: https://www.datacamp.com/courses/rna-seq-differential-expression-analysis

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully