# Compulsory Assignment 1 - BINF200 H20 This is a shared Q&A document for the work on the Compulsory Assignment 1 of BINF200 at the University of Bergen in the autumn 2020. __Ask questions at the bottom__ of the document. You can ask anonymously or add your name. ## Links and contact * [Information at MittUiB](https://mitt.uib.no/courses/23753/discussion_topics/182722) * Assignment instructions (in [PDF](https://mitt.uib.no/files/2713025/download?download_frd=1) or [DOCX](https://mitt.uib.no/files/2713024/download?download_frd=1)) * [Galaxy server at UiB](https://galaxy-uib.bioinfo.no/) * Interactive sessions on Zoom, Wed 9 Sep 08:30-12:00 and Thu 10 Sep 08:30-10:00, where you can work on the assignment and get immediate answers to your questions (using Zoom breakout rooms and chat). [See connection details at MittUiB](https://mitt.uib.no/courses/23753/discussion_topics/182722) (different Zoom link than lectures!) * [Recorded video introduction](https://mitt.uib.no/courses/23753/files/2714332/download?wrap=1) from the 1st Zoom session (19min 160MB). * This HackMD document for shared Q&A, which will be available until "forever". [name=Matus] will keep answering your questions here, and feel free to "+1" others' questions or answer them yourself if you have a good answer. * If you want to ask privately, you can do it during the Zoom sessions (in your breakout room or private chat), or anytime afterwards by messaging [name=Matus Kalas] on MittUib or emailing matus.kalas@uib.no * If you do not have good working conditions at home, or need a bigger screen or a desktop computer, you can use one of the computer rooms in Realfagbygget ([1003](http://use.mazemap.com/?v=1&campuses=uib&sharepoitype=identifier&sharepoi=308:1B6D) is booked for us during the Zoom session, [1001](http://use.mazemap.com/?v=1&campuses=uib&sharepoitype=identifier&sharepoi=308:1C7A) and [1002](http://use.mazemap.com/?v=1&campuses=uib&sharepoitype=identifier&sharepoi=308:1B7A) are empty most of the time). Of course as long as you're not quarantined or isolated :( And the university premises are now closed in the night, between midnight and 6am. ## How to edit in HackMD If you see this page in **view mode**, what you probably do after opening it, click on *Edit* on the top. Later you can always do either of the following to enable editing: 1. **Edit mode**: Ctrl+Alt+E, or click the button with a pencil on the top left to switch to edit mode 2. **Both mode**: Ctrl+Alt+B, or click the button with a square with a vertical line in the middle (among the 3 buttons on the top left). Nice with a big-enough screen. ## Immediate feedback Add any feedback here, for immediate improvements. * [name=Matus]: Sorry for not watching the MittUiB messages for the first 2 hours of the first Zoom session :/ * * * ## Questions and answers Please **ask new questions at the bottom** of this document. You can ask anonymously, or write your name. * writing e.g. `[name=Lisa]` will show up as [name=Lisa] ------------- * A sample question: How to write a question using **markdown** in HackMD? * You can just add a new line or a couple, and you can add a bullet point (`*`), use **bold** or *italics*, [links](https://mitt.uib.no), etc. * You can nest and indent bullet points under each other like this (using tab or at least 2 spaces) * [name=Matus] Another question: What to ask first? * ... * .. * blahblah * [name=Henrik Ø. Søgaard] I have logged in to the NeLS but need access to the assignment data. * You have it now, please refresh. If the 'Send to Galaxy' button won't be there underneath where you're selecting your 2 data files, then please start from [Galaxy](https://galaxy-uib.bioinfo.no/) again, with *Get files from NeLS storage* * [name=Jane]Hi I have a question. I am not quite sure with how to make choice of the Input FASTQ file (R1/first of pair) and R2. Should I make the common shared file SRR5412556 as R1 and file passt to me as R2? * This sounds like you might be using the option for "Paired-end". Please use the "Single-end" option instead * YES BUT IN ASSIGNMENT 1, there is a choice to use paird, either select your 2 data sets one at a time (‘Single dataset’ switch), or select them both at once in the second, * I see. You can select both of your datasets (Trimmomatic results) in the "Multiple datasets" option. * But in any case, please select "Single-end" in the question 'Is this a single or paired library' * ok, So I have to try it again, just to make choice of one of them? * For simplicity, you can just use 'Single dataset' and 'Single-end'. THESE ARE 2 DIFFERENT THINGS. Another option would be using 'Multiple datasets' but still 'Single-end' * I have read paper yesterday. It is still not so clear for me. I should now build up 2 libraries from thes 2 datasets, is it right? * No no, you don't need to create any libraries. We are just re-analysing previously sequenced RNA-Seq reads, in a rather simplified way. * [name=Jane] I have more problem, can you send us a link to get HISAT2, I have tried to find in google NGS homepage and search for RNA analysis, but I cannot find any page where I can search for HISAT2 * A general website with info about HISAT2 is https://daehwankimlab.github.io/hisat2/ * In Galaxy-UiB, you find it under 'NGS: RNA analysis' * [name=Jane] another question is about how to get the information from SRR5412556, I go to the homepage of SRA and search the ID on the page Run Browser, is it right? I read the SRA page, SRR is a ID for run browser, even I do not quite understand what it means. Anyways, after I went to the page, I get a link to Blast, is it the link we should open to get more information about SRR5412556? I have to say, those terms are so strange for me, that it makes me very difficult to figure out what and where to get information. * Oh, I understand that those websites with a lot of confusing terminology and jargon can feel very confusing for everyone who isn't used to it. * However, you don't need to go too deep into browsing through all the complicated stuff. * Simplest suggestion: When you are at the SRA website, there is a SEARCH bar at the top. There you can paste your ID, e.g. SRR5412556, and see what it returns. * I have done it already yesterday, and there shows experiment, and something else. If I don't quite understand the meaning of ID, how can I figure out what is the relationship between the experiment and my ID? * The search result on SRA will show you what this study is about (in 14 words), what organism the RNA comes from, how long all the sequence reads are together (number of bases), and how big the data set is. That's all you need to know for successfully finishing the 1st assignment. * Ok, thanks a lot firstly, I will go to those pages and see what I can do today. * Alright, all best and feel free to ask more questions if you need ;) You're welcome! * [name=Matus] TO EVERYONE: It looks like there is some bug in the current version of Galaxy-UiB when using the tool __Line/Word/Character count of a dataset (Galaxy Version 1.0.0)__, that it will not work on the original **compressed** raw reads (_fastqsanger.gz_ format). But it works fine on uncompressed reads such as the results of Trimmomatic. (Then for the compressed raw reads, you can try to uncompress them by some tool in Galaxy, or also converting them to FASTA format) * [name=Jane] where can I find the number of sequences after trimming? I can only see the size of data. * You can count the number of lines in the file (search e.g. for 'count' in Galaxy), and divide by how many lines per sequence there are (look at the file). Feel free to ask for more clues if this didn't help * ok, thanks! * [name=Håvard] * * * * *