Revision session - Day 3

# Revision session - Day 3 ## Questions ## Feedback ### Rooms Poll results: 25% No, 25% Yes, 50% Undecided. Trial: We will create four rooms, join a room according to your preference based on the table below: |Room number|Preference| |---|---| |1|Individual work| |2|Some interaction| |3|More interaction| |4|Collaborative work| ### Project discussion We repurposed the `# introductions` channel in slack for discussing projects. Other suggestions would be to have: 1. elevator pitches: one slide, 3 minutes, to present your project 2. a get-together open discussion session 3. finding common interests and having a few get-together sessions on said topics with Bastian and myself as a panel and participants discussing issues related to that topic ## Assessment 1. Why is it important to process the forward (_1) and reverse (_2) reads simultaneously when rRNA filtering or trimming paired-end data? Let's think that through for a minute. Think about paired-end data, you have 2 files, one containing all forward and the other one, all reverse reads. For a given sequenced fragment, how do we know which forward read correspond to a reverse read in the two separate files? * the read ID * the physical location in the file Among these two, which is the easiest to look up? Empirically, the second. Now sorting rRNA or trimming the data might take away a read from either file, _de facto_ breaking the ordering in the files. Because the ordering is relied upon by all downstream tools, whether assembler or (pseudo)-aligner, it is crucial to keep the mate reads (the forward and reverse reads coming from the same fragment) at an identical physical position in both files. 2. Describe the principle of de novo transcriptome assembly in one sentence An incomplete jigsaw puzzle :-D that aims at reconstructing _in-silico_ the transcriptome from sequenced fragment. 3. What is the advantage of the digital normalisation of the data? The digital normalisation advantages are numerous: * reduce computational resources * reduce assembly fragmentation due to sequencing errors * increase the representation and accuracy of low expressed transcripts in the final assembly 4. Should digital normalisation be applied when doing expression quantification (not assembly!)? True/**FALSE** NO. Digital normalisation cuts away the dynamic range of the data and while this is advantageous for assembly, it would be detrimental to expression profiling. Every transcript expressed over the selected coverage cutoff would be brought down to the cutoff. 5. Is it necessary to sequence as deep as possible for the purpose of de-novo transcriptome assembly? 1. Of course! 2. **It depends but probably too little risk an incomplete assembly while too much risks a fragmented assembly** 3. No, shallow is enough.