Meeting Notes 2020/05/08

Meeting Notes 2020/05/08 === ###### tags: `Meeting` :::info **Agenda** 1. Plan with HackMD self-hosted instance @mb706? 2. Tasks from last meeting `15min` [name=All] 3. [[Pad for collection of ideas]](https://hackmd.io/XM4QM-yLTY2QWpuFVTjGEw?both)`10min` [name=Philipp] 4. [[Benchmark paper]](https://hackmd.io/@genomenet/BJJfhLZ5I)`10min` [name=Philipp] 5. Weekend: [[Don't Starve Together]](https://www.klei.com/games/dont-starve-together)? `5min` [name=Philipp] **Participants** - Julia (JM) - Martin (MB) - René (RM) - Philipp (PM) ::: :closed_book: Tasks from last meeting -- - [x] OMM error caused by Tensorboard ggplot confusion matrix and remove this part from *deepG* / RM - [x] explore _stateful_ LSTM / RM - [x] move the new negative dataset for CRISPR detection to luna and remove overlap / PM - [x] Stateful LSTM - [x] sensible values for batch size and step size - [x] add callback to reset graph? - [x] Wavenet - [x] Hyperparameter tuning - [x] Stop runs that are not getting better? [TB grid](http://bioinf026:8097/#) :open_book: New tasks -- - [ ] post example code that would work on CRISPR binary task for training and inference and sent to JM / RM ## Notes  ### Neg. dataset for CRISPR prediction ``` $ bowtie2 -p 50 -x all_hmp1-II_reads -f reads_neg.fasta -S alignment.sam --un reads_neg_filtered.fasta 68818787 reads; of these: 68818787 (100.00%) were unpaired; of these: 67992983 (98.80%) aligned 0 times 189394 (0.28%) aligned exactly 1 time 636410 (0.92%) aligned >1 times 1.20% overall alignment rate ``` Filtered dataset is here: `/net/scratch/nobackup/pmuench/reads_neg_filtered.fasta` and should contain the 67,992,983 sequences. ### _stateful_ RNN - `step.size` should be (`lenth(sequence)-1)` or `length(sequence)`? #### "Batch size 1" ``` +------------------------------+ |abkajlfjioajfeijaflijfladkjfdx| +------------------------------+ ``` ``` Batch 1 Batch 2 Batch 3 +----------+----------+----------+ |abkajlfjio|ajfeijafli|jfladkjfdx| +----------+----------+----------+ ``` #### "Batch size 2" ``` seq 1 +------------------------------+ |abkajlfjioajfeijaflijfladkjfdx| +------------------------------+ seq 2 +------------------------------+ |ewjio;jaijfkaj;fkejak;jdfdjfds| +------------------------------+ ``` ``` Batch 1 Batch 2 Batch 3 +----------+----------+----------+ |abkajlfjio|ajfeijafli|jfladkjfdx| + + + + |ewjio;jaij|fkaj;fkeja|k;jdfdjfds| +----------+----------+----------+ ``` #### "Batch size 2" alternative? ``` +------------------------------+ |abkajlfjioajfeijaflijfladkjfdx| +------------------------------+ ``` ``` Batch 1 Batch 2 Batch 3 +----------+----------+----------+ |abkajlfjio|ajfeijafli|jfladkjfdx| + + + + |ajlfjioajf|eijaflijfl|adkjfdx...| # shift by x +----------+----------+----------+ ``` (I think this is not a good idea) - use small batch size (1-2) and big network sizes > add callback to reset graph? - best solution would be to reset as soon as a new file will be used - maybe just reset after x batches? - or not reseat it at all ### Wavenet - try out networks with more parameters (e.g. 10M) - in TB wave_bifo26_* - most influenze `initial_kernel` size / `initial_filters`