Giant weta assembly Updates

# Giant weta assembly Updates ## Running fmlrc on Mahuika **Script for jobid 14270030** ```bash if [ ! -f ${asmdir}/weta_msbwt.npy ]; then gunzip -c ${illuminadir}/*.fastq.gz | awk "NR % 4 == 2" | sort -T $TMPDIR | tr NT TN | ropebwt2 -LR | tr NT TN | fmlrc-convert ${amsdir}/weta_msbwt.npy fi #run fmlrc NUM_PROCS=24 fmlrc -p $NUM_PROCS -e 400 ${asmdir}/weta_msbwt.npy ${datadir}/weta_?.fasta ${asmdir}/corrected_final.fa ``` **Chris' code that worked** ```bash awk 'NR % 4 == 2' Trimmed_Hericium_all.fastq | sort | tr NT TN | ropebwt2 -LR | tr NT TN | fmlrc-convert all.npy ``` **fmlrc command line execute** ```bash fmlrc -p 20 all.npy HKGP-3.6.1-alldata.fasta HKGP-3.6.1-alldata-FMLRC-corrected.fasta ``` **End of Chris' code that worked** * **Script as it currently stands** ```bash if [ ! -f ${asmdir}/weta_msbwt.npy ]; then gunzip -c ${illuminadir}/*.fastq.gz | awk "NR % 4 == 2" | sort -T $TMPDIR | tr NT TN | ropebwt2 -LR | tr NT TN | fmlrc-convert ${amsdir}/weta_msbwt.npy fi ``` **Annabel: Split FMLRC** ```bash /home/awhi701/nobackup_02613/MYNA_SRA/TrimGalore_out/13099_extra.R2_val_2.fq.gz | awk 'NR % 4 == 2' | sort | gzip > FMLRC_13099_extra_trimmed_R2.sorted.txt.gz ``` **MDs attempt at split fmlrc** ```bash if [ ! -f ${asmdir}/weta_msbwt.npy ]; then awk "NR % 4 == 2" ${illuminadir}/H07456-L1_S1_L001_R1_001.fastq | sort |gzip > H07456-L1_S1_L001_R1_001_sorted.txt.gz fi tr NT TN H07456-L1_S1_L001_R1_001_sorted.txt.gz | ropebwt2 -LR | tr NT TN | fmlrc-convert ${amsdir}/weta_msbwt.npy ``` **conda location** /nesi/project/landcare00070/mahuika_project/modules ``` vi ~/.condarc ``` ``` channels: - bioconda - conda-forge - defaults - etetoolkit pkgs_dirs: - /nesi/project/landcare00070/mahuika_project/modules/pkgs create_default_packages: - setuptools envs_dirs: - /nesi/project/landcare00070/mahuika_project/modules ``` *** **Updates since 18 August 2020** **FMLRC** * Job running, waiting for output * 14270673 manpreet ga03048 fmlrc-correct R None 2020-08-18T15:07:15 1:42:36 2-22:17:24 1 24 * This job failed with the following error: * tr: extra operand ‘H07456-L1_S1_L001_R1_001_sorted.txt.gz’ Try 'tr --help' for more information. [M::main_ropebwt2] inserted 1 symbols in 0.001 sec, 0.000 CPU sec [M::main_ropebwt2] constructed FM-index in 0.002 sec, 0.000 CPU sec [M::main_ropebwt2] symbol counts: ($, A, C, G, T, N) = (1, 0, 0, 0, 0, 0) [M::main] Version: r187 [M::main] CMD: ropebwt2 -LR [M::main] Real time: 0.002 sec; CPU: 0.009 sec /var/spool/slurm/job14270673/slurm_script: line 37: 54178 Exit 1 tr NT TN H07456-L1_S1_L001_R1_001_sorted.txt.gz 54179 Done | ropebwt2 -LR 54180 Done | tr NT TN 54181 Segmentation fault (core dumped) | fmlrc-convert ${amsdir}/weta_msbwt.npy **Updates 19 Aug 2020** * Looks like the first half of the script worked. Output > sorted.txt.gz * But second step failed. tr due to path error. * Error fixed and this job has been resubmitted. * New error: tr: extra operand ‘H07456-L1_S1_L001_R1_001_sorted.txt.gz’ Try 'tr --help' for more information. [M::main_ropebwt2] inserted 1 symbols in 0.001 sec, 0.000 CPU sec [M::main_ropebwt2] constructed FM-index in 0.001 sec, 0.000 CPU sec [M::main_ropebwt2] symbol counts: ($, A, C, G, T, N) = (1, 0, 0, 0, 0, 0) [M::main] Version: r187 [M::main] CMD: ropebwt2 -LR [M::main] Real time: 0.001 sec; CPU: 0.003 sec [fmlrc-convert] Reading from stdin [fmlrc-convert] symbol counts ($, A, C, G, N, T) = (1, 0, 0, 0, 0, 0) [fmlrc-convert] RLE-BWT byte length: 1 [fmlrc-convert] RLE-BWT conversion complete. ERROR: Fasta file does not exist * Error fixed resubmitted. * New issue: * The script was run in three successive steps: *1 Illumina reads -> sorted,txt.gz ```bash awk "NR % 4 == 2" ${illuminadir}/H07456-L1_S1_L001_R1_001.fastq | sort |gzip > H07456-L1_S1_L001_R1_001_sorted.txt.gz ``` *2 tr step > msbwt.npy ```bash tr NT TN H07456-L1_S1_L001_R1_001_sorted.txt.gz | ropebwt2 -LR | tr NT TN | fmlrc-convert ${asmdir}/weta_msbwt.npy ``` *3 correction via fmlrc ```bash fmlrc -p $NUM_PROCS -e 400 ${asmdir}/weta_msbwt.npy ${datadir}/*.fasta ${asmdir}/corrected_final.fa ``` stdout: loaded bwt with 1 compressed values Finished processing reads [0, 400) This did not produce a corrected_final.fa, but instead one of the input longread fasta files is now a fraction of the original input file. WHAT? ***Update on 24 August 2020*** Job 14270030 with full dataset killed as below. ```bash [M::main_ropebwt2] inserted 10415295816 symbols in 889.926 sec, 2454.212 CPU sec [M::main_ropebwt2] inserted 10415295816 symbols in 908.420 sec, 2480.353 CPU sec [M::main_ropebwt2] inserted 10415295816 symbols in 934.089 sec, 2484.000 CPU sec /var/spool/slurm/job14270030/slurm_script: line 35: 70496 Done gunzip -c ${illuminadir}/*.fastq.gz 70497 | awk "NR % 4 == 2" 70498 Broken pipe | sort -T $TMPDIR 70499 Broken pipe | tr NT TN 70500 Killed | ropebwt2 -LR 70501 | tr NT TN 70502 Segmentation fault (core dumped) | fmlrc-convert ${amsdir}/weta_msbwt.npy slurmstepd: error: Detected 1 oom-kill event(s) in step 14270030.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler. (END) ``` ## Raven * Job running with mecat-corrected data * 14271250 manpreet landcar pb-assemble-ra R None 2020-08-18T16:47:52 1:59 1-23:58:01 1 48 * end of slurm out: [raven::Graph::Polish] reached checkpoint 10.279901s [raven::] 7610.063556s * output = mecat-raven-asm1.fasta * saact = 02:07:05 2-18:48:54 48 4620535+ COMPLETED *Updates 19 Aug 2020* * running quast report for this assembly * Assembly length: 310817757 (0.3G) * Resubmitted assembly job with full (uncorrected dataset): 14285453 ***Updates. from 24-08-2020*** * Job exited with time out. Resubmitted (14335221) with '--resume' flag. ```bash [manpreet.dhami@mahuika02 PacBIO]$ less slurm-14335221.out [raven::] loaded previous run 5.006878s [raven::] 5.038449s ``` * Job failed almost immediately after this message. sacct says the followi g ```bash JobID JobName Elapsed TotalCPU Alloc MaxRSS State -------------- -------------- ----------- ------------ ----- -------- ---------- 14335221 pb-assemble-r+ 00:00:13 00:07.165 48 COMPLETED 14335221.batch batch 00:00:13 00:07.163 48 875K COMPLETED 14335221.exte+ extern 00:00:13 00:00.001 48 771K COMPLETED ``` * Job resubmitted with different output file name, incase its a write issue. ```bash [raven::] loaded previous run 5.015174s [raven::] 5.047051s ``` * Job resumed when run without the --resume flag: 14335243 *Updates from 25 Aug 2020* * Completed ```bash for 250463 / 267160 windows [===============>] 1263.725767s^M[racon::Polisher::Polish] called consensus for 267160 / 267160 windows [================] 1274.289830s [raven::Graph::Polish] reached checkpoint 8.304683s [raven::] 6357.015190s JobID JobName Elapsed TotalCPU Alloc MaxRSS State -------------- -------------- ----------- ------------ ----- -------- ---------- 14335243 pb-assemble-r+ 00:47:10 1-03:54:05 48 COMPLETED 14335243.batch batch 00:47:10 1-03:54:05 48 4503528+ COMPLETED 14335243.exte+ extern 00:47:11 00:00.001 48 291K COMPLETED ``` * Another job submitted with same uncorrected pb dataset, but without the --weaken flag: 14335500 * COMPLETED ```bash ^M[racon::Polisher::Polish] called consensus for 11832 / 13522 windows [==============> ] 1234.783347s^M[racon::Polisher::Polish] called consensus for 12677 / 13522 windows [===============>] 1287.126407s^M[racon::Polisher::Polish] called consensus for 13522 / 13522 windows [================] 1295.214860s [raven::Graph::Polish] reached checkpoint 7.215402s [raven::] 2806.383468s JobID JobName Elapsed TotalCPU Alloc MaxRSS State -------------- -------------- ----------- ------------ ----- -------- ---------- 14335500 pb-assemble-r+ 01:46:20 2-08:45:39 48 COMPLETED 14335500.batch batch 01:46:20 2-08:45:39 48 6288458+ COMPLETED 14335500.exte+ extern 01:46:21 00:00.002 48 825K COMPLETED ``` * Assembly size (without --weaken flag, full pb dataset) = 0.13 GB ## Mecat correction + assembly * resubmitted with coverage = 10 * 14272160 manpreet ga03048 mecat-correct R None 2020-08-18T17:23:36 0:05 23:59:55 1 16 * job timed out. *Updates 19 Aug 2020* * Resubmitted with 72 hours: 14285230 * Job completed in 68 h. But assembly is 0.02 GB. Poor. Mecat ruled out. ***** ## Ratatosk * submitted trial job with 1xpb readfile, 1lane of illumina, 16 cpus, 40G mem, and 24 hours: 14285852 * Job killed with Bus error ```bash CompactedDBG::filter(): Number of blocks in Bloom filter is 36724071 /var/spool/slurm/job14285852/slurm_script: line 31: 145625 Bus error ``` * resubmitted with 80G mem x 24 cores : 14286903 *Updates from 24 Aug 2020* * Exited with stupid mkdir error. Resubmitted with correction: 14335227 * Job exited with a bus error as follows; ```bash CompactedDBG::construct(): Joining unitigs CompactedDBG::construct(): After join: 201427129 unitigs CompactedDBG::construct(): Joined 144829996 unitigs Ratatosk::Ratatosk(): Adding coverage to vertices and edges (1/2). Ratatosk::addCoverage(): Anchoring reads on graph. /var/spool/slurm/job14335227/slurm_script: line 31: 57513 Bus error (core dumped) $Ratatosk -v -c 24 -s ${scriptdir}/illumina_readlist_1lane.txt -l ${scriptdir}/pb_readlist_1file.txt -o pb-ratatosk-1.fa ``` ***** ## FALCON PIPELINE Have Hi-C data along with PB, potentially this is worth testing out. https://github.com/audreystott/dunnart Falcon Nextflow pipeline installed & tested by Joseph, run into permissions issue. [03/02/2021] Install dir: ```bash= /nesi/project/landcare00070/mahuika_project/scripts/genome_assembly/PacBIO/josephtest nextflow falcon/main.nf ``` Manpreet tested nextflow pipeline as below: [18 Feb 2020] ```bash= module load Nextflow/21.02.0 module load Miniconda3 nextflow falcon/main.nf ``` log output: nextflow.log: ```bash= N E X T F L O W ~ version 21.02.0-edge Launching `falcon/main.nf` [sick_mclean] - revision: 57cb2d38cf [- ] process > fc_run - [- ] process > fc_unzip - [- ] process > fc_phase - executor > slurm (1) [b7/2ba2a0] process > fc_run [ 0%] 0 of 1 [- ] process > fc_unzip - [- ] process > fc_phase - executor > slurm (1) [b7/2ba2a0] process > fc_run [ 0%] 0 of 1 [- ] process > fc_unzip - [- ] process > fc_phase - Error executing process > 'fc_run' Caused by: Process `fc_run` terminated with an error exit status (1) Command executed: sed -i "s/outs.write('/#outs.write('/" /scale_wlg_persistent/filesets/project/landcare00070/mahuika_project/scripts/genome_assembly/PacBIO/josephtest/work/conda/*/lib/python3.7/site-packages/falcon_kit/mains/ovlp_filter.py fc_run /scale_wlg_persistent/filesets/project/landcare00070/mahuika_project/scripts/genome_assembly/PacBIO/josephtest/fc_run.cfg Command exit status: 1 Command output: (empty) Command error: "job.step.dust": {}, "job.step.la": { "MB": "32768", "NPROC": "4", "njobs": "240" }, "job.step.pda": { "MB": "32768", "NPROC": "4", "njobs": "240" }, "job.step.pla": { "MB": "32768", "NPROC": "4", "njobs": "240" } } [INFO]In simple_pwatcher_bridge, pwatcher_impl=<module 'pwatcher.blocking' from '/home/manpreet.dhami/.local/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pwatcher/blocking.py'> [INFO]job_type='slurm', (default)job_defaults={'job_type': 'slurm', 'pwatcher_type': 'blocking', 'JOB_QUEUE': 'default', 'MB': '102400', 'NPROC': '6', 'njobs': '32', 'submit': 'srun \\\n-J ${JOB_NAME} \\\n--mem=${MB}M \\\n--cpus-per-task=${NPROC} \\\n"${JOB_SCRIPT}"', 'use_tmpdir': False}, use_tmpdir=False, squash=False, job_name_style=0 [INFO]Setting max_jobs to 32; was None [INFO]Num unsatisfied: 2, graph: 2 [INFO]About to submit: Node(0-rawreads/build) [INFO]Popen: 'srun \ -J P445a2258be0a69 \ --mem=4000M \ --cpus-per-task=1 \ "/home/manpreet.dhami/.local/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pwatcher/mains/job_start.sh"' [INFO](slept for another 0.0s -- another 1 loop iterations) srun: error: Unable to create step for job 18100314: Memory required by task is not available [ERROR]Task Node(0-rawreads/build) failed with exit-code=1 [ERROR]Some tasks are recently_done but not satisfied: {Node(0-rawreads/build)} [ERROR]ready: set() submitted: set() [ERROR]Noop. We cannot kill blocked threads. Hopefully, everything will die on SIGTERM. Traceback (most recent call last): File "/scale_wlg_persistent/filesets/project/landcare00070/mahuika_project/scripts/genome_assembly/PacBIO/josephtest/work/conda/pb-assembly-71bfdfbd43464806dce898a62108a83c/bin/fc_run", line 11, in <module> load_entry_point('falcon-kit==1.8.1', 'console_scripts', 'fc_run')() File "/scale_wlg_persistent/filesets/project/landcare00070/mahuika_project/scripts/genome_assembly/PacBIO/josephtest/work/conda/pb-assembly-71bfdfbd43464806dce898a62108a83c/lib/python3.7/site-packages/falcon_kit/mains/run1.py", line 706, in main main1(argv[0], args.config, args.logger) File "/scale_wlg_persistent/filesets/project/landcare00070/mahuika_project/scripts/genome_assembly/PacBIO/josephtest/work/conda/pb-assembly-71bfdfbd43464806dce898a62108a83c/lib/python3.7/site-packages/falcon_kit/mains/run1.py", line 73, in main1 input_fofn_fn=input_fofn_fn, File "/scale_wlg_persistent/filesets/project/landcare00070/mahuika_project/scripts/genome_assembly/PacBIO/josephtest/work/conda/pb-assembly-71bfdfbd43464806dce898a62108a83c/lib/python3.7/site-packages/falcon_kit/mains/run1.py", line 235, in run dist=Dist(NPROC=4, MB=4000, job_dict=config['job.step.da']), File "/scale_wlg_persistent/filesets/project/landcare00070/mahuika_project/scripts/genome_assembly/PacBIO/josephtest/work/conda/pb-assembly-71bfdfbd43464806dce898a62108a83c/lib/python3.7/site-packages/falcon_kit/pype.py", line 106, in gen_parallel_tasks wf.refreshTargets() File "/home/manpreet.dhami/.local/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pypeflow/simple_pwatcher_bridge.py", line 278, in refreshTargets self._refreshTargets(updateFreq, exitOnFailure) File "/home/manpreet.dhami/.local/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pypeflow/simple_pwatcher_bridge.py", line 362, in _refreshTargets raise Exception(msg) Exception: Some tasks are recently_done but not satisfied: {Node(0-rawreads/build)} Work dir: /scale_wlg_persistent/filesets/project/landcare00070/mahuika_project/scripts/genome_assembly/PacBIO/josephtest/work/96/6ec0f232064acd68fe7ee260e92bb0 Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh` ``` There seems to be an issue with memory allocation. Looked at fc_run.cfg and at every step the memory allocation is greater than 512MB, but in the case of actual allocation, it seems to be stuck on 2 cpus x 512 MB mem, which is the default allocation for slurm when it doesnt have allocation info. So possibly an issue with the srun blocks? [19/02/2021] Troubleshooting with Joseph & Dini It seems that slurm is not able to understand and allocate correct memory. We have tried with add account=landcare00070. but still the same error. Perhaps there is an environmental variable being set up by Pypeflow thats messing things up for slurm. Trying to get Falcon to submit jobs locally rather than back to slurm. New error message: ```bash= /scale_wlg_persistent/filesets/project/landcare00070/mahuika_project/scripts/genome_assembly/PacBIO/josephtest/work/fe/e2bcae55212683e03290e0ee5d6c53/0-rawreads/build/run-P837d7e819ddee0.bash.stderr ``` This made falcon work but all allocations are the same for each step. To avoid Force kill due t not enough time, srun has been given hard allocation of 7 day run time for each step. Have also moved the falcon run directory and cfg files to a new location: ```bash= /nesi/nobackup/ga03048/assemblies/falcon/ ``` And running nextflow>>falcon from there. No backup shouldnt run out of space. Run process has started and logs for first step (0-rawreads) is here: ```bash= /nesi/nobackup/ga03048/assemblies/falcon/work/91/4de6d4162742c3235cb93b0d5116c3/0-rawreads/build ``` First step of 0-rawreads converting fasta2DB is currently running!!! Havent seen this since June last year :) UPDATE 26-05-2021 ---- Error: Not enough reads for desired coverage. ``` Adding 'weta_m54214_200421_180821.subreads.fasta.gz.fasta' ... INFO:root: #7 count= 91,479 INFO:root: #12 count= 179,589 INFO:root: #25 count= 341,593 INFO:root: #60 count= 670,460 INFO:root: #109 count= 1,312,000 INFO:root: #211 count= 2,560,965 INFO:root: #435 count= 5,068,225 INFO:root: #849 count= 10,060,264 INFO:root: #1,735 count= 20,040,510 INFO:root: #3,484 count= 39,984,665 INFO:root: #6,827 count= 79,859,225 INFO:root: #13,472 count= 159,609,520 INFO:root: #26,969 count= 319,115,009 INFO:root: #54,247 count= 638,117,229 INFO:root: #109,371 count= 1,276,127,202 INFO:root: #219,004 count= 2,552,122,329 INFO:root: #436,041 count= 5,104,112,108 INFO:root: #871,997 count= 10,208,061,125 + read fn #cat fc.fofn | xargs rm -f DBdust raw_reads + DBdust raw_reads DBsplit -f -x500 -s400 -a raw_reads + DBsplit -f -x500 -s400 -a raw_reads #LB=$(cat raw_reads.db | LD_LIBRARY_PATH= awk '$1 == "blocks" {print $3}') #echo -n $LB >| db_block_count CUTOFF=$(python3 -m falcon_kit.mains.calc_cutoff --coverage 10.0 6500000000 <(DBstats -b1 raw_reads)) ++ python3 -m falcon_kit.mains.calc_cutoff --coverage 10.0 6500000000 /dev/fd/63 +++ DBstats -b1 raw_reads falcon-kit 1.8.1 (pip thinks "falcon-kit 1.8.1") pypeflow 2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476 Traceback (most recent call last): File "/scale_wlg_nobackup/filesets/nobackup/ga03048/assemblies/falcon/work/conda/pb-assembly-71bfdfbd43464806dce898a62108a83c/lib/python3.7/site-packages/falcon_kit/mains/calc_: + CUTOFF= [WARNING]Call 'bash -vex build_db.sh' returned 256. Traceback (most recent call last): File "/scale_wlg_nobackup/filesets/nobackup/ga03048/assemblies/falcon/work/conda/pb-assembly-71bfdfbd43464806dce898a62108a83c/lib/python3.7/runpy.py", line 193, in _run_module_ as_main "__main__", mod_spec) File "/scale_wlg_nobackup/filesets/nobackup/ga03048/assemblies/falcon/work/conda/pb-assembly-71bfdfbd43464806dce898a62108a83c/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/scale_wlg_nobackup/filesets/nobackup/ga03048/assemblies/falcon/work/conda/pb-assembly-71bfdfbd43464806dce898a62108a83c/lib/python3.7/site-packages/falcon_kit/mains/dazzl er.py", line 1532, in <module> main() File "/scale_wlg_nobackup/filesets/nobackup/ga03048/assemblies/falcon/work/conda/pb-assembly-71bfdfbd43464806dce898a62108a83c/lib/python3.7/site-packages/falcon_kit/mains/dazzler.py", line 1528, in main args.func(args) File "/scale_wlg_nobackup/filesets/nobackup/ga03048/assemblies/falcon/work/conda/pb-assembly-71bfdfbd43464806dce898a62108a83c/lib/python3.7/site-packages/falcon_kit/mains/dazzler.py", line 1050, in cmd_build build_db(ours, args.input_fofn_fn, args.db_fn, args.length_cutoff_fn) File "/scale_wlg_nobackup/filesets/nobackup/ga03048/assemblies/falcon/work/conda/pb-assembly-71bfdfbd43464806dce898a62108a83c/lib/python3.7/site-packages/falcon_kit/mains/dazzler.py", line 169, in build_db io.syscall('bash -vex {}'.format(script_fn)) File "/home/manpreet.dhami/.local/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pypeflow/io.py", line 29, in syscall raise Exception(msg) Exception: Call 'bash -vex build_db.sh' returned 256. 2021-02-19 14:21:40,190 - root - WARNING - Call '/bin/bash user_script.sh' returned 256. 2021-02-19 14:21:40,234 - root - WARNING - CD: '/scale_wlg_nobackup/filesets/nobackup/ga03048/assemblies/falcon/work/91/4de6d4162742c3235cb93b0d5116c3/0-rawreads/build' -> '/scale_wlg_nobackup/filesets/nobackup/ga03048/assemblies/falcon/work/91/4de6d4162742c3235cb93b0d5116c3/0-rawreads/build' 2021-02-19 14:21:40,235 - root - WARNING - CD: '/scale_wlg_nobackup/filesets/nobackup/ga03048/assemblies/falcon/work/91/4de6d4162742c3235cb93b0d5116c3/0-rawreads/build' -> '/scale_wlg_nobackup/filesets/nobackup/ga03048/assemblies/falcon/work/91/4de6d4162742c3235cb93b0d5116c3/0-rawreads/build' 2021-02-19 14:21:40,264 - root - CRITICAL - Error in /home/manpreet.dhami/.local/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pypeflow/do_task.py with args="{'json_fn': '/scale_wlg_nobackup/filesets/nobackup/ga03048/assemblies/falcon/work/91/4de6d4162742c3235cb93b0d5116c3/0-rawreads/build/task.json',\n 'timeout': 30,\n 'tmpdir': None}" Traceback (most recent call last): File "/scale_wlg_nobackup/filesets/nobackup/ga03048/assemblies/falcon/work/conda/pb-assembly-71bfdfbd43464806dce898a62108a83c/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/scale_wlg_nobackup/filesets/nobackup/ga03048/assemblies/falcon/work/conda/pb-assembly-71bfdfbd43464806dce898a62108a83c/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/manpreet.dhami/.local/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pypeflow/do_task.py", line 267, in <module> main() File "/home/manpreet.dhami/.local/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pypeflow/do_task.py", line 259, in main run(**vars(parsed_args)) File "/home/manpreet.dhami/.local/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pypeflow/do_task.py", line 253, in run run_cfg_in_tmpdir(cfg, tmpdir, '.') File "/home/manpreet.dhami/.local/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pypeflow/do_task.py", line 228, in run_cfg_in_tmpdir run_bash(bash_template, myinputs, myoutputs, parameters) File "/home/manpreet.dhami/.local/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pypeflow/do_task.py", line 187, in run_bash util.system(cmd) File "/home/manpreet.dhami/.local/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pypeflow/io.py", line 29, in syscall raise Exception(msg) Exception: Call '/bin/bash user_script.sh' returned 256. +++ pwd ++ echo 'FAILURE. Running top in /scale_wlg_nobackup/filesets/nobackup/ga03048/assemblies/falcon/work/91/4de6d4162742c3235cb93b0d5116c3/0-rawreads/build (If you see -terminal database is inaccessible- you are using the python bin-wrapper, so you will not get diagnostic info. No big deal. This process is crashing anyway.)' ++ rm -f top.txt ++ which python ++ which top ++ env -u LD_LIBRARY_PATH top -b -n 1 ++ env -u LD_LIBRARY_PATH top -b -n 1 ++ pstree -apl real 105m27.818s user 97m12.563s sys 2m33.992s + finish + echo 'finish code: 1' ``` fc_run.config seed_coverage option changed to 5 Job resubmitted job id: 20177507 ERROR: ``` fasta2DB: raw_reads.db is corrupted, read failed ``` restart job. Job Finished with the following sacct output: ``` JobID JobName Alloc Elapsed TotalCPU ReqMem MaxRSS State --------------- ---------------- ----- ----------- ------------ ------- -------- ---------- 20188290 Pd5b516a622eb5e 2 01:43:46 01:43:14 4000Mc FAILED 20188290.extern extern 2 01:43:46 00:00.001 4000Mc 0 COMPLETED 20188290.0 Pd5b516a622eb5e 2 01:43:44 01:43:14 4000Mc 239252K COMPLETED ``` Location of .sdterr output: ``` //nesi/nobackup/ga03048/assemblies/falcon/work/e4/7440b9ac7ba3660659ca9ee0fcdc24/0-rawreads/build ``` Looks like one of the input files may have an issue? /nesi/project/ga03048/weta/weta_m54219_190817_073801.subreads.fasta.gz File removed from subreads.fasta.fofn ``` /nesi/project/ga03048/weta/weta_m54219_190817_073801.subreads.fasta.gz ``` Job resubmitted. - still having issues with "readsDB corrupted" but the reads keep being added and the jobs moves past this error. - New error this time: ``` slurmstepd: error: *** STEP 20191425.0 ON wbl001 CANCELLED AT 2021-05-27T09:33:46 *** ``` Not sure what this is about? location of error file: ``` /nesi/nobackup/ga03048/assemblies/falcon/work/17/8441484c815d57fcfaf2ffea685587/0-rawreads/build ``` Another file was giving same read_db corrupted error so removed it as well. Re-running now. - same issue. A bit of googling suggests that this may be the issue: - https://github.com/PacificBiosciences/pbbioconda/issues/111 - multiple jobs are being submitted concurrently leading to the read_db corrupted error rather than actual issue with the read files. ***** ## Canu correction ```bash= CRASH: Last 50 lines of the relevant log file (correction/weta-asm4.ovlStore.BUILDING/scripts/1-bucketize.jobSubmit-01.out): CRASH: CRASH: sbatch: error: Please specify one of your project codes as the Slurm account for this job. CRASH: sbatch: error: AssocMaxSubmitJobLimit CRASH: sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits) CRASH: srun: error: wbh001: task 0: Exited with exit code 1 ``` ### potential solution added grid option to include account