# OLD F. cylindrus assembly # Initial graph and workspace Graph at k=63, tipclipped at 125bp x3. Both GFAs dumped: ```python %cd /Users/clavijob/fcyl_pe2/sdgdbg/ import SDGpython as SDG from graphcleaning import * K=63 MIN_CVG=2 NUM_BATCHES=8 ws=SDG.WorkSpace() peds=ws.add_paired_reads_datastore('fcyl_pe2.prseq') SDG.GraphMaker(ws.sdg).new_graph_from_paired_datastore(peds,K,MIN_CVG,NUM_BATCHES) ws.sdg.write_to_gfa1("fcyl_pe2_k63c2.gfa") simple_structures_stats(ws.sdg) c=SDG.GraphContigger(ws) c.clip_tips(125,3) ws.sdg.write_to_gfa1("fcyl_pe2_k63c2_tc125x3.gfa") simple_structures_stats(ws.sdg) ``` Graph reloaded, KCI and paired read mapping: ```python %cd /Users/clavijob/fcyl_pe2/sdgdbg/ import SDGpython as SDG from graphcleaning import * from graphuntangling import * from graphstriding import * ws=SDG.WorkSpace() ws.sdg.load_from_gfa("fcyl_pe2_k63c2_tc125x3.gfa") ws.sdg.write_to_gfa1("fcyl_pe2_initial.gfa") peds=ws.add_paired_reads_datastore('fcyl_pe2.prseq') lords=ws.add_long_reads_datastore('fcyl_nano.loseq') kc=ws.add_kmer_counter("main") kc.add_count("pe",peds) kc.set_kci_peak(61) kc.compute_all_kcis() print(ws.sdg.stats_by_kci()) ws.dump("fcyl_pe2_initial.sdgws") peds.mapper.path_reads() peds.mapper.dump_readpaths("fcyl_pe2_initial_readpaths.dump") ``` ```python ``` | KCI | Total bp | Nodes | Tips | N25 | N50 | N75 | |-------|--------------:|---------:|--------:|-----------:|-----------:|-----------:| | None | 30272273 | 407162 | 9961 | 87 | 72 | 64 | | < 0.5 | 29221328 | 210177 | 9995 | 125 | 125 | 125 | | ~ 1 | 41232715 | 175140 | 1354 | 552 | 257 | 149 | | ~ 2 | 35931827 | 169937 | 367 | 445 | 220 | 127 | | ~ 3 | 16008585 | 82284 | 52 | 396 | 191 | 125 | | > 3.5 | 2554306 | 15334 | 89 | 276 | 147 | 125 | | **All** | 155221034 | 1060034 | 21818 | 318 | 127 | 124 | ## Unused read coverage (idea). Use 63-mers to start with. Only 63-mers of "unused" reads contribute to the coverage of a 63-mer (either on a KC or over the nodes, I am not sure). Once a read is "consumed" (i.e. it is fully assigned to a node), all the coverage of its constituen 63-mers is removed from the problem. Alternatively, a new DBG could be constructed by using only reads that are unused so far.