# GFA2bin tool
1. Below is "our" pipeline.

2. The tool was developed in the meantime: https://github.com/MoinSebi/gfa2bin
This tool didn't start from our input and output. It worked directly on the matrix of reads per node. As we have done in the past, it created a matrix of genotypes based on the matrix of reads per node (using different normalization steps).

Below is how it works:
`gfa2bin cov --p file.pack --output <output>`

Normalizations:
```
-a, --absolute-threshold <absolute-threshold> Set a absolute threshold
--method <method> Normalization method (mean|median|percentile)
```
A hard cutoff value for coverage--> absolute threshold
If I tried it with the example tests, it didn't work.
Main differences:
- Compared to our pipeline, it used `gaf2pack` to produce the matrix of reads per node, which gives this output:
With our tool, we obtained instead node.ids and coverage only.

- It used a rGFA file, https://github.com/lh3/gfatools/blob/master/doc/rGFA.md
- In addition, it's possible to compress the pack file (maybe interesting for human data):
Compress a plain-text coverage file to "pack compressed". Mainly used to reduce the storage size of the coverage file. Maximum coverage in these files is 6553. Higher coverages are truncated.
#### One sample
- Use `gaf2pack`
`gaf2pack --gfa D_C_mm10.fa.gz.bf3285f.eb0f3d3.867196c.smooth.final.gfa --alignment BXD32_BDref_inject.gaf --output BXD32.gaf2pack.txt`
- Use `gfa2bin`
`gfa2bin cov --output /scratch/BXD32.geno --packlist /lizardfs/flaviav/mouse/panQTL_result/BXD32/path.txt`
Default:
```
4740 entries have been truncated (have a coverage above 65,535).
4740 entries have been truncated (have a coverage above 65,535).
```