# Pangeblocks: Maximal blocks and positional strings
>[Data: MSAs for genic and intergenic regions used by Pandora](https://figshare.com/articles/dataset/PanX_Piggy_MSAs/14781732/1)
MSAs for gene clusters curated with PanX from 350 RefSeq assemblies, and for intergenic region clusters based on 228 E. coli ST131 genome sequences generated with Piggy, used to build the E. Coli PanRG (see paper Colquhoun et al. "Nucleotide-resolution bacterial pan-genomics with reference graphs")"
Genic regions for different bacterial species can be downloaded from [PanX](https://pangenome.org/#downloads)
___
### Summary
For a randomly dataset of 50 MSAs from genic and intergenic regions
- Intergenic regions are shorter than genic regions
- maximum number of columns genic: 4098
- maximum number of columns intergenic: 289
- 50% of MSAs have less than 4 sequences
- Time to compute maximal blocks in *intergenic regions* is less than 1 minute in 75% of cases. The maximum is 5 minutes. For *genic regions* the time is higher (maximum is 14 hours).
- Number of maximal blocks seems to increase linearly with the number of sequences (for the number of columns is hard to get conclusions for this results)
- The number of maximal blocks is linear w.r.t the number of maximal blocks that have at least one overlap
___
#### Dataset
Experiments considering 50 MSAs (randomly sampled) from E. coli:
- 36 genic regions,
- 14 intergenic regioms
#### Stats by region
Variables:
- `n_seqs`: number of sequences in the MSA
- `n_unique_seqs`: number of unique sequences in the MSA
- `n_cols`: number of columns in the MSA
- `n_max_blocks`: number of maximal blocks in that MSA
- `n_max_blocks_`: binned `n_max_blocks`
- `t [min]`: same `t` in minutes
- `t_[min]`: binned `t [min]`
- `size_msa`: elements in the MSA (only unique sequences), `n_cols x n_unique_seqs`
- `max_blocks/size_msa`: proportion [%] of maximal blocks w.r.t the size of the MSA
- `region`: either Genic or Intergenic
- `blocks_with_overlap`: number of blocks that overlap at least once with another block
- `inter_between_blocks`: number of intersections between pairs of blocks




#### Plots


