# How to make a circos plot
###### tags: `visualization`
Gautier Richard gave me his script and pipeline to make a circos plot.
## Tool installation
Install everything with conda, into separated envs
[Deeptools](https://deeptools.readthedocs.io/en/develop/content/installation.html)
[DeepStats](https://github.com/gtrichard/deepStats)
[Circos](https://anaconda.org/bioconda/circos)
## Data prep
Create a directory (circos_species) from where all the scripts should be launched. Create a folder named 'data' and one named 'etc'. We are going to populate this folder thanks to a prep.circos.sh (I broke down the script in several categories for this tutorial, but you can create one sigle script out of it), and once all is done, we just need to input the command "circos" in the circos_species folder, and the circos plot will be made!
### Inside the "etc" add the following files :
The circos.conf has the guidelines for the circos soft to create the circos plot. Here you have an example with CpG, CHH and CHG bigwig files. Modify as needed.
circos.conf
```txt
karyotype = data/karyotype.txt
chromosomes_units = 1000000
<ideogram>
<spacing>
default = 0.005r
<pairwise NW_022147617.1 NW_022145681.1>
spacing = 15r
</pairwise>
</spacing>
radius = 0.88r
thickness = 15p
fill = yes
show_label = yes
label_font = default
label_radius = 1.09r
label_center = yes
label_size = 20p
label_with_tag = yes
label_parallel = yes
label_case = upper
fill_color = black
stroke_thickness = 2
stroke_color = black
</ideogram>
<backgrounds>
show = data
<background>
color = vvlgrey
</background>
</backgrounds>
<plots>
type = histogram
thickness = 2
stroke_type = bin
extend_bin = yes
<plot>
max_gap = 1u
file = data/mean_CpG.txt
color = dred
fill_color = dred
r1 = 0.98r
r0 = 0.88r
min = 0
max = 65
</plot>
<plot>
max_gap = 1u
file = data/mean_CHG.txt
color = red
fill_color = red
r1 = 0.87r
r0 = 0.77r
min = 0
max = 65
extend_bin = yes
</plot>
<plot>
max_gap = 1u
file = data/mean_CHH.txt
color = lred
fill_color = lred
r1 = 0.76r
r0 = 0.66r
min = 0
max = 65
extend_bin = yes
</plot>
<plot>
max_gap = 1u
file = data/gene_density.bedGraph
color = vdblue
fill_color = vdblue
r1 = 0.65r
r0 = 0.55r
min = 0
max = 25
extend_bin = yes
</plot>
<plot>
max_gap = 1u
file = data/repeats_density.bedGraph
color = vdgreen
fill_color = vdgreen
r1 = 0.54r
r0 = 0.44r
min = 0
max = 25
extend_bin = yes
</plot>
<plot>
max_gap = 1u
file = data/gc_content.bedGraph
color = vdgrey
fill_color = vdgey
r1 = 0.43r
r0 = 0.33r
min = 0
max = 1
extend_bin = yes
</plot>
</plots>
<image>
angle_offset* = -79.5
<<include etc/image.conf>>
</image>
<<include etc/colors_fonts_patterns.conf>>
<<include etc/housekeeping.conf>>
<<include etc/ticks.conf>>
```
ticks.conf
```txt
show_ticks = yes
show_tick_labels = yes
<ticks>
<tick>
chromosomes_display_default = yes
radius = dims(ideogram,radius_outer)
orientation = out
color = black
label_multiplier = 1e-6
spacing = 5u
size = 15p
thickness = 4p
show_label = yes
label_size = 35p
label_color = black
label_offset = 5p
format = %d
</tick>
<tick>
chromosomes_display_default = yes
radius = dims(ideogram,radius_outer)
orientation = out
color = dgrey
spacing = 1u
size = 8p
thickness = 3p
show_label = no
format = %d
</tick>
</ticks>
```
### Prepping of "genome structure data"
Let's prepare karyotypes, gene, TEs and GC content (if necessary) data to show on the circos plot. The procedure here works with any GFF/BED tool. Run these command lines from the circos_species folder
```ssh
#karyotype with all chromosomes/scaffolds
cat PATH/genome.fa | awk '$0 ~ ">" {print c; c=0;printf "chr - " substr($0,2,100) " " substr($0,2,100) " " "0" " "; } $0 !~ ">" {c+=length($0);} END { print c; }' | sed 1d | awk '{print $0" ""chr22"}' | sed '/Scaffold/d' > data/karyotype_full.txt
#filter for only the 24th longest scaffolds/chr
sort -k6,6rn data/karyotype_full.txt | head -n 24 | sort -k3,3 > data/karyotype.txt
#get chromosome sizes
counter=1
awk '{print $1"\t"$2}' PATH/genome.fa.fai > chromsize
#move to the data folder
cd ./data
#compute GC coverage and gene/TE density with deepStats. the -w value used can be changed, depending on the size of the genome. You can try with less or more, this is the bin that will be used to average the data.
conda activate deepStats
dsComputeGCCoverage -i PATH/genome.fa -w 6000 -o gc_content
dsComputeBEDDensity --input PATH/genes.bed -c PATH/genome.fa.fai -w 6000 -o gene_density
dsComputeBEDDensity --input reps.bed -c PATH/genome.fa.fai -w 6000 -o repeats_density
conda deactivate
```
#### If you have GFFs instead of bed files
I used two different tools: [gencode_regions](https://github.com/saketkc/gencode_regions) to get gene.ned and also to get promoters, exons etc because I needed! And I used [agat](https://github.com/NBISweden/AGAT) to convert GFF to GTF ou to BED.
## Create coverage tracks
To get coverage tracks, you can either do a multiBamSummary (if you have BAM files) or a multiBigWigSummary (if you have bigwigs or bedGraphs). Use the same number for bisize as you used for -w in the deepStats commands (so here 6000). Once you do the multiXXSummary, you get a txt file (`--outRawCounts`) with average scores for each of the files you input. Then you just need to cut that file to get chr start end score, and named it as the track name you will give to circos.conf. Here is an example of what I did when making CpG/CHH/CHG tracks :
```ssh
conda activate deeptools
#get the coverage text file with mutliBigwigSummary
multiBigwigSummary bins -p max -b *.bw -o bw_summary.npz --outRawCounts bw_summary.txt -bs 6000
#cut the chr start end and score column for each of the bigwig files I added in input of multibigwigsummary
sed 1d bw_summary.txt | awk '{print $1"\t"$2"\t"$3"\t"$4}' | sed '/ nan/d' > mean_CHG.txt
sed 1d bw_summary.txt | awk '{print $1"\t"$2"\t"$3"\t"$5}' | sed '/ nan/d' > mean_CHH.txt
sed 1d bw_summary.txt | awk '{print $1"\t"$2"\t"$3"\t"$6}' | sed '/ nan/d' > mean_CpG.txt
conda deactivate
```
## What does it look like
