How to make a circos plot

# How to make a circos plot ###### tags: `visualization` Gautier Richard gave me his script and pipeline to make a circos plot. ## Tool installation Install everything with conda, into separated envs [Deeptools](https://deeptools.readthedocs.io/en/develop/content/installation.html) [DeepStats](https://github.com/gtrichard/deepStats) [Circos](https://anaconda.org/bioconda/circos) ## Data prep Create a directory (circos_species) from where all the scripts should be launched. Create a folder named 'data' and one named 'etc'. We are going to populate this folder thanks to a prep.circos.sh (I broke down the script in several categories for this tutorial, but you can create one sigle script out of it), and once all is done, we just need to input the command "circos" in the circos_species folder, and the circos plot will be made! ### Inside the "etc" add the following files : The circos.conf has the guidelines for the circos soft to create the circos plot. Here you have an example with CpG, CHH and CHG bigwig files. Modify as needed. circos.conf ```txt karyotype = data/karyotype.txt chromosomes_units = 1000000 <ideogram> <spacing> default = 0.005r <pairwise NW_022147617.1 NW_022145681.1> spacing = 15r </pairwise> </spacing> radius = 0.88r thickness = 15p fill = yes show_label = yes label_font = default label_radius = 1.09r label_center = yes label_size = 20p label_with_tag = yes label_parallel = yes label_case = upper fill_color = black stroke_thickness = 2 stroke_color = black </ideogram> <backgrounds> show = data <background> color = vvlgrey </background> </backgrounds> <plots> type = histogram thickness = 2 stroke_type = bin extend_bin = yes <plot> max_gap = 1u file = data/mean_CpG.txt color = dred fill_color = dred r1 = 0.98r r0 = 0.88r min = 0 max = 65 </plot> <plot> max_gap = 1u file = data/mean_CHG.txt color = red fill_color = red r1 = 0.87r r0 = 0.77r min = 0 max = 65 extend_bin = yes </plot> <plot> max_gap = 1u file = data/mean_CHH.txt color = lred fill_color = lred r1 = 0.76r r0 = 0.66r min = 0 max = 65 extend_bin = yes </plot> <plot> max_gap = 1u file = data/gene_density.bedGraph color = vdblue fill_color = vdblue r1 = 0.65r r0 = 0.55r min = 0 max = 25 extend_bin = yes </plot> <plot> max_gap = 1u file = data/repeats_density.bedGraph color = vdgreen fill_color = vdgreen r1 = 0.54r r0 = 0.44r min = 0 max = 25 extend_bin = yes </plot> <plot> max_gap = 1u file = data/gc_content.bedGraph color = vdgrey fill_color = vdgey r1 = 0.43r r0 = 0.33r min = 0 max = 1 extend_bin = yes </plot> </plots> <image> angle_offset* = -79.5 <<include etc/image.conf>> </image> <<include etc/colors_fonts_patterns.conf>> <<include etc/housekeeping.conf>> <<include etc/ticks.conf>> ``` ticks.conf ```txt show_ticks = yes show_tick_labels = yes <ticks> <tick> chromosomes_display_default = yes radius = dims(ideogram,radius_outer) orientation = out color = black label_multiplier = 1e-6 spacing = 5u size = 15p thickness = 4p show_label = yes label_size = 35p label_color = black label_offset = 5p format = %d </tick> <tick> chromosomes_display_default = yes radius = dims(ideogram,radius_outer) orientation = out color = dgrey spacing = 1u size = 8p thickness = 3p show_label = no format = %d </tick> </ticks> ``` ### Prepping of "genome structure data" Let's prepare karyotypes, gene, TEs and GC content (if necessary) data to show on the circos plot. The procedure here works with any GFF/BED tool. Run these command lines from the circos_species folder ```ssh #karyotype with all chromosomes/scaffolds cat PATH/genome.fa | awk '$0 ~ ">" {print c; c=0;printf "chr - " substr($0,2,100) " " substr($0,2,100) " " "0" " "; } $0 !~ ">" {c+=length($0);} END { print c; }' | sed 1d | awk '{print $0" ""chr22"}' | sed '/Scaffold/d' > data/karyotype_full.txt #filter for only the 24th longest scaffolds/chr sort -k6,6rn data/karyotype_full.txt | head -n 24 | sort -k3,3 > data/karyotype.txt #get chromosome sizes counter=1 awk '{print $1"\t"$2}' PATH/genome.fa.fai > chromsize #move to the data folder cd ./data #compute GC coverage and gene/TE density with deepStats. the -w value used can be changed, depending on the size of the genome. You can try with less or more, this is the bin that will be used to average the data. conda activate deepStats dsComputeGCCoverage -i PATH/genome.fa -w 6000 -o gc_content dsComputeBEDDensity --input PATH/genes.bed -c PATH/genome.fa.fai -w 6000 -o gene_density dsComputeBEDDensity --input reps.bed -c PATH/genome.fa.fai -w 6000 -o repeats_density conda deactivate ``` #### If you have GFFs instead of bed files I used two different tools: [gencode_regions](https://github.com/saketkc/gencode_regions) to get gene.ned and also to get promoters, exons etc because I needed! And I used [agat](https://github.com/NBISweden/AGAT) to convert GFF to GTF ou to BED. ## Create coverage tracks To get coverage tracks, you can either do a multiBamSummary (if you have BAM files) or a multiBigWigSummary (if you have bigwigs or bedGraphs). Use the same number for bisize as you used for -w in the deepStats commands (so here 6000). Once you do the multiXXSummary, you get a txt file (`--outRawCounts`) with average scores for each of the files you input. Then you just need to cut that file to get chr start end score, and named it as the track name you will give to circos.conf. Here is an example of what I did when making CpG/CHH/CHG tracks : ```ssh conda activate deeptools #get the coverage text file with mutliBigwigSummary multiBigwigSummary bins -p max -b *.bw -o bw_summary.npz --outRawCounts bw_summary.txt -bs 6000 #cut the chr start end and score column for each of the bigwig files I added in input of multibigwigsummary sed 1d bw_summary.txt | awk '{print $1"\t"$2"\t"$3"\t"$4}' | sed '/ nan/d' > mean_CHG.txt sed 1d bw_summary.txt | awk '{print $1"\t"$2"\t"$3"\t"$5}' | sed '/ nan/d' > mean_CHH.txt sed 1d bw_summary.txt | awk '{print $1"\t"$2"\t"$3"\t"$6}' | sed '/ nan/d' > mean_CpG.txt conda deactivate ``` ## What does it look like ![](https://i.imgur.com/IX0dbzd.png)