Usage of New Genome Browser

--- title: Usage of New Genome Browser tags: OMM description: Personal hackmd notes image: https://partechshaker.com/wp-content/uploads/2018/10/logo_square.png robots: noindex, nofollow GA: UA-165598729-1 --- Usage of New Genome Browser === :::info This document desribes the usage of the NGB instance at http://3.123.8.123/catgenome/ ::: [TOC] Costs --- This server is hosted on a Amazon instance and running costs are currently covered by money we get from Amazon from a other grant. If you want to have such instance running long term (>3 Months) I would offer to write a short gant proposal for [this](https://aws.amazon.com/de/research-credits/). The computing costs is currently [~4 USD a day](https://aws.amazon.com/de/ec2/pricing/on-demand/) and maybe would work on a smaller server (which would be cheaper). Hosting of data is done on [S3](https://aws.amazon.com/de/s3/) which costs $0.023/GB, a average sized `bam` file storing alignment information is 1.5G. Usage --- ### Open the _New Genome Browser_ (NGS) Point your browser to [http://3.123.8.123/catgenome/](http://3.123.8.123/catgenome/) you should see this page: ![](https://i.imgur.com/ZgH9nbU.png) ### Select a dataset On the right you should see the datasets available (here just the dataset _1423_S34_). Click on the checkbox to enable this dataset. These should be enabled like here: ![](https://i.imgur.com/OESWfX3.png) Now you see on the main panel some basic statistics. You can select a genome on top where "CHR:NONE" is written: ![](https://i.imgur.com/NHCMfUt.png) Here, we select _A. Muciniphila_: ![](https://i.imgur.com/fmyh3Rm.png) Now you should see different tracks - on top there is the _position_ track, where you can zoom in by clicking on the location bar (where "2.7Mbp" is written) - The heatmap is showing you the GC content and if you zoom in the nucleotide on the reference genome ![](https://i.imgur.com/CwaVBDD.png) - the next track is showing where variants are, sadly only if you zoom in ![](https://i.imgur.com/zlgZ3N4.png) - the next track showing you the reads, also only if you zoom in to a postion ![](https://i.imgur.com/dUKDWX1.png) - the last track is showing the coverage for the full chromosome ![](https://i.imgur.com/XBdNTjF.png) If you want to zoom in to a location e.g. with high coverage you can double click at this position a few times or select this region on the first track. Zooming in shows you the reads like here. ![](https://i.imgur.com/j0n48Jg.png) The track where the reads are shown is longer than you see here, so there is a scrollbar on the right of the track. To see more reads at once you can change the view under <kbd>Reads: view:</kbd> > <kbd>Collapsed</kbd> ![](https://i.imgur.com/16fOjPA.png ### Jump to variants On the right side there is the "Variants" tab and by clicking on the variants the browser will jump to this position ![](https://i.imgur.com/tF6950z.png) E.g. for the first variant we have ![](https://i.imgur.com/IYAFWG7.png) When changing the view back to <kbd>Expanded</kbd> we also see which nucleotide is affected. ![](https://i.imgur.com/ohltC1L.png) In this case there is a `a` on the reference and `G` found on some reads in the alignment. If you want to save this postion for later or to share with someone else, you can copy the url from the browser and this should lead to this dataset/location: ![](https://i.imgur.com/jw9HKvI.png). In this case the URL is quite long [http://3.123.8.123/catgenome/#/omm/Akkermansia_muciniphila/2358443/...](http://3.123.8.123/catgenome/#/omm-new/B_caecimuris/2099436/2122472?tracks=%5B%7B%22b%22:%22omm-new%22,%22p%22:%221423_S34%22,%22f%22:%22REFERENCE%22,%22h%22:100,%22s%22:%7B%22rt%22:false,%22rsfs%22:true,%22rsrs%22:false%7D%7D,%7B%22b%22:%221423_S34_bwa_lofreq.vcf%22,%22p%22:%221423_S34%22,%22s%22:%7B%22v%22:%22Expanded%22%7D,%22h%22:85,%22n%22:%221423_S34_bwa_lofreq.vcf%22,%22f%22:%22VCF%22%7D,%7B%22b%22:%221423_S34_bwa_mapped_omm_sorted.bam%22,%22p%22:%221423_S34%22,%22s%22:%7B%22a%22:true,%22aa%22:true,%22c%22:%22noColor%22,%22c1%22:true,%22d%22:true,%22g1%22:%22default%22,%22i%22:true,%22m%22:true,%22r%22:0,%22s1%22:false,%22s2%22:true,%22s3%22:false,%22v1%22:false,%22cls%22:false,%22csm%22:%22default%22%7D,%22h%22:340,%22n%22:%221423_S34_bwa_mapped_omm_sorted.bam%22,%22f%22:%22BAM%22%7D,%7B%22b%22:%221423_S34_bwa_mapped_omm_sorted.coverage.bw%22,%22p%22:%221423_S34%22,%22s%22:%7B%22cls%22:false,%22csm%22:%22default%22%7D,%22h%22:56,%22n%22:%221423_S34_bwa_mapped_omm_sorted.coverage.bw%22,%22f%22:%22WIG%22%7D,%7B%22b%22:%22PROKKA_04282020.gff%22,%22p%22:%22%22,%22l%22:true,%22f%22:%22GENE%22,%22h%22:100,%22s%22:%7B%22g%22:%22collapsed%22%7D%7D%5D) ### Some interesting examples ==TODO== Maintainance --- This sections describes how one can connect to the AWS instance e.g. for update or modifications. To make it work you need the private key file (`pmuench.pem`), please write `philipp.muench@helmholtz-hzi.de` to get this. ### open a SSH connection ```bash! ssh -i ~/Documents/pmuench.pem ubuntu@ec2-3-123-8-123.eu-central-1.compute.amazonaws.com ``` ### Stop/Start server Stop the server ```bash! sudo docker ps # lists all docker instances, the first column in the table is the docker ID you need to stop the instance sudo docker stop $docker_id ``` Start the server ```bash! sudo docker run -p 80:8080 -v /home/ubuntu/data:/ngs -v -d lifescience/ngb:latest sudo docker run -p 8080:8080 -d -v /home/ubuntu/data:/ngs lifescience/ngb:latest sudo docker run -p 8080:8080 -d -v /home/ubuntu/data:/ngs ngb:latest ``` ### Add reference ```bash! sudo docker exec -it c61729e998ab /bin/bash # register reference ngb reg_ref /ngs/genomes/joined_reference_curated.fasta -n omm-new -g /ngs/PROKKA_04282020.gff ``` ### Add dataset ```bash! # omm is already a registerd reference ngb reg_dataset omm-new s3 ``` ### Add samples see [command line reference](https://github.com/epam/NGB/blob/master/docs/md/cli/command-reference.md) ```bash! # add bam file ngb add_dataset 1423_S34 /ngs/1423_S34_bwa_mapped_omm_sorted.bam # requires that a .bai (index) files is present, here at /ngs/1423_S34_bwa_mapped_omm_sorted.bam.bai # add coverage ngb add_dataset 1423_S34 /ngs/1423_S34_bwa_mapped_omm_sorted.coverage.bw # add variants ngb add_dataset 1423_S34 /ngs/1423_S34_bwa_lofreq.vcf ``` ### Hosting datasets in S3 instad of transfering the dataset to the AWS EC instance its cheaper to host it at [S3](https://aws.amazon.com/de/s3/) or even Blackbaze B2. Not sure how good blackbaze is supported (and the AWS grant money is not covering that), therefore I placed all `.bam` files and other large files to the S3 bucket at ==TODO==. These files can be then registered similar ```bash! ngb add_dataset 1423_S34 s3:paht-to.file.bam ``` #### Install AWS CLI tools and add files On the machine where the datasets are currently located we usw [AWS CLI tools](https://aws.amazon.com/de/cli/) for uploding to the S3 bucket. [Configure CLI tools](https://docs.aws.amazon.com/de_de/cli/latest/userguide/cli-chap-configure.html) ```bash! cd /home/pmuench/aws curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip ./aws/install --install-dir /home/pmuench/local/bin # ignore errors export PATH=/home/pmuench/local/bin/v2/2.0.12/bin/:$PATH # put it to the ~/.bashrc aws --version ``` - Access key ID: `AKIARMMRAZQ45IQBLAOS` - Private key: <kbd>Lastpass</kbd> > <kbd>AWS IAM pmuench</kbd> ```bash! aws configure # enter credentials # eu-central-1 cd /net/sgi/oligomm_ab/OligoMM-report/processed aws s3 sync I_cc_S46 s3://ngs-oligom ``` ### Register S3 files in NGB Login to instance and docker ```bash! ssh -i ~/Documents/pmuench.pem ubuntu@ec2-3-123-8-123.eu-central-1.compute.amazonaws.com sudo docker ps sudo docker exec -it e4c3cb1fc890 /bin/bash ``` Make changes (S3 credentials) described [here](http://ngb.opensource.epam.com/distr/dev/latest/docs/installation/standalone.html) ```bash export AWS_ACCESS_KEY_ID=AKIARMMRAZQ45IQBLAOS export AWS_SECRET_ACCESS_KEY=xy export AWS_DEFAULT_REGION=eu-central-1 ``` See [Documentation](http://ngb.opensource.epam.com/distr/dev/latest/docs/cli/command-reference.html) ```bash! ngb set_srv http://3.123.8.123/catgenome/ ngb reg_dataset omm-new I_cc_S46 ngb reg_file omm-new s3://ngs-oligomm/I_cc_S46/I_cc_S46_bowtie2_lofreq.vcf ``` ### Create a AWS instance and install NGB #### Installation based on Docker 1. created a EC2 instance with `pmuench.pem` with cmod set to `600` (DNS: `ec2-3-123-8-123.eu-central-1.compute.amazonaws.com`) 2. change *Security Group* and allow connections from Port 80 3. install dependencies ```bash sudo apt update sudo apt install openjdk-8-jdk sudo update-alternatives --config java # and select java 8 ``` 4. install NGB as described [here](https://github.com/epam/NGB) 6. login via SSH as described above, using the public DNA of your AWS instance which is shown on the EC2 panel in AWS test http://3.123.8.123/catgenome/ ```bash ngb set_srv http://3.123.8.123:8080/catgenome/ ``` Links --- - **[NGB Github page](https://github.com/epam/NGB)** - **[NGB further Documentation](http://ngb.opensource.epam.com/distr/latest/docs/user-guide/tracks-bam/index.html)** Open points --- - [ ] use `.gtf` files as reference since they seem to provide [more informations](http://ngb.opensource.epam.com/distr/latest/docs/user-guide/annotations/index.html)? - [ ] automated upload of all .bam file to S3 from grid using [AWS CLI tools](https://aws.amazon.com/de/cli/) - [ ] automate `ngb add_dataset` steps for S3 files? - [ ] check how we can add custom tracks, e.g. to visualize ref AF and location of phages. Should work like `.bw` files, e.g. somehow transform them to a binary object? Issue that NGB is not in line with other genome browsers (fixed) --- Seems that the alignment is not shown correct in the NGB, e.g. Start of _A. M_ genome there are 17k nts without any alignment on NGV but there _is_ a alignment when inspecting the bam file `samtools tview -p "10:1000" 1423_S34_bwa_mapped_omm_sorted.bam joined_reference_curated.fasta -d c` ![](https://i.imgur.com/1TeSgUP.png) and same for IGV ![](https://i.imgur.com/mBfzUxh.png) Solution: positions gest filtered by NGB and by disable <kbd>Show soft-clipped bases</kbd> it works and all viewer are showing the same alignment. ![](https://i.imgur.com/9VZrYxB.png)