sra-tool kit

tags: bioinformatic

Sequence Read Archive (SRA) data, available through multiple cloud providers and NCBI servers, is the largest publicly available repository of high throughput sequencing data.

Using the SRA Toolkit to convert .sra files into other formats

Installation

cd ~/tools
wget --output-document sratoolkit.tar.gz http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-ubuntu64.tar.gz
tar -zxvf sratoolkit.tar.gz 
rm sratoolkit.tar.gz
cd sratoolkit.tar.gz/bin

env

echo "export PATH=$PATH:/home/hunglin/tools/sratoolkit.2.10.5-ubuntu64/bin" >> ./.bashrc
which fastq-dump

test

vdb-config --interactive #chache中RAM +1MB
fastq-dump --stdout SRR8839822 |head -n 10

Download SRA

prefetch [SRA accession] [SRA 2]

也可以再同一條指令下載多個SRA f
ile

sra to fastq

fasterq-dump SRR8839822 -o FSIS11811834 -t /dev/shm/ -e 6 -p 

join   :|-------------------------------------------------- 100%   
concat :|-------------------------------------------------- 100%   
spots read      : 565,950
reads read      : 1,131,900
reads written   : 1,131,900

-t 緩存位置

It is helpful for the speed-up, if the output-path and the scratch-path are on different file-systems. For instance it is a good idea to point the temporary directory to a SSD if available or a RAM-disk like /dev/shm if enough RAM is available.

-o 檔名
-e 使用核心數
-p 顯示進度條

search info

vdb-dump --info SRR8839822
acc    : SRR8839822
type   : Database
platf  : SRA_PLATFORM_ILLUMINA
SEQ    : 565,950
SCHEMA : NCBI:align:db:alignment_sorted#1.3
TIME   : 0x000000005ca3f081 (04/03/2019 07:30)
FMT    : FASTQ
FMTVER : 2.9.1
LDR    : latf-load.2.9.1
LDRVER : 2.9.1
LDRDATE: Jun 15 2018 (6/15/2018 0:0)

HowTo: fasterq dump
How to use NCBI SRA Toolkit effectively?