Root refseq location is here: https://ftp.ncbi.nih.gov/refseq/release/
This page is downloading the genomic fasta files from RefSeq "complete".
The script used can be downloaded with the following (and is presented below for quick reference):
curl -L -o refseq-complete-genome-dl.sh https://figshare.com/ndownloader/files/35063551
The script gets the filenames ending in "genomic.fna.gz" available at this html page: https://ftp.ncbi.nih.gov/refseq/release/complete/
Then downloads them in parallel with xargs
. Generated files will have today's date in them, as well as the directory holding the genomes at the end.
bash refseq-complete-genome-dl.sh
#!/usr/bin/env bash
# Contact: Mike Lee (Mike.Lee@nasa.gov; github.com/AstrobioMike)
if [ "$#" != 0 ]; then
printf "\n Helper script to download all refseq complete genomes as of whatever today is.\n"
printf " See script for details. There are currently no guardrails or safety nets if a\n"
printf " download fails. So check the starting file count vs the total downloaded at end.\n\n"
printf " \tUsage:\n\t bash refseq-complete-genome-dl.sh\n\n"
printf " \tContact:\n\t Mike Lee (Mike.Lee@nasa.gov; github.com/AstrobioMike)\n\n"
exit
fi
# we can pull through https or ftp
# protocol="ftp"
protocol="https"
refseq_complete_base_link="${protocol}://ftp.ncbi.nlm.nih.gov/refseq/release/complete"
curr_date_marker=$(date +%d-%B-%Y)
refseq_html_file="refseq-${curr_date_marker}.html"
refseq_filenames_file="refseq-${curr_date_marker}-genome-files.txt"
genomes_dir="refseq-${curr_date_marker}-complete-genomes"
mkdir -p ${genomes_dir}
# downloading html page (using this to get all the files we want to download)
curl -L -s -o ${refseq_html_file} ${refseq_complete_base_link}
# parsing out genomic.fna.gz filenames (which are also their link suffixes)
grep "genomic.fna.gz" ${refseq_html_file} | cut -f 2 -d '"' > ${refseq_filenames_file}
# this is messy so that it works on darwin (mac) too
num_files=$(wc -l ${refseq_filenames_file} | sed 's/^ *//' | tr -s " " "\t" | cut -f 1)
printf "\n We are beginning the download of ${num_files} files now...\n"
printf " See you in a bit :)\n\n"
# downloading in parallel with xargs (num run in parallel is set with -P option)
xargs -I % -P 10 curl -L -s -O "${refseq_complete_base_link}/%" < ${refseq_filenames_file}
mv *genomic.fna.gz ${genomes_dir}
Interactive function plots and tables are here. BRAILLE update (31-Mar-2021) Previous update docs https://hackmd.io/@astrobiomike/BRAILLE-notes-17-Mar-2021 https://hackmd.io/@astrobiomike/BRAILLE-update-24-Feb-2021 https://hackmd.io/@astrobiomike/BRAILLE-update-3-Feb-2021 https://hackmd.io/@astrobiomike/BRAILLE-notes-12-Dec-2020
Nov 24, 2024Overview Metagenomics attempts to sequence all the DNA present in a sample. It can provide a window into the taxonomy and functional potential of a mixed community. There are a ton of things that can be done with metagenomics data as this non-exhaustive overview figure begins to highlight: <a href="https://astrobiomike.github.io/images/metagenomics_overview.png "><img src="https://astrobiomike.github.io/images/metagenomics_overview.png "></a> This page is an introduction to some concepts about one of the things we can try to do with metagenomics data: recovering metagenome-assembled genomes (MAGs). Key concepts
Jul 25, 2024<a href="https://github.com/AstrobioMike/AstrobioMike.github.io/raw/master/images/GToTree-logo-1200px.png "><img src="https://github.com/AstrobioMike/AstrobioMike.github.io/raw/master/images/GToTree-logo-1200px.png "></a>
Jul 22, 2024GUI used was Jetstream2 exosphere: https://jetstream2.exosphere.app/ Summary info The base image created below is publicly available as "STAMPS-2023" and includes: conda v23.5.2 / mamba v1.4.9 jupyterlab v3.6.3 in base conda env an anvio-dev conda environment R v4.3.1 / Rstudio Server (2023.06.1-524) with:BiocManager 1.30.21 remotes 2.4.2
Jul 8, 2024or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up