genelab-utils
to download GLDS dataThis page demonstrates using programs in the genelab-utils package to programmatically download specific files from a specific OSD or GLDS ID.
Contact Mike.Lee@nasa.gov if having trouble.
NOTE
The "OSD-170-file-info.tsv" file produced by the above holds urls to all files in the dataset.
genelab-utils
if neededThe genelab-utils package should be installed with conda/mamba. If you are not familiar with conda, you can find an introduction here if wanted, and if you are not familiar with mamba, there is a super-short introduction on that same page here – it's definitely worth using mamba if you use conda at all
If we wanted the raw fastq files from OSD-170 for example.
First here adding the --print-only
flag to see the files listed that would be downloaded.
And running it without the --print-only
flag like this would ask us if we want to download them:
We can add the -f
flag to avoid being asked to confirm, if wanting to be able to use it non-interactively.
Multiple patterns can be given to the -p
argument, separated by a comma. See help menu for note on 'additive' vs 'exclusive' flags for this argument.
See the help menu with GL-download-GLDS-data -h
for information on things like controlling how many downloads to run in parallel and whether to use additive or exclusive filtering when providing multiple patterns to search for in filenames. This is the help menu as of genelab-utils version 1.2.11:
As noted at the end of the help menu above, some confusion may arise due to recent changes in the OSD/GeneLab repository. It is possible that a GLDS ID and an OSD ID may not match up, e.g., 'OSD-561' (https://osdr.nasa.gov/bio/repo/data/studies/OSD-561) holds 'GLDS-556' (which we can see at the very top, just under the image next to the title):
Moving forward, it is recommended to search for the OSD ID (which you can search for based on a given GLDS ID here: https://osdr.nasa.gov/bio/repo/search) – as searching by OSD will find all the associated GLDS files no matter what their GLDS ID's are. E.g., GL-download-GLDS-data -g OSD-561 --print-only
Contact Mike Lee at Mike.Lee@nasa.gov if having trouble.