--- tags: GeneLab title: Initial GL reference page template --- # GeneLab-hosted reference databases This page hosts pre-built reference databases that are utilized by [GeneLab's data processing](https://github.com/nasa/GeneLab_Data_Processing). With any questions or issues, please contact: Michael D. Lee (Mike.Lee@nasa.gov) --- ## Organism-specific functional annotations Files holding the tab-delimited annotation tables end with the extension `*.tsv`. Build information is included in the corresponding `*.txt` files. - *Arabidopsis thaliana* - [Arabidopsis_thaliana.TAIR10.48-GL-annotations.tsv](https://figshare.com/ndownloader/files/35939648) - [Arabidopsis_thaliana.TAIR10.48-GL-build-info.txt](https://figshare.com/ndownloader/files/35939654) - *Drosophila melanogaster* - [Drosophila_melanogaster.BDGP6.28.101-GL-annotations.tsv](https://figshare.com/ndownloader/files/35939663) - [Drosophila_melanogaster.BDGP6.28.101-GL-build-info.txt](https://figshare.com/ndownloader/files/35939660) - *Homo sapiens* - [Homo_sapiens.GRCh38.101-GL-annotations.tsv](https://figshare.com/ndownloader/files/35939645) - [Homo_sapiens.GRCh38.101-GL-build-info.txt](https://figshare.com/ndownloader/files/35939651) - *Mus musculus* - [Mus_musculus.GRCm38.101-GL-annotations.tsv](https://figshare.com/ndownloader/files/35939642) - [Mus_musculus.GRCm38.101-GL-build-info.txt](https://figshare.com/ndownloader/files/35939657) --- ## Pre-built Kraken2 databases ### Removing human reads All metagenomics datasets need to be screened against a reference human genome prior to being released on GeneLab. Our workflow for doing that with short reads is available [here](https://github.com/nasa/GeneLab_Data_Processing/tree/master/Metagenomics/Remove_human_reads_from_raw_data), with information on building the database [here](https://github.com/nasa/GeneLab_Data_Processing/blob/master/Metagenomics/Remove_human_reads_from_raw_data/reference-database-info.md). - [kraken2-human-db.tar.gz](https://ndownloader.figshare.com/files/25627058) ### Estimating host reads When there is a known host organism for metagenomic data, we screen the reads against a kraken2 reference database for that organism just to get an estimate of the proportion of reads potentially originating from the host (these are not removed from what is released on GeneLab). Our workflow for that with short reads is available [here](https://github.com/nasa/GeneLab_Data_Processing/tree/master/Metagenomics/Estimate_host_reads_in_raw_data), with information on building the available databases [here](https://github.com/nasa/GeneLab_Data_Processing/blob/master/Metagenomics/Estimate_host_reads_in_raw_data/reference-database-info.md). - [kraken2-mouse-db.tar.gz](https://figshare.com/ndownloader/files/33900572) --- With any questions or issues, please contact: Michael D. Lee (Mike.Lee@nasa.gov)