---
tags: GeneLab
title: Generating fasta for simulating data for human-read removal
---
# Generating fasta for simulating data for human-read removal
[toc]
## Download results
This holds 3 bugs (a Pseudomonas aeruginosa, a Staph epidermidis, and a Micrococcus luteus) and about 5 MB of human genome. Created as shown below, result fasta downloadable with the following:
```bash
curl -L -o mock-bug-and-human.fa.gz https://figshare.com/ndownloader/files/31454380
```
## Making
### Env
Used for downloading the genomes by accession and renaming the fasta headers
```bash
conda create -n bit -c conda-forge -c bioconda -c defaults -c astrobiomike bit
conda activate bit
```
### Downloading genomes
Making list of target accessions, 3 microbes picked randomly, and the human ref genome:
```bash
printf "GCF_000006765.1\nGCF_006094375.1\nGCF_019890915.1\nGCF_000001405.39\n" > target-accs.txt
```
Downloading:
```bash
bit-dl-ncbi-assemblies -w target-accs.txt -f fasta
```
### Subsetting human genome
Taking just over 5 MB, removing the starting lines that are all Ns:
```bash
zcat GCF_000001405.39.fa.gz | head -n 70000 | grep -v "^N*$" > GCF_000001405.39-subset.fa
rm GCF_000001405.39.fa.gz
```
### Renaming fasta headers
```bash
gunzip *.gz
bit-rename-fasta-headers -i GCF_000006765.1.fa -w bug-Pseudomonas-aeruginosa-GCF_000006765.1 -o bug-GCF_000006765.1-for-cat.fa
bit-rename-fasta-headers -i GCF_006094375.1.fa -w bug-Staphylococcus-epidermidis-GCF_006094375.1 -o bug-GCF_006094375.1-for-cat.fa
bit-rename-fasta-headers -i GCF_019890915.1.fa -w bug-Micrococcus-luteus-GCF_019890915.1 -o bug-GCF_019890915.1-for-cat.fa
bit-rename-fasta-headers -i GCF_000001405.39-subset.fa -w human-GCF_000001405.39 -o human-GCF_000001405.39-for-cat.fa
```
### Combining
```bash
cat *-for-cat.fa > mock-bug-and-human.fa
gzip mock-bug-and-human.fa
```