--- tags: GeneLab title: Govind-wasp-annotations-2022 --- # Govind wasp annotations 2022 [toc] ## Looking for GAJC IDs in NCBI files for *Leptopilina heterotoma* Getting RefSeq files from here: https://www.ncbi.nlm.nih.gov/genome/17698?genome_assembly_id=1491693 ### Getting gff file ```bash # getting and unzipping gff file curl -LO https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/015/476/425/GCF_015476425.1_ASM1547642v1/GCF_015476425.1_ASM1547642v1_genomic.gff.gz gunzip GCF_015476425.1_ASM1547642v1_genomic.gff.gz ``` ### Investigating lines with a GAJC ID ```bash # how many lines total in file wc -l GCF_015476425.1_ASM1547642v1_genomic.gff # 444152 GCF_015476425.1_ASM1547642v1_genomic.gff # how many lines with a GAJC grep -c "GAJC" GCF_015476425.1_ASM1547642v1_genomic.gff # 17 # how many unique GAJC IDs grep "GAJC" GCF_015476425.1_ASM1547642v1_genomic.gff | sed 's/GAJC/\|GAJC/g' | cut -f 2 -d "|" | cut -f 1 -d ";" | sort -u # GAJC01001300.1 # GAJC01028679.1 # GAJC01029969.1 ```