---
tags: N-exo
title: Looking into hao and hcp annotations
---
[toc]
**Working on local here: ~/Documents/NASA/Nitrogen-project/initial-manuscript-with-paula/poking-at-hcp-and-hao**
**Working on oberyn server here: /data3/Data_Processing/mlee/N-exo/poking-at-hcp-and-hao**
# Looking into *hao* and *hcp* annotations
hao: K10535
hcp: K05601
**Some quick confirmation/checks**
- Ran through GhostKoala, all came back the same.
- Ran through BlastKoala, all the same.
## Looking at BLASTp hits of highest covered genes
### *hao* (K10535; hydroxylamine dehydrogenase)
Highest coverage genes by depth were:
1. 194837, 328
2. 778733, 237805
3. 778733, 237805
4. 778733, 237805
194837 has refseq blastp hits to hypotheticals, cytochrome C552's, and hydroxylamine oxidoreductases (highest percent IDs are 50-53)
```
>K10535_194837
MKKVRGLFYGGLTGILLLATWAGVAGAEDKPELLAEPEKETEPTWGHPAGEDCLKCHRENSPLLVAQWEDSPHAEIGVNCMDCHQAGQDDPDAITHHGQTVSTLVSPLDCGRCHEQEYNQHRGSTHAKAARRSKAWSEVLVHRLSGEIMQDIGCERCHGGEVKVLENQGGLDLNTWPDHGIGRLNPDDSRGNCSACHARHRFSKVQARAPETCAKCHGATDAPNWGIYISSSHGRHFQLFREHLKLSGEEWEPGRHYIEAPSCATCHMGGAGSLRPTHDVGMRNAWNLHAPISEQQYLVVLESGDKYNLPVSRKPPRKGDPVTKPDGGKGLVKAVATPERRRQAMIQVCRQCHGERTAQRYMEEFDQAVELYNSKFAQPARDMMQALYATKKLTPAPFDEPLEFTYWKLWHDAGIRARQGAAMSSPQYAWWWGMNQVAELFYGHFLPQAKNLAGEAFIRQHLKDLDAHQWLGKPEQAHAILGETLHPPREAPKEEKEEKEKPEQKKPAKSE
```
328 has refseq blastp hits to hypotheticals, cytochrome c3 family protein, cytochrome C552, hydroxylamine oxidoreductase (highest percent IDs are 60-65)
```
>K10535_328
MIRACLIALCGLAFLASGATAQLPDEASVAGKCMTCHKEKHPGLYQQWYESAHGIHNVTCISCHKADEDDPDAYEHYDATIATLVTPKDCGRCHEKEAEQVSNSYHAKAGQILDSADAYLAHVGAGHPAAIQGCESCHGAKMEIDPESPNKLSKTSWPNSGIGRINPDGSLGSCNACHTRHSFDIAQSRQPQSCGKCHLGPDHPQMEIYEESKHGNTYYTNREKMNLDAQEWVVGEDYNVAPTCATCHMSATRTQDFTHDVGERIAWTNRPVISKHKENHVQKRTAMQDVCLACHGDTFADGHFYQYDASVKLYNEKFAKPAQQIMNKIKEKDLLETPAAFSNDVEWAFWELWHHEGRRARHGASMMGPDYTWWHGFYDVLHNFYFDFLPKAREYGDSEVNAMIDDLLQNDPMHQWVMTDTETLKRRINEGTMQEVFVNMFDAYIEAEAREDEEGTE
```
778733 has refseq blastp hits to hydroxylamine oxidoreductases, cytochrome c3 family protein, hypotheticals (highest percent IDs are 80-83)
```
>K10535_778733
EHPGLFADWAQSRHANANITCFDCHKADETDPDVSQEHFKQYQRVDQPYGTSEYKIPIAAVVTPKDCSRCHPDEAKQYSQSKHANTMEIIWKIDPWLNKGMNSDFERASGCYHCHGTVLKMKDGKLDPLTWPNVGVGRINLDGSKGSCTSCHTRHLFSVMEARKPEACGQCHLGPDHPQIEIYMESKHGDIYTAHGDNYNWTAAPGTWSAGTDYRGPTCATCHISGAGTTLTTHDVTERLSWEIQAPLTIRPSEFKPFPAKTNWRTERDKMKAVCTQCHGKTWVDDHYVKLDKVVEEYNEVYFKPAKKMLDDLYDKGLLDKTKFFDERLEVEYYELWHHEGRRARMGAAMMAPDYAWWHGFYECKKRYSNFMEEARHLIDNNQKAYVAEDFPNATGDTT
```
237805 has refseq blastp hits to hydroxylamine oxidases (80% ID), cytochrome c554 (53% ID)
```
>K10535_237805
MRHANLFVSVVLIVLLTASSVIGDDAPVSDATQECLDCHAAIHPGVVSDWRKSRHATVTPQAAMAVDELQRRMSGRKTPEPLLATSVGCAECHTLRGDAHADTFEHNGYDIHIVVSPDDCATCHTIEREQYAQNIMAMAEKNLSANSLYEDLERHISGTPEIKPDRLHFSPPNDMTQADACYYCHGTRLQVTGSEVRDTVAGELEFPVIANWPNQGVGRVNLDGSRGACSACHTRHRFSIEMARKPYTCKECHVGPDVPAFKVYSASKHGNIFASMNHEWEFNTVPWVIGEDFGAPTCATCHISLTVNTDGEVINRRSHQVSDRLGWRIFGLIYAHPQPKSPDTSIIRNKSGLPLPTNLDGSFAADHLISKDEVASRRAAMQRTCLACHDTSWVRGYFARLDNTITTSNQSVKVATQLMQSVWDKGYAQGLAGGGSIFDEYIERRWSDTWLLYANNIRFVSAMAGGGDYGVFEDGRYHLSKAIMDMHDWQQTRDLIMKKP
```
### *hcp* (K05601; hydroxylamine reductase)
Highest coverage genes by depth were:
1. 76668, 79801
2. 76668, 557114
3. 76668, 557114
4. 557114, 76668
76668 has refseq blastp hits to hydroxylamine reductases (80% ID)
```
>K05601_76668
MSMYCSQCEQTAGGQGCHQWGACGKSPEVDALQDLLLYCLRGLGQVAIRARQLGIDTDEADGFTGETLFATLTNVNFDPDDFINYIHRAIALREDLKQRIDAISPNADWSPVARFQPSRDFDELVESGREVEYRFISQSSRDVDIFSLKLTIIYGLKGIAAYAFHARELGQTDDRVDAFFHEILADLDRQDLSLEDWVNRALKVGEINLLAMELLDAGNTGTYGHPVPTPVPLNPKPGKAILVSGHDLKLLEALLQQTVGKDIAIYTHGEMLPAHGYPGLKQTYPHLYGHYGTAWQNQTREFAEFPGPIVMTTNCLMPPRNSYKDRVFTLGPVGWPGLTHLDNDDFTAVIDMALAMPGFETEAEPRQVMTGFARNAVLGVADEVIAAVKRGDIRHFFLVGGCDGAKPTRSYYSEFVEKVPQDCVVLTLACGKFRFFDKQLGDIGGIPRLMDVGQCNDAYSAIQIALALANAFETDVNQLPLSMVLSWYEQKAVGILLTLLYLGIQNIRLGPTLPAFISPRVFELLSQKYHLKAIATPDEDLAACLANG
```
79801 has refseq blastp hits to hydroxylamine reductases (98% ID)
```
>K05601_79801
MFCEQCEQTASGNGCHQWGACGKSPQLNAVQDLLVYCLRGLAPVVIKARQLGISTHDEDVFTCESLFATMTNVNFDRKRFTKYVRECIEKRDNLKAKVKEASPEPVKWTRVSGYHPDFNESLVEQGQDLALGFISESARDVDIFSLKLTVLYGIKGAASYTFHAQELGQEDEQVYHFIQEALDALNHQDLTLDDWVNLALKVGEMNLRAMELLDAGHTNTYGHPTPTPVPLNPRKGKALLVSGHDIRQLEAILKQTADTGITVYTHGELLPAHGYPLLKQNYPHLYGHYGTAWQNQTKEFGKFPGAIIVTTNCLMPPHETYDEKLFTIGPVGFSGINYIPAEEGNIPDFSPAIQKSLDMPGFVEDQPPRQVMAGFARNAVLNVADQVIDGVKQGKIRHFFLVGGCDGAKPERNYYTEFVEKVPEDCIVLTLACGKFRFFDQQLGEIGSLPRLMDVGQCNDAYSAIQIALGLAKAFDMDVNQLPLSMILSWYEQKAVAVLLTLLYLGIKDIRLGPTLPAFISPNVFKLLSEKYQLKAITTPDEDLAACLG
```
557114 has refseq blastp hits to hydroxylamine reductases (63% ID)
```
>K05601_557114
MSMFCYQCQETARNEGCTVRGVCGKTEDVSHLQDLLIWLLKGMSYWDAKARKLDVDDPAMGLFVAEGLFLTITNVNFDPDSYVQWIREAVEKRDALRARVEAQGEAPADLPEMATWTPETYTVEALESKGRSVGVMADPDLDPDVRSLRELLTYGLKGLAAYTDHAYVLEHTNDDLLAFIEEALVATADDTLGAEEMVDWVLRCGEMGVEGMRLLDEANTGTYGHPEPTQANIGVRPGPAILISGHDLRDLDELLQQTEGTGVNVYTHGEMLPANAYPAFKKYDHFVGNYGGSWWHQKQEFESFGGAILLTTNCLVPPKESYKDRLFTTGLVGWPDVPHIPDREPGQQKDFSAVIAKAKASDEPEPLEEGAIPIGFARHTVMSVADKVIEAVQAGAIRRFVVMAGCDGRFKSREYYTEVAKALPKDHVIMTAGCAKYRYNKLDLGEIDGIPRILDAGQCNDSYSLAYIALQLKDA
```
## Annotating with PFam
```bash
wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam34.0/Pfam-A.hmm.gz
gunzip Pfam-A.hmm.gz
# grabbing info file too
wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam34.0/database_files/pfamA.txt.gz
```
Running search:
```bash
conda activate hmmer-3.3
hmmsearch --cut_ga --cpu 5 --tblout pfam-hits.tab /data3/Data_Processing/mlee/ref-dbs/pfam/Pfam-A.hmm K05601-and-K10535-genes.faa > /dev/null
```
### K10535/hao-annotated genes PFam results
Online results should be up until 15-April-ish: https://www.ebi.ac.uk/Tools/hmmer/results/68F87BE6-98F2-11EB-8EBF-381EE976C163/score
Top hit for all was this one:
* [PF13447.8](https://pfam.xfam.org/family/PF13447)
* Multi-haem_cyto; multihaem cytochrome
* average seq coverage by domain: 42%
* most abundant seqs in model are annotated as hydroxylamine oxydoreductase, then multi-haem cytos only
All had multiple PFam hits (cause it's based on domains). Almost all had the following:
* [PF09699.12](https://pfam.xfam.org/family/PF09699)
* Paired_CXXCH_1; c-type cytochrome
* average seq coverage by domain: 18%
* pretty much all seqs in model are annotated as cytochrome
* [PF13435.8](https://pfam.xfam.org/family/PF13435) (just a short domain)
* Cytochrome_C554; tetra-haem cytochrome involved in the oxidation of ammonia
* average seq coverage by domain: 15%
* pretty much all seqs in model are annotated as cytochrome C554
* [PF14522.8](https://pfam.xfam.org/family/PF14522)
* Cytochrome_C7; c7-type cytochromes
* average seq covered by domain: 27%
* pretty much all seqs in model are annotated as cytochrome C7
* [PF14537.8](https://pfam.xfam.org/family/PF14537)
* Cytochrome c family
* average seq covered by domain: 28%
* pretty much all just cytochrome c3
Taking some representative seqs of each of those with different annotations to make a tree with ours:
```
cat K10535-annotated-gene-PFam-hit-rep-seqs.faa
```
```
>PF09699_W6K9T9_MAGSQ-c-type-cytochrome
MMGASARHLVAWGVAIALFAGFSSALAKEAKPLTPEELQNIRSIQSACLK
CHSETGIAKLEKKEFEYEQMQGLFVDQTVYEASSHGLVDCLACHVTGYKE
YPHFEQGKSIINNCDECHTREFLYIEEQYMESVHHKQMTTKIACDSCHDP
HVFLKASKFKTVGEAVAQDNAMCAQCHGITSSLATLGRGRQEDLEKNHEW
LPNLKAHWTKVRCVDCHTAPWKKQLLSHNIMGAKGALRDCVACHTVDSAL
RLRQYAHLVGQEREKLGFINSVILNEAYVVGATRNRYLDLAFGGITLLVV
AGIGVHTLARIIAYRRRSRRHD
>PF13435_X5DD27_9BACT-cytochrome-c554
MKLKNLIILLGLAVLFSTAATAQTYKYIGAAKCKMCHNKPDKGEQFKVWE
AGPHAKAMEALQGDEKNDPTCLKCHSTAGSVDSGLLAGLKADEGVSCESC
HGPGSHYKSAAIMKNKKWPFQKE
>PF13447_Q60AB1_METCA-hydroxylamine-oxidoreductase
MTSFGFVPILCTDNDQDGLPKEHTPMIKMTTTWLLAALLMFVQLTASATA
GETDFSGLKDKYEKDHPGKGKFSQYWEPIPIQKYWNPRNFYQPPTAVSGE
VSRDQCVACHQSLTPGAFHAWENSTHAKLDAIRNLSNGQDARFYKKEKLA
EIERNLVKQGVLKEGEPLKEVGCIDCHGKVGAQSIRHDKDLVMPDRTQCG
SCHLQEFAEAESEKDQQWPQGQWGKGHPSHAVDWEANVETAIWAGMAERE
IAQGCDMCHYQQNKCDGCHTRHSFSAAEARQPEACATCHNGVDHNEWENY
TLSKHGTVYQTHKSTWNFDVPLKDALTKGGYTAPTCQYCHFEFNGEFSHN
LVRKVRWGFNPTPAIADNLKHPWFEGRKENWNTTCAHCHSPSFARSYLEA
ADKGTLAGLKVEQEAKQVVEGLFRDGLLPGQKTNRPAPPAPEKDAPGGFF
QLFWAKGNNPSHVERVHADMWEHDLIKLYKGLVHGNPGGFTYTEGWSELM
RDYAVIMDENTRLREKSGNAPGAAAANPPAGKDDSNVRNVLGGLALLAGI
AVLLYRRKH
>PF13447_S7TP77_9DELT-hydroxylamine-oxidoreductase
MRFHLLPSLALFSLLAVPCLCGAANPDASGTPEAKTSVPKPPISDATTTC
LSCHEMVTPGVVAGWRASRHAQITPAEALKVPAASRRVSSSDIPEELSGT
VVGCAECHVTRSAEHGKGAFEHNGFTIHVDVSPDDCAQCHAVERKQYDGN
IMAHAYGNLAHNTLYMQAVNAIDGAAAMNAHGRVEIRPASPEAMAGTCFH
CHGTKLRVTGTKSRDTALGPMEFPVIAGWPNNGVGRVNLDGSMGSCAACH
TRHQFAVKTARQPYTCEECHNGPDVPAYKIYAASKHGGIFSTMKSKWDWN
AMPWTVGRDFTAPTCAVCHISQVVTPDGTVLATRTHTMNDRLPWRIFGLP
FAHPEPKSPDTSIIRNAEGLPLPTSLNGQVASSFLIDAKEMAVRKARLQK
VCSGCHGSAWVEGHWARFEKAIAETDQTVLAGNRVMQKAWKLGLADEKDM
FDEYPEKLWGSTWLFYANSSRFTSAMAGGGDYGVFAEGRYQLMQTVLQLQ
DWVDARRGKKK
>PF13447_Q1Q5N7_KUEST-hydroxylamine-oxidoreductase
MKFHRMRAVLIAIPFLMCVFFSNVKEVKAEANLKELQKKATSYYEILYPN
EPLPMWDWLGANVGLEKGAGPWMELYKPVPLQMYWFPGRHYVKPDGTYYD
QLLERFKPTDCVTCHEEVTPGFVNDWKDSTHANPKKSPQFAEKTQQIEKL
IGRELKEVTCSDCHGKDHKELHMPTPAHCGECHPKEVTEFMGERERGRPN
HIDGLMANVVPPWYPEMFRRGYPAAQFGCDLCHATDRCNICHSRHKFAAA
EGRRPEACMSCHMGFDHPDAETYSESKMGYIYHMEGEHWDWEKPLAEVVP
GKDYRTPTCQFCHFDQGNGTFAHNPVTKGVWRMGTIPPKGIEYKSSLKDY
PYGINLPPMNYKLDVYSPENKKRSEQWVNVCSKCHSPRFARLYRENLDEF
MFEMWRLQDRAQGILDEIVALDAFEVPIKKRDIFPLGDILADALGPGLLG
EAVYSAFKTTGGKVPVIGPILGLYGVFMTGKNNPSQIELEYANMWFGDKA
HAYKGVAHGQQDIAWWYGAAKVYQGINKLESQAEQLKKLKKLDELLESKK
RKGLAGILGSIIGGVAVLMISAAIWKKRRDASQQ
>PF14522_W5WXJ3_BDEBC-cytochrome-c3
MHRVFKRLAGNAFAFAGALVVSALLMGCKFQPGFGYNKGYAPEQPIPFEH
SLHVGTHNIQCQYCHNQVERTKHANIPSLQTCMNCHLQVATDKPSIQKMR
ELYDNGGSVEWVRVHMLPDFVHFNHNAHVSKGVNCQTCHGQIETMKTVEQ
FSDLSMGWCVNCHRQPENKAPLNCSTCHY
>PF14537_U9VQD9_9CYAN-cytochrome-c3
MIGRQLTGWLRSLLVIGLSVLLMFIVTPMASATTQQELDDITELWQTSAH
ALNDINCASCHQNNETKEFVAVPDHETCRSCHEQAVDTFLLSKHGIRLLE
GDSPLTPAMARLPMQHEAMQKQMNCNACHNVHSANTTEASVDACLTCHND
NHSLNYQNSRHAELFAASQELPRPGPGAVSCSTCHLPRVVHGQGDNAVVK
VNHNNTYNLKPQDRMVGDVCMSCHGIEYSYNSIFDPELVEANFDRPPNLE
METFELMKAAEARRTGNTSE
```
#### hao tree
**Based on [the tree](https://itol.embl.de/tree/13822922232347011617947800), i'm inclined to think we do mostly have hydroxylamine oxidoreductases.**
<a href="https://i.imgur.com/WfkBD8B.png"><img src="https://i.imgur.com/WfkBD8B.png"></a>
The top clade near PF09699, c-type-cytochrome, are our gene IDs with these coverages:
```
595128 : 262.91 613.471 0 0
921360 : 0 106.568 272.379 169.45
296856 : 411.275 507.073 644.473 0
94026 : 0 0 599.278 282.665
```
With summed coverages lower than/around 1,000. Compared to the 10,000 in L1, then > 40,000 in L2, L3, and L4. I'd say this isn't too consequential on what we're seeing either.
### K05601/hcp-annotated genes PFam results
Every K05601 gene came back with only hit being to [PF03063.22 (prismane)](https://pfam.xfam.org/family/Prismane). Says in description there:
> *This family includes both hybrid-cluster proteins and the beta chain of carbon monoxide dehydrogenase. The hybrid-cluster proteins contain two Fe/S centres - a [4Fe-4S] cubane cluster, and a hybrid [4Fe-2S-2O] cluster. The physiological role of this protein is as yet unknown, although a role in nitrate/nitrite respiration has been suggested [1]. The prismane protein from Escherichia coli was shown to contain hydroxylamine reductase activity (NH2OH + 2e + 2 H+ -> NH3 + H2O). This activity is rather low. Hydroxylamine reductase activity was also found in CO-dehydrogenase in which the active site Ni was replaced by Fe [2]. The CO dehydrogenase contains a Ni-3Fe-2S-3O centre.*
>
> *Literature references*
> *1. van den Berg WA, Hagen WR, van Dongen WM; , Eur J Biochem 2000;267:666-676.: The hybrid-cluster protein ('prismane protein') from Escherichia coli. Characterization of the hybrid-cluster protein, redox properties of the [2Fe-2S] and [4Fe-2S-2O] clusters and identification of an associated NADH oxidoreductase containing FAD and [2F PUBMED:10651802 EPMC:10651802
> 2. Wolfe MT, Heo J, Garavelli JS, Ludden PW; , J Bacteriol 2002;184:5898-5902.: Hydroxylamine reductase activity of the hybrid cluster protein from Escherichia coli. PUBMED:12374823 EPMC:12374823*
Domain used for model covers almost the whole protein (https://pfam.xfam.org/family/Prismane#tabview=tab6). It's not easy to distinguish based on sequence.
From domain organization page (https://pfam.xfam.org/family/Prismane#tabview=tab1
), taking some representative seqs of those with different annotations to make a tree with ours:
X5DGC1_9BACT [Draconibacterium orientale] Hydroxylamine reductase {ECO:0000256|HAMAP-Rule:MF_00069} (550 residues)
ACDA2_METAC [Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A)] Acetyl-CoA decarbonylase/synthase complex subunit alpha 2 {ECO:0000255|HAMAP-Rule:MF_01137} (805 residues)
A0A1Y1XTB6_9FUNG [Basidiobolus meristosporus CBS 931.73] Hybrid cluster protein {ECO:0000313|EMBL:ORX89009.1} (1197 residues)
R6GB04_9FIRM [Eubacterium hallii CAG:12] Carbon-monoxide dehydrogenase catalytic subunit {ECO:0000313|EMBL:CDB18794.1} (573 residues)
```
cat PF03063-rep-seqs.faa
```
```
>PFAM_X5DGC1_9BACT-hydroxylamine-reductase
MSMFCFQCQEAAKGTGCTIAGVCGKTSDVANLQDTLLYVLKGICWYNEKL
RAVGANPKKVDKIVFDGLFSTITNANFDAAVFTKRIIKALQLRNELHVLS
KEAGVALPAELPAIATWTGNTTEEFEAKAEEVGVLSTENEDIRSLRELII
YGVKGLAAYAEHAYNLGSQKDEIFAFMQRALVATTEDLSIDELVALTLET
GKFGVDVMALLDAANTGSYGNPEATKVNIGVRNNPAILISGHDMKDMEEL
LKQTEGTGVDVYTHSEMLPAHYYPAFKKYDHLVGNYGNAWWKQNTEFESF
NGPILFTTNCIVPPKESATYTDRIYTTGASGLEGAIHIPDRENGKMKDFS
AIIEHAKKCAAPQEIETGEIVGGFAHAQVFALADKIVDAVKSGAIKKFFV
MAGCDGRMKSRDYYTEFAEQLPQDTVILTAGCAKYRYNKLPLGDIGGIPR
VIDAGQCNDSYSLAVIALKLKEVFELNDINELPIAYNIAWYEQKAVIVLL
ALLSLGVKNIHLGPTLPAFLSPNVANVLVENFGIGGITEVEKDLEMFMSA
>ACDA2_METAC-Acetyl-CoA-decarbonylase-synthase-subunit
MSKLTTGSFSIEDLESVQITINNIVGAAKEAAEEKAKELGPMGPTAMAGL
ASYRSWNLLLLDRYEPVLTPMCDQCCYCTYGPCDLSGNKRGACGIDMAGQ
TGREFFLRVITGTACHAAHGRHLLDHVIEVFGEDLPLNLGESNVLTPNVT
ICTGLSPKTLGECRAPMEYVEEQLTQLLATIHAGQESAEIDYDSKALFSG
SLDHVGMEVSDIAQVSAYDFPKADPEAPLIEIGMGSIDKSKPLIVAIGHN
VAGVTYIMDYMEENNLTDKMEIAGLCCTAFDMTRYKEADRRAPYAKIVGS
LAKELKVIRSGMPDVIVVDEQCVRGDVLSESQKLKIPVIASNEKIMMGLP
DRTDADVDSIVEEIKSGAIPGCVMLDYDKLGELIPKIAEVMAPIRDAEGI
TAIPTDEEFKVYIDKCVKCGECMLACPEELDIPEALEYAAKGSYEYLEAL
HDVCIGCRRCEQVCKKEIPILNVLEKAAQKSISEEKGWVRSGRGQASDAE
IRKEGLNLVMGTTPGIIAIIGCPNYPAGTKDVYLIAEEFLKRNYLLAVSG
CSAMDIGMFKDEDGKTLYEKYPGTFAGGGLLNTGSCVSNAHISGAAEKVA
GIFAQRTLAGNLAEIADYTLNRVGACGLAWGAYSQKAASIGTGCNIYGIP
AVLGPHSSKYRRALIAKNYDESKWKVYDGRDGSEMTIPPAPEFLLTTAET
WQEAIPMMAKACIRPSDNNMGRSIKLTHWMELSKKYLGVEPEDWWKFVRN
EADLPLAKREELLKRLEAEHGWEIDWKRKKIISGPKIKFDVSAQPTNLKR
LCKEA
>PFAM_A0A1Y1XTB6_9FUNG-hybrid-cluster-protein
MSTEKPRMLILYGTQTGTTEGYAKVVQTFARIRSFDVKLCRMDEIDHATL
PSEPLIVFLTCTFYNGEFPDSAVPLWTYLKRQDHSPDLFRKTRYAVFGFG
NRTLQENFNKAAKSLDERMSQLGGFNILPVGLGDEYDGNGHETAFRPWLK
ALWTKLTGSDVKMTLPVSVKIQKSDRAAPEVTHEGYTRVPVSSNKRLTAP
DYERTGSLITFDISQTSLEYDVAGHIQVVPENPDELVARAARLLNADLDT
VVEIQPTDDSVSLPSLATVRQLLKNYLDISSIPSRALIEGFSCLATDAHE
QEALESLASDMLAGNMYMKLSNSTVFSIVDVLERYPSVKISLEQFISNVP
KLTSRYYSIASSPLVAKDKIDIVFFVEEWKTEDGGKFQGLASTYLSNKSP
NCADPYVHMKIHSGLVQLPERLDTPILGVALGSGIGVFRSILQHREVLLE
QGHELSRIRLYYGMRYYEHEFLFQDELDHFTRKGLVEVIDAASRDHQKNC
AVRMLDFPEKVADYLDNNGTYLYCGLGGLIPGAMEITVGECLQANKQISY
EESLEIIANLKKENRWQVEAYAKSVDEENALKSIILKRGGHADGEGVPTA
TLYEDAKMFCYQCEQTYQGRGCTTIGVCGKTPEVAALQDLLITCLKRLSW
FAYNLRQLHQEHPNQIQESEVEYPEVNHYTLKATFSTLTNVNFDNSRFLE
FHQECRNFTKRLSVQYQSLCKRVGTRSKKCPIPESVSDILDSTPGSVGDI
EDMLVSKGKEVGILSRMRATKNDALVGLQEMIVYGLKGLCAYADHALVLA
HEDRRIYEYVHRAFHFLTTKDSKDMEKVLGYLMELGQVNLICMDVLHNAN
NTFGAQSPHTVSLKPRPGKCVLVSGHDFKFLDALLRQTEGMGINVYTHGE
MLPAHGYPKLRKYKHLAGHYGVAWQRQSIEFPHFPGAIVMTTNCLTAPKD
DYQGRLFTVGVVGWPNIAHIGDDLDFSAVIKVALESPGFTEDTPEFKYPP
SSFTPVTDNYQVGFSSETVIGVAPTVLKAIETGDISRVFVIGGCDGYEGE
RSYYTDLAKMLPESAVVLTAGCGKFRINSLQWKTIGDSGIPRLLDMGQCN
DSYAAIQIASALAEALNCTVHDLPLSIVLSWFEQKAVVVLLSLLSLGLQN
IRVGPQLPAFLRPSAVKILSDKFGLKLIGDPKLDLEEIYGGQIPASA
>PFAM_R6GB04_9FIRM-CO-dehydrogenase-catalytic-subunit
MADKICSSADKVLEVFLENAPMDTSHHRMVKQQNKCGYGLQGVCCRLCSN
GPCRLSPSKPKGVCGADADTIACRNFLRQVAAGSGCYTHVVENTARRLKE
LAQELQAEGKKPKYKDSVAKLAKILQINCCGNCGPDCHNSCAKTAEMIAD
AVLADIRKPYDEKMTLMKNIALPKRYELWEKLGILPGGAKDEIFNAVVKT
STNLNSDPMDMLLQCLRLGISTGNYGLILTNLMNDIIMGPPQISMDPVGF
RIIDPEYINIMITGHQQSMFADLEEKLESEIVQKSAELVGDIIMGPPQIS
MDPVGFRIIDPEYINIMITGHQQSMFADLEEKLESEIVQKSAQLVGAKGI
RIVGCTCVGQDYQARSGCYKDVYCGHAGNKGIRIVGCTCVGQDYQARSGC
YKDVYCGHAGNNYTSEAVLMTGCVDLVVSEFNCTIPGIEPICEQLDIKML
CLDDVAKKANAELLPYTAEEKEKITSQIIADALCGFKNRKEKLYGTAPAK
GEKRVNVMAQHGFDKSQKKKKRLQVRLSQMLYVDLKTEKKNYMEPLRQKE
KSVSMSWHSMVLTNLLPVYQKIL
```
#### hcp tree
**Looking at [the tree](https://itol.embl.de/tree/13822922232180131617925262), we def don't have the CO one mixed in, all of ours are in the hydroxylamine reductase grouping, closely related to the hybrid-cluster protein, which is the "prismane" protein that was only detected under anaerobic conditions when nitrate/nitrite were added (see end of abstract [here](https://doi.org/10.1046/j.1432-1327.2000.01032.x)).**
<a href="https://i.imgur.com/gyvooGF.png"><img src="https://i.imgur.com/gyvooGF.png"></a>
**So we may just be stuck saying generally something like this:**
**"This is what it looks like based on these annotations (K05601 and PF03063) and keeping with the dominant annotation of hydroxylamine reductase from KEGG. But the type of prismane proteins both of these annotation models (KOFamScan and PFAM) capture include hydroxylamine activity and possibly nitrate/nitrite respiration. Their high relative coverage here is of interest, perhaps these are physiologically involved with nitrate and/or nitrite. Molecular biology work is required to further investigate... bla bla... real work needed..."**
## Checking out some that were annotated but didn't pass the most stringent score thresholds
**gene_ID : CPM coverages in 4 depths pulled from shiny app**
**K05601 (hcp)**
```
130727 : 0 323.865 0 975.625
194074 : 0 0 270.352 680.045
463481 : 0 412.326 0 250.424
569665 : 0 279.314 522.722 298.099
586817 : 0 268.195 282.664 124.037
807092 : 0 0 868.844 285.206
823538 : 0 107.475 0 303.97
828527 : 0 0 0 324.658
884545 : 0 0 712.932 264.212
888396 : 0 0 194.418 159.404
```
> **The sums of these for depths 2,3, and 4 are in the range of 1-4,000 CPM. The cummulative coverage for the KO is in the range of 90,000. So these slightly less stringent annotations are not what's causing the high coverage. That's good.**
**K10535 (hao)**
None