--- tags: N-exo title: Looking into hao and hcp annotations --- [toc] **Working on local here: ~/Documents/NASA/Nitrogen-project/initial-manuscript-with-paula/poking-at-hcp-and-hao** **Working on oberyn server here: /data3/Data_Processing/mlee/N-exo/poking-at-hcp-and-hao** # Looking into *hao* and *hcp* annotations hao: K10535 hcp: K05601 **Some quick confirmation/checks** - Ran through GhostKoala, all came back the same. - Ran through BlastKoala, all the same. ## Looking at BLASTp hits of highest covered genes ### *hao* (K10535; hydroxylamine dehydrogenase) Highest coverage genes by depth were: 1. 194837, 328 2. 778733, 237805 3. 778733, 237805 4. 778733, 237805 194837 has refseq blastp hits to hypotheticals, cytochrome C552's, and hydroxylamine oxidoreductases (highest percent IDs are 50-53) ``` >K10535_194837 MKKVRGLFYGGLTGILLLATWAGVAGAEDKPELLAEPEKETEPTWGHPAGEDCLKCHRENSPLLVAQWEDSPHAEIGVNCMDCHQAGQDDPDAITHHGQTVSTLVSPLDCGRCHEQEYNQHRGSTHAKAARRSKAWSEVLVHRLSGEIMQDIGCERCHGGEVKVLENQGGLDLNTWPDHGIGRLNPDDSRGNCSACHARHRFSKVQARAPETCAKCHGATDAPNWGIYISSSHGRHFQLFREHLKLSGEEWEPGRHYIEAPSCATCHMGGAGSLRPTHDVGMRNAWNLHAPISEQQYLVVLESGDKYNLPVSRKPPRKGDPVTKPDGGKGLVKAVATPERRRQAMIQVCRQCHGERTAQRYMEEFDQAVELYNSKFAQPARDMMQALYATKKLTPAPFDEPLEFTYWKLWHDAGIRARQGAAMSSPQYAWWWGMNQVAELFYGHFLPQAKNLAGEAFIRQHLKDLDAHQWLGKPEQAHAILGETLHPPREAPKEEKEEKEKPEQKKPAKSE ``` 328 has refseq blastp hits to hypotheticals, cytochrome c3 family protein, cytochrome C552, hydroxylamine oxidoreductase (highest percent IDs are 60-65) ``` >K10535_328 MIRACLIALCGLAFLASGATAQLPDEASVAGKCMTCHKEKHPGLYQQWYESAHGIHNVTCISCHKADEDDPDAYEHYDATIATLVTPKDCGRCHEKEAEQVSNSYHAKAGQILDSADAYLAHVGAGHPAAIQGCESCHGAKMEIDPESPNKLSKTSWPNSGIGRINPDGSLGSCNACHTRHSFDIAQSRQPQSCGKCHLGPDHPQMEIYEESKHGNTYYTNREKMNLDAQEWVVGEDYNVAPTCATCHMSATRTQDFTHDVGERIAWTNRPVISKHKENHVQKRTAMQDVCLACHGDTFADGHFYQYDASVKLYNEKFAKPAQQIMNKIKEKDLLETPAAFSNDVEWAFWELWHHEGRRARHGASMMGPDYTWWHGFYDVLHNFYFDFLPKAREYGDSEVNAMIDDLLQNDPMHQWVMTDTETLKRRINEGTMQEVFVNMFDAYIEAEAREDEEGTE ``` 778733 has refseq blastp hits to hydroxylamine oxidoreductases, cytochrome c3 family protein, hypotheticals (highest percent IDs are 80-83) ``` >K10535_778733 EHPGLFADWAQSRHANANITCFDCHKADETDPDVSQEHFKQYQRVDQPYGTSEYKIPIAAVVTPKDCSRCHPDEAKQYSQSKHANTMEIIWKIDPWLNKGMNSDFERASGCYHCHGTVLKMKDGKLDPLTWPNVGVGRINLDGSKGSCTSCHTRHLFSVMEARKPEACGQCHLGPDHPQIEIYMESKHGDIYTAHGDNYNWTAAPGTWSAGTDYRGPTCATCHISGAGTTLTTHDVTERLSWEIQAPLTIRPSEFKPFPAKTNWRTERDKMKAVCTQCHGKTWVDDHYVKLDKVVEEYNEVYFKPAKKMLDDLYDKGLLDKTKFFDERLEVEYYELWHHEGRRARMGAAMMAPDYAWWHGFYECKKRYSNFMEEARHLIDNNQKAYVAEDFPNATGDTT ``` 237805 has refseq blastp hits to hydroxylamine oxidases (80% ID), cytochrome c554 (53% ID) ``` >K10535_237805 MRHANLFVSVVLIVLLTASSVIGDDAPVSDATQECLDCHAAIHPGVVSDWRKSRHATVTPQAAMAVDELQRRMSGRKTPEPLLATSVGCAECHTLRGDAHADTFEHNGYDIHIVVSPDDCATCHTIEREQYAQNIMAMAEKNLSANSLYEDLERHISGTPEIKPDRLHFSPPNDMTQADACYYCHGTRLQVTGSEVRDTVAGELEFPVIANWPNQGVGRVNLDGSRGACSACHTRHRFSIEMARKPYTCKECHVGPDVPAFKVYSASKHGNIFASMNHEWEFNTVPWVIGEDFGAPTCATCHISLTVNTDGEVINRRSHQVSDRLGWRIFGLIYAHPQPKSPDTSIIRNKSGLPLPTNLDGSFAADHLISKDEVASRRAAMQRTCLACHDTSWVRGYFARLDNTITTSNQSVKVATQLMQSVWDKGYAQGLAGGGSIFDEYIERRWSDTWLLYANNIRFVSAMAGGGDYGVFEDGRYHLSKAIMDMHDWQQTRDLIMKKP ``` ### *hcp* (K05601; hydroxylamine reductase) Highest coverage genes by depth were: 1. 76668, 79801 2. 76668, 557114 3. 76668, 557114 4. 557114, 76668 76668 has refseq blastp hits to hydroxylamine reductases (80% ID) ``` >K05601_76668 MSMYCSQCEQTAGGQGCHQWGACGKSPEVDALQDLLLYCLRGLGQVAIRARQLGIDTDEADGFTGETLFATLTNVNFDPDDFINYIHRAIALREDLKQRIDAISPNADWSPVARFQPSRDFDELVESGREVEYRFISQSSRDVDIFSLKLTIIYGLKGIAAYAFHARELGQTDDRVDAFFHEILADLDRQDLSLEDWVNRALKVGEINLLAMELLDAGNTGTYGHPVPTPVPLNPKPGKAILVSGHDLKLLEALLQQTVGKDIAIYTHGEMLPAHGYPGLKQTYPHLYGHYGTAWQNQTREFAEFPGPIVMTTNCLMPPRNSYKDRVFTLGPVGWPGLTHLDNDDFTAVIDMALAMPGFETEAEPRQVMTGFARNAVLGVADEVIAAVKRGDIRHFFLVGGCDGAKPTRSYYSEFVEKVPQDCVVLTLACGKFRFFDKQLGDIGGIPRLMDVGQCNDAYSAIQIALALANAFETDVNQLPLSMVLSWYEQKAVGILLTLLYLGIQNIRLGPTLPAFISPRVFELLSQKYHLKAIATPDEDLAACLANG ``` 79801 has refseq blastp hits to hydroxylamine reductases (98% ID) ``` >K05601_79801 MFCEQCEQTASGNGCHQWGACGKSPQLNAVQDLLVYCLRGLAPVVIKARQLGISTHDEDVFTCESLFATMTNVNFDRKRFTKYVRECIEKRDNLKAKVKEASPEPVKWTRVSGYHPDFNESLVEQGQDLALGFISESARDVDIFSLKLTVLYGIKGAASYTFHAQELGQEDEQVYHFIQEALDALNHQDLTLDDWVNLALKVGEMNLRAMELLDAGHTNTYGHPTPTPVPLNPRKGKALLVSGHDIRQLEAILKQTADTGITVYTHGELLPAHGYPLLKQNYPHLYGHYGTAWQNQTKEFGKFPGAIIVTTNCLMPPHETYDEKLFTIGPVGFSGINYIPAEEGNIPDFSPAIQKSLDMPGFVEDQPPRQVMAGFARNAVLNVADQVIDGVKQGKIRHFFLVGGCDGAKPERNYYTEFVEKVPEDCIVLTLACGKFRFFDQQLGEIGSLPRLMDVGQCNDAYSAIQIALGLAKAFDMDVNQLPLSMILSWYEQKAVAVLLTLLYLGIKDIRLGPTLPAFISPNVFKLLSEKYQLKAITTPDEDLAACLG ``` 557114 has refseq blastp hits to hydroxylamine reductases (63% ID) ``` >K05601_557114 MSMFCYQCQETARNEGCTVRGVCGKTEDVSHLQDLLIWLLKGMSYWDAKARKLDVDDPAMGLFVAEGLFLTITNVNFDPDSYVQWIREAVEKRDALRARVEAQGEAPADLPEMATWTPETYTVEALESKGRSVGVMADPDLDPDVRSLRELLTYGLKGLAAYTDHAYVLEHTNDDLLAFIEEALVATADDTLGAEEMVDWVLRCGEMGVEGMRLLDEANTGTYGHPEPTQANIGVRPGPAILISGHDLRDLDELLQQTEGTGVNVYTHGEMLPANAYPAFKKYDHFVGNYGGSWWHQKQEFESFGGAILLTTNCLVPPKESYKDRLFTTGLVGWPDVPHIPDREPGQQKDFSAVIAKAKASDEPEPLEEGAIPIGFARHTVMSVADKVIEAVQAGAIRRFVVMAGCDGRFKSREYYTEVAKALPKDHVIMTAGCAKYRYNKLDLGEIDGIPRILDAGQCNDSYSLAYIALQLKDA ``` ## Annotating with PFam ```bash wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam34.0/Pfam-A.hmm.gz gunzip Pfam-A.hmm.gz # grabbing info file too wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam34.0/database_files/pfamA.txt.gz ``` Running search: ```bash conda activate hmmer-3.3 hmmsearch --cut_ga --cpu 5 --tblout pfam-hits.tab /data3/Data_Processing/mlee/ref-dbs/pfam/Pfam-A.hmm K05601-and-K10535-genes.faa > /dev/null ``` ### K10535/hao-annotated genes PFam results Online results should be up until 15-April-ish: https://www.ebi.ac.uk/Tools/hmmer/results/68F87BE6-98F2-11EB-8EBF-381EE976C163/score Top hit for all was this one: * [PF13447.8](https://pfam.xfam.org/family/PF13447) * Multi-haem_cyto; multihaem cytochrome * average seq coverage by domain: 42% * most abundant seqs in model are annotated as hydroxylamine oxydoreductase, then multi-haem cytos only All had multiple PFam hits (cause it's based on domains). Almost all had the following: * [PF09699.12](https://pfam.xfam.org/family/PF09699) * Paired_CXXCH_1; c-type cytochrome * average seq coverage by domain: 18% * pretty much all seqs in model are annotated as cytochrome * [PF13435.8](https://pfam.xfam.org/family/PF13435) (just a short domain) * Cytochrome_C554; tetra-haem cytochrome involved in the oxidation of ammonia * average seq coverage by domain: 15% * pretty much all seqs in model are annotated as cytochrome C554 * [PF14522.8](https://pfam.xfam.org/family/PF14522) * Cytochrome_C7; c7-type cytochromes * average seq covered by domain: 27% * pretty much all seqs in model are annotated as cytochrome C7 * [PF14537.8](https://pfam.xfam.org/family/PF14537) * Cytochrome c family * average seq covered by domain: 28% * pretty much all just cytochrome c3 Taking some representative seqs of each of those with different annotations to make a tree with ours: ``` cat K10535-annotated-gene-PFam-hit-rep-seqs.faa ``` ``` >PF09699_W6K9T9_MAGSQ-c-type-cytochrome MMGASARHLVAWGVAIALFAGFSSALAKEAKPLTPEELQNIRSIQSACLK CHSETGIAKLEKKEFEYEQMQGLFVDQTVYEASSHGLVDCLACHVTGYKE YPHFEQGKSIINNCDECHTREFLYIEEQYMESVHHKQMTTKIACDSCHDP HVFLKASKFKTVGEAVAQDNAMCAQCHGITSSLATLGRGRQEDLEKNHEW LPNLKAHWTKVRCVDCHTAPWKKQLLSHNIMGAKGALRDCVACHTVDSAL RLRQYAHLVGQEREKLGFINSVILNEAYVVGATRNRYLDLAFGGITLLVV AGIGVHTLARIIAYRRRSRRHD >PF13435_X5DD27_9BACT-cytochrome-c554 MKLKNLIILLGLAVLFSTAATAQTYKYIGAAKCKMCHNKPDKGEQFKVWE AGPHAKAMEALQGDEKNDPTCLKCHSTAGSVDSGLLAGLKADEGVSCESC HGPGSHYKSAAIMKNKKWPFQKE >PF13447_Q60AB1_METCA-hydroxylamine-oxidoreductase MTSFGFVPILCTDNDQDGLPKEHTPMIKMTTTWLLAALLMFVQLTASATA GETDFSGLKDKYEKDHPGKGKFSQYWEPIPIQKYWNPRNFYQPPTAVSGE VSRDQCVACHQSLTPGAFHAWENSTHAKLDAIRNLSNGQDARFYKKEKLA EIERNLVKQGVLKEGEPLKEVGCIDCHGKVGAQSIRHDKDLVMPDRTQCG SCHLQEFAEAESEKDQQWPQGQWGKGHPSHAVDWEANVETAIWAGMAERE IAQGCDMCHYQQNKCDGCHTRHSFSAAEARQPEACATCHNGVDHNEWENY TLSKHGTVYQTHKSTWNFDVPLKDALTKGGYTAPTCQYCHFEFNGEFSHN LVRKVRWGFNPTPAIADNLKHPWFEGRKENWNTTCAHCHSPSFARSYLEA ADKGTLAGLKVEQEAKQVVEGLFRDGLLPGQKTNRPAPPAPEKDAPGGFF QLFWAKGNNPSHVERVHADMWEHDLIKLYKGLVHGNPGGFTYTEGWSELM RDYAVIMDENTRLREKSGNAPGAAAANPPAGKDDSNVRNVLGGLALLAGI AVLLYRRKH >PF13447_S7TP77_9DELT-hydroxylamine-oxidoreductase MRFHLLPSLALFSLLAVPCLCGAANPDASGTPEAKTSVPKPPISDATTTC LSCHEMVTPGVVAGWRASRHAQITPAEALKVPAASRRVSSSDIPEELSGT VVGCAECHVTRSAEHGKGAFEHNGFTIHVDVSPDDCAQCHAVERKQYDGN IMAHAYGNLAHNTLYMQAVNAIDGAAAMNAHGRVEIRPASPEAMAGTCFH CHGTKLRVTGTKSRDTALGPMEFPVIAGWPNNGVGRVNLDGSMGSCAACH TRHQFAVKTARQPYTCEECHNGPDVPAYKIYAASKHGGIFSTMKSKWDWN AMPWTVGRDFTAPTCAVCHISQVVTPDGTVLATRTHTMNDRLPWRIFGLP FAHPEPKSPDTSIIRNAEGLPLPTSLNGQVASSFLIDAKEMAVRKARLQK VCSGCHGSAWVEGHWARFEKAIAETDQTVLAGNRVMQKAWKLGLADEKDM FDEYPEKLWGSTWLFYANSSRFTSAMAGGGDYGVFAEGRYQLMQTVLQLQ DWVDARRGKKK >PF13447_Q1Q5N7_KUEST-hydroxylamine-oxidoreductase MKFHRMRAVLIAIPFLMCVFFSNVKEVKAEANLKELQKKATSYYEILYPN EPLPMWDWLGANVGLEKGAGPWMELYKPVPLQMYWFPGRHYVKPDGTYYD QLLERFKPTDCVTCHEEVTPGFVNDWKDSTHANPKKSPQFAEKTQQIEKL IGRELKEVTCSDCHGKDHKELHMPTPAHCGECHPKEVTEFMGERERGRPN HIDGLMANVVPPWYPEMFRRGYPAAQFGCDLCHATDRCNICHSRHKFAAA EGRRPEACMSCHMGFDHPDAETYSESKMGYIYHMEGEHWDWEKPLAEVVP GKDYRTPTCQFCHFDQGNGTFAHNPVTKGVWRMGTIPPKGIEYKSSLKDY PYGINLPPMNYKLDVYSPENKKRSEQWVNVCSKCHSPRFARLYRENLDEF MFEMWRLQDRAQGILDEIVALDAFEVPIKKRDIFPLGDILADALGPGLLG EAVYSAFKTTGGKVPVIGPILGLYGVFMTGKNNPSQIELEYANMWFGDKA HAYKGVAHGQQDIAWWYGAAKVYQGINKLESQAEQLKKLKKLDELLESKK RKGLAGILGSIIGGVAVLMISAAIWKKRRDASQQ >PF14522_W5WXJ3_BDEBC-cytochrome-c3 MHRVFKRLAGNAFAFAGALVVSALLMGCKFQPGFGYNKGYAPEQPIPFEH SLHVGTHNIQCQYCHNQVERTKHANIPSLQTCMNCHLQVATDKPSIQKMR ELYDNGGSVEWVRVHMLPDFVHFNHNAHVSKGVNCQTCHGQIETMKTVEQ FSDLSMGWCVNCHRQPENKAPLNCSTCHY >PF14537_U9VQD9_9CYAN-cytochrome-c3 MIGRQLTGWLRSLLVIGLSVLLMFIVTPMASATTQQELDDITELWQTSAH ALNDINCASCHQNNETKEFVAVPDHETCRSCHEQAVDTFLLSKHGIRLLE GDSPLTPAMARLPMQHEAMQKQMNCNACHNVHSANTTEASVDACLTCHND NHSLNYQNSRHAELFAASQELPRPGPGAVSCSTCHLPRVVHGQGDNAVVK VNHNNTYNLKPQDRMVGDVCMSCHGIEYSYNSIFDPELVEANFDRPPNLE METFELMKAAEARRTGNTSE ``` #### hao tree **Based on [the tree](https://itol.embl.de/tree/13822922232347011617947800), i'm inclined to think we do mostly have hydroxylamine oxidoreductases.** <a href="https://i.imgur.com/WfkBD8B.png"><img src="https://i.imgur.com/WfkBD8B.png"></a> The top clade near PF09699, c-type-cytochrome, are our gene IDs with these coverages: ``` 595128 : 262.91 613.471 0 0 921360 : 0 106.568 272.379 169.45 296856 : 411.275 507.073 644.473 0 94026 : 0 0 599.278 282.665 ``` With summed coverages lower than/around 1,000. Compared to the 10,000 in L1, then > 40,000 in L2, L3, and L4. I'd say this isn't too consequential on what we're seeing either. ### K05601/hcp-annotated genes PFam results Every K05601 gene came back with only hit being to [PF03063.22 (prismane)](https://pfam.xfam.org/family/Prismane). Says in description there: > *This family includes both hybrid-cluster proteins and the beta chain of carbon monoxide dehydrogenase. The hybrid-cluster proteins contain two Fe/S centres - a [4Fe-4S] cubane cluster, and a hybrid [4Fe-2S-2O] cluster. The physiological role of this protein is as yet unknown, although a role in nitrate/nitrite respiration has been suggested [1]. The prismane protein from Escherichia coli was shown to contain hydroxylamine reductase activity (NH2OH + 2e + 2 H+ -> NH3 + H2O). This activity is rather low. Hydroxylamine reductase activity was also found in CO-dehydrogenase in which the active site Ni was replaced by Fe [2]. The CO dehydrogenase contains a Ni-3Fe-2S-3O centre.* > > *Literature references* > *1. van den Berg WA, Hagen WR, van Dongen WM; , Eur J Biochem 2000;267:666-676.: The hybrid-cluster protein ('prismane protein') from Escherichia coli. Characterization of the hybrid-cluster protein, redox properties of the [2Fe-2S] and [4Fe-2S-2O] clusters and identification of an associated NADH oxidoreductase containing FAD and [2F PUBMED:10651802 EPMC:10651802 > 2. Wolfe MT, Heo J, Garavelli JS, Ludden PW; , J Bacteriol 2002;184:5898-5902.: Hydroxylamine reductase activity of the hybrid cluster protein from Escherichia coli. PUBMED:12374823 EPMC:12374823* Domain used for model covers almost the whole protein (https://pfam.xfam.org/family/Prismane#tabview=tab6). It's not easy to distinguish based on sequence. From domain organization page (https://pfam.xfam.org/family/Prismane#tabview=tab1 ), taking some representative seqs of those with different annotations to make a tree with ours: X5DGC1_9BACT [Draconibacterium orientale] Hydroxylamine reductase {ECO:0000256|HAMAP-Rule:MF_00069} (550 residues) ACDA2_METAC [Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A)] Acetyl-CoA decarbonylase/synthase complex subunit alpha 2 {ECO:0000255|HAMAP-Rule:MF_01137} (805 residues) A0A1Y1XTB6_9FUNG [Basidiobolus meristosporus CBS 931.73] Hybrid cluster protein {ECO:0000313|EMBL:ORX89009.1} (1197 residues) R6GB04_9FIRM [Eubacterium hallii CAG:12] Carbon-monoxide dehydrogenase catalytic subunit {ECO:0000313|EMBL:CDB18794.1} (573 residues) ``` cat PF03063-rep-seqs.faa ``` ``` >PFAM_X5DGC1_9BACT-hydroxylamine-reductase MSMFCFQCQEAAKGTGCTIAGVCGKTSDVANLQDTLLYVLKGICWYNEKL RAVGANPKKVDKIVFDGLFSTITNANFDAAVFTKRIIKALQLRNELHVLS KEAGVALPAELPAIATWTGNTTEEFEAKAEEVGVLSTENEDIRSLRELII YGVKGLAAYAEHAYNLGSQKDEIFAFMQRALVATTEDLSIDELVALTLET GKFGVDVMALLDAANTGSYGNPEATKVNIGVRNNPAILISGHDMKDMEEL LKQTEGTGVDVYTHSEMLPAHYYPAFKKYDHLVGNYGNAWWKQNTEFESF NGPILFTTNCIVPPKESATYTDRIYTTGASGLEGAIHIPDRENGKMKDFS AIIEHAKKCAAPQEIETGEIVGGFAHAQVFALADKIVDAVKSGAIKKFFV MAGCDGRMKSRDYYTEFAEQLPQDTVILTAGCAKYRYNKLPLGDIGGIPR VIDAGQCNDSYSLAVIALKLKEVFELNDINELPIAYNIAWYEQKAVIVLL ALLSLGVKNIHLGPTLPAFLSPNVANVLVENFGIGGITEVEKDLEMFMSA >ACDA2_METAC-Acetyl-CoA-decarbonylase-synthase-subunit MSKLTTGSFSIEDLESVQITINNIVGAAKEAAEEKAKELGPMGPTAMAGL ASYRSWNLLLLDRYEPVLTPMCDQCCYCTYGPCDLSGNKRGACGIDMAGQ TGREFFLRVITGTACHAAHGRHLLDHVIEVFGEDLPLNLGESNVLTPNVT ICTGLSPKTLGECRAPMEYVEEQLTQLLATIHAGQESAEIDYDSKALFSG SLDHVGMEVSDIAQVSAYDFPKADPEAPLIEIGMGSIDKSKPLIVAIGHN VAGVTYIMDYMEENNLTDKMEIAGLCCTAFDMTRYKEADRRAPYAKIVGS LAKELKVIRSGMPDVIVVDEQCVRGDVLSESQKLKIPVIASNEKIMMGLP DRTDADVDSIVEEIKSGAIPGCVMLDYDKLGELIPKIAEVMAPIRDAEGI TAIPTDEEFKVYIDKCVKCGECMLACPEELDIPEALEYAAKGSYEYLEAL HDVCIGCRRCEQVCKKEIPILNVLEKAAQKSISEEKGWVRSGRGQASDAE IRKEGLNLVMGTTPGIIAIIGCPNYPAGTKDVYLIAEEFLKRNYLLAVSG CSAMDIGMFKDEDGKTLYEKYPGTFAGGGLLNTGSCVSNAHISGAAEKVA GIFAQRTLAGNLAEIADYTLNRVGACGLAWGAYSQKAASIGTGCNIYGIP AVLGPHSSKYRRALIAKNYDESKWKVYDGRDGSEMTIPPAPEFLLTTAET WQEAIPMMAKACIRPSDNNMGRSIKLTHWMELSKKYLGVEPEDWWKFVRN EADLPLAKREELLKRLEAEHGWEIDWKRKKIISGPKIKFDVSAQPTNLKR LCKEA >PFAM_A0A1Y1XTB6_9FUNG-hybrid-cluster-protein MSTEKPRMLILYGTQTGTTEGYAKVVQTFARIRSFDVKLCRMDEIDHATL PSEPLIVFLTCTFYNGEFPDSAVPLWTYLKRQDHSPDLFRKTRYAVFGFG NRTLQENFNKAAKSLDERMSQLGGFNILPVGLGDEYDGNGHETAFRPWLK ALWTKLTGSDVKMTLPVSVKIQKSDRAAPEVTHEGYTRVPVSSNKRLTAP DYERTGSLITFDISQTSLEYDVAGHIQVVPENPDELVARAARLLNADLDT VVEIQPTDDSVSLPSLATVRQLLKNYLDISSIPSRALIEGFSCLATDAHE QEALESLASDMLAGNMYMKLSNSTVFSIVDVLERYPSVKISLEQFISNVP KLTSRYYSIASSPLVAKDKIDIVFFVEEWKTEDGGKFQGLASTYLSNKSP NCADPYVHMKIHSGLVQLPERLDTPILGVALGSGIGVFRSILQHREVLLE QGHELSRIRLYYGMRYYEHEFLFQDELDHFTRKGLVEVIDAASRDHQKNC AVRMLDFPEKVADYLDNNGTYLYCGLGGLIPGAMEITVGECLQANKQISY EESLEIIANLKKENRWQVEAYAKSVDEENALKSIILKRGGHADGEGVPTA TLYEDAKMFCYQCEQTYQGRGCTTIGVCGKTPEVAALQDLLITCLKRLSW FAYNLRQLHQEHPNQIQESEVEYPEVNHYTLKATFSTLTNVNFDNSRFLE FHQECRNFTKRLSVQYQSLCKRVGTRSKKCPIPESVSDILDSTPGSVGDI EDMLVSKGKEVGILSRMRATKNDALVGLQEMIVYGLKGLCAYADHALVLA HEDRRIYEYVHRAFHFLTTKDSKDMEKVLGYLMELGQVNLICMDVLHNAN NTFGAQSPHTVSLKPRPGKCVLVSGHDFKFLDALLRQTEGMGINVYTHGE MLPAHGYPKLRKYKHLAGHYGVAWQRQSIEFPHFPGAIVMTTNCLTAPKD DYQGRLFTVGVVGWPNIAHIGDDLDFSAVIKVALESPGFTEDTPEFKYPP SSFTPVTDNYQVGFSSETVIGVAPTVLKAIETGDISRVFVIGGCDGYEGE RSYYTDLAKMLPESAVVLTAGCGKFRINSLQWKTIGDSGIPRLLDMGQCN DSYAAIQIASALAEALNCTVHDLPLSIVLSWFEQKAVVVLLSLLSLGLQN IRVGPQLPAFLRPSAVKILSDKFGLKLIGDPKLDLEEIYGGQIPASA >PFAM_R6GB04_9FIRM-CO-dehydrogenase-catalytic-subunit MADKICSSADKVLEVFLENAPMDTSHHRMVKQQNKCGYGLQGVCCRLCSN GPCRLSPSKPKGVCGADADTIACRNFLRQVAAGSGCYTHVVENTARRLKE LAQELQAEGKKPKYKDSVAKLAKILQINCCGNCGPDCHNSCAKTAEMIAD AVLADIRKPYDEKMTLMKNIALPKRYELWEKLGILPGGAKDEIFNAVVKT STNLNSDPMDMLLQCLRLGISTGNYGLILTNLMNDIIMGPPQISMDPVGF RIIDPEYINIMITGHQQSMFADLEEKLESEIVQKSAELVGDIIMGPPQIS MDPVGFRIIDPEYINIMITGHQQSMFADLEEKLESEIVQKSAQLVGAKGI RIVGCTCVGQDYQARSGCYKDVYCGHAGNKGIRIVGCTCVGQDYQARSGC YKDVYCGHAGNNYTSEAVLMTGCVDLVVSEFNCTIPGIEPICEQLDIKML CLDDVAKKANAELLPYTAEEKEKITSQIIADALCGFKNRKEKLYGTAPAK GEKRVNVMAQHGFDKSQKKKKRLQVRLSQMLYVDLKTEKKNYMEPLRQKE KSVSMSWHSMVLTNLLPVYQKIL ``` #### hcp tree **Looking at [the tree](https://itol.embl.de/tree/13822922232180131617925262), we def don't have the CO one mixed in, all of ours are in the hydroxylamine reductase grouping, closely related to the hybrid-cluster protein, which is the "prismane" protein that was only detected under anaerobic conditions when nitrate/nitrite were added (see end of abstract [here](https://doi.org/10.1046/j.1432-1327.2000.01032.x)).** <a href="https://i.imgur.com/gyvooGF.png"><img src="https://i.imgur.com/gyvooGF.png"></a> **So we may just be stuck saying generally something like this:** **"This is what it looks like based on these annotations (K05601 and PF03063) and keeping with the dominant annotation of hydroxylamine reductase from KEGG. But the type of prismane proteins both of these annotation models (KOFamScan and PFAM) capture include hydroxylamine activity and possibly nitrate/nitrite respiration. Their high relative coverage here is of interest, perhaps these are physiologically involved with nitrate and/or nitrite. Molecular biology work is required to further investigate... bla bla... real work needed..."** ## Checking out some that were annotated but didn't pass the most stringent score thresholds **gene_ID : CPM coverages in 4 depths pulled from shiny app** **K05601 (hcp)** ``` 130727 : 0 323.865 0 975.625 194074 : 0 0 270.352 680.045 463481 : 0 412.326 0 250.424 569665 : 0 279.314 522.722 298.099 586817 : 0 268.195 282.664 124.037 807092 : 0 0 868.844 285.206 823538 : 0 107.475 0 303.97 828527 : 0 0 0 324.658 884545 : 0 0 712.932 264.212 888396 : 0 0 194.418 159.404 ``` > **The sums of these for depths 2,3, and 4 are in the range of 1-4,000 CPM. The cummulative coverage for the KO is in the range of 90,000. So these slightly less stringent annotations are not what's causing the high coverage. That's good.** **K10535 (hao)** None