---
title: 'mRhisin1 investigation'
disqus: hackmd
---
Investigating VGP's mRhisin1 assembly
===
By Tanya Lama with contributions from Giulio Formenti, Ariadna Morales and Hannah Frank
## Summary:
We suspect that VGP's mRhiSin1 genome assembly is actually not derived from Rhinolophus sinicus genetic material, rather another bat species resultant from a species misidentification or data swap.
## Table of Contents
[TOC]
### Investigating the source of error
First, we confirmed that our local versions of the mRhiSin1 assembly match that which is [archived on NCBI](https://www.ncbi.nlm.nih.gov/bioproject/776471) and archived in the vgp genomeark aws bucket. We concluded that the error did not originate from the Hiller, Davalos, Ray, Teeling or Hughes labs.
The Vertebrate Genomes lab generated the assembly, but did not collect or generate the data for mRhiSin1. We concluded that the error did not likely originate from VGL. Although data swaps have occurred on occasion (e.g., see fAntMac1 and fEsoLuc1, or mLynCan4), such errors are easily identified pre-curation using HiC heat maps.
The samples were collected by bat taxonomy expert Dr. Zhang and submitted to VGL by Dr. Jianguo Lu (SYSU). The sampling trip completed by Dr. Zhang did not target any species other than Rhinolophus sinicus. A picture of the individual bat from which the samples were collected has not yet been provided. A species misidentification in the field, or a data swap at the sequencing center have not been ruled out as sources of error.
### Phylogenetic evidence
In January 2022 a phylogeny estimated from whole genome alignemnts for the bat1k longevity project placed mRhiSin1 (*Rhinolophus sinicus*) in an unusual position next to *Hipposideros larvatus* with a branch length of near 0.

**Figure**: Please note that VGP's mRhiSin1 genome is denoted as HLRhiSin1.
To rule out methodological bias, Ariadna Morales used a more robust gene/species tree method to estimate the phylogeny 1) with [VGP's mRhiSin1 assembly](https://www.ncbi.nlm.nih.gov/bioproject/776471) and 2) with [a 2017 *Rhinolophus sinicus* assembly from NCBI](https://www.ncbi.nlm.nih.gov/assembly/GCA_001888835.1/)
#### Tree with VGP's mRhiSin1 assembly

**Figure**: Note that VGP's mRhiSin1 assembly reproduces the same error, placing *Rhinolophus sinicus* within *Hipposideros* bats
#### Tree with SKLEC & IECR's Rhinolophus sinicus assembly

**Figure**: Note that SKLEC & IECR's assembly places *Rhinolophus sinicus* in the expected phylogenetic position, as sister to *Rhinolophus affinis*
#### Tree based on *Rhinolophus sinicus* ACE2 sequences
Hannah Frank generated a simple neighbor joining tree using all available bat ACE2 receptor sequences. Her tree places vgp's mRhisin1 sequence (denoted here as RhiSin) as an outgroup to all other *Rhinolophus sinicus* sequences.

### % Identity using BLAST
Using VGP's mRhiSin1 assembly, I blasted a bunch of random sequences, finding that *Hipposideros armiger* and *Rhinolophus ferrumequium* were often listed with the highest % Identity for any given sequence. I found this unusual, considering *Rhinolophus sinicus* is available on the BLAST database, yet never comes up as the top hit.
Ariadna Morales took a small set of highly conserved genes linked to SARS-CoV2 from VGP's mRhiSin1 assembly, and blasted them as well. Each result demonstrated >99% identity for *Hippoideros armiger* and <94% for *Rhinolophus sinicus* (see below).
This leads us to believe mRhiSin1 may be another species, likely a Hipposiderid. My understanding is that VGP has plans to sequence *Hipposideros armiger*. A data swap between *Rhinolophus sinicus* and *Hipposideros armiger* at VGL has not been ruled out
### Whole genome similarity based on MASH sketch
A mash sketch was used to compare sequence similarity between mRhiSin1 and three other bat genomes: Bat1k's *Hipposideros larvatus*, SKLEC & IECR's *Rhinolophus sinicus*, and vgp's *Rhinolophus ferrumequium*.

**Figure**: The result shows striking simmilarly between vgp's mRhiSin1 with bat1k's mHipLar1. At this point the most plausible explanation seems to be a sample swap. It is also likely that mRhiSin1 is actually a Hipposiderid, potentially *Hipposideros larvatus*.
### % Identity based on mitocontigs
Lastly, we blasted mitocontigs present in the vgp mRhiSin1 assembly against the BOLD barcoding system.
*Hipposideros cf.larvatus* matches the mitocontigs with 100% identity.

This provides further evidence that the error originated from field collection or raw data generation.
The assembly will be flagged on NCBI until collaborators can ascertain what species this assembly originated from.
## Appendix and FAQ
:::info
**Find this document incomplete?** Leave a comment!
:::
###### tags: `VGP`