Tanya Lama

@tlama

As a molecular ecologist I use population genomic data to explore the ecology of species of conservation concern.

Joined on Apr 1, 2019

  • Uploading raw reads to NCBI SRA database hackmd-github-sync-badge By Tanya Lama Table of Contents Note that you need to preload files if you are uploading files >10Gb in size or more than 300 files into a single archive Files must be in a single folder
     Like  Bookmark
  • Downloading VGP genomes using awscli By Tanya Lama Table of Contents Install install or module load anaconda and awscli/1.16.144 I have installed awscli on Unity following the instructions here: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html module load awscli-v2/2.15.53
     Like  Bookmark
  • Configuring Rclone for Unity By Tanya Lama Objective Configure rclone on Unity for the transfer of files to and from cloud storage. Table of Contents Note that there may be a file size limit on uploads, depending on what platform you are using for storage (e.g., the Box limit is 15GB). Any file larger will return an error. Google Drive does not appear to have an upper limit.
     Like  Bookmark
  • By Tanya Lama These scripts were initially run for my dissertation along with other pop gen metrics, but the analysis was eventually abandoned because I didn't have time/support in interpreting the results at the time. This analysis has become much more common among popgen studies so there are likely better resources out there, but if you need to get scripts running in a pinch this should do it for you. Essentially, step one is to take the fasta Canada lynx genome file, align it to a reference (in this case we have Canada lynx but previously had also used Felis Catus) and use the resultant bam file for the PSMC workflow. Only have one genome (e.g., just the reference?) No problem! Just align the raw reads from your assembly to the assembly itself, and use that bam file for the PSMC. I beliee Bentley et al did this for this paper in PNAS and I bet the scripts are decent too: here #This may be easier than converting our .maf to a bam?
     Like 1 Bookmark
  • By Tanya Lama Sign up for a Unity account here. You need a PI to endorse your account. https://unity.rc.umass.edu/ Linux/Mac Users Most distributions of linux come with the openssh client, which you can use to connect to the cluster. On your local terminal, navigate to /Users/username/.ssh. If this folder does not exist, create it using mkdir /Users/tanyalama/.ssh If the file ~/.ssh/config doesn't exist, create it using nano config.
     Like  Bookmark
  • SNeP: trends in recent effective population size trajectories using genome-wide SNP data By Tanya Lama Read more about SNeP from Mario Barbato, here: https://www.frontiersin.org/articles/10.3389/fgene.2015.00109/full Objective We're interested in looking at effective population size trends in "contemporary" demographic history (most recent ~1000 generations) that would be complementary to PSMC and likely more of interest to our state agency folks than PSMC. SNeP and IBDNe are two methods to do this, but SNeP has proven easier to use because IBDNe requires a full pedigree and linkage map which we don't have at this time. Background
     Like  Bookmark
  • Investigating VGP's mRhisin1 assembly By Tanya Lama with contributions from Giulio Formenti, Ariadna Morales and Hannah Frank Summary: We suspect that VGP's mRhiSin1 genome assembly is actually not derived from Rhinolophus sinicus genetic material, rather another bat species resultant from a species misidentification or data swap. Table of Contents [TOC] Investigating the source of error
     Like  Bookmark
  • By Tanya Lama Open terminal and log in using your username and password. Note that the cursor will not appear as you type your password in. Press Enter $ ssh tl50a@ghpcc06.umassrc.org tl50a@ghpcc06.umassrc.org's password: m*otism*otis If you have successfully logged in, the console will read:
     Like  Bookmark
  • See my new academic website here As a molecular ecologist, I use population genomic data to explore the ecology of species of conservation concern in natural and human-altered ecosystems. Using a combination of field, laboratory and quantitative approaches, my work is centered on understanding the evolutionary mechanisms, such as gene flow and selection, that shape population health, spatial patterns in genetic variation, distributions, and adaptation. My research blends basic and applied science by addressing fundamental biological questions that inform effective conservation strategies. In 2020 I began a three year NSF Postdoctoral Research Fellowship in Biology. My proposal titled Connecting comparative, genomic, and evolutionary ecology factors to extreme mammalian longevity is a collaborative effort with mentors and co-PIs, Drs. Liliana M Davalos (SUNY Stony Brook) and Emma Teeling (University College Dublin). We are currently hiring two full-time NSF REU undergraduate interns for the summer of 2021. Please see here for more information on the positions (deadline to apply March 14th, 2021). In May 2021 I will join Dr. Jose Antonio Godoy at the Estación Biológica de Doñana in Sevilla, Spain as a Fulbright U.S. Scholar. Our proposed research focuses on the development of non-invasive genomics for population-level monitoring. Curriculum vitae (abridged) Selected Publications
     Like  Bookmark
  • Heterozygosity Methods: ROHan, ANGSD, VCFTools and GenomeScope ** summary of heterozygosity results for Warren: GenomeScope (not listed here): 0.19 #this is based on raw reads from our reference lynx ANGSD (below): 0.23 #this method uses whole genomes (bams aligned to our mLynCan4 reference) ROHan: (running) #same as ANGSD PLINK: 0.25, 321 SNP/Mb #this is a SNP-based method Re: Inbreeding Coefficient, there is a pretty large span. These were calculated using VCFTools (below) but I would like to validate this estimate with at least one more method, because some of them are REALLY high. I should note that these relationships didn't show up clearly in my two relatedness analyses, so I am a little suspicious. > min(plink$F)
     Like  Bookmark
  • SNP Outlier Analysis By Tanya Lama Goal We are looking for hallmarks of local adaptation by searching for outliers among our mLynCan4_v1.p SNPset (generated via GATK, and filtered and LD-pruned further using R (see HackMD or ask Tanya for these scripts)). Objectives Estimate locus-by-locus FST and heterozygosity at each SNP ~53k SNP dataset "mLynCan4_v1.p.vcf.gz" using PLINK Use the ProcessFST_STACKS.R script to evaluate FST outliers based on number of standard deviations from the mean
     Like  Bookmark
  • Interpreting ROH NOTE TO WARREN: I re-ran this analysis with a more informed parametrization and the results told a completely different story, mainly that the Newfoundland (island population) samples have a higher count of ROH, when shorter ROH <1000bp are included. The results below suggested that at least one individual (LIC46) is highly inbred and has a high proportion of the genome (0.09) in ROH. So, I think there is some additional thought that needs to go into the parameterization of this analysis. Please see the updated results here: and read the general description below. Thank you! A brief overview of ROH and what we're doing here: {add more here but basically...} Shorter ROH display more ancient inbreeding, while longer ROH show more recent inbreeding [24]. I've used the detectRUNS(0.9.6) package in R to detect runs of homozygosity and summarize the results. A full description of the analyses can be found here:
     Like  Bookmark
  • By Tanya Lama Project space usage: du -h --max-depth=1 /project/uma_lisa_komoroske/Tanya/ | sort -n -r Backing up to BOX module load rclone/1.51.0 rclone copy hello.txt "remote_box:box_backups"
     Like  Bookmark
  • By Tanya Lama Objectives Requirements module load anaconda2/4.4.0 module load GATK/3.5 module load java/1.8.0_77 module load R/3.6.0 module load gcc/8.1.0
     Like  Bookmark
  • By Tanya Lama Browse SRA database using data access (public) source (DNA) type (genome) and taxon "mammals"[orgn:__txid40674])) NOT "Homo sapiens"[orgn:__txid9606] We've found 10 runs of low-coverage (2X) whole genome data for alpaca (Vicunga pacos) (paired end WGS library on Illumina) https://www.ncbi.nlm.nih.gov/sra/SRX172662[accn] To access the first run of this data (SRR530974) we will need to download the SRA toolkit For Mac OS: visit https://www.ncbi.nlm.nih.gov/Traces/sra/?view=software Double-click on the .tar.gz file and the Archive Utility will unpack it. Open a terminal prompt and "cd" change directory into the folder containing the toolkit executables
     Like 1 Bookmark
  • By Tanya Lama ############################################ Select variants by excluding nonvariants ############################################ java -Xmx$MEM -jar ${GATK} -R ${REFERENCE} -T SelectVariants
     Like  Bookmark
  • By Tanya Lama GenomeScope is used as part of the VGP 1.6 assembly pipeline to estimate the overall characteristics of a genome including genome size, heterozygosity rate and repeat content from unprocessed short reads. Giulio and I compared the outputs of jellyfish and meryl for fAntMac1 (warty frogfish) as follows: Each plot is coverage of the kmer (x) by kmer counts (y).You can interpret these profiles similar to KAT plots, where you expect a diploid peak and a haploid peak. When we run genomescope we set kmer length = 31. In other words, we are looking for copies (kmer counts) of unique motifs that are 31bp long. We expect these unique motifs to be present at approximately the sequencing depth of our raw data. 1. jellyfish: genome size estimate: 521,013,374 bp jellyfish has a coverage cutoff of 250x, which you can see on the log scale plot
     Like  Bookmark
  • ssh tl50a@ghpcc06.umassrc.org module load anaconda2/4.4.0 module load java/1.8.0_77 module load gcc/8.1.0 #required for meryl source activate vgp The low signal from our original HiC heatmap was my error -- I needed to manually cat the forward and reverse read files. However, I re-ran the hic heatmap, and the signal is still incredibly low. fAntMac1
     Like  Bookmark
  • By Tanya Lama HiC align plots are part of scaffold4_salsa quality checks in the VGP 1.6 assembly pipeline ssh tl50a@ghpcc06.umassrc.org {password} module load anaconda2/4.4.0 source activate vgp cd /project/uma_lisa_komoroske/Tanya/download/vgp/fAntMac1
     Like  Bookmark
  • By: Tanya Lama 12/24/2019 This is preliminary analysis of the first ~26 lynx genomes from Maine. The total dataset is ~70 genomes encompassing all of Maine and several eastern Canadian provinces, and including one bobcat/lynx hybrid and several bobcat. The bioinformatic pipeline used to generate the SNP dataset used in these analyses can be viewed on HackMD or GitHub (see NGSPipeline). Briefly, our goal here is to 1) Explore the lynx SNP dataset (e.g. depth of coverage of a joint .vcf -- what else can this package do in terms of data exploration?) 1a) What is our average sequence depth? Ranges ~5 to ~14x
     Like  Bookmark