This is a bioinformatics pipeline used to extract and curate raw data obtained from the JUNO Fluidigm platform for downstream analyses.
Prior to running the assay you must design your sample and primer layouts. See our templates in */templates_iberian_lynx_fluidigm.
This setup method is highly recommended in order to have a great organized sample setup. You would have in the final csv file a huge ammount of useful information about the run for the post analysis; Well location, Sample Name, Sample concentration, Sample type.
Once you have created the sample setup you can easily import it to the SNP Genotyping Software of Fluidigm; click Sample Setup> New > (Window: Are you sure that you want to erase the current sample setup and creat a new one?) click yes > (Window: Sample Plate Setup Wizard) click ok. Then copy the cells in your .csv file with the names of the samples and then paste it into the sample setup window, providing the next information to the software. In Data item you can select:
You need to provide a proper "sampling Map" *.dsp (dispense mapping file). In our case for a Juno 96.96 genotyping IFC run, we used "Juno96x96-Sample-SBS96.dsp" map file.
Once you have created the assay setup you can import it to the SNP Genotyping Software of Fluidigm with similar method of sample setup. You just have to copy the allele X and Y names and paste in the software plate and the same for Assay names.
Data is extracted directly from the desktop computer in LEM3. <working directory is /pcr. In particular we need input files *.bml < describe contents of PCR folder>
(see SNP Genotyping User Guide (68000098 Rev.18) in the github repository for instructions) Launch Biomark & EP1 Software and open the run results using File>Open>*/ChipRun.bml
You can scroll through each SNP to review discriminant analysis plots (clusters) for all heterozygous, homozygous, homozygous alternate, NO CALL and INVALID CALL calls at each sample x snp.
We can adjust the data normalization method using the window on the left "Analysis settings".
SNPtype normalization: is the default option for chip runs using Fluidigm SNP Type Assays. This option determines a global NTC setting from the NTCs across the chip run. In cases where there are no NTCs defined, this option will estimate the location of the NTC.
Plot view - hiding invalids
Plot view - showing invalids
NTC normalization: This option makes viewing assays on the plotted graphs easier, because it normalizes the position of the no template control cells. The no template control cells are aligned to the x = 0.1 and y = 0.1 location on the plotted graph. It also normalizes the intensities of the assays so that they are roughly plotted in a square.
Plot view - Hiding invalids
Plot view - showing invalids
Using this method the NTC cluster shows more relatedness within the global of NTC samples. Furthermore, the allele clusters are also better defined.
In both methods our run showed a lot of invalid chambers (meaning of invalid explained bellow). These invalids are related with a group of failed assays (SNPs - see the picture below)
The orange vertical lines are the SNPs amplification failed.
Using File>Export we can export data from Fluidigm in one of four formats:
Export the raw data in *.csv and save as "All Files".
Rather than opening directly, open a blank excel sheet and go to Data>Get External Data>from Text to load the fluidigm data into excel with columns delimited by ";". Start import on line 16.
Tanya created a custom rscript called csv_to_genepop_format.Rmd for this. See it in the github
Allele correspondeces from A,G,T,C to GENEPOP format 0,1,2,3,4. No Call and Invalid Call are both treated as missing (000).
A | G | C | T | Missing |
---|---|---|---|---|
001 | 002 | 003 | 004 | 000 |
Download the normal (non beta version) of gimlet unzip with WINZIP Choose a directory (change to another directory) click the little computer icon (there is no other button, it's a little weird)
Generating consensus genotypes is the process that allows researchers to determine the genotype of a marker locus (in our case SNPs) from non invasive, low quality and replicated samples, for instance scats, hair, … (Cite).
Gimlet provide two different methods to create consensus genotypes from a set of PCR reactions for each sample; threshold method (our case) and probability based method (not used).
This was the method used in our run. Defineed as a "decission-based" method. It is based on the number of apparition of each allele. The genotype of each locus is constructed based on the most likely genotype for the x number of replicates.The user can adjust the threshold (See figure). This threshold number is the number of times that an allele must be observed for each locus to be retained and taking it into account.
At each locus, when no allele can be retained (i.e. all allele scores are below the threshold) or when more than two alleles are retained, the genotype is considered missing data.
Threshold Based Method Protocol
Gimlet requires a GenePop format input file (Step 4). Once you have created the input file open Gimlet and click: Calculator> Consensus genotypes > Input file. Select your file and now you can adjust the "Determination of the consensus genotypes" parameters. Make sure that you select the "Threshold method". To calculate the consensus genotypes with a threshold of one, just use the default parameters. Then click on "Output file and Go!" and save the output file in your chosen folder. To calculute the consensus with a threshold greater than one set it in "Consensus threshold or Probability ratio". The output file is also a genepop format file with a list of the consensus genotypes for each sample (See the example in GitHub?).
For our run we had two different sets of samples. A group of them replicatd (2-4 replicates), suposed to have lower quality than the other group that were not replicated in the assay. In this scenario, with the no replicated we used the threshold one method, in view of the fact that Gimlet would type all the locus of this samples as missing if we had used threshold two or greater.
On the other hand, for replicated samples we choose threshold two for more accuracy in the construction of the consensus of this poor quality samples.
We also used Gimlet to estimate a set of genotyping errors as:
Allelic dropout (ADO): when a heterozygote (from the consensus genotype) is typed as a homozygote (from repeated genotypes).
False allele (FA): when a homozygote (from the consensus genotype) is typed as a heterozygote (from repeated genotypes).
Five types of errors (true genotype in the first line; erroneous genotype in the bottom line):
Protocol
Open Gimlet. Click on Calculator > Estimate error rates > Input file. Select the same GenePop format file you used to construct genotypes with all the replicated samples. Select the "Construct consensus genotypes" method to estimate the error rates without a reference genotype for each sample.
The output file is a extended (*.txt) file with info about all the error types mentioned above for each locus and samples. (See the example in GitHub?)
### Using this pipeline we were able to create a set of 62 consensus genotypes from a run of 96 samples with a missing alleles mean of ≈13
### For the replicated samples we got better results of allele dropout while using threshold two consensus genotypes than treshold one (0.2557 to 0.0355, respectivetly)
### Godoy unexpected matches
See step 2 genotype consensus threshold 1.5: In step 2 we use a custom R script to address allelic dropout while rescuing some of the data lost by gimlet under "strict" parameters.
Find this document incomplete? Leave a comment!
iberian_lynx
fluidigm
genotyping
SNP