Exomio Fragmissions daily notes

# Exomio Fragmissions daily notes ###### tags: `Exomio- Adriana` # Fragmission Ideas * Edit and create a VCF file that encodes the fragmissions, think of it kind of like a form of genetic code poetry * Maybe create a fragmented sequence chromatogram for the variants * Also look into FASTQ sequence format and think about how to link up with the quality matrix * Consider sending the altered fragmented files as part of the amateur radio fragmissions, perhaps using some sort of protocol such as PSK31 or packet radio or something of that sort # General notes ## Sequencing * For sequence quality high quality has traditionally been set at 20 or more # Collected information ## Day 001 ### Notes * VCF format, need header metadata such as INFO, FILTER datatypes, but also needing in the tab-delimited table columns such as CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO * Interesting words and phrases: frameshift, missense, nonsense, indels, codons, exomes, exons, introns ### Variants possibly found in study and in gnomAD database Seemingly good variants! * CTNNA2, <https://gnomad.broadinstitute.org/variant/2-80136894-C-T?dataset=gnomad_r2_1> * DIAPH2: <https://gnomad.broadinstitute.org/variant/X-96171440-G-T?dataset=gnomad_r2_1> * INHBE, <https://gnomad.broadinstitute.org/variant/12-57850008-C-T?dataset=gnomad_r2_1> * KIAA1109, <https://gnomad.broadinstitute.org/variant/4-123245675-C-T?dataset=gnomad_r2_1> These variants are the complement of what is listed in the article, is that due to the way these variants are labeled? * CTC1, <https://gnomad.broadinstitute.org/variant/17-8137838-G-A?dataset=gnomad_r2_1> * FAM186A, <https://gnomad.broadinstitute.org/variant/12-50744311-C-A?dataset=gnomad_r2_1> * GSS, <https://gnomad.broadinstitute.org/variant/20-33519849-G-A?dataset=gnomad_r2_1> * NUP210L, <https://gnomad.broadinstitute.org/variant/1-154067536-G-A?dataset=gnomad_r2_1> * OR1E2, <https://gnomad.broadinstitute.org/variant/17-3336880-G-A?dataset=gnomad_r2_1> ### Links * <http://samtools.github.io/hts-specs/VCFv4.2.pdf> * <https://gnomad.broadinstitute.org/> * <https://www.youtube.com/watch?v=Bh8AKkI-DhY> * <https://cadd.gs.washington.edu/> * <https://www.ncbi.nlm.nih.gov/snp/> * <https://en.wikipedia.org/wiki/Phred_quality_score> * <https://bioinformatics.stackexchange.com/questions/14/what-is-the-difference-between-fasta-fastq-and-sam-file-formats> * <https://en.wikipedia.org/wiki/FASTA_format> * <https://en.wikipedia.org/wiki/FASTQ_format> * <https://en.wikipedia.org/wiki/FASTQ_format#Illumina_sequence_identifiers> * <http://rice.plantbiology.msu.edu/training/Genome_Sequence_Quality.pdf> * <https://www.snpedia.com/index.php/GRCh37> * <https://medlineplus.gov/genetics/gene/mecp2/#conditions> ## Day 002 ### Notes * In VCF format, strings can't have whitespace, commas, or semi-colons ### Conversation with Dr. Wang * Statistical variants * Different types of studies: * Genome-wide association study * Phenotype association study * His own research about specific variants in the genetic code that are associated with people from Taiwan versus people from the mainland * Changes in sequencing technology leads to different "versions" of the reference human genome * There is error rates sometimes of more than 2-3% in sequencing * The Y chromosome is quite difficult to sequence, and that maybe more than 40% of it is wrong * Many repeats of sequences lead to issues in sequencing * Less than 1% of the total genome is the exome, or the coding regions * Concept of **molecular anthropology** * Contrast between association studies vs causation studies * Studies of genetic variations cannot tell you about causation * 10-20% of our phenotypes is due to genotypes * There is no real genetic determinism! * **Metagenomics**, [list of papers on science direct](https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/metagenomics), genomic analysis of a combination of species within a particular environment, including the microbiome or virome * Beautiful thought: * **More focus on gene histories, on the environmental conditions that triggered these changes in the genetic code: what could have been different in these societies (maybe they were more open, more equal, allowed for more types of gender and sexual identities and activities), and then, how might those differences have been preserved in the genetic sequences** ### Variants possibly found in study and in gnomAD database ![CTNNA2](day002/images/CTNNA2.png) ![DIAPH2](day002/images/DIAPH2.png) ![INHBE](day002/images/INHBE.png) ![KIAA1109](day002/images/KIAA1109.png) ## Day 003 ### Brainstorming about encoding possibilities I would like something that can convey something about the deep-time potentials of the genetic code to encode the experiences of gender minorities in the past, when they may not have been minorities. Insofar as the environmental conditions are possibly related to changes in the genetic code, then these changes may thus be expressed in some of these variants. So, to explore this scientifically would be in the realm of a kind of "molecular anthropology". And so, I want to convey the affect of this within the constraints of how these variants are encoded today, in order to fork these formats into new forms, and to swerve them, let's say, into something else. The limitations of the VCF format are quite immense. * Could play around something with the file data to explore something about deep time. * Think about how to modify sources and references. * What kinds of INFO fields would we like to have that could suggest new types of INFOs that would perhaps be more suited to the types of studies we would want to see, or the types of data that are more in tune with the directions of this project. * The FILTER field could also be important to talk about how certain types of information are filtered out in these studies and that should be included. What sorts of filters would we need to create so as to filter particular kinds of data in? * In terms of the actual variants, we can then correlate the new types of fields we create to particular variants. * We can maybe invent new kinds of tranxxeno variants that would be more in line with our forking and our desires. Do we need to also create something in FASTA or FASTQ format to convey some of this? Perhaps. Maybe do we want to create some kind of typology here of variants, like we were finding in some of the videos about variants. Create a table and some examples of it, including syntactical representations. Data translation of genetic code information, genetic variation information, forked genetic variants, the translation of this into radiophonic space. So: what does it mean, what are the implications, of translating something about the genetic code into a form that then potentially envelops and infuses the radiophonic space of the earth. Exploration of this translation as a means of also exploring making ourselves tranxxeno. ### Sexually Dimorphic variants found in study and also found in gnomAD * CDK2: 1, <https://gnomad.broadinstitute.org/variant/17-37618361-G-C?dataset=gnomad_r2_1> * AKR1C3: 1, <https://gnomad.broadinstitute.org/variant/10-5144369-A-G?dataset=gnomad_r2_1> * PPARGC1B: 2, <https://gnomad.broadinstitute.org/variant/5-149213068-C-T?dataset=gnomad_r2_1> * SPHK1: 1, <https://gnomad.broadinstitute.org/variant/17-74383224-C-T?dataset=gnomad_r2_1> * CTNNA2: 1, <https://gnomad.broadinstitute.org/variant/2-80136894-C-T?dataset=gnomad_r2_1> * SYNPO: 1, <https://gnomad.broadinstitute.org/variant/5-150028774-G-A?dataset=gnomad_r2_1> * TNN: 1, <https://gnomad.broadinstitute.org/variant/1-175046693-A-G?dataset=gnomad_r2_1> What does this mean? To me it means that these variants might be found in people who are also not transgender, but, it is also possible that these people *are* transgender but that information and determination is not available to us. ### Sexually Dimorphic variants found in study and NOT found in gnomAD * DNER * PIK3CA * CDH8 * DSCAML1 * EGF * EFHD2 * RIMS3 * RIMS4 * GRIN1 * MAP4K3 * BOK * KCNK3 What does this mean? It means that the study really identified some rare variants that are only found in a transgender cohort--or at least this is how I would understand things. ### Links * [What is Epigenetics?](https://www.cdc.gov/genomics/disease/epigenetics.htm) * [The Search for Genetic Variants and Epigenetics Related to Asthma](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3178821/) * [The role of epigenetics in genetic and environmental epidemiology](https://www.futuremedicine.com/doi/full/10.2217/epi.15.102) * [Genetics meets epigenetics: Genetic variants that modulate noncoding RNA in cardiovascular diseases](https://www.sciencedirect.com/science/article/pii/S0022282815300997?casa_token=Ce3vQXxFnCQAAAAA:rIUU2HEPBOeTH2vC5tJc7JKDsGaNpQ9c8sDwmrDmUjZmWWLl520loJi4_5tm0eLYXlGafqyrPR0) * [My forked version of OpenWebRX](https://github.com/zeitkunst/openwebrx) * [Introduction to Variants and Nomenclature](https://www.youtube.com/watch?v=t0_DQvwfKwM) * [Lab Value Changes in Transgender Females](https://labmedicineblog.com/2018/12/10/lab-value-changes-in-transgender-females/) ## Day004 Edited and updated images for the radio interface. ### Sequences * CDK12: AGAAGGA C GGGAGTG * AKR1C3: AAAAACC G AGGTATA (TODO unclear if this is correct?) * PPARGC1B: CGTGCGG T GTTCTCG * SPHK1: TGCACAC T TTGTGCC * CTNNA2: GGCGCTC T AGGACCT * SYNPO: CCCCTTG A GGAGCTT * TNN: CACCTAC G AGATCGA ### FASTQ example CDK12 variant: ``` @SEQ-ID AGAAAGGACGGGAGTG + 1111111111111111 ``` AKR1C3 variant: ``` @SEQ-ID AAAAACCGAGGTATA + 1111111111111111 ``` PPARGC1B variant: ``` @SEQ-ID CGTGCGGTGTTCTCG + 1111111111111111 ``` ### exomio_fragmissions.fastq To be edited later ``` @CDK12: AGAAGGACGGGAGTG + 111111111111111 @AKR1C3 AAAAACCGAGGTATA + 111111111111111 @PPARGC1B CGTGCGGTGTTCTCG + 111111111111111 @SPHK1 TGCACACTTTGTGCC + 111111111111111 @CTNNA2 GGCGCTCTAGGACCT + 111111111111111 @SYNPO CCCCTTGAGGAGCTT + 111111111111111 @TNN CACCTACGAGATCGA + 111111111111111 ``` ### Today's links * [Genome Data Viewer](https://www.ncbi.nlm.nih.gov/genome/gdv)

Read more

Slovanian

Adriana Knouf -- Exomio Fragmissions

Ryuoyama

Forking Piragene