# Rice GBSS project
# JOB
https://www.nature.com/naturecareers/jobs/search?location=&text=Postdoctoral%20Fellow&facets[jobType]=Postdoctoral
https://arabidopsis.org/news/jobs.jsp
https://jobs.plantae.org/jobs
https://jobs.sciencecareers.org/jobs/
https://recruiting.paylocity.com/recruiting/jobs/All/6b69bee1-3ec6-4d49-9564-72dfee0922ef/Boyce-Thompson-Institute-for-Plant-Research
https://www.danforthcenter.org/careers/
https://buell-lab.github.io/jobs.html
## Aim
1. Identify the divergent in Wx locus using GFM algorithm.
2. Find uncovered genetic materials in genome.
3. Provide better understanding of Amylose contents in rice.
## TO DO
1. Build comprehensive and extended version of rice genome reference for SNPs and indels alignment.
2. Create haplotype genome sequence data for 3K project
3. Identify the sequence variantion in 3K rices.
## Manuscript Link
https://docs.google.com/document/d/1iKYjKW4NdHkI64Zw0Bas3-jhjq7KugJ_OKnEe8u3w1A/edit?usp=sharing
## VG tube map
There's more snps and indels than expected.




## GBSS1 locus (wx locus) in Phytozome
https://phytozome-next.jgi.doe.gov/jbrowse/index.html?data=genomes%2FOsativa_v7_0&loc=Chr6%3A1765307..1767797&tracks=MSU_Rice_TE%2CTranscripts%2CAlt_Transcripts%2CPASA_assembly%2CBlastx_protein%2CBlatx_Grass%2CRepeatMasker%2CBlatx_BasalEmbryophyte%2CBlatx_BasalMonocot&highlight=
## Gene ID
LOC_Os06g04200
## Repeat ID
ORSgTEMT01100006
ORSiTEMT01100003
## MSU database
http://rice.uga.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_7.0/chr06.dir/
## box for Dodam, Hwayeong fastq
https://ucdavis.box.com/s/cctzv4q7u50iac9drqtr1yxvqk8saomt
## Reference
- Hisat-genotype
https://www.nature.com/articles/s41587-019-0201-4
https://daehwankimlab.github.io/hisat-genotype/
- Shim et al., Amylose content
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9389286/
- GBSS1 paper
https://www.frontiersin.org/articles/10.3389/fpls.2021.707237/full
- Wx locus from 3k snps
https://onlinelibrary.wiley.com/doi/10.1111/jipb.13011
- RiceVarMap2
https://ricevarmap.ncpgr.cn/
- 3K project
https://gigascience.biomedcentral.com/articles/10.1186/2047-217X-3-7
https://github.com/awslabs/open-data-docs/tree/main/docs/3kricegenome
- Rice pan-genome browser
https://academic.oup.com/nar/article/45/2/597/2333876
- Transcript accumulation and utilization of alternate and non-consensus splice sites in rice granule-bound starch synthase are temperature-sensitive and controlled by a single-nucleotide polymorphism
https://idp.springer.com/authorize/casa?redirect_uri=https://link.springer.com/article/10.1023/A:1006298608408&casa_token=cXBuYzEh34gAAAAA:DXbfdmmp8FSCbkLh35MYVj8qCPHXtnj2KaSWKlFc4B7SUDIZgexEumrNilUJ6BYixVTJBEQHVJWA43mY
- The amylose content in rice endosperm is related to the post-transcriptional regulation of the waxy gene
https://onlinelibrary.wiley.com/doi/abs/10.1046/j.1365-313X.1995.7040613.x
## TASSEL5
```bash
git clone https://bitbucket.org/tasseladmin/tassel-5-standalone.git
cd tassel-5-standalone/
ll
git pull
cp ~/Downloads/248Entries_40840SNPs_inorder_21May2015_v2.txt .
perl run_pipeline.pl -Xmx5g -fork1 -h 248Entries_40840SNPs_inorder_21May2015_v2.txt -export -exportType VCF -runfork1
vi 248Entries_40840SNPs_inorder_21May2015_v2.txt
```
## VCF and HAPMAP manual
https://statgen-esalq.github.io/Hapmap-and-VCF-formats-and-its-integration-with-onemap/
## Plink2 conversion
````bash
plink2 --bfile Nipponbare_indel --recode vcf --out indel --ref-from-fa Osativa_323_v7.0.fa
plink2 --bfile NB_final_snp --recode vcf --out IRRI_snps --ref-from-fa Osativa_323_v7.0.fa
````
## bcftools vcf handling
```bash
bcftools concat --allow-overlaps --remove-duplicates indel.vcf IRRI_snps.vcf
bcftools query -l indel.vcf
bcftools query -l IRRI_snps.vcf
bcftools query -l IRRI_snps.vcf > samples.txt
bcftools view -S samples.txt -o filtered_indel.vcf indel.vcf
bcftools concat --allow-overlaps --remove-duplicates filtered_indel.vcf IRRI_snps.vcf
bcftools query -l IRRI_snps.vcf | wc -l
sbatch -c 24 --mem=180g --wrap="bcftools view -S samples.txt -o IRRI_snps_filtered.vcf --threads 24 IRRI_snps.vcf"
his bcf
```
### vcf compression
```bash
bgzip filtered_indel.vcf -@ 24
bgzip IRRI_snps_filtered.vcf -@ 24
```
## hisat-genotype
```bash
git clone https://github.com/DaehwanKimLab/hisat-genotype
cd hisat-genotype
git submodule init
git submodule update
bash setup.sh
```
>LOC_Os06g04200 genomic|starch synthase, putative, expressed
ACCATTCCTTCAGTTCTTTGTCTATCTCAAGACACAAATAACTGCAGTCTCTCTCTCTCT
CTCTCTCTCTCTCTCTCTCTCTCTGCTTCACTTCTCTGCTTGTGTTGTTCTGTTGTTCAT
CAGGAAGAACATCTGCAAGTTATACATATATGTTTATAATTCTTTGTTTCCCCTCTTATT
CAGATCGATCACATGCATCTTTCATTGCTCGTTTTTCCTTACAAGTAGTCTCATACATGC
TAATTTCTGTAAGGTGTTGGGCTGGAAATTAATTAATTAATTAATTGACTTGCCAAGATC
CATATATATGTCCTGATATTAAATCTTCGTTCGTTATGTTTGGTTAGGCTGATCAATGTT
ATTCTAGAGTCTAGAGAAACACACCCAGGGGTTTTCCAACTAGCTCCACAAGATGGTGGG
CTAGCTGACCTAGATTTGAAGTCTCACTCCTTATAATTATTTTATATTAGATCATTTTCT
AATATTCGTGTCTTTTTTTATTCTAGAGTCTAGATCTTGTGTTCAACTCTCGTTAAATCA
TGTCTCTCGCCACTGGAGAAACAGATCAGGAGGGTTTATTTTGGGTATAGGTCAAAGCTA
AGATTGAAATTCACAAATAGTAAAATCAGAATCCAACCAATTTTAGTAGCCGAGTTGGTC
AAAGGAAAATGTATATAGCTAGATTTATTGTTTTGGCAAAAAAAAATCTGAATATGCAAA
ATACTTGTATATCTTTGTATTAAGAAGATGAAAATAAGTAGCAGAAAATTAAAAAATGGA
TTATATTTCCTGGGCTAAAAGAATTGTTGATTTGGCACAATTAAATTCAGTGTCAAGGTT
TTGTGCAAGAATTCAGTGTGAAGGAATAGATTCTCTTCAAAACAATTTAATCATTCATCT
GATCTGCTCAAAGCTCTGTGCATCTCCGGGTGCAACGGCCAGGATATTTATTGTGCAGTA
AAAAAATGTCATATCCCCTAGCCACCCAAGAAACTGCTCCTTAAGTCCTTATAAGCACAT
ATGGCATTGTAATATATATGTTTGAGTTTTAGCGACAATTTTTTTAAAAACTTTTGGTCC
TTTTTATGAACGTTTTAAGTTTCACTGTCTTTTTTTTTCGAATTTTAAATGTAGCTTCAA
ATTCTAATCCCCAATCCAAATTGTAATAAACTTCAATTCTCCTAATTAACATCTTAATTC
ATTTATTTGAAAACCAGTTCAAATTCTTTTAGGCTCACCAAACCTTAAACAATTCAATTC
AGTGCAGAGATCTTCCACAGCAACAGCTAGACAACCACCATGTCGGCTCTCACCACGTCC
CAGCTCGCCACCTCGGCCACCGGCTTCGGCATCGCCGACAGGTCGGCGCCGTCGTCGCTG
CTCCGCCACGGGTTCCAGGGCCTCAAGCCCCGCAGCCCCGCCGGCGGCGACGCGACGTCG
CTCAGCGTGACGACCAGCGCGCGCGCGACGCCCAAGCAGCAGCGGTCGGTGCAGCGTGGC
AGCCGGAGGTTCCCCTCCGTCGTCGTGTACGCCACCGGCGCCGGCATGAACGTCGTGTTC
GTCGGCGCCGAGATGGCCCCCTGGAGCAAGACCGGCGGCCTCGGTGACGTCCTCGGTGGC
CTCCCCCCTGCCATGGCTGTAAGCACACACAAACTTCGATCGCTCGTCGTCGCTGACCGT
CGTCGTCTTCAACTGTTCTTGATCATCGCATTGGATGGATGTGTAATGTTGTGTTCTTGT
GTTCTTTGCAGGCGAATGGCCACAGGGTCATGGTGATCTCTCCTCGGTACGACCAGTACA
AGGACGCTTGGGATACCAGCGTTGTGGCTGAGGTAGGAGCATATGCGTGATCAGATCATC
ACAAGATCGATTAGCTTTAGATGATTTGTTACATTTCGCAAGATTTTAACCCAAGTTTTT
GTGGTGCAATTCATTGCAGATCAAGGTTGCAGACAGGTACGAGAGGGTGAGGTTTTTCCA
TTGCTACAAGCGTGGAGTCGACCGTGTGTTCATCGACCATCCGTCATTCCTGGAGAAGGT
GGAGTCATCATTAGTTTACCTTTTTTGTTTTTACTGAATTATTAACAGTGCATTTAGCAG
TTGGACTGAGCTTAGCTTCCACTGGTGATTTCAGGTTTGGGGAAAGACCGGTGAGAAGAT
CTACGGACCTGACACTGGAGTTGATTACAAAGACAACCAGATGCGTTTCAGCCTTCTTTG
CCAGGTCAGTGATTACTTCTATCTGATGATGGTTGGAAGCATCACGAGTTTACCATAGTA
TGTATGGATTCATAACTAATTCGTGTATTGATGCTACCTGCAGGCAGCACTCGAGGCTCC
TAGGATCCTAAACCTCAACAACAACCCATACTTCAAAGGAACTTATGGTGAGTTACAATT
GATCTCAAGATCTTATAACTTTCTTCGAAGGAATCCATGATGATCAGACTAATTCCTTCC
GGTTTGTTACTGACAACAGGTGAGGATGTTGTGTTCGTCTGCAACGACTGGCACACTGGC
CCACTGGCGAGCTACCTGAAGAACAACTACCAGCCCAATGGCATCTACAGGAATGCAAAG
GTCTATGCTTGTTCTTGCCATACCAACTCAAATCTGCATGCACACTGCATTCTGTTCAGA
AACTGACTGTCTGAATCTTTTTCACTGCAGGTTGCTTTCTGCATCCACAACATCTCCTAC
CAGGGCCGTTTCGCTTTCGAGGATTACCCTGAGCTGAACCTCTCCGAGAGGTTCAGGTCA
TCCTTCGATTTCATCGACGGGTATGAGTAAGATTCTAAGAGTAACTTACTGTCAATTCGC
CATATATCGATTCAATCCAAGATCCTTTTGAGCTGACAACCCTGCACTACTGTCCATCGT
TCAAATCCGGTTAAATTTCAGGTATGACACGCCGGTGGAGGGCAGGAAGATCAACTGGAT
GAAGGCCGGAATCCTGGAAGCCGACAGGGTGCTCACCGTGAGCCCGTACTACGCCGAGGA
GCTCATCTCCGGCATCGCCAGGGGATGCGAGCTCGACAACATCATGCGGCTCACCGGCAT
CACCGGCATCGTCAACGGCATGGACGTCAGCGAGTGGGATCCTAGCAAGGACAAGTACAT
CACCGCCAAGTACGACGCAACCACGGTAAGAACGAATGCATTCTTCACAAGATATGCAAT
CTGAATTTTCTTTGAAAAAGAAATTATCATCTGTCACTTCTTGATTGATTCTGACAAGGC
AAGAATGAGTGACAAATTTCAGGCAATCGAGGCGAAGGCGCTGAACAAGGAGGCGTTGCA
GGCGGAGGCGGGTCTTCCGGTCGACAGGAAAATCCCACTGATCGCGTTCATCGGCAGGCT
GGAGGAACAGAAGGGCCCTGACGTCATGGCCGCCGCCATCCCGGAGCTCATGCAGGAGGA
CGTCCAGATCGTTCTTCTGGTATAATATAATACACTACAAGACACACTTGCACGATATGC
CAAAAATTCAGAACAAATTCAGTGGCAAAAAAAAAACTCGAATATTAGGGAAGGACCTAA
TAATATCAAATAATTAGAAGGGGTGAGGCTTTGAACCCAGATCGTCTAGTCCACCACCTT
GTGGAGTTAGCCGGAAGACCTCTGAGCATTTCTCAATTCAGTGGCAAATGATGTGTATAA
TTTTGATCCGTGTGTGTTTCAGGGTACTGGAAAGAAGAAGTTCGAGAAGCTGCTCAAGAG
CATGGAGGAGAAGTATCCGGGCAAGGTGAGGGCCGTGGTGAAGTTCAACGCGCCGCTTGC
TCATCTCATCATGGCCGGAGCCGACGTGCTCGCCGTCCCCAGCCGCTTCGAGCCCTGTGG
ACTCATCCAGCTGCAGGGGATGAGATACGGAACGGTATACAATTTCCATCTATCAATTCG
ATTGTTCGATTTCATCTTTGTGCAATGCAATGCAATTGCAAATGCAAATGCATGATGATT
TTCCTTGTTGATTTCTCCAGCCCTGTGCTTGCGCGTCCACCGGTGGGCTCGTGGACACGG
TCATCGAAGGCAAGACTGGTTTCCACATGGGCCGTCTCAGCGTCGACGTAAGCCTATACA
TTTACATAACAATCAGATATGACACATCCTAATACCGATAAGTCGGTACACTACTACACA
TTTACATGGTTGCTGGTTATATGGTTTTTTTGGCAGTGCAAGGTGGTGGAGCCAAGCGAC
GTGAAGAAGGTGGCGGCCACCCTGAAGCGCGCCATCAAGGTCGTCGGCACGCCGGCGTAC
GAGGAGATGGTCAGGAACTGCATGAACCAGGACCTCTCCTGGAAGGTATAAATTACGAAA
CAAATTTAACCCAAACATATACTATATACTCCCTCCGCTTCTAAATATTCAACGCCGTTG
TCTTTTTTAAATATGTTTGACCATTCGTCTTATTAAAAAAATTAAATAATTATAAATTCT
TTTCCTATCATTTGATTCATTGTTAAATATACTTATATGTATACATATAGTTTTACATAT
TTCATAAAATTTTTTGAACAAGACGAACGGTCAAACATGTGCTAAAAAGTTAACGGTGTC
GAATATTCAGAAACGGAGGGAGTATAAACGTCTTGTTCAGAAGTTCAGAGATTCACCTGT
CTGATGCTGATGATGATTAATTGTTTGCAACATGGATTTCAGGGGCCTGCGAAGAACTGG
GAGAATGTGCTCCTGGGCCTGGGCGTCGCCGGCAGCGCGCCGGGGATCGAAGGCGACGAG
ATCGCGCCGCTCGCCAAGGAGAACGTGGCTGCTCCTTGAAGAGCCTGAGATCTACATATG
GAGTGATTAATTAATATAGCAGTATATGGATGAGAGACGAATGAACCAGTGGTTTGTTTG
TTGTAGTGAATTTGTAGCTATAGCCAATTATATAGGCTAATAAGTTTGATGTTGTACTCT
TCTGGGTGTGCTTAAGTATCTTATCGGACCCTGAATTTATGTGTGTGGCTTATTGCCAAT
AATATTAAGTAATAAAGGGTTTATTATATTATTATATATGTTATATTATACTTCC
>LOC_Os06g04200.1 protein|starch synthase, putative, expressed
MSALTTSQLATSATGFGIADRSAPSSLLRHGFQGLKPRSPAGGDATSLSVTTSARATPKQ
QRSVQRGSRRFPSVVVYATGAGMNVVFVGAEMAPWSKTGGLGDVLGGLPPAMAANGHRVM
VISPRYDQYKDAWDTSVVAEIKVADRYERVRFFHCYKRGVDRVFIDHPSFLEKVWGKTGE
KIYGPDTGVDYKDNQMRFSLLCQAALEAPRILNLNNNPYFKGTYGEDVVFVCNDWHTGPL
ASYLKNNYQPNGIYRNAKVAFCIHNISYQGRFAFEDYPELNLSERFRSSFDFIDGYDTPV
EGRKINWMKAGILEADRVLTVSPYYAEELISGIARGCELDNIMRLTGITGIVNGMDVSEW
DPSKDKYITAKYDATTAIEAKALNKEALQAEAGLPVDRKIPLIAFIGRLEEQKGPDVMAA
AIPELMQEDVQIVLLGTGKKKFEKLLKSMEEKYPGKVRAVVKFNAPLAHLIMAGADVLAV
PSRFEPCGLIQLQGMRYGTPCACASTGGLVDTVIEGKTGFHMGRLSVDCKVVEPSDVKKV
AATLKRAIKVVGTPAYEEMVRNCMNQDLSWKGPAKNWENVLLGLGVAGSAPGIEGDEIAP
LAKENVAAP*
>LOC_Os06g04200.2 protein|starch synthase, putative, expressed
MSALTTSQLATSATGFGIADRSAPSSLLRHGFQGLKPRSPAGGDATSLSVTTSARATPKQ
QRSVQRGSRRFPSVVVYATGAGMNVVFVGAEMAPWSKTGGLGDVLGGLPPAMAANGHRVM
VISPRYDQYKDAWDTSVVAEIKVADRYERVRFFHCYKRGVDRVFIDHPSFLEKVWGKTGE
KIYGPDTGVDYKDNQMRFSLLCQAALEAPRILNLNNNPYFKGTYGEDVVFVCNDWHTGPL
ASYLKNNYQPNGIYRNAKVAFCIHNISYQGRFAFEDYPELNLSERFRSSFDFIDGYDTPV
EGRKINWMKAGILEADRVLTVSPYYAEELISGIARGCELDNIMRLTGITGIVNGMDVSEW
DPSKDKYITAKYDATTAIEAKALNKEALQAEAGLPVDRKIPLIAFIGRLEEQKGPDVMAA
AIPELMQEDVQIVLLGTGKKKFEKLLKSMEEKYPGKVRAVVKFNAPLAHLIMAGADVLAV
PSRFEPCGLIQLQGMRYGTPCACASTGGLVDTVIEGKTGFHMGRLSVDCKVVEPSDVKKV
AATLKRAIKVVGTPAYEEMVRNCMNQDLSWKGPAKNWENVLLGLGVAGSAPGIEGDEIAP
LAKENVAAP*
>LOC_Os06g04200.4 protein|starch synthase, putative, expressed
MSALTTSQLATSATGFGIADRSAPSSLLRHGFQGLKPRSPAGGDATSLSVTTSARATPKQ
QRSVQRGSRRFPSVVVYATGAGMNVVFVGAEMAPWSKTGGLGDVLGGLPPAMAANGHRVM
VISPRYDQYKDAWDTSVVAEIKVADRYERVRFFHCYKRGVDRVFIDHPSFLEKVWGKTGE
KIYGPDTGVDYKDNQMRFSLLCQAALEAPRILNLNNNPYFKGTYGEDVVFVCNDWHTGPL
ASYLKNNYQPNGIYRNAKVAFCIHNISYQGRFAFEDYPELNLSERFRSSFDFIDGYDTPV
EGRKINWMKAGILEADRVLTVSPYYAEELISGIARGCELDNIMRLTGITGIVNGMDVSEW
DPSKDKYITAKYDATTAIEAKALNKEALQAEAGLPVDRKIPLIAFIGRLEEQKGPDVMAA
AIPELMQEDVQIVLLGTGKKKFEKLLKSMEEKYPGKVRAVVKFNAPLAHLIMAGADVLAV
PSRFEPCGLIQLQGMRYGTPCACASTGGLVDTVIEGKTGFHMGRLSVDCKVVEPSDVKKV
AATLKRAIKVVGTPAYEEMVRNCMNQDLSWKGPAKNWENVLLGLGVAGSAPGIEGDEIAP
LAKENVAAP*
>LOC_Os06g04200.3 protein|starch synthase, putative, expressed
MSALTTSQLATSATGFGIADRSAPSSLLRHGFQGLKPRSPAGGDATSLSVTTSARATPKQ
QRSVQRGSRRFPSVVVYATGAGMNVVFVGAEMAPWSKTGGLGDVLGGLPPAMAANGHRVM
VISPRYDQYKDAWDTSVVAEIKVADRYERVRFFHCYKRGVDRVFIDHPSFLEKVWGKTGE
KIYGPDTGVDYKDNQMRFSLLCQAALEAPRILNLNNNPYFKGTYGEDVVFVCNDWHTGPL
ASYLKNNYQPNGIYRNAKVAFCIHNISYQGRFAFEDYPELNLSERFRSSFDFIDGYDTPV
EGRKINWMKAGILEADRVLTVSPYYAEELISGIARGCELDNIMRLTGITGIVNGMDVSEW
DPSKDKYITAKYDATTAIEAKALNKEALQAEAGLPVDRKIPLIAFIGRLEEQKGPDVMAA
AIPELMQEDVQIVLLGTGKKKFEKLLKSMEEKYPGKVRAVVKFNAPLAHLIMAGADVLAV
PSRFEPCGLIQLQGMRYGTPCACASTGGLVDTVIEGKTGFHMGRLSVDCKVVEPSDVKKV
AATLKRAIKVVGTPAYEEMVRNCMNQDLSWKGPAKNWENVLLGLGVAGSAPGIEGDEIAP
LAKENVAAP*
>FGENESH
MSALTTSQLATSATGFGIADRSAPSSLLRHGFQGLKPRSPAGGDATSLSVTTSARATPKQ
QRSVQRGSRRFPSVVVYATGAGMNVVFVGAEMAPWSKTGGLGDVLGGLPPAMAANGHRVM
VISPRYDQYKDAWDTSVVAEIKVADRYERVRFFHCYKRGVDRVFIDHPSFLEKVWGKTGE
KIYGPDTGVDYKDNQMRFSLLCQAALEAPRILNLNNNPYFKGTYGEDVVFVCNDWHTGPL
ASYLKNNYQPNGIYRNAKVAFCIHNISYQGRFAFEDYPELNLSERFRSSFDFIDGYDTPV
EGRKINWMKAGILEADRVLTVSPYYAEELISGIARGCELDNIMRLTGITGIVNGMDVSEW
DPSKDKYITAKYDATTARMSDKFQAIEAKALNKEALQAEAGLPVDRKIPLIAFIGRLEEQ
KGPDVMAAAIPELMQEDVQIVLLGTGKKKFEKLLKSMEEKYPGKVRAVVKFNAPLAHLIM
AGADVLAVPSRFEPCGLIQLQGMRYGTPCACASTGGLVDTVIEGKTGFHMGRLSVDCKVV
EPSDVKKVAATLKRAIKVVGTPAYEEMVRNCMNQDLSWKGPAKNWENVLLGLGVAGSAPG
IEGDEIAPLAKENVAAP
https://phytozome-next.jgi.doe.gov/jbrowse/index.html?data=genomes%2FOsativa_v7_0&loc=Chr6%3A1765267..1767757&tracks=MSU_Rice_TE%2CTranscripts%2CAlt_Transcripts%2CPASA_assembly%2CBlastx_protein%2CBlatx_Grass%2CRepeatMasker&highlight=
## KC
### 2022.12.21
-Wxb (Japonica) transcript include splicing form and non-splicing form (including intron1), and Wxa (Indica) transcript only has splicing form.

- Varisous splicing variants were observed in Wx


- Repeat elements detected from RepeatMasker
https://phytozome-next.jgi.doe.gov/jbrowse/index.html?data=genomes%2FOsativa_v7_0&loc=Chr6%3A1765958..1766450&tracks=MSU_Rice_TE%2CPASA_assembly%2CBlastx_protein%2CBlatx_Grass%2CTranscripts%2CAlt_Transcripts%2CRepeatMasker&highlight=

-

### 2022.12.28
In the literature search, three characteristics of rice Waxy gene were found.
1. Canonical splicing
- T/G SNP at 1,765,761 on chromosome 6 regulates alternative splicing variant. Japonica rice which has low amylose content harbor T-type SNP and T-type varieties showed two types of Wx transcript forms 3.3kb and 2.3kb. 2.3kb transcript is matured transcript without intron, while 3.3kb transcript includes 1st intron. The waxy mutant which has no amylose in the endosperm also have about 3.3kb transcript but Wx protein was not detected, indicating that 3.3kb transcript may not be translated. On the other hands, G-type SNP is mostly found in indica rice and displayed high amylose content. G-type varieties have only 2.3kb of matured transcript form.
- In addition, aberrant splicing of intron 1 was identified and more than six splicing patterns were observed (Cai et al. 1998; Larkin and Park 1999).
2. Wx gene is temperature sensitive and controlled by T/G SNP
- T-type varieties showed decreased gene expression of 2.3kb and 3.3kb at the high temperature, leading decreased amylose content. However, G-type varieties showed consistent expression level at low and high temperature.
3. Repeat sequence in 1 intron.
- Two SINE elements were identified on 5’ terminal of Wx gene (ORSiTEMT01100003 and ORSgTEMT01100006). RNA secondary structure is predicted with 5’ terminal region (350bp) of Wx gene and it formed six cruciforms (Cai et al. 1998).
Q&A from last discussion
1. Discussions about first intron repeat.
- Many studies have been conducted about first intron repeat of Wx gene in rice. Wang et al. (1994) reported that two types of transposon-like elements (RTL-1 and RTL-2) were located in introns 1, 10, and 13 of the rice Wx gene. In addition, Wang et al. (1995) predicted RNA secondary structure in the 5’ terminal region (350 bp) of Wx gene.
2. Rice 3k database in amylose content data.
- Rice 3k database include amylose content data and ENDO (endosperm type) are evaluated by 1 (Non-waxy), 2 (Waxy), 3 (Indeterminate), and 999 (Mixture).
- IRRI group published this paper <Haplotype analysis of key genes governing grain yield and quality traits across 3K RG panel reveals scope for the development of tailor-made rice with enhanced genetic gains, Plant biotechnology journal, doi: 10.1111/pbi.13087> and they measured amylose content. But amylose content data is not included in supplementary files.
3. Do transposable elements have structure?
Yse. RCSB PDB has LINE or SINE structures. RNA hairpin or RNA stem-loop structures are experimentally observed. Also, AlphaFold2 predict computed structure model of retrotransposon proteins.
4. Rice 3K database has RNA-seq data?
- I think Rice 3K database doesn’t have RNA-seq data. Rice pan-genome browser was constructed by rice3k db and they used public RNA-seq data. The sample IDs were listed in supplementary table1.
5. GBSS1 has four different types of transcript form. Do they have different expression level and SINE is associated with expression level?
- Expressional level of LOC_Os06g04200.1-4 was not examined from previous studies. SINE in intron 1 possibly associated with Wx expression level.
01/18/2023
Please check below manuscript
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7663775/
Main questions are
1. does isoform variation affect to starch contents?
- Wxb type expressed two major types of transcripts 2.3 and 3.3 kb forms. The 2.3 Kb form is completely matured type and 3.3 Kb is prematured form which has 1 Kb of intron 1 and is not normally function.
2. Above manuscript includes, aus, indica and jap, do they have difference splicing form in GBSS1?
- Above manuscript opened only N22 (variety name) transcriptome data. Three transcript forms were identified and one of them showed different transcript form compared to reference isoforms.
(I found their transcriptome data from NCBI GSE153030, https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE153030&format=file&file=GSE153030%5Ffinal%5FdeNovo%2EB%5FN22%2Efasta%2Egz)
3. Can we interpret the splicing form from their work?
-
4. Do they have divergent starch contents?
- They didn't measure the amylose content, but they have different amylose content due to their different ecotype.
5. do they have DNA-seq data in 3k?
- Only one variety (Dular) was included in rice 3k.
- However, other varieties are popular, so we may be able to find sequencing data from NCBI.
https://github.com/pangenome/pggb
03/07/23
- Table. Rice 3K DB endosperm type

- Manhattan plot generated from Rice 3K DB using endosperm type phenotype

https://pubmed.ncbi.nlm.nih.gov/34051138/