{%hackmd @themes/dracula %} # Sequence Fragment Geometry Description Language The sequence fragment geometry descriptions language (FGDL) is a basic grammar designed to describe the layout of information encoded in sequenced fragments. Specifically, it is initially designed to support parsing different sequencing chemistries that are common in single-cell transcriptomics. It is capable of processing both "simple" and "complex" geometries — where the definitions of these terms are outlined below. ## Examples of some geometries in this format. Below are examples of some common geometries (chemistries) and how they are translated into FGDL. * 10x Chromium v3 3' : `1{b[16]u[12]}2{r:}` * 10x Chromium v2 3' : `1{b[16]u[10]}2{r:}` * SciSeq3 : `1{b[9-10]f[CAGAGC]u[8]b[10]}2{r:}` ## Preventing improper parsing By virtue of the grammar, some things are not allowed. For example, due to the resulting ambiguity, it is not possible to have a variable length segment followed by a variable length or unbounded (i.e. `:`) segment. The following description, for example, will (_properly_) fail to parse. * `1{u[15]b[9-10]u:}2{r:}` ## Formal grammar Below is the formal grammar for descriptions accepted by the FGDL. This is the actual grammar, and the syntax used below is the syntax of the [pest](http://www.pest.rs) library. ``` // hidden tokens bopen = _{ "[" } bclose = _{ "]" } rsep = _{ "-" } usep = _{ ":" } dopen = _{ "{" } dclose = _{ "}" } read_num = { "1" | "2" } single_len = { ASCII_DIGIT+ } len_range = ${ single_len ~ rsep ~ single_len } nucstr = { ("A" | "C" | "G" | "T")+ } fixed_barcode_segment = { "b" ~ bopen ~ single_len ~ bclose } fixed_umi_segment = { "u" ~ bopen ~ single_len ~ bclose } fixed_seq_segment = { "f" ~ bopen ~ nucstr ~ bclose } fixed_read_segment = { "r" ~ bopen ~ single_len ~ bclose } fixed_discard_segment = { "x" ~ bopen ~ single_len ~ bclose } ranged_barcode_segment = { "b" ~ bopen ~ len_range ~ bclose } ranged_umi_segment = { "u" ~ bopen ~ len_range ~ bclose } ranged_read_segment = { "r" ~ bopen ~ len_range ~ bclose } ranged_discard_segment = { "x" ~ bopen ~ len_range ~ bclose } unbounded_barcode_segment = { "b" ~ usep } unbounded_umi_segment = { "u" ~ usep } unbounded_read_segment = { "r" ~ usep } unbounded_discard_segment = { "x" ~ usep } fixed_segment = { (fixed_umi_segment | fixed_read_segment | fixed_barcode_segment | fixed_seq_segment | fixed_discard_segment) } ranged_segment = { (ranged_umi_segment | ranged_read_segment | ranged_barcode_segment | ranged_discard_segment) } bounded_segment = _{ (fixed_segment | (ranged_segment ~ fixed_seq_segment) | (unbounded_segment ~ fixed_seq_segment)) } unbounded_segment = { (unbounded_umi_segment | unbounded_read_segment | unbounded_barcode_segment | unbounded_discard_segment) } read_desc = { dopen ~ ((bounded_segment)+ ~ (ranged_segment | unbounded_segment)? | unbounded_segment | ranged_segment) ~ dclose } read_1_desc = { "1" ~ read_desc } read_2_desc = { "2" ~ read_desc } frag_desc = _{ SOI ~ read_1_desc ~ read_2_desc ~ EOI} ```
×
Sign in
Email
Password
Forgot password
or
Sign in via Google
Sign in via Facebook
Sign in via X(Twitter)
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
Continue with a different method
New to HackMD?
Sign up
By signing in, you agree to our
terms of service
.