{%hackmd @themes/dracula %}
# Sequence Fragment Geometry Description Language
The sequence fragment geometry descriptions language (FGDL) is a basic grammar designed to describe the layout of information encoded in sequenced fragments. Specifically, it is initially designed to support parsing different sequencing chemistries that are common in single-cell transcriptomics. It is capable of processing both "simple" and "complex" geometries — where the definitions of these terms are outlined below.
## Examples of some geometries in this format.
Below are examples of some common geometries (chemistries) and how they are translated into FGDL.
* 10x Chromium v3 3' : `1{b[16]u[12]}2{r:}`
* 10x Chromium v2 3' : `1{b[16]u[10]}2{r:}`
* SciSeq3 : `1{b[9-10]f[CAGAGC]u[8]b[10]}2{r:}`
## Preventing improper parsing
By virtue of the grammar, some things are not allowed. For example, due to the resulting ambiguity, it is not possible to have a variable length segment followed by a variable length or unbounded (i.e. `:`) segment. The following description, for example, will (_properly_) fail to parse.
* `1{u[15]b[9-10]u:}2{r:}`
## Formal grammar
Below is the formal grammar for descriptions accepted by the FGDL. This is the actual grammar, and the syntax used below is the syntax of the [pest](http://www.pest.rs) library.
```
// hidden tokens
bopen = _{ "[" }
bclose = _{ "]" }
rsep = _{ "-" }
usep = _{ ":" }
dopen = _{ "{" }
dclose = _{ "}" }
read_num = { "1" | "2" }
single_len = { ASCII_DIGIT+ }
len_range = ${ single_len ~ rsep ~ single_len }
nucstr = { ("A" | "C" | "G" | "T")+ }
fixed_barcode_segment = { "b" ~ bopen ~ single_len ~ bclose }
fixed_umi_segment = { "u" ~ bopen ~ single_len ~ bclose }
fixed_seq_segment = { "f" ~ bopen ~ nucstr ~ bclose }
fixed_read_segment = { "r" ~ bopen ~ single_len ~ bclose }
fixed_discard_segment = { "x" ~ bopen ~ single_len ~ bclose }
ranged_barcode_segment = { "b" ~ bopen ~ len_range ~ bclose }
ranged_umi_segment = { "u" ~ bopen ~ len_range ~ bclose }
ranged_read_segment = { "r" ~ bopen ~ len_range ~ bclose }
ranged_discard_segment = { "x" ~ bopen ~ len_range ~ bclose }
unbounded_barcode_segment = { "b" ~ usep }
unbounded_umi_segment = { "u" ~ usep }
unbounded_read_segment = { "r" ~ usep }
unbounded_discard_segment = { "x" ~ usep }
fixed_segment = {
(fixed_umi_segment | fixed_read_segment | fixed_barcode_segment | fixed_seq_segment | fixed_discard_segment)
}
ranged_segment = {
(ranged_umi_segment | ranged_read_segment | ranged_barcode_segment | ranged_discard_segment)
}
bounded_segment = _{
(fixed_segment | (ranged_segment ~ fixed_seq_segment) | (unbounded_segment ~ fixed_seq_segment))
}
unbounded_segment = {
(unbounded_umi_segment | unbounded_read_segment | unbounded_barcode_segment | unbounded_discard_segment)
}
read_desc = {
dopen ~ ((bounded_segment)+ ~ (ranged_segment | unbounded_segment)? | unbounded_segment | ranged_segment) ~ dclose
}
read_1_desc = { "1" ~ read_desc }
read_2_desc = { "2" ~ read_desc }
frag_desc = _{ SOI ~ read_1_desc ~ read_2_desc ~ EOI}
```

NOTE: This is a working specification, and is subject to modifications and revisions. However, to the extent possible, this specification aims to agree with and conform to the RAD files currently being produced by the tools using this format (the otherwise de facto specification).

7/2/2024What are the biggest challenges facing the adoption of rust in genomics? Which of these challenges can be met by the community?

4/30/2024template <typename T_idx_>inline T_idx_ Suffix_Array<T_idx_>::lcp_opt_avx(const char* str1, const char* str2, const idx_t len_in) {int64_t i = 0;int64_t len = static_cast<int64_t>(len_in);

10/31/2023
Published on ** HackMD**

or

By clicking below, you agree to our terms of service.

Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet

Wallet
(
)

Connect another wallet
New to HackMD? Sign up