# Grammar for brms-like models sketches
## Building blocks
- Variable
- Primarily real number, but possibly with bounds and possibly of some other more complex types (see below)
- Optionally a data.frame context it can be predicted over
- Arrays of variables
- Optionally a data.frame context of the same length
- “Clamp” variable to a value
- Latent function
- Probably limited to evaluation only at data-derived values
- Linear predictors
- Only need to predict real numbers (but have to respect bounds)
- Link functions
- Subsetting
- Blocks
- Parametrized by variables, latent functions and possibly other blocks
- Native vs. expressed in the grammar
- Block types/constraints/concepts
## Linear predictors
Explicit formulas (akin to non-linear formulas in brms) are the basis.
R-style formula are just syntactic sugar.
## Blocks
Basically just containers binding together its parameters.
Native blocks then define code generation outputs.
Maybe some inspiration from modular Stan would help for sub-blocks (e.g. transition model for HMM) - i.e. blocks have members that can be accessed...
## HMM
```
s <- data.frame(state_id = c(“A”,”B”,”C”))
# By default reference transition A -> A always present
t <- data.frame(from = c(“A”,”B”), to = c(“B”, “C”))
t$trans_id <- paste0(t$from, "-", t$to)
d <- tribble(~serie_id, ~time, ~y, …)
```
```
hmm(
states = s,
series_data = d,
obs = binary_hmm_obs(
y /*Ref to series_data*/,
:theta /* Name for new var, linked to crossing(series_data, states) */),
trans = categorical_hmm_transition(t,
:rho /* Name for new var, linked to crossing(series_data, t) */),
init = known_hmm_init(“A”)
)
rho ~ from * to + condition
theta ~ mo(state_id)
```
Subsetting allows for nice grouping of predictors, i.e. to have separate predictors for each transition I'd do:
```
rho[trans_id == "A_B"] ~ something
```
or to share info just between transitions from the same source state I'd have
```
rho[from == "A"] ~ (1 + condition| to)
```
## Joint model
```
# Define a latent function
latent <- function(t, patient)
bs(t) + (1 + t | patient),
variable mu_marker(context = longitudinal)
lognormal(data = longitudinal /*data.frame context*/,
y = marker /* outcome reference to longitudinal*/,
mu = mu_marker /* new var name linked to longitudinal *//),
mu_marker ~ latent(time, patient)
cox(data = survival,
event = ev,
time = time,
log_hr = :h )
h ~ 1*latent(time, patient)
```
## Missing data
This would probably be built-in, but we can implement it from more basic building blocks
```
# define a new array of variable, tied to the "df" data.frame (same length)
array(name = missing_x, context = df)
# Outcome model
normal(data = df,
y = resp,
mu ~ x1 + x2 + missing_x)
# Missing data model
normal(data = df,
y = missing_x,
mu = :missing_mu
)
missing_mu ~ x1 + x2
# Set some elements of missing_x as known
clamp(missing_x[!is.na(x3)], x3)
```
## Some extra complexities
ordinal models require thresholds, multivariate models require correlation matrix -> need for non-real variable types - those don't need to allow predictors.
Maybe a `varies_by(var, grouping)` operation that duplicates the values (and keeps prior) and works for all var types when a data.frame context is present. (i.e. for a real unbounded variable `varies_by(var, group)` is functionally identical to `var ~ 0 + group`).
Categorical and MVN models require multiple predictors -> Arrays of variables