Basic syntax traits:
#
means comment for the rest of the line.\
at line end can continue the line with the next line.Program: ( (Rule 1|2|3)? โ (#
any)? )*
The grammar can be loosely described as follows: A rule generally has the following structure (if it does not end with {
):
Rule 1: ( [~?!@$]
? RuleName | <
TagName >
)
โ ((
Args )
)? โ โ only meaningful for funcs
โ :
(anything elseโฆ)
Args: arg ( (
+ | ,
) arg )* ,
?
If ending with {
, the parsing rule is different:
Rule 2: [~?!@$]
? RuleNameAndArgs {
where RuleNameAndArgs is anything in between, optionally having an argument list:
RuleName ((
Args )
)?
It may open a block function, which the only recursive structure the format currently supports. The block function is closed by a single }
as a line.
Rule 3: }
#
wrapped in "
does not count:
, not some identifier token
<a">
would be a tag token, but a"
is illegal as a tag name (spoiler: it is valid in HTML5)<<xxx>>
; it says Invalid function name: replace_tag(<<xxx>)>
.<
is not a valid tag name, but it is included.{
, not just those starting in @
:
:
to be the delimiter, skipping any strings. But if the rule ends with {
, parse everything before as { name + args }.
@debug:::
-> XPath error: Invalid expression in query ::
@debug:::{
-> Invalid function name: debug:::
@debug:::{{
-> Invalid function name: debug:::{
@debug::{:
-> Invalid expression in query :{:
?true:{
and ?true{
.(
is only meaningful when paired with )
:
if(
: Invalid property name: if(
if()
: unexpected (
@if{
-> fine@debug{
-> unexpected {
; weird!?true
block, but the error telling that a condition cannot appear in a block func would prevail.}
and jump there".From the fuzzing to guess the official IV engine's inner working, we simplify the parsing by preprocess the line stream before tokenizations take place. The steps and postconditions are as follows:
/\s*\\\s*$/
glue to the next line, also eliminate any leading spaces for the next rule.#
char MUST be inside some string literal.Also from fuzzing, the engine interprets the template line-by-line, i.e., it does not produce IR. This makes it easier to implement shortcut behaviors but make it harder to discover syntax errors early.
All tokens are effectively string under most cases, parsed depending on the patterns and so it is impossible to "escape" special strings.
Some properties of string literals:
"
(or '
, but not where strict JSON strings are required)\n
, \"
, \u1234
#
char can appear in. But does not count if the string literal is left open"\."
)"(?:\\.|[^"\\])*"
and pass it to JSON.parse
Specialized types:
$var
.[a-zA-Z]\w*
from code highlighting.$$
and $@
.@attr
with attr
non-empty. Used rarely, by like @append_to
.ims
modifiers.i
flag$context/query
$
prefix normally for variables, context can refer to properties as a fallback(query)[n]
self::*
.has-class("class")
contains(concat(" ", normalize-space(@class), " "), " class ")
ends-with("haystack", "needle")
(substring("haystack", string-length("haystack") - string-length("needle") + 1) = "needle")
prev-sibling
-> preceding-sibling::*[1]/self
next-sibling
-> following-sibling::*[1]/self
null
might just happen to be a valid XPath query that always return an empty list (the only valid words under the default context node (root) are head
and body
for any valid HTML document). However, specialize that value to enforce its semantics might be a good idea.>
, skipping over any string literals[a-zA-Z_][-\w.]*
PLUS some Unicode categories (not tested exhaustively).โโ$foo: "foo"
โโ$foo: null
โโ# `$foo` is now null
โโ@debug: $foo # does not emit this error
@function lpar PropList rpar lbrac
Rule*
rbrac
err: version should be a (quoted) string
invalid version
quirk? "1" or "1." is interpreted as "1.0"
TODO: "2." -> ? ("2.10000" causes an internal error)
err: Version should be defined once
version not placed as the first rule:
Version 1.0 is outdated. Please update your template to the last version 2.1
[medium.com:12] ~version: "2.1"
Version should be set at the beginning of template
string quote rules?
Conditions are not allowed inside block functions: ?true inside @if
Parse from (
to the nearest )
(i.e., no nesting):
"
is met, find the extent ("\b
) read the whole as a string literal. Otherwise, read until the first space or ,
and interpret the content as a string.,
if there is one.Note that the trailing comma matters. @x(1,2,)
should be parsed as if the last argument is ""
, but @x(1,2 )
should not. It is easier tested through @append(<tag>, ...)
since it requires a odd number of argument in this form.
TODO: how to test @x
vs. @x()
vs. @x(,)
?