A mostly turring incomplete language for graph selection with lisp syntax, duck typing and wasm extensions.
Killer features (design goals):
Tokens are accumulated until broken at which point a new token is made.
If a break happen but the current token is empty no token is emited.
Thoses codepoints (ascii whitespaces) break tokens, 0x20
and 0x9
to 0xc
(including).
Code is a stream of unicode codepoints. Which unicode encoding to use is left as an implementation details (/ must be managed out of band).
If a token starts with 0
to 9
it is a literal and the resulting value is of type number.
If the token is made of a single 0
, then it is the value zero and number parsing stops here.
Else if it starts with a number from 1
to 9
the whole token is considered as the number body and the base is 10.
For all bases the alphabet is: 0123456789abcdefghijklmnopqrstuvwxyz
(case insensitive).
If it starts with a 0 alternative base mode is activated.
x
or X
then everything past the second rune is the number body and the base is 16, if you find nothing this is a syntax error.o
or O
then everything past the second rune is the number body and the base is 8, if you find nothing this is a syntax error.0
to 9
until you reach a b
or B
boundry rune. You parse everything you accumuled so far as base 10 and this is the base that will be used, it must be 2 <= base <= 36
else that a syntax error. Then everything past the boundry rune is the number body, if you find nothing this is a syntax error.In number bodies the _
character is skipped by the parser (so 12__37
must be read as the number 1237
)
The number body is parsed as a RTL number of whatever base was specified.
Literal | Value |
---|---|
1234 |
1234 |
0xfF |
255 |
0O10 |
8 |
03b20 |
6 |
The "
character starts a string, while parsing a string other parsing rules are ignored.
Strings follow the interpreted_string_lit
golang spec. The parser is free to parse the string as whatever unicode encoding it uses, however the emited string value must be utf8 encoded. If the string is not valid utf8 then this is a syntax error.
If the string does not match the interpreted_string_lit
spec
If a token begin with a $
everything past the first rune is parsed following the Human Readable CIDs spec and the resulting value is of type CID.
Nodes build a tree like structure, each node can be a collection of tokens and child nodes.
Parsed tokens and nodes are added in order to the node that currently on top of the node stack.
When parsing is finished the node stack must be empty else that a syntax error (that means there are unmatched nodes).
When closed this scope node is emited into the parent.
()
Value NodesThe (
character starts a value node, pusing a new value node on the node stack, it also breaks tokens before and after.
A )
pop a node from the node stack and close it, if the poped node isn't a value node the syntax is invalid.
[]
Scope NodesThe [
character starts a scope node, pusing a new scope node on the node stack, it also breaks tokens before and after.
A ]
pop a node from the node stack and close it, if the poped node isn't a scope node the syntax is invalid.
When closed this scope node is emited into the parent.
{}
Comment NodesThe {
character starts a comment node, pusing a new comment node on the node stack, it also breaks tokens before and after.
A }
pop a node from the node stack and close it.
While inside a comment node the parser ignores other parsing rules until it find a }
rune, a comment node does not emit anything into the parent node.
Decorator are tokens that will modify some flag of the following node, if they are followed by Generic Token or comment node this is a syntax error.
The Nodes Decorators are !?§%@#
and can be used by the type system.
Decorators breaks tokens.
If a remaining tokens contain values outside of this list: abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_-
this is a syntax error. Else they are added as tokens (they will be used later by the type system).
None is a value some types can have, it indicates that this feature is not supported.
§
match§
can decorate literals and will transform them into an equivalent (match literal)
node.
§
can also be used to decorate conditional nodes, then it transform them on a matcher node that will apply the condition to the matching result.
Other decorators are reserved for future usage, using them now is a type error.
Builtins MUST be supported by all runners that implement IPSL.
pick
[T] <- (...[T])
pick
is a choice node, it accepts a variable amount of arguments which all have the same type, the runner then pick one of them that is not a none type. If all are none then it returns none.
Implementations may lazily and omit type checks of nodes that it knows it is not gonna pick, so for example in this case:
(pick 123 "testing")
If the node decide to pick 123
, it can return 123
even tho "testing"
should be a type error.
The runner is free to implement whatever picking strategy it wants.
The simplest strategy would be to always pick the first non none element, however the runner is free to implement ones it like more.
For example, if tasked to pick a scope, scopes that does not require downloading extra data (such as a builtin scope, or a wasm scope that is already cached) are probably preferable.
load-builtin-scope
scope <- (matcher[string])
load-builtin-scope
will execute the matcher on all available builtin scopes and return one of them that match. If none match it return a none scope.
[unixfs (pick (load-builtin-scope (wildcard "/unixfs/v1.*")) (load-wasm-scope $Qmfoo {Qmfoo is https://example.com/Foo/rust-unixfs-ipsl-scope}))
(depth-limit 10 (all
(unixfs.name (wildcard "file1*") (unixfs.everything))
(unixfs.name §"file2" (unixfs.file-range 0xFF1024 0x2FF0555 (unixfs.everything)))))]
traversal := ipsl.NewScope(unixfs.Scope, ipsl.NewDepthLimit(10, ipsl.NewAll(
unixfs.Name(match.Wildcard("file1*"), unixfs.Everything)),
unixfs.Name(match.Eq("file2"), unixfs.Range(0xFF1024, 0x2FF0555, unixfs.Everything))))
(unixfs folder $Qmbar)
+ -> file1*
| + -> *
+ -> file2
+ -> 0xFF1024 / 0x2FF0555
+ -> *