# CloudEvents Expression Language The goal of this specification is to define an expression language which can be used to express predicates on CloudEvents instances. ## Overview The CloudEvents Expression Language is a _[Total pure functional programming language](total-programming-language-wiki)_ in order to guarantee the termination of the evaluation of the expression. The language is not constrained to a particular execution environment, which means it might run in a source, in a producer, in an intermediary, and it can be implemented using any tech stack. The CloudEvents Expression Language assumes the input always includes, but it's not limited to, a single valid and type checked CloudEvent instance. An expression cannot mutate the value of the input CloudEvent instance, nor of any of the other input values. The evaluation of an expression must observe the concept of [referential transparency](referential-transparency). The CloudEvents Expression Language doesn't support the handling of the data field of the CloudEvent instances, due to its polymorphic nature and complexity. We strongly encourage users that needs this functionality to use other more appropriate tools. The CloudEvents Expression Language features: * A type system based on the [CloudEvents specification](ce-spec-type-system) * The concept of operators, with a set of built-in operators * The concept of functions, with a set of built-in functions ### Examples _CloudEvent including a subject_ ``` is_present(subject) ``` _CloudEvent including the extension 'firstname' with value 'Francesco'_ ``` firstname == "Francesco" ``` _CloudEvent including the extension 'firstname' with value 'Francesco' or the subject with value 'Francesco'_ ``` firstname == "Francesco" || subject == "Francesco" ``` _CloudEvent including the extension 'firstname' with value 'Francesco' and extension 'lastname' with value 'Guardiani', or the subject with value 'Francesco Guardiani'_ ``` (firstname == "Francesco" && lastname == "Guardiani") || subject == "Francesco Guardiani" ``` _CloudEvent including the extension 'subject' with value 'Francesco Guardiani' (using a case insensitive match)_ ``` case_insensitive_equals(subject, "Francesco Guardiani") ``` _CloudEvent including the time attribute with year value '2020'_ ``` time.year() == 2020 ``` _CloudEvent including the time attribute with year value '2020' (comparing with CET timezone)_ ``` time.to_timezone(CET).year() == 2020 ``` _CloudEvent including the extension 'sequence' with numeric value 10_ ``` sequence == 10 ``` _CloudEvent including the extension 'sequence' with a valid numeric value_ ``` sequence.is_integer() ``` ### Relation to the Subscription Spec ## Language syntax The grammar of the language is defined using the EBNF Notation from [W3C XML specification](ebnf-xml-spec) ### Expressions ```ebnf expression ::= value-identifier | literal | unary-operation | binary-operation | function-invocation | method-invocation | ( "(" expression ")" ) ``` ### Identifiers and literals ```ebnf digit ::= [0-9] number-literal ::= digit? boolean-literal ::= "true" | "false" string-literal ::= ( "'" ( [^'] | "\'" )* "'" ) | ( '"' ( [^"] | '\"' )* '"') literal ::= number-literal | boolean-literal | string-literal ``` String literals can be either `''` delimited or `""`. In one case, the `"` has to be escaped, while in the other the `"` has to be escaped. ```ebnf lowercase-char ::= [a-z] value-identifier ::= lowercase-char ( lowercase-char | digit )* ``` A value identifier cannot be greater than 20 characters in length. ### Operators ```ebnf op ::= "=" | "==" | ">=" | "<=" | "=<" | "=>" | "<" | ">" | "!" | "-" | "&&" | "||" | "^||" unary-operation ::= op expression binary-operation ::= expression op expression ``` ### Functions invocation ```ebnf parameter ::= expression function-identifier ::= lowercase-char ( "_" | lowercase-char ) function-invocation ::= ( function-identifier "(" parameter* ")" ) | ( parameter "." function-identifier "(" parameter* ")" ) ``` In CloudEvents Expression Language the user can invoke a particular function through the usual "function" notation or, in case the function has a arity equal or greater than 1, using what is often referred to as the "method" notation. ## Language semantics ### Type System The type system adopted by the CloudEvents Expression Language maps to the [types defined in the CloudEvents specification](ce-spec-type-system), extending it with another special type called _untype_. For each type, we define a set of casting operations from one to another type. Each cast operation is total and closed to the type system defined in this paragraph. Since the CloudEvent Expression Language doesn't have the concept of null pointer or undefined value, every type, except `untype`, defines a **default** value. <!-- TODO the allowed values are already defined by the CE spec, i just copy pasted here... Could we refer to the spec directly? --> #### Boolean * Allowed values: `true` or `false` * Default value: `false` #### Integer * Allowed values: Any signed int32 integer * Default value: `0` #### String * Allowed values: Unicode characters * Default value: empty string #### Binary <!-- TODO Should we support binary? at what level? Introducing arrays just for binaries might add unnecessary complexity! --> #### URI * Default value: http://example.com #### URI-reference * Default value: http://example.com #### Timestamp * Default: beginning of Epoch #### Untype The _untype_ is a special type used in CloudEvent Expression Language to represent fields that doesn't have a known type at compile time. In this spec we often refer to "concrete" types as the set of all the types, excluding _untype_. We define both built-in casting functions and inference rules to handle `untype` fields. ### CloudEvent context identifiers Each CloudEvent context attribute and extension can be addressable from an expression using its identifier, as defined by the spec. This expression will return the `id` of the input CloudEvent: ``` id ``` This expression will return the `time` of the input `CloudEvent` if present, otherwise it will return the default value of the `Timestamp` type: ``` time ``` To address the extensions, users can use the extension names. This will return the extension `partitionkey` of the input `CloudEvent` if present: ``` partitionkey ``` While the type of specific CloudEvents attributes is always defined, the protocol bindings/event formats doesn't define a mechanism to propagate the type information of extensions. The type checker MUST represent the extensions as `untype` and it MUST, when possible, infer the expected type from the context. When an extension is not present in the input `CloudEvent`, the default value returned MUST be always the default value of the inferred type. ### Errors Although an expression evalution flow is defined statically and cannot be modified by expected or unexpected errors (because every operator and built-in function are total), when an expression is evaluated, it collects a list of evaluation errors, referred in this spec as _error list_, which can be used by the expression engine invoker to perform error handling. ### Functions/methods invocation A 0-arity function can be defined to retrieve a specific value from the executor (e.g. the `now()` function might be defined to return the now timestamp). For all the 1+ arity functions, these two expressions are semantically equivalent: ``` case_insensitive_equals(subject, "Francesco Guardiani") ``` ``` subject.case_insensitive_equals("Francesco Guardiani") ``` ### Built-in unary operators | Definition | Semantics | | -------- | -------- | | `!x: Boolean -> Boolean` | Returns the negate value of `x` | | `-x: Integer -> Integer` | Returns the minus value of `x` | ### Built-in binary operators The operators in this table are ordered by precedence. | Definition | Semantics | | ------ | ------------------------------------ | | `x && y: Boolean x Boolean -> Boolean` | Returns the logical and of `x` and `y` | | `x || y: Boolean x Boolean -> Boolean` | Returns the logical or of `x` and `y` | | `x ^|| y: Boolean x Boolean -> Boolean` | Returns the logical xor of `x` and `y` | | `x == y: Boolean x Boolean -> Boolean` | Returns `true` if the values of `x` and `y` are equal | | `x == y: Integer x Integer -> Boolean` | Returns `true` if the values of `x` and `y` are equal | | `x < y: Integer x Integer -> Boolean` | Returns `true` if `x` is strictly lower than `y` | | `x <= y: Integer x Integer -> Boolean` | Returns `true` if `x` is lower or equal to `y` | | `x > y: Integer x Integer -> Boolean` | Returns `true` if `x` is strictly greater than `y` | | `x >= y: Integer x Integer -> Boolean` | Returns `true` if `x` is greater or equal to `y` | | `x == y: String x String -> Boolean` | Returns `true` if the values of `x` and `y` are equal | | `x == y: Uri x Uri -> Boolean` | Returns `true` if the values of `x` and `y` are equal | | `x == y: Uri-reference x Uri-reference -> Boolean` | Returns `true` if the values of `x` and `y` are equal | | `x == y: Timestamp x Timestamp -> Boolean` | Returns `true` if the values of `x` and `y` are equal | | `x < y: Timestamp x Timestamp -> Boolean` | Returns `true` if `x` is strictly lower than `y` | | `x <= y: Timestamp x Timestamp -> Boolean` | Returns `true` if `x` is lower or equal to `y` | | `x > y: Timestamp x Timestamp -> Boolean` | Returns `true` if `x` is strictly greater than `y` | | `x >= y: Timestamp x Timestamp -> Boolean` | Returns `true` if `x` is greater or equal to `y` | Note: for each binary operator using `==` as symbol, using the symbol `=` is equivalent. Note: for each binary operator using `<=` as symbol, using `=<` is equivalent. Note: for each binary operator using `>=` as symbol, using `=>` is equivalent. ### Built-in functions To simplify these tables, sometimes we refer to `Any` as any of the types defined in the specified types system, including `Untype`. #### General built-ins: | Definition | Semantics | | -------- | -------- | | `is_present(x): Any -> Boolean` | Returns `true` if the identifier is available in the value identifiers list (CloudEvents context identifiers plus the built-in identifiers), `false` otherwise | #### Type conversions and assertions built-ins: We define the following semantics to perform type conversion: | Input type | Output type | Semantics | Can fail | | -------- | -------- | -------- | -------- | | `Boolean` | `String` | returns `true` if Boolean value is true, `false` otherwise | false | | `Boolean` | `Integer` | returns `0` if Boolean value is true, `1` otherwise | false | | `Integer` | `String` | returns the integer as a String | false | | `Integer` | `Boolean` | returns `false` if the value is greater than 0, `true` otherwise | false | | `String` | `Boolean` | returns the String `"true"` if the value is `true`, returns the String `"false"` if the value is `false`, fails otherwise | true | | `String` | `Integer` | returns the Integer if the value can be parsed as Integer, fails otherwise | true | | `String` | `Uri` | returns the Uri if the value can be parsed as Uri, fails otherwise | true | | `String` | `Uri-reference` | returns the Integer if the value can be parsed as Uri reference, fails otherwise | true | | `String` | `Timestamp` | returns the Integer if the value can be parsed as valid RFC3339 Timestamp, fails otherwise | true | | `Uri` | `String` | returns the uri as a String | false | | `Uri-reference` | `String` | returns the uri-reference as a String | false | | `Timestamp` | `String` | returns the timestamp as an RFC3339 String | false | | `Untype` | `Boolean` | explicit cast of `Untype` to `Boolean` | true | | `Untype` | `Integer` | explicit cast of `Untype` to `Integer` | true | | `Untype` | `String` | explicit cast of `Untype` to `String` | true | | `Untype` | `Uri` | explicit cast of `Untype` to `Uri` | true | | `Untype` | `Uri-reference` | explicit cast of `Untype` to `Uri-reference` | true | | `Untype` | `Timestamp` | explicit cast of `Untype` to `Timestamp` | true | For each type tuple `(I, O)` in this table, there is a corresponding conversion function built-in in the CloudEvents Expression Language defined as follows: ``` to_O(x): I -> O ``` This function applies the semantic of the conversion as described in the table. If the conversion fails, the default value of type `O` is returned instead and the conversion error is collected in the error list. For each type `T` in the type system we define `to_T(x): T -> T` which is the identity function. For each type `O` we define functions `is_O(x): Any -> Boolean` where, given `I` the concrete type of `x`: * If there is a defined conversion `(I, O)` and this conversion doesn't fail on the input `x`, then returns `true` * Returns `false` otherwise Note: The function identifier name is always lowercase. #### String built-ins: | Definition | Semantics | | -------- | -------- | | `match(x, regex): String x String -> Boolean` | Returns `true` if `x` matches the the regular expression `regex`, `false` otherwise | | `case_insensitive_equals(x, y): String x String -> Boolean` | Returns `true` if the `x` is equal to `y` using a case insensitive match, `false` otherwise | | `trim(x): String -> String` | Returns `x` trimmed from whitespaces | | `length(x): String -> Integer` | Returns the length of `x` | | `contains(x, y): String x String -> Integer` | Returns `true` if `x` contains `y` | | `has_prefix(x, prefix): String x String -> Boolean` | Returns `true` if `x` has as prefix `prefix` | | `has_suffix(x, suffix): String x String -> Boolean` | Returns `true` if `x` has as suffix `suffix` | #### URI built-ins: | Definition | Semantics | | -------- | -------- | | `scheme(x): Uri -> String` | Returns the `scheme` component of `x` | | `is_absolute(x): Uri -> Boolean` | Returns true if `x` is an absolute URI | #### Timestamp built-ins: | Definition | Semantics | | -------- | -------- | | `year(x): Timestamp -> Integer` | Returns the `year` component of `x` | | `month(x): Timestamp -> Integer` | Returns the `month` component of `x` | | `day(x): Timestamp -> Integer` | Returns the `day` component of `x` | | `hour(x): Timestamp -> Integer` | Returns the `hour` component of `x` | | `minute(x): Timestamp -> Integer` | Returns the `minute` component of `x` | | `second(x): Timestamp -> Integer` | Returns the `second` component of `x` | | `to_timezone(x, y): Timestamp x String -> Timestamp` | Returns `x` converted to the timezone specified in `y` | <!-- Durations without defining a duration type? --> ### Compilation and execution A CloudEvents Expression Language, when evaluated, MUST return only a concrete type included in the specified type system. #### Untyped type checking and runtime handling As discussed above, the language defines explicit casting built-ins to manipulate `untype` fields and it also defines some inference rules to infer the "concrete" type: * If the untyped field is used in an unary operator and the operator argument type is defined unambiguously, then the inferred type is the concrete type of the operator argument * If the untyped field is used in a binary operator as left argument and the right argument is a concrete type, then the inferred type is the concrete type of the right operator argument * If the untyped field is used in a binary operator as right argument and the left argument is a concrete type, then the inferred type is the concrete type of the left operator argument * If the untyped field is used as an argument of a built-in function, and the argument type is unambigously defined, then the inferred type is the concrete type of the function argument If none of the casting built-ins are used and none of these inference rules match, the expression is invalid and MUST be rejected by the type checker. For example, in this expression, the type checker should infer the type `String` for the field `partitionkey`: ``` partitionkey.has_prefix("abc") ``` As a counterexample, this expression should be rejected by the type checker because both operands are extensions, so both are `untype` and the type checker cannot infer their types: ``` partitionkey == sequencetype ``` To fix the above expression, at least one explicit casting must be used: ``` partitionkey.to_string() == sequencetype ``` Or: ``` partitionkey == sequencetype.to_string() ``` Or: ``` partitionkey.to_string() == sequencetype.to_string() ``` When the type checker performs an implicit cast, the evaluation semantic should be the same as an explicit cast, that is the evaluation of the three above expressions is equivalent. #### TODO <!-- TODO should we allow to return any type of the type system? or do we want to restrict to boolean? --> <!-- TODO talk about the evaluation order/parenthesis --> [total-programming-language-wiki]: https://en.wikipedia.org/wiki/Total_functional_programming [referential-transparency]: https://en.wikipedia.org/wiki/Referential_transparency [ce-spec-type-system]: ./spec.md#type-system [ebnf-xml-spec]: https://www.w3.org/TR/REC-xml/#sec-notation