# CloudEvents SQL Expression Language
The goal of this specification is to define an expression language SQL-like which can be used to express predicates on CloudEvents instances.
## Overview
CloudEvents SQL expressions (also known as CESQL) allow matching of CloudEvent attributes against complex expressions that lean on the syntax of Structured Query Language (SQL) `WHERE` clauses. Using SQL-derived expressions for message filtering is in widespread implementation use because the Java Message Service (JMS) message selector syntax also leans on SQL. Note that neither the SQL standard (ISO 9075) nor the JMS standard nor any other SQL dialect are used as a normative foundation or constrain the expression syntax defined in this specification, but the syntax is informed by them.
CESQL is a _[Total pure functional programming language](total-programming-language-wiki)_ in order to guarantee the termination of the evaluation of the expression.
The language is not constrained to a particular execution environment, which means it might run in a source, in a producer, in an intermediary, and it can be implemented using any tech stack.
The CloudEvents Expression Language assumes the input always includes, but it's not limited to, a single valid and type checked CloudEvent instance. An expression cannot mutate the value of the input CloudEvent instance, nor of any of the other input values. The evaluation of an expression must observe the concept of [referential transparency](referential-transparency).
The CloudEvents Expression Language doesn't support the handling of the data field of the CloudEvent instances, due to its polymorphic nature and complexity. We strongly encourage users that needs this functionality to use other more appropriate tools.
### Examples
_CloudEvent including a subject_
```
EXISTS subject
```
_CloudEvent including the extension 'firstname' with value 'Francesco'_
```
firstname = 'Francesco'
```
_CloudEvent including the extension 'firstname' with value 'Francesco' or the subject with value 'Francesco'_
```
firstname = 'Francesco' OR subject = 'Francesco'
```
_CloudEvent including the extension 'firstname' with value 'Francesco' and extension 'lastname' with value 'Guardiani', or the subject with value 'Francesco Guardiani'_
```
(firstname = 'Francesco' AND lastname = 'Guardiani') OR subject = 'Francesco Guardiani'
```
_CloudEvent including the extension 'sequence' with numeric value 10_
```
INT(sequence) = 10
```
_CloudEvent including the extension 'hop' and 'ttl', where 'hop' is smaller than 'ttl'_
```
INT(hop) < INT(ttl)
```
### Relation to the Subscription Spec
<!-- TODO -->
## Language syntax
The grammar of the language is defined using the EBNF Notation from [W3C XML specification](ebnf-xml-spec).
Although in the EBNFs keywords, operators and function identifiers are defined using uppercase characters, they are case-insensitive. For example:
```
int(hop) < int(ttl) and int(hop) < 1000
```
Is the same expression as:
```
INT(hop) < INT(ttl) AND INT(hop) < 1000
```
### Expression
The root of the expression is the `expression` rule:
```ebnf
expression ::= value-identifier | literal | unary-operation | binary-operation | function-invocation | ( "(" expression ")" )
```
Nested expressions must be correctly parenthesized.
### Value identifiers and literals
Value identifiers in CESQL have the same restrictions of the [Attribute Naming Convention](ce-attribute-naming-convention) from the CloudEvents spec.
```ebnf
lowercase-char ::= [a-z]
value-identifier ::= lowercase-char ( lowercase-char | digit )*
```
A value identifier cannot be greater than 20 characters in length.
CESQL defines 3 kind of type literals: number, boolean and string.
```ebnf
digit ::= [0-9]
number-literal ::= digit?
boolean-literal ::= "true" | "false"
string-literal ::= ( "'" ( [^'] | "\'" )* "'" ) | ( '"' ( [^"] | '\"' )* '"')
literal ::= number-literal | boolean-literal | string-literal
```
String literals can be either `''` or `""` delimited. In one case, the `"` has to be escaped, while in the other the `"` has to be escaped.
### Operators
```ebnf
not-operator = "NOT"
unary-numeric-operator = "-"
unary-logical-operator = not-operator
binary-comparison-operator = "=" | ">=" | "<=" | "<" | ">"
binary-logic-operator = "AND" | "OR" | "XOR"
binary-numeric-operator = "+" | "-" | "*" | "/" | "%"
like-operator = "LIKE"
exists-operator = "EXISTS"
in-operator = "IN"
unary-operation ::= (unary-numeric-operator | unary-logical-operator) expression
binary-operation ::= expression (binary-comparison-operator | binary-logic-operator | binary-numeric-operator) expression
like-operation ::= expression not-operator? like-operator string-literal
exists-operation ::= exists-operator value-identifier
set-expression ::= "(" expression ("," expression)* ")"
in-operation ::= expression not-operator? in-operator set-expression
```
### Functions invocation
```ebnf
uppercase-char ::= [A-Z]
parameter ::= expression
function-identifier ::= uppercase-char ( "_" | uppercase-char )*
parameter-list ::= parameter ("," parameter)*
function-invocation ::= function-identifier "(" parameter-list? ")"
```
## Language semantics
### Type System
The type system is composed by 3 types:
* _String_: Sequence of allowable Unicode characters
* _Integer_: A signed 32-bit integer
* _Boolean_: A boolean value of "true" or "false"
<!-- TBD timestamp? -->
The [types defined in the CloudEvents specification](ce-spec-type-system) URI, URI Reference and String are represented as _String_.
### CloudEvent context identifiers and types
Each CloudEvent context attribute and extension can be addressable from an expression using its identifier, as defined by the spec. For example, using `id` in an expression will address to the CloudEvent [id attribute](ce-id-attribute).
Unless otherwise specified, every attribute and extension is initially represented by the _String_ type. Through explicit and implicit casting, the user can convert the addressed value instances to _Integer_ and _Boolean_.
### Operators and functions
#### Errors
Because every operator and function is total, an expression evalution flow is defined statically and cannot be modified by expected or unexpected errors. Neverthless, CESQL includes the concept of errors: when an expression is evaluated, in case an error arise, the evaluator collects a list of errors, referred in this spec as _error list_, which is then returned to the user, together with the evaluated value of the CESQL expression.
Whenever possible, some errors checks should be done at compile time by the expression evaluator, in order to prevent runtime errors.
#### Unary operators
Corresponds to the syntactic rule `unary-operation`:
| Definition | Semantics |
| ------ | ------------------------------------ |
| `NOT x: Boolean -> Boolean` | Returns the negate value of `x` |
| `-x: Integer -> Integer` | Returns the minus value of `x` |
#### Binary operators
Operators in this table appears in precedence order. Corresponds to the syntactic rule `binary-operation`:
| Definition | Semantics |
| ------ | ------------------------------------ |
| `x = y: Boolean x Boolean -> Boolean` | Returns `true` if the values of `x` and `y` are equal |
| `x AND y: Boolean x Boolean -> Boolean` | Returns the logical and of `x` and `y` |
| `x OR y: Boolean x Boolean -> Boolean` | Returns the logical or of `x` and `y` |
| `x XOR y: Boolean x Boolean -> Boolean` | Returns the logical xor of `x` and `y` |
| `x = y: Integer x Integer -> Boolean` | Returns `true` if the values of `x` and `y` are equal |
| `x < y: Integer x Integer -> Boolean` | Returns `true` if `x` is strictly lower than `y` |
| `x <= y: Integer x Integer -> Boolean` | Returns `true` if `x` is lower or equal to `y` |
| `x > y: Integer x Integer -> Boolean` | Returns `true` if `x` is strictly greater than `y` |
| `x >= y: Integer x Integer -> Boolean` | Returns `true` if `x` is greater or equal to `y` |
| `x + y: Integer x Integer -> Integer` | Returns the sum of `x` and `y` |
| `x - y: Integer x Integer -> Integer` | Returns the difference of `x` and `y` |
| `x * y: Integer x Integer -> Integer` | Returns the product of `x` and `y` |
| `x / y: Integer x Integer -> Integer` | Returns the rounded division of `x` and `y`. Returns `0` if `y = 0` and raise an error |
| `x % y: Integer x Integer -> Integer` | Returns the remainder of the division of `x` and `y`. Returns `0` if `y = 0` and raise an error |
| `x = y: String x String -> Boolean` | Returns `true` if the values of `x` and `y` are equal |
#### Like operator
| Definition | Semantics |
| ------ | ------------------------------------ |
| `x LIKE pattern: String x String -> Boolean` | Returns `true` if the value x matches the `pattern` |
| `x NOT LIKE pattern: String x String -> Boolean` | Same as `NOT (x LIKE PATTERN)` |
The pattern of the `LIKE` operator can contain:
* `%` represents zero, one, or multiple characters
* `_` represents a single character
For example, the pattern `_b*` will accept values `ab`, `abc`, `abcd1` but won't accept values `b` or `acd`.
Both `%` and `_` can be escaped with `\`, in order to be matched literally. For example, the pattern `abc\%` will match `abc%` but won't match `abcd`.
#### Exists operator
| Definition | Semantics |
| ------ | ------------------------------------ |
| `EXISTS identifier: Any -> Boolean` | Returns `true` if the attribute `identifier` exists in the input CloudEvent |
Note: `EXISTS` must always return `true` for the required context attributes because the input CloudEvent is always assumed valid, e.g. `EXISTS id` must always return `true`.
#### In operator
| Definition | Semantics |
| ------ | ------------------------------------ |
| `x IN (y1, y2, ...): String x String^n -> Boolean` | Returns `true` if `x` is an element included in the set of `yN` elements |
| `x NOT IN (y1, y2, ...): String x String^n -> Boolean` | Same as `NOT (x IN set)` |
The set of values `yN` can be of arbitrary length and the matching is done using the same semantics of the equal `=` operator.
### Functions
#### Casting
| Definition | Semantics |
| -------- | -------- |
| `INT(x): String -> Integer` | Returns `x` converted to _Integer_, if possible. Otherwise, returns `0` and raise an error |
| `BOOL(x): String -> Boolean` | Returns `true` if the String value is `"true"`, `false` if the String value is `"false"`, otherwise `false` and returns an error |
| `STRING(x): Integer -> String` | Returns `x` converted to _String_ |
| `STRING(x): Boolean -> String` | Returns `x` converted to _String_ |
#### Built-in String manipulation
| Definition | Semantics |
| -------- | -------- |
| `LENGTH(x): String -> Integer` | Returns the character length of the String `x` |
| `CONCAT(x1, x2, ...): String^n -> String` | Returns the concatenation of `x1` up to `xN` |
| `LOWER(x): String -> String` | Returns `x` in lowercase |
| `UPPER(x): String -> String` | Returns `x` in uppercase |
| `TRIM(x): String -> String` | Returns `x` with leading and trailing trimmed whitespaces |
| `LEFT(x, y): String x Integer -> String` | Returns a new string with the first `y` characters of `x`, or returns `x` if `LENGTH(x) <= y`. Returns `x` if `y < 0` and raise an error |
| `RIGHT(x, y): String x Integer -> String` | Returns a new string with the last `y` characters of `x` or returns `x` if `LENGTH(x) <= y`. Returns `x` if `y < 0` and raise an error |
| `SUBSTRING(x, b, e): String x Integer x Integer -> String` | Returns the substring of `x` starting from index `b` (included) and ending with index `e` (excluded). Returns `SUBSTRING(x, b, LENGTH(x))` if `LENGTH(x) < e`. Returns `x` and raise an error if `b < 0 OR e < 0 OR e < b` |
## Execution
A CESQL expression, when evaluated, MUST always return a Boolean `true` if the evaluation based on predicate described in the expression matches the input, otherwise `false`.
[total-programming-language-wiki]: https://en.wikipedia.org/wiki/Total_functional_programming
[referential-transparency]: https://en.wikipedia.org/wiki/Referential_transparency
[ce-attribute-naming-convention]: ./spec.md#attribute-naming-convention
[ce-spec-type-system]: ./spec.md#type-system
[ce-id-attribute]: ./spec.md#id
[ebnf-xml-spec]: https://www.w3.org/TR/REC-xml/#sec-notation