owned this note
owned this note
Published
Linked with GitHub
# SQL Embedding / Extensions
Wasm offers a powerful way to define various "user objects" commonly used in databases like User-Defined Functions and User Defined Types. This extensibility will enable users to push more logic down into their database and reduce data movement. User's will want to create extensions that bundle multiple related user objects into a single component that can be imported into the database in one command.
Embeddings for SQL Extensions will need a way to
1. identify the exported user objects,
* determine what kind of user object they are (e.g. UDF, UDAF, UDT),
* determine what their names are,
4. identify each part of the user object,
5. and determine what SQL types of the arguments/returns are.
## Identifying User Objects
The proposed kinds of user objects are
* User-Defined Functions
* User-Defined Aggregate Functions
* User-Defined Table-Valued Functions
* User-Defined Transform Functions
* User-Defined Types
Each of these kinds of user objects can be defined as an exported interface with a URL indicating the kind.
e.g.
```wit
export interface my-func "wasi:sql-extension/udf" {
apply: func(i: i32) -> i32;
}
```
The name of the interface can be interpreted as the name of the user object, so in the example above the UDF is named "my-func" or its SQL-normalization (e.g. screaming snake case).
## Identifying the Parts of User Objects
Each part of a user object (e.g. the functions that make up a UDAF) can be contained within the interface and have standard names. Ideally, this "shape" of a user object interface will be expressible as single `.wit`.
e.g.
udaf-resource, udaf-iimf
```wit
export interface sketch "wasi:sql-extension/udaf/defaults" {
type state = ...;
type row = ...;
type output = ...;
initialize: func() -> state;
iterate: func(s: state, input: row) -> state;
merge: func(s1: state, s2: state) -> state;
finalize: func(s: state) -> output;
}
```
This example used what are essentially "associated types", which is a nice convention for identifying the types for this UDAF but may not be generalizable into a single`.wit` for all UDAFs.
## Determining the SQL Types of User Objects
In addition to knowing what user objects are defined, what they're called, and which exports and definitions make them up, the embedding needs to determine what SQL types the `.wit` types should map to.
### Fixed Type Mapping
The simplest way for this to work is to have a table for each major SQL driver that maps each `.wit` type to a single SQL type. Then whenever a type is used in e.g. a UDF, the embedding will map this to the specified type.
That table might look something like this for MySQL:
| `.wit` | SQL |
| - | - |
| `i32` | INT |
| `i64` | BIGINT |
| `float32` | FLOAT |
| `float64` | DOUBLE |
| `list<u8>` | BLOB |
| ... | ... |
### Newtypes
We could also define a collection of `.wit` record types that represent SQL types and wrap the `.wit` type they convert to/from. If you use a `TEXT` type in your definition, the embedding knows that you want the SQL type to be `TEXT` and not `VARCHAR`
**wasi:sql-embed.wit**
```wit
// Simple wrappers
record TEXT {
content: string
}
record VARCHAR {
content: string
}
// Wrappers offering a choice of inner type
record BLOB-stream {
content: stream<u8>
}
record BLOB-list {
content: list<u8>
}
// Potentially represent this as a variant
varient BLOB {
stream(stream<u8>),
list(list<u8>)
}
```
**codec.wit**
```wit
use { TEXT, VARCHAR, BLOB-stream } from 'wasi:sql-embed';
export interface encode "wasi:sql-extension/udf" {
apply: func(bytes: TEXT, encoding: VARCHAR) -> BLOB-stream
}
export interface decode "wasi:sql-extension/udf" {
apply: func(bytes: BLOB-stream, encoding: VARCHAR) -> TEXT
}
```
### Supporting Different Variations of Each User Object
At the moment, there is one main way to define each kind of User Object,
but in the future there may be different ways to configure/represent them.
For example, we may want a way to define an aggregate function whose state doesn't have a "zero" value which might look something like this.
```wit
export interface foobar "wasi:sql-extension/udaf/semigroup" {
from: func(row) -> state;
merge: func(s1: state, s2: state) -> state;
finalize: func(s: state) -> output;
}
```
This format, while different from the conventional UDAF API lets you more intuitively represent
some algorithms and have less partial initialization / nullability like the following
so we may at some point want to support both.
> (value) -> (value, 1)
> (s1, s2) -> (s1.value + s2.value, s1.count + s2.count)
> (s) -> s.value / s.count
This raises the question "How do we distinguish between different UDAF configurations?".
The intuitive option would be to create different interfaces that, by definition, have different names that can be distinguished.
# Hypothetical World File
moved to [a separate file](https://hackmd.io/yCjw2MazRiuMs-45ZADmtw)