SQL Embedding / Extensions

# SQL Embedding / Extensions Wasm offers a powerful way to define various "user objects" commonly used in databases like User-Defined Functions and User Defined Types. This extensibility will enable users to push more logic down into their database and reduce data movement. User's will want to create extensions that bundle multiple related user objects into a single component that can be imported into the database in one command. Embeddings for SQL Extensions will need a way to 1. identify the exported user objects, * determine what kind of user object they are (e.g. UDF, UDAF, UDT), * determine what their names are, 4. identify each part of the user object, 5. and determine what SQL types of the arguments/returns are. ## Identifying User Objects The proposed kinds of user objects are * User-Defined Functions * User-Defined Aggregate Functions * User-Defined Table-Valued Functions * User-Defined Transform Functions * User-Defined Types Each of these kinds of user objects can be defined as an exported interface with a URL indicating the kind. e.g. ```wit export interface my-func "wasi:sql-extension/udf" { apply: func(i: i32) -> i32; } ``` The name of the interface can be interpreted as the name of the user object, so in the example above the UDF is named "my-func" or its SQL-normalization (e.g. screaming snake case). ## Identifying the Parts of User Objects Each part of a user object (e.g. the functions that make up a UDAF) can be contained within the interface and have standard names. Ideally, this "shape" of a user object interface will be expressible as single `.wit`. e.g. udaf-resource, udaf-iimf ```wit export interface sketch "wasi:sql-extension/udaf/defaults" { type state = ...; type row = ...; type output = ...; initialize: func() -> state; iterate: func(s: state, input: row) -> state; merge: func(s1: state, s2: state) -> state; finalize: func(s: state) -> output; } ``` This example used what are essentially "associated types", which is a nice convention for identifying the types for this UDAF but may not be generalizable into a single`.wit` for all UDAFs. ## Determining the SQL Types of User Objects In addition to knowing what user objects are defined, what they're called, and which exports and definitions make them up, the embedding needs to determine what SQL types the `.wit` types should map to. ### Fixed Type Mapping The simplest way for this to work is to have a table for each major SQL driver that maps each `.wit` type to a single SQL type. Then whenever a type is used in e.g. a UDF, the embedding will map this to the specified type. That table might look something like this for MySQL: | `.wit` | SQL | | - | - | | `i32` | INT | | `i64` | BIGINT | | `float32` | FLOAT | | `float64` | DOUBLE | | `list<u8>` | BLOB | | ... | ... | ### Newtypes We could also define a collection of `.wit` record types that represent SQL types and wrap the `.wit` type they convert to/from. If you use a `TEXT` type in your definition, the embedding knows that you want the SQL type to be `TEXT` and not `VARCHAR` **wasi:sql-embed.wit** ```wit // Simple wrappers record TEXT { content: string } record VARCHAR { content: string } // Wrappers offering a choice of inner type record BLOB-stream { content: stream<u8> } record BLOB-list { content: list<u8> } // Potentially represent this as a variant varient BLOB { stream(stream<u8>), list(list<u8>) } ``` **codec.wit** ```wit use { TEXT, VARCHAR, BLOB-stream } from 'wasi:sql-embed'; export interface encode "wasi:sql-extension/udf" { apply: func(bytes: TEXT, encoding: VARCHAR) -> BLOB-stream } export interface decode "wasi:sql-extension/udf" { apply: func(bytes: BLOB-stream, encoding: VARCHAR) -> TEXT } ``` ### Supporting Different Variations of Each User Object At the moment, there is one main way to define each kind of User Object, but in the future there may be different ways to configure/represent them. For example, we may want a way to define an aggregate function whose state doesn't have a "zero" value which might look something like this. ```wit export interface foobar "wasi:sql-extension/udaf/semigroup" { from: func(row) -> state; merge: func(s1: state, s2: state) -> state; finalize: func(s: state) -> output; } ``` This format, while different from the conventional UDAF API lets you more intuitively represent some algorithms and have less partial initialization / nullability like the following so we may at some point want to support both. > (value) -> (value, 1) > (s1, s2) -> (s1.value + s2.value, s1.count + s2.count) > (s) -> s.value / s.count This raises the question "How do we distinguish between different UDAF configurations?". The intuitive option would be to create different interfaces that, by definition, have different names that can be distinguished. # Hypothetical World File moved to [a separate file](https://hackmd.io/yCjw2MazRiuMs-45ZADmtw)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.