It's really just a play on JSON with the T meaning "table", but if you like, it can also be backronymed as Table Serialization Object Notation.
This is a work-in-progress specification.
Tag | Name | Summary |
---|---|---|
0 | None | Empty value. 0-byte representation. |
1 | Integer | Variable length signed integer. |
2 | Float32 | 32-bit floating point number. |
3 | Float64 | 64-bit floating point number. |
4 | String | UTF-8 encoded text. Length is VarUInt. |
5 | FixedIntArray | An array of fixed-size integers. |
6 | List | Single element type. |
7 | Tuple | A group of disjoint element types, of fixed length. |
8 | Record | Fixed name-value mapping (e.g. a struct). |
9 | Dictionary | Key to value mapping. Keys are not limited to strings. |
10 | Union | Allows an alternation between multiple types. |
Files must start with these two bytes:
magic | version |
---|---|
0x72 | 0x00 |
After this comes the document schema, and after that comes the payload. To parse a document, the schema first needs to be read into a tree-based description, and then the payload is interpreted by applying rules based on the schema.
The schema describes, from the root downwards, the structure of the document that will follow. This starts with the type description of the root tag.
varuint
For compact encoding, integers are stored in a variable length representation. The most-significant bit (0x80) is used as a continuation marker - if it is set to 1 then there is another byte following. The remaining 7 bits are parts of the integer. The payload is stored in big-endian order.
This naturally falls out that numbers in the range 0 to 127 (inclusive) are represented using 1 byte.
varsint
Signed integers in the variable length encoding scheme.
These are stored the same way as varuint
, but the least significant
bit of the final integer further stores the sign bit.
The sign bit is two's complement format, which means there's no negative zero. A value of 0 indicates a positive number, while 1 indicates negative.
Strings consist of a length value followed by the payload.
len | stri |
---|---|
varuint | byte |
Strings must be valid UTF-8 text. Invalid strings should be stored as a byte array tagged with a usage hint instead.
Type descriptions start with a 1-byte integer which contains the type tag, and end with a string for the usage hint.
tag | content | usage |
---|---|---|
byte | … | string |
The contents of the descriptor depend on the tag:
Primitives (None, String, Integer, Float32, Float64) have no content (zero-length).
Fixed int arrays have a length marker, followed by an integer type bit.
tag | len | prim | usage |
---|---|---|---|
0x05 | varuint | byte | string |
The high bit of the prim
byte is 1 if the type is signed, 0 if not. The lower 7 bits encode a power of two representing the length of the integer in bits. It must not be higher than 7 (128 bits). The most common case, a byte array would set prim
= 3.
Arrays for 1, 2, and 4-bit integers are padded to the nearest byte boundary at the end of the array.
# | size in bits |
---|---|
0 | 1 |
1 | 2 |
2 | 4 (nibble) |
3 | 8 (byte/octet) |
4 | 16 |
5 | 32 |
6 | 64 |
7 | 128 |
Lists have a length marker and then a type descriptor. The element type must not be None.
tag | len | elt | usage |
---|---|---|---|
0x06 | varuint | type desc | string |
Tuples have a length value N followed by N type descriptors. Implementations should omit fields that have a value of None.
tag | len | elti | usage |
---|---|---|---|
0x07 | varuint | type desc | string |
Records have a length value N followed by N field descriptions. Implementations should omit fields that have a value of None.
tag | len | namei | typei | usage |
---|---|---|---|---|
0x08 | varuint | string | type desc | string |
Dictionaries have two type descriptors for the key and value types. The key must not be None. The value may be None, which should be interpreted as a set rather than a dictionary.
tag | key | value | usage |
---|---|---|---|
0x09 | type desc | type desc | string |
Unions have a length value N followed by N variants.
tag | len | namei | typei | usage |
---|---|---|---|---|
0x0a | varuint | string | type desc | string |
Unions that only have 1 variant should not be emitted. Unions that have 0 variants must not be emitted. Implementations should avoid generating union variants that are not used in the payload that follows.
The root of the document (a value) is described using the root of the schema (a type description).
To decode a particular value depends on the type description.
For primitives:
For arrays, the behavior is different depending on the length marker. For arrays with a length marker of 0, read a varuint N and then read N elements of the element type.
len | elementi |
---|---|
varuint | element type |
For arrays with a non-zero length marker M, read M elements of the element type.
elementi |
---|
element type |
For tuples, read each element type sequentially.
elementi |
---|
elementi type |
For records, read the value types of each field in the order they were written in the schema.
valuei |
---|
value type |
For dictionaries, read a length value N and then read N key/value pairs.
len | keyi | valuei |
---|---|---|
varuint | key type | value type |
For unions, read a varuint index N, and then read the type descriptor corresponding to the Nth variant.
index | value |
---|---|
varuint | variantindex |
Usage hints are designed as a way for implementations to pass through native data types and allow for 1:1 encode and decode. A few common usage hints are pre-defined to maximize compatibility between languages and universality of tooling.
Unlike raw data types, new usage hints can be added without existing implementations needing to be adapted to support them.
Implementations should define their own namespace when adding new
formats not specified here. This includes sub-formats like those of
datetime
. For example, one might define a json:null
type to
distinguish between null
and lack-of-existence in JSON.
URN | Base Type | Summary |
---|---|---|
tson:bool |
Numeric[1] | 0 for false, 1 for true. |
tson:uuid |
FixedIntArray<u8> | UUID stored as big endian bytes. |
tson:uuid |
String | UUID stored as a hex string.[2] |
tson:display/hex |
Integer, FixedIntArray | Display as hexadecimal (base 16). |
tson:display/octal |
Integer, FixedIntArray | Display as octal (base 8). |
tson:display/binary |
Integer, FixedIntArray | Display as binary (base 2). |
tson:datetime/unix |
Integer | Seconds since Jan 1, 1970 UTC.[3] |
tson:datetime/iso8601 |
String | ISO 8601 formatted string. |
tson:datetime/http |
String | RFC 7231 formatted string. |
tson:string/utf16 |
FixedIntArray<u16> | UTF-16 text with BOM.[4] |
tson:unit/bytes |
Numeric[1:1] | Represents a quantity in bytes.[3:1] |
tson:unit/bits |
Numeric[1:2] | Represents a quantity in bits.[3:2] |
tson:unit/seconds |
Numeric[1:3] | Represents a quantity in seconds.[3:3] |
Scalable units may have query parameters mul
and div
. These two
combine to form a ratio that the number should be scaled by to achieve
the base unit. Both numbers must be integers, but may be written using
scientific notation.
tson:unit/bytes?mul=1000
: Kilobytes (KB).tson:unit/bytes?mul=1048576
: Mibibytes (MiB).tson:unit/seconds?div=1e9
: Nanoseconds (ns).tson:unit/seconds?div=60
: 60Hz timer ticks.tson:datetime/unix?div=1e3
: Milliseconds since epoch.This format should be generally avoided in favor of the array
representation. UUIDs stored as a string should be assumed to
potentially have wrapping curly braces {}
and included dashes -
. ↩︎
tson:string/utf16/be
and tson:string/utf16/le
may be used in order to omit the BOM. This format should generally be
assumed to potentially contain orphaned surrogate pairs. ↩︎