# Stardust Database Schema Proposal
## Milestone Range ID
The `ms_range_id` used in some of these tables is effectively a `partition_id` based on the `milestone_index`. It is simply the equal to `milestone_index / chunk_size` where `chunk_size` can be any predefined value for the table. This `chunk_size` may be configurable, but any changes to it require a migration of all database data that uses it, so it may be best to settle on appropriate values and disallow them to be modified.
## Messages
```sql=
CREATE TABLE IF NOT EXISTS test.messages (
message_id text,
version tinyint,
message blob,
est_milestone_index int,
milestone_index int,
inclusion_state tinyint,
conflict_reason tinyint,
proof blob,
PRIMARY KEY (message_id, version, message)
);
CREATE INDEX messages_by_milestone ON test.messages (milestone_index);
```
### Selects
```sql
SELECT message
FROM test.messages
WHERE message_id = ?;
SELECT message_id, variant, parents, is_solid, milestone_index, inclusion_state, should_promote, should_reattach
FROM test.messages
WHERE message_id = ?;
SELECT *
FROM test.messages
WHERE message_id = ?;
```
## Transactions
```sql=
CREATE TABLE IF NOT EXISTS test.transactions (
transaction_id text,
idx smallint,
variant text,
message_id text,
data blob,
est_milestone_index int,
milestone_index int,
inclusion_state tinyint,
PRIMARY KEY (transaction_id, idx, variant, message_id, data)
);
```
### Selects
```sql
SELECT message_id, data, inclusion_state
FROM test.transactions
WHERE transaction_id = ?
AND idx = ?
AND variant IN ('output', 'unlock');
SELECT message_id, data, idx, inclusion_state, milestone_index
FROM test.transactions
WHERE transaction_id = ?;
```
### Materialized view
This view is used specifically to improve the performance of the `included-message` API.
```sql=
CREATE MATERIALIZED VIEW test.transactions_by_state AS
SELECT transaction_id, idx, variant, message_id, inclusion_state
FROM test.transactions
WHERE transaction_id IS NOT NULL
AND idx IS NOT NULL
AND variant IS NOT NULL
AND message_id IS NOT NULL
AND inclusion_state IS NOT NULL
AND data IS NOT NULL
PRIMARY KEY (transaction_id, inclusion_state, idx, variant, message_id);
```
```sql
SELECT message_id
FROM test.transactions_by_state
WHERE transaction_id = ?
AND inclusion_state = ?
LIMIT 1;
```
## Milestones
```sql=
CREATE TABLE IF NOT EXISTS test.milestones (
milestone_index int,
message_id text,
timestamp bigint,
payload blob,
PRIMARY KEY (milestone_index, message_id)
);
```
### Selects
```sql
SELECT message_id, timestamp
FROM test.milestones
WHERE milestone_index = ?;
```
## Partition Hints
These tables enable selecting only from partitions with data. A row should be inserted into to these tables whenever indexed data is inserted.
### Tag Hints
```sql=
CREATE TABLE IF NOT EXISTS test.tag_hints (
tag text,
table_kind text,
ms_range_id int,
PRIMARY KEY (tag, ms_range_id)
) WITH CLUSTERING ORDER BY (ms_range_id DESC);
```
```sql
SELECT ms_range_id
FROM test.tag_hints
WHERE tag = 'spam'
SELECT ms_range_id
FROM test.tag_hints
WHERE tag = 'my_output_special_tag'
AND table_kind = 'ext_output'
INSERT INTO test.tag_hints (tag, ms_range_id)
VALUES ('spam', ?);
INSERT INTO test.tag_hints (tag, ms_range_id)
VALUES ('not_a_spam', ?);
```
### Addresses Hints
```sql=
CREATE TABLE IF NOT EXISTS test.addresses_hints (
address text,
output_kind text,
variant text,
ms_range_id int,
PRIMARY KEY (address, output_kind, variant, ms_range_id)
) WITH CLUSTERING ORDER BY (ms_range_id DESC);
```
```sql
SELECT ms_range_id
FROM test.addresses_hints
WHERE address = 'some_address'
AND output_kind in ('extended', 'legacy', 'nft', 'foundry', 'alias')
AND variant in ('address', 'sender', 'issuer', ..);
INSERT INTO test.addresses_hints (address, output_kind, variant ms_range_id)
VALUES ('address_a', ?, ?);
INSERT INTO test.addresses_hints (address, output_kind, variant, ms_range_id)
VALUES ('address_b', ?, ?);
```
## Outputs
Outputs are represented by optional indexation tables. They can also be looked up in the `transactions` table by `output_id`, but the below tables and views enable an extended API.
All of the following tables are structured similarly, containing indexation columns which define MVs to query with. Thus, all queries are approximately identical. Below are some examples.
#### Selects
```sql
SELECT *
FROM test.basic_outputs_by_address
WHERE address = ?
AND ms_range_id = ?;
SELECT *
FROM test.basic_outputs_by_address
WHERE address = ?
AND ms_range_id = ?
AND ms_timestamp >= ?
AND ms_timestamp < ?;
```
#### Inserts
```sql
INSERT INTO test.basic_outputs (
output_id, ms_range_id, ms_timestamp, milestone_index)
VALUES (?, 2147483647, ?, ?)
USING TTL 86400;
INSERT INTO test.basic_outputs (
output_id, ms_range_id, ms_timestamp, milestone_index, amount,
inclusion_state, address, tag, sender, sender_type)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?);
```
### Basic Outputs
```sql=
CREATE TABLE IF NOT EXISTS test.basic_outputs (
output_id text,
ms_range_id int,
milestone_index int,
ms_timestamp timestamp,
inclusion_state tinyint,
address text,
sender text,
tag text,
data blob,
PRIMARY KEY (output_id, ms_range_id, milestone_index, ms_timestamp)
) WITH CLUSTERING ORDER BY (ms_range_id DESC, milestone_index DESC, ms_timestamp DESC);
CREATE MATERIALIZED VIEW test.basic_outputs_by_address AS
SELECT * from test.basic_outputs
WHERE output_id IS NOT NULL
AND address IS NOT NULL
AND ms_range_id IS NOT NULL
AND ms_timestamp IS NOT NULL
AND milestone_index IS NOT NULL
PRIMARY KEY ((address, ms_range_id), ms_timestamp, milestone_index, output_id)
WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC);
CREATE MATERIALIZED VIEW test.basic_outputs_by_tag AS
SELECT * from test.basic_outputs
WHERE output_id IS NOT NULL
AND tag IS NOT NULL
AND ms_range_id IS NOT NULL
AND ms_timestamp IS NOT NULL
AND milestone_index IS NOT NULL
PRIMARY KEY ((tag, ms_range_id), ms_timestamp, milestone_index, output_id)
WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC);
CREATE MATERIALIZED VIEW test.basic_outputs_by_sender AS
SELECT * from test.basic_outputs
WHERE output_id IS NOT NULL
AND sender IS NOT NULL
AND ms_range_id IS NOT NULL
AND ms_timestamp IS NOT NULL
AND milestone_index IS NOT NULL
PRIMARY KEY ((sender, ms_range_id), ms_timestamp, milestone_index, output_id)
WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC);
```
### Alias Outputs
```sql=
CREATE TABLE IF NOT EXISTS test.alias_outputs (
alias_id text,
ms_range_id int,
milestone_index int,
ms_timestamp timestamp,
inclusion_state tinyint,
sender text,
issuer text,
state_controller text,
governor text,
data blob,
PRIMARY KEY (alias_id, ms_range_id, milestone_index, ms_timestamp)
) WITH CLUSTERING ORDER BY (ms_range_id DESC, milestone_index DESC, ms_timestamp DESC);
CREATE MATERIALIZED VIEW test.alias_outputs_by_sender AS
SELECT * from test.alias_outputs
WHERE alias_id IS NOT NULL
AND sender IS NOT NULL
AND ms_range_id IS NOT NULL
AND ms_timestamp IS NOT NULL
AND milestone_index IS NOT NULL
PRIMARY KEY ((sender, ms_range_id), ms_timestamp, milestone_index, alias_id)
WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC);
CREATE MATERIALIZED VIEW test.alias_outputs_by_issuer AS
SELECT * from test.alias_outputs
WHERE alias_id IS NOT NULL
AND issuer IS NOT NULL
AND ms_range_id IS NOT NULL
AND ms_timestamp IS NOT NULL
AND milestone_index IS NOT NULL
PRIMARY KEY ((issuer, ms_range_id), ms_timestamp, milestone_index, alias_id)
WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC);
CREATE MATERIALIZED VIEW test.alias_outputs_by_state_controller AS
SELECT * from test.alias_outputs
WHERE alias_id IS NOT NULL
AND state_controller IS NOT NULL
AND ms_range_id IS NOT NULL
AND ms_timestamp IS NOT NULL
AND milestone_index IS NOT NULL
PRIMARY KEY ((state_controller, ms_range_id), ms_timestamp, milestone_index, alias_id)
WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC);
CREATE MATERIALIZED VIEW test.alias_outputs_by_governor AS
SELECT * from test.alias_outputs
WHERE alias_id IS NOT NULL
AND governor IS NOT NULL
AND ms_range_id IS NOT NULL
AND ms_timestamp IS NOT NULL
AND milestone_index IS NOT NULL
PRIMARY KEY ((governor, ms_range_id), ms_timestamp, milestone_index, alias_id)
WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC);
```
### Foundry Outputs
```sql=
CREATE TABLE IF NOT EXISTS test.foundry_outputs (
foundry_id text,
ms_range_id int,
milestone_index int,
ms_timestamp timestamp,
inclusion_state tinyint,
address text,
data blob,
PRIMARY KEY (foundry_id, ms_range_id, milestone_index, ms_timestamp)
) WITH CLUSTERING ORDER BY (ms_range_id DESC, milestone_index DESC, ms_timestamp DESC);
CREATE MATERIALIZED VIEW test.foundry_outputs_by_address AS
SELECT * from test.foundry_outputs
WHERE foundry_id IS NOT NULL
AND address IS NOT NULL
AND ms_range_id IS NOT NULL
AND ms_timestamp IS NOT NULL
AND milestone_index IS NOT NULL
PRIMARY KEY ((address, ms_range_id), ms_timestamp, milestone_index, foundry_id)
WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC);
```
### NFT Outputs
```sql=
CREATE TABLE IF NOT EXISTS test.nft_outputs (
nft_id text,
ms_range_id int,
milestone_index int,
ms_timestamp timestamp,
inclusion_state tinyint,
address text,
dust_return_address text,
sender text,
issuer text,
tag text,
data blob,
PRIMARY KEY (nft_id, ms_range_id, milestone_index, ms_timestamp)
) WITH CLUSTERING ORDER BY (ms_range_id DESC, milestone_index DESC, ms_timestamp DESC);
CREATE MATERIALIZED VIEW test.nft_outputs_by_address AS
SELECT * from test.nft_outputs
WHERE nft_id IS NOT NULL
AND address IS NOT NULL
AND ms_range_id IS NOT NULL
AND ms_timestamp IS NOT NULL
AND milestone_index IS NOT NULL
PRIMARY KEY ((address, ms_range_id), ms_timestamp, milestone_index, nft_id)
WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC);
CREATE MATERIALIZED VIEW test.nft_outputs_by_dust_return_address AS
SELECT * from test.nft_outputs
WHERE nft_id IS NOT NULL
AND dust_return_address IS NOT NULL
AND ms_range_id IS NOT NULL
AND ms_timestamp IS NOT NULL
AND milestone_index IS NOT NULL
PRIMARY KEY ((dust_return_address, ms_range_id), ms_timestamp, milestone_index, nft_id)
WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC);
CREATE MATERIALIZED VIEW test.nft_outputs_by_sender AS
SELECT * from test.nft_outputs
WHERE nft_id IS NOT NULL
AND sender IS NOT NULL
AND ms_range_id IS NOT NULL
AND ms_timestamp IS NOT NULL
AND milestone_index IS NOT NULL
PRIMARY KEY ((sender, ms_range_id), ms_timestamp, milestone_index, nft_id)
WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC);
CREATE MATERIALIZED VIEW test.nft_outputs_by_issuer AS
SELECT * from test.nft_outputs
WHERE nft_id IS NOT NULL
AND issuer IS NOT NULL
AND ms_range_id IS NOT NULL
AND ms_timestamp IS NOT NULL
AND milestone_index IS NOT NULL
PRIMARY KEY ((issuer, ms_range_id), ms_timestamp, milestone_index, nft_id)
WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC);
CREATE MATERIALIZED VIEW test.nft_outputs_by_tag AS
SELECT * from test.nft_outputs
WHERE nft_id IS NOT NULL
AND tag IS NOT NULL
AND ms_range_id IS NOT NULL
AND ms_timestamp IS NOT NULL
AND milestone_index IS NOT NULL
PRIMARY KEY ((tag, ms_range_id), ms_timestamp, milestone_index, nft_id)
WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC);
```
### Legacy Outputs
This table will hold outputs from before Stardust.
```sql=
CREATE TABLE IF NOT EXISTS test.legacy_outputs (
output_id text,
output_type tinyint,
is_used boolean,
ms_range_id int,
milestone_index int,
ms_timestamp timestamp,
inclusion_state tinyint,
message_id text,
amount bigint,
address text,
data blob,
PRIMARY KEY (output_id, output_type, is_used, ms_range_id, milestone_index, ms_timestamp)
) WITH CLUSTERING ORDER BY (output_type ASC, is_used ASC, ms_range_id DESC, milestone_index DESC, ms_timestamp DESC);
CREATE MATERIALIZED VIEW test.legacy_outputs_by_address AS
SELECT * from test.legacy_outputs
WHERE output_id IS NOT NULL
AND output_type IS NOT NULL
AND is_used IS NOT NULL
AND address IS NOT NULL
AND ms_range_id IS NOT NULL
AND ms_timestamp IS NOT NULL
AND milestone_index IS NOT NULL
PRIMARY KEY ((address, ms_range_id), output_type, is_used, ms_timestamp, milestone_index, output_id)
WITH CLUSTERING ORDER BY (output_type ASC, is_used ASC, ms_timestamp DESC, milestone_index DESC);
```
## Tags
```sql=
CREATE TABLE IF NOT EXISTS test.tags (
tag text,
ms_range_id int,
milestone_index int,
ms_timestamp timestamp,
message_id text,
inclusion_state blob,
PRIMARY KEY ((tag, ms_range_id), ms_timestamp, milestone_index)
) WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC);
```
### Selects
```sql
SELECT milestone_index, message_id, inclusion_state
FROM test.tags
WHERE tag = ?
AND ms_range_id = ?;
SELECT milestone_index, message_id, inclusion_state
FROM test.tags
WHERE tag = ?
AND ms_range_id = ?
AND ms_timestamp >= ?
AND ms_timestamp < ?;
```
## Parents
```sql=
CREATE TABLE IF NOT EXISTS test.parents (
parent_id text,
message_id text,
est_milestone_index int,
milestone_index int,
ms_timestamp timestamp,
inclusion_state tinyint,
PRIMARY KEY (parent_id, message_id)
);
CREATE MATERIALIZED VIEW test.parents_by_ms AS
SELECT * from test.parents
WHERE parent_id IS NOT NULL
AND message_id IS NOT NULL
AND milestone_index IS NOT NULL
AND ms_timestamp IS NOT NULL
PRIMARY KEY (parent_id, ms_timestamp, message_id)
WITH CLUSTERING ORDER BY (ms_timestamp DESC);
```
### Selects
```sql
SELECT milestone_index, message_id, inclusion_state
FROM test.parents_by_ms
WHERE parent_id = ?;
SELECT milestone_index, message_id, inclusion_state
FROM test.parents_by_ms
WHERE parent_id = ?
AND ms_timestamp >= ?
AND ms_timestamp < ?;
```
### Deletes
```sql
DELETE FROM test.parents
WHERE parent_id = ?;
```
### Inserts
The first insert is for unreferenced messages, with an Estimated Milestone.
```sql
INSERT INTO test.parents (parent_id, message_id, milestone_index, ms_timestamp)
VALUES (?, ?, ?, ?);
INSERT INTO test.parents (parent_id, message_id, milestone_index, ms_timestamp, inclusion_state)
VALUES (?, ?, ?, ?, ?);
```
## Sync
```sql=
CREATE TABLE IF NOT EXISTS test.sync (
ms_range_id int,
milestone_index int,
synced_by tinyint,
logged_by tinyint,
PRIMARY KEY (ms_range_id, milestone_index)
) WITH CLUSTERING ORDER BY (milestone_index DESC);
```
### Selects
```sql
SELECT milestone_index, synced_by, logged_by
FROM test.sync;
SELECT milestone_index, synced_by, logged_by
FROM test.sync
WHERE ms_range_id = ?
AND milestone_index >= ?
AND milestone_index < ?;
```
## Milestone-based Analytics
```sql=
CREATE TABLE IF NOT EXISTS test.ms_analytics (
ms_range_id int,
milestone_index int,
message_count int,
transaction_count int,
transferred_tokens bigint,
PRIMARY KEY (ms_range_id, milestone_index)
) WITH CLUSTERING ORDER BY (milestone_index DESC);
```
### Selects
```sql
SELECT milestone_index, message_count, transaction_count, transferred_tokens
FROM test.analytics
WHERE ms_range_id = ?
AND milestone_index >= ?
AND milestone_index < ?;
```
## Daily Analytics
```sql=
CREATE TABLE IF NOT EXISTS test.daily_analytics (
year int,
date date,
total_addresses int,
send_addresses int,
recv_addresses int,
PRIMARY KEY (year, date)
) WITH CLUSTERING ORDER BY (date DESC);
```
```sql
SELECT date, total_addresses, send_addresses, recv_addresses
FROM test.daily_analytics
WHERE year = ?
AND date >= ?
AND date <= ?;
```
## Address Analytics
```sql=
CREATE TABLE IF NOT EXISTS test.address_analytics (
address text,
milestone_index int,
sent_tokens bigint,
recv_tokens bigint,
PRIMARY KEY (address, milestone_index)
) WITH CLUSTERING ORDER BY (milestone_index DESC);
```
### Selects
```sql
SELECT milestone_index, sent_tokens, recv_tokens
FROM test.address_analytics
WHERE address = ?
AND milestone_index >= ?
AND milestone_index < ?;
```
## Date Cache
This table is used to track the start and end milestones of each date.
```sql=
CREATE TABLE IF NOT EXISTS test.date_cache (
date date,
start_ms int,
end_ms int,
PRIMARY KEY (date)
);
```
### Selects
```sql
SELECT start_ms, end_ms
FROM test.date_cache
WHERE date = ?;
```
## Metric Cache
This table is used to store daily metrics which can be counted for the `daily_analytics` table.
```sql=
CREATE TABLE IF NOT EXISTS test.metrics_cache (
date date,
variant text,
metric text,
value text,
metric_value blob,
PRIMARY KEY (date, variant, metric, value)
);
```
### Selects
```sql
SELECT count(1)
FROM test.metrics_cache
WHERE date = ?
AND variant = 'address'
AND metric = 'sent';
SELECT count(1)
FROM test.metrics_cache
WHERE date = ?
AND variant = 'address'
AND metric = 'received';
SELECT count(1)
FROM test.metrics_cache
WHERE date = ?
AND variant = 'address'
AND metric = 'sent_or_received';
SELECT count(1)
FROM test.metrics_cache
WHERE date = ?
AND variant = 'index'
AND metric = 'used';
```
### Inserts
```sql
INSERT INTO test.metrics_cache (date, variant, value, metric)
VALUES (?, 'address', ?, 'sent');
INSERT INTO test.metrics_cache (date, variant, value, metric)
VALUES (?, 'address', ?, 'received');
INSERT INTO test.metrics_cache (date, variant, value, metric)
VALUES (?, 'address', ?, 'sent_or_received');
INSERT INTO test.metrics_cache (date, variant, value, metric)
VALUES (?, 'index', ?, 'used');
```
### Deletes
```sql
DELETE FROM test.metrics_cache
WHERE date = ?;
```
## Indexation Plugin
Allowing indexation into these tables is necessary to support some features of the protocol. The following routes are specified by hornet and can be used to index unspent outputs. Chronicle may support some of these endpoints for all outputs, and some for only unspent outputs.
### Outputs
- `/outputs`
##### Optional Parameters
- `address`
- `requiresDustReturn`
- `sender`
- `tag`
- `/outputs/<output_id>`
### Alias Outputs
- `/aliases`
##### Optional Parameters
- `stateController`
- `governer`
- `issuer`
- `sender`
- `/aliases/<alias_id>`
### NFT Outputs
- `/nft`
##### Optional Parameters
- `address`
- `requiresDustReturn`
- `issuer`
- `sender`
- `tag`
- `/nft/<nft_id>`
### Fountry Outputs
- `/foundries`
##### Optional Parameters
- `address`
- `/foundries/<foundry_id>`
## Expanding Current API
The current API could potentially be expanded to included less restrictive queries over data that is specifically unspent because of the relatively small size of this data. This will require that the schema is specifically designed to support indexing with additional query parameters. Likely the best way to support this is to store this data separately from the rest where predefined indexes will not affect the majority of data. This implies that API queries must be **either** over **all outputs** or **unspent outputs**, and not both.