Stardust Database Schema Proposal

# Stardust Database Schema Proposal ## Milestone Range ID The `ms_range_id` used in some of these tables is effectively a `partition_id` based on the `milestone_index`. It is simply the equal to `milestone_index / chunk_size` where `chunk_size` can be any predefined value for the table. This `chunk_size` may be configurable, but any changes to it require a migration of all database data that uses it, so it may be best to settle on appropriate values and disallow them to be modified. ## Messages ```sql= CREATE TABLE IF NOT EXISTS test.messages ( message_id text, version tinyint, message blob, est_milestone_index int, milestone_index int, inclusion_state tinyint, conflict_reason tinyint, proof blob, PRIMARY KEY (message_id, version, message) ); CREATE INDEX messages_by_milestone ON test.messages (milestone_index); ``` ### Selects ```sql SELECT message FROM test.messages WHERE message_id = ?; SELECT message_id, variant, parents, is_solid, milestone_index, inclusion_state, should_promote, should_reattach FROM test.messages WHERE message_id = ?; SELECT * FROM test.messages WHERE message_id = ?; ``` ## Transactions ```sql= CREATE TABLE IF NOT EXISTS test.transactions ( transaction_id text, idx smallint, variant text, message_id text, data blob, est_milestone_index int, milestone_index int, inclusion_state tinyint, PRIMARY KEY (transaction_id, idx, variant, message_id, data) ); ``` ### Selects ```sql SELECT message_id, data, inclusion_state FROM test.transactions WHERE transaction_id = ? AND idx = ? AND variant IN ('output', 'unlock'); SELECT message_id, data, idx, inclusion_state, milestone_index FROM test.transactions WHERE transaction_id = ?; ``` ### Materialized view This view is used specifically to improve the performance of the `included-message` API. ```sql= CREATE MATERIALIZED VIEW test.transactions_by_state AS SELECT transaction_id, idx, variant, message_id, inclusion_state FROM test.transactions WHERE transaction_id IS NOT NULL AND idx IS NOT NULL AND variant IS NOT NULL AND message_id IS NOT NULL AND inclusion_state IS NOT NULL AND data IS NOT NULL PRIMARY KEY (transaction_id, inclusion_state, idx, variant, message_id); ``` ```sql SELECT message_id FROM test.transactions_by_state WHERE transaction_id = ? AND inclusion_state = ? LIMIT 1; ``` ## Milestones ```sql= CREATE TABLE IF NOT EXISTS test.milestones ( milestone_index int, message_id text, timestamp bigint, payload blob, PRIMARY KEY (milestone_index, message_id) ); ``` ### Selects ```sql SELECT message_id, timestamp FROM test.milestones WHERE milestone_index = ?; ``` ## Partition Hints These tables enable selecting only from partitions with data. A row should be inserted into to these tables whenever indexed data is inserted. ### Tag Hints ```sql= CREATE TABLE IF NOT EXISTS test.tag_hints ( tag text, table_kind text, ms_range_id int, PRIMARY KEY (tag, ms_range_id) ) WITH CLUSTERING ORDER BY (ms_range_id DESC); ``` ```sql SELECT ms_range_id FROM test.tag_hints WHERE tag = 'spam' SELECT ms_range_id FROM test.tag_hints WHERE tag = 'my_output_special_tag' AND table_kind = 'ext_output' INSERT INTO test.tag_hints (tag, ms_range_id) VALUES ('spam', ?); INSERT INTO test.tag_hints (tag, ms_range_id) VALUES ('not_a_spam', ?); ``` ### Addresses Hints ```sql= CREATE TABLE IF NOT EXISTS test.addresses_hints ( address text, output_kind text, variant text, ms_range_id int, PRIMARY KEY (address, output_kind, variant, ms_range_id) ) WITH CLUSTERING ORDER BY (ms_range_id DESC); ``` ```sql SELECT ms_range_id FROM test.addresses_hints WHERE address = 'some_address' AND output_kind in ('extended', 'legacy', 'nft', 'foundry', 'alias') AND variant in ('address', 'sender', 'issuer', ..); INSERT INTO test.addresses_hints (address, output_kind, variant ms_range_id) VALUES ('address_a', ?, ?); INSERT INTO test.addresses_hints (address, output_kind, variant, ms_range_id) VALUES ('address_b', ?, ?); ``` ## Outputs Outputs are represented by optional indexation tables. They can also be looked up in the `transactions` table by `output_id`, but the below tables and views enable an extended API. All of the following tables are structured similarly, containing indexation columns which define MVs to query with. Thus, all queries are approximately identical. Below are some examples. #### Selects ```sql SELECT * FROM test.basic_outputs_by_address WHERE address = ? AND ms_range_id = ?; SELECT * FROM test.basic_outputs_by_address WHERE address = ? AND ms_range_id = ? AND ms_timestamp >= ? AND ms_timestamp < ?; ``` #### Inserts ```sql INSERT INTO test.basic_outputs ( output_id, ms_range_id, ms_timestamp, milestone_index) VALUES (?, 2147483647, ?, ?) USING TTL 86400; INSERT INTO test.basic_outputs ( output_id, ms_range_id, ms_timestamp, milestone_index, amount, inclusion_state, address, tag, sender, sender_type) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?); ``` ### Basic Outputs ```sql= CREATE TABLE IF NOT EXISTS test.basic_outputs ( output_id text, ms_range_id int, milestone_index int, ms_timestamp timestamp, inclusion_state tinyint, address text, sender text, tag text, data blob, PRIMARY KEY (output_id, ms_range_id, milestone_index, ms_timestamp) ) WITH CLUSTERING ORDER BY (ms_range_id DESC, milestone_index DESC, ms_timestamp DESC); CREATE MATERIALIZED VIEW test.basic_outputs_by_address AS SELECT * from test.basic_outputs WHERE output_id IS NOT NULL AND address IS NOT NULL AND ms_range_id IS NOT NULL AND ms_timestamp IS NOT NULL AND milestone_index IS NOT NULL PRIMARY KEY ((address, ms_range_id), ms_timestamp, milestone_index, output_id) WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC); CREATE MATERIALIZED VIEW test.basic_outputs_by_tag AS SELECT * from test.basic_outputs WHERE output_id IS NOT NULL AND tag IS NOT NULL AND ms_range_id IS NOT NULL AND ms_timestamp IS NOT NULL AND milestone_index IS NOT NULL PRIMARY KEY ((tag, ms_range_id), ms_timestamp, milestone_index, output_id) WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC); CREATE MATERIALIZED VIEW test.basic_outputs_by_sender AS SELECT * from test.basic_outputs WHERE output_id IS NOT NULL AND sender IS NOT NULL AND ms_range_id IS NOT NULL AND ms_timestamp IS NOT NULL AND milestone_index IS NOT NULL PRIMARY KEY ((sender, ms_range_id), ms_timestamp, milestone_index, output_id) WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC); ``` ### Alias Outputs ```sql= CREATE TABLE IF NOT EXISTS test.alias_outputs ( alias_id text, ms_range_id int, milestone_index int, ms_timestamp timestamp, inclusion_state tinyint, sender text, issuer text, state_controller text, governor text, data blob, PRIMARY KEY (alias_id, ms_range_id, milestone_index, ms_timestamp) ) WITH CLUSTERING ORDER BY (ms_range_id DESC, milestone_index DESC, ms_timestamp DESC); CREATE MATERIALIZED VIEW test.alias_outputs_by_sender AS SELECT * from test.alias_outputs WHERE alias_id IS NOT NULL AND sender IS NOT NULL AND ms_range_id IS NOT NULL AND ms_timestamp IS NOT NULL AND milestone_index IS NOT NULL PRIMARY KEY ((sender, ms_range_id), ms_timestamp, milestone_index, alias_id) WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC); CREATE MATERIALIZED VIEW test.alias_outputs_by_issuer AS SELECT * from test.alias_outputs WHERE alias_id IS NOT NULL AND issuer IS NOT NULL AND ms_range_id IS NOT NULL AND ms_timestamp IS NOT NULL AND milestone_index IS NOT NULL PRIMARY KEY ((issuer, ms_range_id), ms_timestamp, milestone_index, alias_id) WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC); CREATE MATERIALIZED VIEW test.alias_outputs_by_state_controller AS SELECT * from test.alias_outputs WHERE alias_id IS NOT NULL AND state_controller IS NOT NULL AND ms_range_id IS NOT NULL AND ms_timestamp IS NOT NULL AND milestone_index IS NOT NULL PRIMARY KEY ((state_controller, ms_range_id), ms_timestamp, milestone_index, alias_id) WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC); CREATE MATERIALIZED VIEW test.alias_outputs_by_governor AS SELECT * from test.alias_outputs WHERE alias_id IS NOT NULL AND governor IS NOT NULL AND ms_range_id IS NOT NULL AND ms_timestamp IS NOT NULL AND milestone_index IS NOT NULL PRIMARY KEY ((governor, ms_range_id), ms_timestamp, milestone_index, alias_id) WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC); ``` ### Foundry Outputs ```sql= CREATE TABLE IF NOT EXISTS test.foundry_outputs ( foundry_id text, ms_range_id int, milestone_index int, ms_timestamp timestamp, inclusion_state tinyint, address text, data blob, PRIMARY KEY (foundry_id, ms_range_id, milestone_index, ms_timestamp) ) WITH CLUSTERING ORDER BY (ms_range_id DESC, milestone_index DESC, ms_timestamp DESC); CREATE MATERIALIZED VIEW test.foundry_outputs_by_address AS SELECT * from test.foundry_outputs WHERE foundry_id IS NOT NULL AND address IS NOT NULL AND ms_range_id IS NOT NULL AND ms_timestamp IS NOT NULL AND milestone_index IS NOT NULL PRIMARY KEY ((address, ms_range_id), ms_timestamp, milestone_index, foundry_id) WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC); ``` ### NFT Outputs ```sql= CREATE TABLE IF NOT EXISTS test.nft_outputs ( nft_id text, ms_range_id int, milestone_index int, ms_timestamp timestamp, inclusion_state tinyint, address text, dust_return_address text, sender text, issuer text, tag text, data blob, PRIMARY KEY (nft_id, ms_range_id, milestone_index, ms_timestamp) ) WITH CLUSTERING ORDER BY (ms_range_id DESC, milestone_index DESC, ms_timestamp DESC); CREATE MATERIALIZED VIEW test.nft_outputs_by_address AS SELECT * from test.nft_outputs WHERE nft_id IS NOT NULL AND address IS NOT NULL AND ms_range_id IS NOT NULL AND ms_timestamp IS NOT NULL AND milestone_index IS NOT NULL PRIMARY KEY ((address, ms_range_id), ms_timestamp, milestone_index, nft_id) WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC); CREATE MATERIALIZED VIEW test.nft_outputs_by_dust_return_address AS SELECT * from test.nft_outputs WHERE nft_id IS NOT NULL AND dust_return_address IS NOT NULL AND ms_range_id IS NOT NULL AND ms_timestamp IS NOT NULL AND milestone_index IS NOT NULL PRIMARY KEY ((dust_return_address, ms_range_id), ms_timestamp, milestone_index, nft_id) WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC); CREATE MATERIALIZED VIEW test.nft_outputs_by_sender AS SELECT * from test.nft_outputs WHERE nft_id IS NOT NULL AND sender IS NOT NULL AND ms_range_id IS NOT NULL AND ms_timestamp IS NOT NULL AND milestone_index IS NOT NULL PRIMARY KEY ((sender, ms_range_id), ms_timestamp, milestone_index, nft_id) WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC); CREATE MATERIALIZED VIEW test.nft_outputs_by_issuer AS SELECT * from test.nft_outputs WHERE nft_id IS NOT NULL AND issuer IS NOT NULL AND ms_range_id IS NOT NULL AND ms_timestamp IS NOT NULL AND milestone_index IS NOT NULL PRIMARY KEY ((issuer, ms_range_id), ms_timestamp, milestone_index, nft_id) WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC); CREATE MATERIALIZED VIEW test.nft_outputs_by_tag AS SELECT * from test.nft_outputs WHERE nft_id IS NOT NULL AND tag IS NOT NULL AND ms_range_id IS NOT NULL AND ms_timestamp IS NOT NULL AND milestone_index IS NOT NULL PRIMARY KEY ((tag, ms_range_id), ms_timestamp, milestone_index, nft_id) WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC); ``` ### Legacy Outputs This table will hold outputs from before Stardust. ```sql= CREATE TABLE IF NOT EXISTS test.legacy_outputs ( output_id text, output_type tinyint, is_used boolean, ms_range_id int, milestone_index int, ms_timestamp timestamp, inclusion_state tinyint, message_id text, amount bigint, address text, data blob, PRIMARY KEY (output_id, output_type, is_used, ms_range_id, milestone_index, ms_timestamp) ) WITH CLUSTERING ORDER BY (output_type ASC, is_used ASC, ms_range_id DESC, milestone_index DESC, ms_timestamp DESC); CREATE MATERIALIZED VIEW test.legacy_outputs_by_address AS SELECT * from test.legacy_outputs WHERE output_id IS NOT NULL AND output_type IS NOT NULL AND is_used IS NOT NULL AND address IS NOT NULL AND ms_range_id IS NOT NULL AND ms_timestamp IS NOT NULL AND milestone_index IS NOT NULL PRIMARY KEY ((address, ms_range_id), output_type, is_used, ms_timestamp, milestone_index, output_id) WITH CLUSTERING ORDER BY (output_type ASC, is_used ASC, ms_timestamp DESC, milestone_index DESC); ``` ## Tags ```sql= CREATE TABLE IF NOT EXISTS test.tags ( tag text, ms_range_id int, milestone_index int, ms_timestamp timestamp, message_id text, inclusion_state blob, PRIMARY KEY ((tag, ms_range_id), ms_timestamp, milestone_index) ) WITH CLUSTERING ORDER BY (ms_timestamp DESC, milestone_index DESC); ``` ### Selects ```sql SELECT milestone_index, message_id, inclusion_state FROM test.tags WHERE tag = ? AND ms_range_id = ?; SELECT milestone_index, message_id, inclusion_state FROM test.tags WHERE tag = ? AND ms_range_id = ? AND ms_timestamp >= ? AND ms_timestamp < ?; ``` ## Parents ```sql= CREATE TABLE IF NOT EXISTS test.parents ( parent_id text, message_id text, est_milestone_index int, milestone_index int, ms_timestamp timestamp, inclusion_state tinyint, PRIMARY KEY (parent_id, message_id) ); CREATE MATERIALIZED VIEW test.parents_by_ms AS SELECT * from test.parents WHERE parent_id IS NOT NULL AND message_id IS NOT NULL AND milestone_index IS NOT NULL AND ms_timestamp IS NOT NULL PRIMARY KEY (parent_id, ms_timestamp, message_id) WITH CLUSTERING ORDER BY (ms_timestamp DESC); ``` ### Selects ```sql SELECT milestone_index, message_id, inclusion_state FROM test.parents_by_ms WHERE parent_id = ?; SELECT milestone_index, message_id, inclusion_state FROM test.parents_by_ms WHERE parent_id = ? AND ms_timestamp >= ? AND ms_timestamp < ?; ``` ### Deletes ```sql DELETE FROM test.parents WHERE parent_id = ?; ``` ### Inserts The first insert is for unreferenced messages, with an Estimated Milestone. ```sql INSERT INTO test.parents (parent_id, message_id, milestone_index, ms_timestamp) VALUES (?, ?, ?, ?); INSERT INTO test.parents (parent_id, message_id, milestone_index, ms_timestamp, inclusion_state) VALUES (?, ?, ?, ?, ?); ``` ## Sync ```sql= CREATE TABLE IF NOT EXISTS test.sync ( ms_range_id int, milestone_index int, synced_by tinyint, logged_by tinyint, PRIMARY KEY (ms_range_id, milestone_index) ) WITH CLUSTERING ORDER BY (milestone_index DESC); ``` ### Selects ```sql SELECT milestone_index, synced_by, logged_by FROM test.sync; SELECT milestone_index, synced_by, logged_by FROM test.sync WHERE ms_range_id = ? AND milestone_index >= ? AND milestone_index < ?; ``` ## Milestone-based Analytics ```sql= CREATE TABLE IF NOT EXISTS test.ms_analytics ( ms_range_id int, milestone_index int, message_count int, transaction_count int, transferred_tokens bigint, PRIMARY KEY (ms_range_id, milestone_index) ) WITH CLUSTERING ORDER BY (milestone_index DESC); ``` ### Selects ```sql SELECT milestone_index, message_count, transaction_count, transferred_tokens FROM test.analytics WHERE ms_range_id = ? AND milestone_index >= ? AND milestone_index < ?; ``` ## Daily Analytics ```sql= CREATE TABLE IF NOT EXISTS test.daily_analytics ( year int, date date, total_addresses int, send_addresses int, recv_addresses int, PRIMARY KEY (year, date) ) WITH CLUSTERING ORDER BY (date DESC); ``` ```sql SELECT date, total_addresses, send_addresses, recv_addresses FROM test.daily_analytics WHERE year = ? AND date >= ? AND date <= ?; ``` ## Address Analytics ```sql= CREATE TABLE IF NOT EXISTS test.address_analytics ( address text, milestone_index int, sent_tokens bigint, recv_tokens bigint, PRIMARY KEY (address, milestone_index) ) WITH CLUSTERING ORDER BY (milestone_index DESC); ``` ### Selects ```sql SELECT milestone_index, sent_tokens, recv_tokens FROM test.address_analytics WHERE address = ? AND milestone_index >= ? AND milestone_index < ?; ``` ## Date Cache This table is used to track the start and end milestones of each date. ```sql= CREATE TABLE IF NOT EXISTS test.date_cache ( date date, start_ms int, end_ms int, PRIMARY KEY (date) ); ``` ### Selects ```sql SELECT start_ms, end_ms FROM test.date_cache WHERE date = ?; ``` ## Metric Cache This table is used to store daily metrics which can be counted for the `daily_analytics` table. ```sql= CREATE TABLE IF NOT EXISTS test.metrics_cache ( date date, variant text, metric text, value text, metric_value blob, PRIMARY KEY (date, variant, metric, value) ); ``` ### Selects ```sql SELECT count(1) FROM test.metrics_cache WHERE date = ? AND variant = 'address' AND metric = 'sent'; SELECT count(1) FROM test.metrics_cache WHERE date = ? AND variant = 'address' AND metric = 'received'; SELECT count(1) FROM test.metrics_cache WHERE date = ? AND variant = 'address' AND metric = 'sent_or_received'; SELECT count(1) FROM test.metrics_cache WHERE date = ? AND variant = 'index' AND metric = 'used'; ``` ### Inserts ```sql INSERT INTO test.metrics_cache (date, variant, value, metric) VALUES (?, 'address', ?, 'sent'); INSERT INTO test.metrics_cache (date, variant, value, metric) VALUES (?, 'address', ?, 'received'); INSERT INTO test.metrics_cache (date, variant, value, metric) VALUES (?, 'address', ?, 'sent_or_received'); INSERT INTO test.metrics_cache (date, variant, value, metric) VALUES (?, 'index', ?, 'used'); ``` ### Deletes ```sql DELETE FROM test.metrics_cache WHERE date = ?; ``` ## Indexation Plugin Allowing indexation into these tables is necessary to support some features of the protocol. The following routes are specified by hornet and can be used to index unspent outputs. Chronicle may support some of these endpoints for all outputs, and some for only unspent outputs. ### Outputs - `/outputs` ##### Optional Parameters - `address` - `requiresDustReturn` - `sender` - `tag` - `/outputs/<output_id>` ### Alias Outputs - `/aliases` ##### Optional Parameters - `stateController` - `governer` - `issuer` - `sender` - `/aliases/<alias_id>` ### NFT Outputs - `/nft` ##### Optional Parameters - `address` - `requiresDustReturn` - `issuer` - `sender` - `tag` - `/nft/<nft_id>` ### Fountry Outputs - `/foundries` ##### Optional Parameters - `address` - `/foundries/<foundry_id>` ## Expanding Current API The current API could potentially be expanded to included less restrictive queries over data that is specifically unspent because of the relatively small size of this data. This will require that the schema is specifically designed to support indexing with additional query parameters. Likely the best way to support this is to store this data separately from the rest where predefined indexes will not affect the majority of data. This implies that API queries must be **either** over **all outputs** or **unspent outputs**, and not both.