# Slashing protection database
Revision of https://github.com/sigp/lighthouse/blob/stable/validator_client/slashing_protection/src/slashing_database.rs.
We check performance of queries here.
```sql
.timer on
.qpn on
```
I focus only on `signed_attestations` because other tables are not bloated so much.
```
> SELECT SUM("pgsize") FROM "dbstat" where name='signed_attestations';
2371887104
```
That is, `signed_attestations` aggregated ~2.3GB on one of our validators.
## Table signed_attestations
```
CREATE TABLE signed_attestations (
validator_id INTEGER,
source_epoch INTEGER NOT NULL,
target_epoch INTEGER NOT NULL,
signing_root BLOB NOT NULL,
FOREIGN KEY(validator_id) REFERENCES validators(id)
UNIQUE (validator_id, target_epoch)
);
```
The unique constraint on `(validator_id, target_epoch)` implicitly creates a unique btree index on the same columns.
### same_target_att
==index==
Checks for a double vote. Namely, an existing attestation with the same target epoch, and a different signing root.
```sql
SELECT source_epoch, target_epoch, signing_root
FROM signed_attestations
WHERE validator_id = ?1 AND target_epoch = ?2
```
This is a quick query because it uses index:
```sql
sqlite> SELECT validator_id, source_epoch, target_epoch FROM signed_attestations where validator_id = 19 and target_epoch = 20474;
QUERY PLAN
`--SEARCH TABLE signed_attestations USING INDEX sqlite_autoindex_signed_attestations_1 (validator_id=? AND target_epoch=?)
19|20473|20474
Run Time: real 0.007 user 0.000166 sys 0.000418
```
### surrounding_attestation
==index== ==good selectivity==
Checks that no previous vote is surrounding `attestation`.
If there is a surrounding attestation, we only return the most recent one.
```sql
SELECT source_epoch, target_epoch, signing_root
FROM signed_attestations
WHERE validator_id = ?1 AND source_epoch < ?2 AND target_epoch > ?3
ORDER BY target_epoch DESC
LIMIT 1
```
Uses the index and the condition `target_epoch > ?3` provides good selectivity.
```sql
sqlite> SELECT source_epoch, target_epoch FROM signed_attestations where validator_id = 19 and source_epoch < 20473 and target_epoch > 20474 ORDER BY target_epoch DESC LIMIT 1;
QUERY PLAN
`--SEARCH TABLE signed_attestations USING INDEX sqlite_autoindex_signed_attestations_1 (validator_id=? AND target_epoch>?)
Run Time: real 0.009 user 0.000448 sys 0.001521
```
### surrounded_attestation
==index== ==bad selectivity==
Checks that no previous vote is surrounded by `attestation`.
If there is a surrounded attestation, we only return the most recent one.
```sql
SELECT source_epoch, target_epoch, signing_root
FROM signed_attestations
WHERE validator_id = ?1 AND source_epoch > ?2 AND target_epoch < ?3
ORDER BY target_epoch DESC
LIMIT 1
```
Uses the same index as in `surrounding_attestation`, but it has bad selectivity for this case. The condition `target_epoch < ?3` does not filter records out, and sqlite has to filter all the records with `source_epoch > ?2` without any optimizations, i.e full scan.
```sql
sqlite> SELECT source_epoch, target_epoch FROM signed_attestations WHERE validator_id = 19 AND source_epoch > 20473 AND target_epoch < 20474 ORDER BY target_epoch DESC LIMIT 1;
QUERY PLAN
`--SEARCH TABLE signed_attestations USING INDEX sqlite_autoindex_signed_attestations_1 (validator_id=? AND target_epoch<?)
Run Time: real 1.426 user 0.061270 sys 0.260175
```
You can see that `target_epoch < 20474` returns, basically, all unfiltered records.
```sql
sqlite> SELECT count(*) FROM signed_attestations WHERE validator_id = 19 AND target_epoch < 20474;
20474
sqlite> SELECT count(*) FROM signed_attestations WHERE validator_id = 19;
20476
```
That means sqlite has load all 20474 records, put them into a temporary table (in-memory or file), and filter by `source_target` again without any indexes.
Full plan:
```
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 19 0 0 Start at 19
1 Noop 1 4 0 0
2 Integer 1 1 0 0 r[1]=1; LIMIT counter
3 OpenRead 0 6 0 3 0 root=6 iDb=0; signed_attestations
4 OpenRead 2 7 0 k(3,,,) 0 root=7 iDb=0; sqlite_autoindex_signed_attestations_1
5 Explain 5 0 0 SEARCH TABLE signed_attestations USING INDEX sqlite_autoindex_signed_attestations_1 (validator_id=? AND target_epoch<?) 0
6 Integer 19 2 0 0 r[2]=19
7 Integer 20474 3 0 0 r[3]=20474
8 SeekLT 2 18 2 2 0 key=r[2..3]
9 IdxLT 2 18 2 1 0 key=r[2]
10 DeferredSeek 2 0 0 0 Move 0 to 2.rowid if needed
11 Column 0 1 4 0 r[4]=signed_attestations.source_epoch
12 Le 5 17 4 BINARY-8 84 if r[4]<=r[5] goto 17
13 Column 0 1 6 0 r[6]=signed_attestations.source_epoch
14 Column 2 1 7 0 r[7]=signed_attestations.target_epoch
15 ResultRow 6 2 0 0 output=r[6..7]
16 DecrJumpZero 1 18 0 0 if (--r[1])==0 goto 18
17 Prev 2 9 0 0
18 Halt 0 0 0 0
19 Transaction 0 0 3 0 1 usesStmtJournal=0
20 Integer 20473 5 0 0 r[5]=20473
21 Goto 0 1 0 0
```
### min_source
==no index== ==full scan==
Checks lower bounds: ensure that source is greater than or equal to min source, and target is greater than min target. This allows pruning, and compatibility with the interchange format.
```sql
SELECT MIN(source_epoch) FROM signed_attestations WHERE validator_id = ?1
```
The same story as with `surrounded_attestation`. I don't treat `validator_id=?` as indexed to emphasize how slow it is.
```sql
sqlite> SELECT MIN(source_epoch) FROM signed_attestations WHERE validator_id = 19;
QUERY PLAN
`--SEARCH TABLE signed_attestations USING INDEX sqlite_autoindex_signed_attestations_1 (validator_id=?)
0
Run Time: real 1.352 user 0.062109 sys 0.253005
```
Basically, it goes over all 20476 records and finds the minimum.
Full plan:
```
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 17 0 0 Start at 17
1 Null 0 1 2 0 r[1..2]=NULL
2 OpenRead 0 6 0 2 0 root=6 iDb=0; signed_attestations
3 OpenRead 1 7 0 k(3,,,) 0 root=7 iDb=0; sqlite_autoindex_signed_attestations_1
4 Explain 4 0 0 SEARCH TABLE signed_attestations USING INDEX sqlite_autoindex_signed_attestations_1 (validator_id=?) 0
5 Integer 19 3 0 0 r[3]=19
6 SeekGE 1 13 3 1 0 key=r[3]
7 IdxGT 1 13 3 1 0 key=r[3]
8 DeferredSeek 1 0 0 0 Move 0 to 1.rowid if needed
9 Column 0 1 4 0 r[4]=signed_attestations.source_epoch
10 CollSeq 0 0 0 BINARY-8 0
11 AggStep 0 4 1 min(1) 1 accum=r[1] step(r[4])
12 Next 1 7 0 0
13 AggFinal 1 1 0 min(1) 0 accum=r[1] N=1
14 Copy 1 5 0 0 r[5]=r[1]
15 ResultRow 5 1 0 0 output=r[5]
16 Halt 0 0 0 0
17 Transaction 0 0 3 0 1 usesStmtJournal=0
18 Goto 0 1 0 0
```
### min_target
==index== ==good selectivity==
Checks lower bounds: ensure that source is greater than or equal to min source, and target is greater than min target. This allows pruning, and compatibility with the interchange format.
```sql
SELECT MIN(target_epoch) FROM signed_attestations WHERE validator_id = ?1
```
Btree as an implementation of n-ary tree. Finding the minimum and the maximum in it is essentially taking the leftmost and the rightmost leaves. (https://www.sqlite.org/optoverview.html#the_min_max_optimization)
That makes this query super fast which is effectively loading one record from the table.
```sql
sqlite> SELECT MIN(target_epoch) FROM signed_attestations WHERE validator_id = 19;
QUERY PLAN
`--SEARCH TABLE signed_attestations USING COVERING INDEX sqlite_autoindex_signed_attestations_1 (validator_id=?)
0
Run Time: real 0.006 user 0.000148 sys 0.000224
```
## Recommendations
### Cleaning
Remove old records or recreate the slashing protection database. Lighthouse stores everything since the genesis.
```sql
DELETE FROM signed_attestations
WHERE source_epoch < "epoch-one-week-ago" AND
target_epoch < "epoch-one-week-ago";
```
### Indexing
Add index over `(validator_id, source_epoch, target_epoch)`. This should help selectivity.
```sql
CREATE INDEX signed_attestations_idx_1
ON signed_attestations (validator_id, source_epoch, target_epoch);
```
Took one minute for me.
**But,** SQLite has no sophisticated query planner, so it won't pick up this new and better index for slow queries.
The queries should be modified with `INDEXED BY`, e.g.:
```sql
-- With INDEXED BY
sqlite> SELECT MIN(source_epoch) FROM signed_attestations indexed by signed_attestations_idx_1 WHERE validator_id = 54;
QUERY PLAN
`--SEARCH TABLE signed_attestations USING COVERING INDEX signed_attestations_idx_1
133
Run Time: real 0.008 user 0.006468 sys 0.000842
-- Without INDEXED BY
sqlite> SELECT MIN(source_epoch) FROM signed_attestations WHERE validator_id = 54;
QUERY PLAN
`--SEARCH TABLE signed_attestations USING INDEX sqlite_autoindex_signed_attestations_1 (validator_id=?)
133
Run Time: real 0.858 user 0.038519 sys 0.163293
```
**But,** if we fork Lighthouse I would rework the whole database because SQLite is a poor man's relational database (with long history of improvements). Even moving it to PostgreSQL would be much better.
### Locking
Lighthouse uses exclusive locking for transactions, i.e. write and read queries block each other. That is, while one single attestation is being checked, which may take up to two seconds, others have to wait. Instead, Lighthouse should permit concurrent read access.
### Caching
https://www.sqlite.org/pragma.html#pragma_cache_size
Thankfully, there's a cache in SQLite. Unfortunately, it is small by default: 2,048,000 bytes. Our table in question takes 2,371,887,104 bytes. (`SELECT SUM("pgsize") FROM "dbstat" WHERE name='signed_attestations';`)
To increase cache size, we will have to modify Lighthouse because PRAGMAs are connection-wise and not persistent.