Performance Improvements: go-bond

# Performance Improvements: go-bond ## 1. Bloom filters The **pebble** database does not have *Exist* function that makes every instance when we want to check for existence of a key very expensive as it needs to go to the file system and get the entry in question. The idea is to add configurable bloom filters to **go-bond**, so we eliminate this short comming of **pebble**. ### 1.1 Use cases #### 1.1.1 Exist function At the moment in order to check if element exists we need to execute *Get* on **pebble**. We can replace this call with bloom filter and execute *Get* only when bloom filter indicates that key exists within database. ##### Exist call steps - check bloom filter - if key exists - get key from db - return if exist - if key does not exist - return that key does not exist #### 1.1.2 Get function The *Get* function immidietly reaches to database. It does not matter if key exists or not. In order to increase it's efficiency we can first check bloom filter in order to see if element has been added to the database. If it doesn't we can immidietly return error. ##### Get call steps - check bloom filter - if key exist - get from db - if exists - return data - if does not exist - return error - if does not exist - return error #### 1.1.3 Set function The *Set* function is not implemented at the moment, however we can implement it in order to use it when we do not care if element already exist we just want to write it to the database. The bloom filter can be useful here in order to reduce number of times that we need to go to database to check if there is some old data and indexes need to be deleted. ##### Set call steps - check bloom filter - if key exist - get from db - if exists - compute index keys to delete from old version - insert into db - if does not exist - insert into db - if key does not exist - insert into db #### 1.1.4 Insert function The *Insert* function require us to check if element exists before we proceed with insert data into database. In this case bloom filters should reduce database access in order to confirm that record does not exist by ~90%. Only for false-positives we will make query to confirm or in case someone makes coding error and tries to add the same record twice. ##### Insert call steps - check bloom filter - if key exist - get from db - if exist - return error - if does not exist - insert into db - if key does not exist (90% of rows should end up here) - insert into db #### 1.1.5 Upsert function The *Upsert* function requires us to check if element already exist in the database in order to perform *OnConflict* operation and build keys that should be removed from the indexes. The bloom filter can be useful to reduce number of the times that we need to read database in order to get old versions of the records that were already in the database. ##### Upsert call steps - check bloom filter - if key exist - get from db - if exist - run *OnConflict* function - insert into db - if does not exist - insert into db - if key does not exist - insert into db ### 1.2 Proposed Library github.com/bits-and-blooms/bloom ### 1.2.1 Performance The tests has been made for bloom filter which max capacity is **500 mln** of keys and false-positive ratio is **10%**. - In memory size: **300 MB** - Write bloom filter into buffer with zstd compression - Number of keys in the bloom filter: **1 mln** - Size of the buffer: **10 MB** - Time to write and compress: **781.420382 ms** - Test if key exists with existing keys - Number of keys tested: **1 mln** - Time it took to test: **410.006182 ms** - Time it took to test 1 key: **410 ns** - Test if key exists with random keys - Number of keys tested: **1 mln** - Time it took to test: **459.939952 ms** - Time it took to test 1 key: **459 ns** ## 1.3 Implementation Plan - The bloom filters will be added to bond.Options. - In case bloom filter has not been defined bloom filters will not be used. - The bloom filters will only work on *data keys* as this are the only keys that are tested for existance of the *key* within the set. - The bloom filters will be updated in the memory and will be presisted at some yet to be defined intervals - The operations that produced updates to the bloom filter will be added to *bloom filter change list* thats saved on the same batch to the db. - The *bloom filter change list* will be cleared when bloom filter is presisted to db - On start-up of the database the bloom filter will be loaded from db and *bloom filter change list* will be applied and cleared. ### 1.3.1 Open questions - Should we offer separate bloom filters at DB level and Table level? - At what point we should regenerate bloom filter? After how many deletes? - Should bloom filter take into account index keys too? How would that impact it's size? - How often bloom filter should be updated in the database? ## 2. Indexer abstraction At the moment **go-bond** has only one Indexer implementation. The database Indexer abstraction could allow us and tird parties to implement different indexing methods build for the purpose. The indexing could be done async or sync depending on the user need.