--- tags: operating systems --- # 2020 OS Fall HW3: Key-Value Storage contributed by < `Hao-Jung Chen (haojungc)` > * [GitHub repo](https://github.com/haojungc/key-value-database) * [2020_OS_Fall_HW3: Key-Value Storage (Chinese)](https://hackmd.io/@dsclab/rJDIP8YYv) ## Overview This project is a simple key-value database that reads an input file line by line and writes the query results to an output file. Each operation is separated by a newline character and there are three different operations, which are as follows: ### PUT The **PUT** operation inserts/updates a key-value entry in the database, where the key is a 64-bit positive integer ($0$ ~ $2^{63} - 1$) and the value is a string consists of 128 alphanumerical characters. ``` PUT <key> <value> ``` ### GET The **GET** operation performs a query in the database with a key. ``` GET <key> ``` ### SCAN The **SCAN** operation performs a range query in the database with two keys, starting from key1 and ending at key2, inclusively. ``` SCAN <key1> <key2> ``` ## Environment * OS: Ubuntu 20.04.1 LTS * Arch: X86-64 * CPU: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz * Memory: 8 GB * Programming Language: C ## Database Architecture ### Data Types * The following data types are defined in `definition.h`. * `metadata_t`: stores the information of a data file ```c typedef struct { size_t file_number; uint64_t start_key; uint64_t end_key; size_t total_keys; } metadata_t; ``` * `data_t`: stores the key-value ```c typedef struct data { uint64_t key; char *value; } data_t; ``` ### Metatable * Metatable is an array of type `struct metadata` (or `metadata_t`, see [Data Types](#Data-Types)). It is used to store the information of existing storage files, and it is declared as: ```c #define MAX_METADATA 200 metadata_t metatable[MAX_METADATA]; ``` ### Bloom Filter * [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) is used for checking whether a particular key is in the database. The data type is defined as follows. * in `bloomfilter.h`: ```c= #include <stdint.h> typedef struct bloomfilter { /* Used for loading/saving the bloom filter */ char *state_file; /* Loads the bloom filter from filepath. */ void (*load)(const char *filepath); /* Saves the current bloom filter. */ void (*save)(); /* Frees the memory allocated for the bloom filter. */ void (*free)(); /* Sets k entries in the bloom filter to 1. * k: the number of hash functions */ void (*add)(const uint64_t key); /* Checks if a given key is in the database by looking up the bloom filter. * Returns 0 if the key is in the database, otherwise returns -1. */ int_fast8_t (*lookup)(const uint64_t key); } bloomfilter_t; ``` * In order to reduce false positives of the Bloom filter, we need to know the maximum possible number of records in the test files. * We already know that the size of test files are larger than the size of RAM. * Assume the size of each test file is not larger than $16 GB$ and the size of each record is $(8 + 128) = 136$ Bytes ($8$ for key and $128$ for value), then the number of records is $16 GB/136 B \approx 118 M \approx 10^8$. * According to the formula, the probability of false positive is, $$ \varepsilon \approx (1 - e^{-kn/m})^k $$ * where $k$ is the number of hash functions, $n$ is the number of input elements and $m$ is the filter size. * $n$ is already assumed to be $10^8$, we also set the number of hash functions $k$ to $3$. * If we want to reduce false positive rate to less than $0.005$, $m = 2^{31}$ is a good choice, the false positive of this assumption is: $$ \varepsilon \approx (1 - e^{-3\times10^8/2^{31}})^3 = 0.002216 \approx 0.0022 $$ * Note: Although the optimal number of hash functions is $k = (m/n) \times ln2 = (2^{31}/10^8) \times ln2 = 14.86$, the number is too large compared with $3$ we assumed before. * As a result, * $n = 10^8$ * $m = 2^{31}$ * $k = 3$ ### B+ tree * [B+ tree](https://en.wikipedia.org/wiki/B%2B_tree) is a self-balancing tree data structure and it is well suited for storage systems that read and write relatively large blocks of data. The data type is defined as follows. * in `bptree.h`: ```c= #include "definition.h" #include <stdbool.h> #include <stdint.h> typedef struct bptree { /* Loads records from file. */ void (*load)(const char *filepath, const uint64_t total_keys); /* Saves the B+ tree to file and updates metadata. */ void (*save)(metadata_t *metadata, const char *filepath); /* Splits the B+ tree into two parts and saves one of them to file, * according to key. */ void (*split_and_save_one)(metadata_t *metadata, const char *filepath, const uint64_t key); /* Frees the memory allocated for the B+ tree. */ void (*free_memory)(); /* Inserts a record into the B+ tree. */ void (*insert)(const uint64_t key, char *value); /* Searches key in the B+ tree. */ const char *(*search)(const uint64_t key); /* Scans from start key to end key and assigns the address of the value (if * found) to the pointer array. Note that the pointer array is initialized * to NULL before passing it to this function. */ void (*scan)(char *ptrs[], const uint64_t start_key, const uint64_t end_key); /* Returns a non zero value if the tree is empty, and 0 otherwise. */ int_fast8_t (*is_empty)(); /* Returns a non zero value if the tree is full, and 0 otherwise. */ int_fast8_t (*is_full)(); /* Returns the minimum key in the tree. */ uint64_t (*get_min_key)(); /* Returns the maximum key in the tree. */ uint64_t (*get_max_key)(); } bptree_t; ``` * where `node_t` is defined as: ```c= #include <stdbool.h> #include <stdint.h> typedef struct node { bool is_leaf; uint64_t *keys; /* is_leaf == true: ptrs[i] points to the ith data; * is_leaf == false (internal node): ptrs[i] points to the ith child node */ void **ptrs; int_fast8_t key_count; struct node *parent; /* is_leaf == true: next points to the next leaf; * is_leaf == false (internal node): next is a null pointer */ struct node *next; } node_t; ``` * The order (the maximum number of children nodes) of B+ tree is set to $5$. ### Database Initialization 1. Initialize Bloom filter and load the previous one if available. 2. Load the previous metatable if available. 3. Initialize B+ tree. 4. Allocate memory for the PUT buffer. ### Database Closure 1. Save Bloom filter and free its memory. 2. Flush the buffer. * If the B+ tree is not empty, write it to a file and update the metatable. * Free memory allocated for the B+ tree. 3. Save the metatable. 4. Free memory allocated for the buffer. 5. Close the output file. ## Operations * Read operation commands from the input file. * The three possible operations are as follows. ### PUT 1. Add the key to the Bloom filter. 2. Store key-value in a buffer until * the buffer is full, or * `EOF` is met, or * one of **GET**/**SCAN** operations is called. 2. Sort the buffer in ascending order, according to the key. * The sorting algorithm must be stable. (e.g. [merge sort](https://en.wikipedia.org/wiki/Merge_sort)) * If the buffer is not sorted before inserting key-values into the B+ tree, the system needs to do a lot of file swappings, which can be prevented if the buffer is sorted in advance. 3. Check if the B+ tree is full. * If true, find the position of the key in the B+ tree (keep searching until a key is greater than or equal to the key). * Split the B+ tree into two parts, save one to file and rebuild the tree with the other. * If the key belongs to the first quarter of B+ tree, write the second half of tree to file and rebuild a new B+ tree with the first part of the tree. * Otherwise, write the key-values whose key is less than the key to file and rebuild a new B+ tree with the remaining part. * Update the metatable. 4. Flush the buffer by inserting them into a B+ tree. * Before inserting each key-value into the B+ tree, check if the key belongs to another existing file. * If true, save the current B+ tree into a file and load the B+ tree where the current key-value belongs to. * Update the metatable if files are swapped. ### GET 1. Look up Bloom filter and check if the key exists in the database. * If no, write `EMPTY` to the output file and read the next command. 2. If the buffer is not empty, * sort the buffer and flush it (see [PUT](#PUT) section). 3. Look up the metatable, read the corresponding file **without** swapping (B+ tree is not rebuilt). 4. Search for the key in the file and return its corresponding value (`EMPTY` if not found). ### SCAN 1. If the buffer is not empty, * sort the buffer and flush it (see [PUT](#PUT) section). 2. Create a pointer array and set the values to `NULL`. 3. Check if the current key is between `min_key` and `max_key` of the B+ tree. * Swap files if the key belongs to a file that is not currently loaded. 5. Set the start key and end key, call `bptree.scan()` to set the pointer array. 6. Write values to output file according to the pointer array. * "EMPTY" if the pointer is a null pointer * otherwise, write the value the pointer points to. ## Usage ```bash # Compile # note: all codes are compiled with -O0 make # Run # filename format: *.input ./main <file> # Remove executable files make clean # Remove database rm -rf storage/ ``` ## Development Process ### Problem 1: binary search is too slow #### Details * In the original version of PUT operation, I use a buffer to store the key-values (sorted in ascending order). * When a key-value is PUT into the buffer, I use binary search to find the position where the key-value should be inserted into. * After the position is found, move the following key-values in the buffer to the right by 1 index and store the key-value to the position it belongs to. * `insert_after`: inserts a key-value to a certain index * `overwrite_buffer_entry`: overwrites the key-value at a certain index ```c= static data_t buf[MAX_KEY]; static void insert_after(const uint64_t key, const char *value, const int idx) { char *tmp = buf[key_count].value; /* very time-consuming */ for (int i = key_count - 1; i > idx; i--) { buf[i + 1].key = buf[i].key; buf[i + 1].value = buf[i].value; } buf[idx + 1].value = tmp; overwrite_buffer_entry(key, value, idx + 1); key_count++; } static void overwrite_buffer_entry(const uint64_t key, const char *value, const int idx) { buf[idx].key = key; strncpy(buf[idx].value, value, VALUE_LENGTH); } ``` * PUT 10K lines: ![](https://i.imgur.com/YzlLYbm.png) ![](https://i.imgur.com/dudYdRi.png) * PUT 100K lines: ![](https://i.imgur.com/MWQg9IX.png) ![](https://i.imgur.com/J4ZMHue.png) * PUT 200K lines: ![](https://i.imgur.com/eqKtFqy.png) ![](https://i.imgur.com/pQkP05o.png) * PUT 500K lines: ![](https://i.imgur.com/5Sc1hpK.png) ![](https://i.imgur.com/ijgN58F.png) * As you can see, the time spend by CPU is very long, especially in function `insert_after`. * I speculate that it might result from a considerable number of data moving, which happens when the key-values are PUT to the front part of the buffer. #### Solution * Implement B+ tree to avoid a lot of data moving. * The result is way faster than binary search. ![](https://i.imgur.com/KAHiO1C.png) ### Problem 2: generating too much small files #### Details * Originally, key-values will be inserted into the B+ tree if the keys don't belong to any existing files, which is undesired because when a key is found to belong to a certain existing file, the previous key-values will be written to a new file, and it results in generating too much small files. * This problem occurs if * no file is loaded, and * there are more than 1 existing file, and * keys to be inserted are not in the current valid range (`key < min_key` or `key > max_key`) before loading an existing file. * in `flush_put_buffer` (original): ```c= if (key < min_key || key > max_key) { /* Looks up the metatable */ bool found = false; for (int i = 0; i < meta_count; i++) { found = (loaded_file == -1 && meta_count == 1) || (key >= metatable[i].start_key && key <= metatable[i].end_key && metatable[i].file_number != loaded_file); if (found) { swap_files(&metatable[i]); min_key = metatable[i].start_key; max_key = metatable[i].end_key; break; } } if (!found) { min_key = MIN(key, min_key); max_key = MAX(key, max_key); } } bptree.insert(key, value); ``` * result: ![](https://i.imgur.com/iSXBYz0.png) ![](https://i.imgur.com/4W6i9fj.png) * Only one key-value is written to storage/3. Bad! #### Solution * Load the file whose start key or end key is nearest to the key. * in `flush_put_buffer` (new): ```c= if (key < min_key || key > max_key) { /* Looks up the metatable */ bool found = false; uint64_t min_diff = (key < min_key) ? min_key - key : key - max_key; int32_t min_diff_idx = -1; for (int i = 0; i < meta_count; i++) { bool loaded = (metatable[i].file_number == loaded_file); found = (!loaded && key >= metatable[i].start_key && key <= metatable[i].end_key); if (found) { swap_files(&metatable[i]); min_key = metatable[i].start_key; max_key = metatable[i].end_key; break; } /* Finds the file whose start key is larger than and nearest * to the key */ if (key < metatable[i].start_key) { uint64_t diff = metatable[i].start_key - key; if (diff < min_diff) { min_diff = diff; min_diff_idx = i; } } else if (key > metatable[i].end_key) { /* Finds the file whose end key is less than and nearest to * the key */ uint64_t diff = key - metatable[i].end_key; if (diff < min_diff) { min_diff = diff; min_diff_idx = i; } } } if (!found) { /* Swaps to the file whose start key or end key is nearest to * the key */ bool need_to_swap = (min_diff_idx != -1 && metatable[min_diff_idx].file_number != loaded_file); if (need_to_swap) { swap_files(&metatable[min_diff_idx]); min_key = MIN(key, metatable[min_diff_idx].start_key); max_key = MAX(key, metatable[min_diff_idx].end_key); } else { /* Already loaded the file whose start key or end key is * nearest to the key */ min_key = MIN(key, min_key); max_key = MAX(key, max_key); } } } bptree.insert(key, value); ``` ### Problem 3: memory leaks #### Details * By using [Valgrind](https://valgrind.org/), I found some memory leaks in the B+ tree. * memory leak check: ```bash valgrind --leak-check=full \ --show-leak-kinds=all \ --track-origins=yes \ --verbose \ ./main hw3example.input ``` * output: ``` ==12296== HEAP SUMMARY: ==12296== in use at exit: 1,344 bytes in 42 blocks ==12296== total heap usage: 4,000,343 allocs, 4,000,301 frees, 816,475,976 bytes allocated ==12296== ==12296== Searching for pointers to 42 not-freed blocks ==12296== Checked 16,077,384 bytes ==12296== ==12296== 32 bytes in 1 blocks are definitely lost in loss record 1 of 3 ==12296== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==12296== by 0x1094C6: safe_malloc (utils.c:15) ==12296== by 0x10A7A4: create_leaf (bptree.c:376) ==12296== by 0x10A2EB: insert (bptree.c:194) ==12296== by 0x10C7F1: flush_put_buffer (database.c:358) ==12296== by 0x10BA20: get (database.c:118) ==12296== by 0x10CD54: manage_database (main.c:70) ==12296== by 0x10CB85: main (main.c:28) ==12296== ==12296== 32 bytes in 1 blocks are definitely lost in loss record 2 of 3 ==12296== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==12296== by 0x1094C6: safe_malloc (utils.c:15) ==12296== by 0x10A7A4: create_leaf (bptree.c:376) ==12296== by 0x10A829: split_leaf (bptree.c:389) ==12296== by 0x10ADAE: insert_into_leaf (bptree.c:473) ==12296== by 0x10A35B: insert (bptree.c:203) ==12296== by 0x10C7F1: flush_put_buffer (database.c:358) ==12296== by 0x10BC58: scan (database.c:160) ==12296== by 0x10CDCC: manage_database (main.c:77) ==12296== by 0x10CB85: main (main.c:28) ==12296== ==12296== 1,280 bytes in 40 blocks are definitely lost in loss record 3 of 3 ==12296== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==12296== by 0x1094C6: safe_malloc (utils.c:15) ==12296== by 0x10A7A4: create_leaf (bptree.c:376) ==12296== by 0x10A829: split_leaf (bptree.c:389) ==12296== by 0x10ADAE: insert_into_leaf (bptree.c:473) ==12296== by 0x10A35B: insert (bptree.c:203) ==12296== by 0x10C7F1: flush_put_buffer (database.c:358) ==12296== by 0x10BA20: get (database.c:118) ==12296== by 0x10CD54: manage_database (main.c:70) ==12296== by 0x10CB85: main (main.c:28) ==12296== ==12296== LEAK SUMMARY: ==12296== definitely lost: 1,344 bytes in 42 blocks ==12296== indirectly lost: 0 bytes in 0 blocks ==12296== possibly lost: 0 bytes in 0 blocks ==12296== still reachable: 0 bytes in 0 blocks ==12296== suppressed: 0 bytes in 0 blocks ``` * After viewing the source code, I found that pointer arrays (`ptrs`) in leaf nodes won't be freed when the tree is being freed. * `create_leaf`: allocates memory for leaf node and returns the pointer to it ```c= static node_t *create_leaf() { node_t *node = safe_calloc(1, sizeof(node_t)); node->is_leaf = true; node->keys = safe_malloc(MAX_KEY * sizeof(uint64_t)); node->ptrs = safe_malloc(MAX_KEY * sizeof(char *)); return (node_t *)node; } ``` * `free_tree`: frees the memory allocated for the B+ tree ```c= static void free_tree(node_t *node) { if (node->is_leaf) { free(node->keys); free(node); return; } /* Postorder Traversal */ for (int i = 0; i < node->key_count + 1; i++) { free_tree(node->ptrs[i]); } free(node->keys); free(node->ptrs); free(node); } ``` * `ptrs` in leaf nodes are not freed. #### Solution * Free `ptrs` before freeing the current leaf node. * `free_tree`(new): ```c= static void free_tree(node_t *node) { if (node->is_leaf) { free_node(node); return; } /* Postorder Traversal */ for (int i = 0; i < node->key_count + 1; i++) { free_tree(node->ptrs[i]); } free_node(node); } ``` * `free_node`: frees the memory allocated for the node, including memory allocated for the node's member variables ```c= static void free_node(node_t *node) { free(node->keys); free(node->ptrs); free(node); } ``` * memory leak check: ``` ==13103== HEAP SUMMARY: ==13103== in use at exit: 0 bytes in 0 blocks ==13103== total heap usage: 4,000,343 allocs, 4,000,343 frees, 816,475,976 bytes allocated ==13103== ==13103== All heap blocks were freed -- no leaks are possible ``` ### Problem 4: too much unnecessary file swapping #### Details * Originally, when the B+ tree is full, it is split into two parts and the two parts are written to two different files. * However, if the PUT buffer is sorted in ascending order, it is likely that the next key belongs to one of the files we just written. * As a result, there is no need to save both parts of B+ tree to files. * in `flush_put_buffer`(original): ```c= /* Flushes the B+ tree */ if (bptree.is_full()) { /* Splits B+ tree into two parts and saves them separately */ char file1[MAX_FILENAME_LENGTH + 1]; char file2[MAX_FILENAME_LENGTH + 1]; size_t file_number1 = (loaded_file == -1) ? meta_count++ : loaded_file; size_t file_number2 = meta_count++; metatable[file_number1].file_number = file_number1; metatable[file_number2].file_number = file_number2; snprintf(file1, MAX_FILENAME_LENGTH, "%s/%lu", dir_path, file_number1); snprintf(file2, MAX_FILENAME_LENGTH, "%s/%lu", dir_path, file_number2); bptree.split_and_save(&metatable[file_number1], &metatable[file_number2], file1, file2); loaded_file = -1; min_key = UINT64_MAX; max_key = 0; } ``` * test: ```bash time ./main put_5m.input ``` * result: ![](https://i.imgur.com/LN77npC.png) ![](https://i.imgur.com/TW8BvZs.png) #### Solution 1 * Find the position of the key in the B+ tree (keep searching until a key is greater than or equal to the key). * Split the B+ tree into two parts, save one to file and rebuild the tree with the other. * If the key belongs to the first quarter of B+ tree, write the second half of tree to file and rebuild a new B+ tree with the first part of the tree. * Otherwise, write the key-values whose key is less than the key to file and rebuild a new B+ tree with the remaining part. * in `flush_put_buffer`: ```c= /* Flushes the B+ tree */ if (bptree.is_full()) { /* Splits B+ tree into two parts and saves one of them depending on * the current key */ char file[MAX_PATH + 1]; size_t file_number = (loaded_file == -1) ? meta_count++ : loaded_file; metatable[file_number].file_number = file_number; snprintf(file, MAX_PATH, "%s/%lu", dir_path, file_number); bptree.split_and_save_one(&metatable[file_number], file, key); loaded_file = -1; min_key = bptree.get_min_key(); max_key = bptree.get_max_key(); } ``` * `split_and_save_one`: splits the B+ tree into two parts and saves one of the two to file, according to the passed-in key * The new version prevents unnecessary swaps, but unfortunately, the result didn't change much. ![](https://i.imgur.com/o5snW9E.png) ![](https://i.imgur.com/88tDgL8.png) #### Solution 2 * Set a larger buffer and a larger B+ tree. * PUT buffer size: $8 \times 10^6$ * B+ tree capacity: $4 \times 10^6$ * Increasing the size leads to a better result. ![](https://i.imgur.com/SlxFA0j.png) ## Execution Time * PUT 100 lines: ```bash time ./main put_100.input initializing database ... initializing bloom filter ... closing database ... saving bloom filter ... keys in buffer: 100 saving B+ tree to storage/0 ... saved 100 keys to storage/0 saving metatable ... real 0m0.720s user 0m0.328s sys 0m0.380s ``` * PUT 1K lines (cont.): ```bash time ./main put_1k.input initializing database ... initializing bloom filter ... loading bloom filter ... loading metatable ... file_number: 0, start: 265559698061076404, end: 9150388517377406808, total_keys: 100 closing database ... saving bloom filter ... keys in buffer: 1000 swapping to file 0 ... loading 100 keys from storage/0 saving B+ tree to storage/0 ... saved 1100 keys to storage/0 saving metatable ... real 0m6.022s user 0m0.363s sys 0m0.623s ``` * PUT 10K lines (cont.): ```bash time ./main put_10k.input initializing database ... initializing bloom filter ... loading bloom filter ... loading metatable ... file_number: 0, start: 364261332623835, end: 9209232481591014772, total_keys: 1100 closing database ... saving bloom filter ... keys in buffer: 10000 swapping to file 0 ... loading 1100 keys from storage/0 saving B+ tree to storage/0 ... saved 11100 keys to storage/0 saving metatable ... real 0m6.295s user 0m0.388s sys 0m0.605s ``` * PUT 100K lines (cont.): ```bash time ./main put_100k.input initializing database ... initializing bloom filter ... loading bloom filter ... loading metatable ... file_number: 0, start: 21106261949620, end: 9223086143511058432, total_keys: 11100 closing database ... saving bloom filter ... keys in buffer: 100000 swapping to file 0 ... loading 11100 keys from storage/0 saving B+ tree to storage/0 ... saved 111100 keys to storage/0 saving metatable ... real 0m6.945s user 0m0.612s sys 0m0.620s ``` * PUT 2.5M lines (cont.): ```bash time ./main put_2.5m.input initializing database ... initializing bloom filter ... loading bloom filter ... loading metatable ... file_number: 0, start: 21106261949620, end: 9223361605341576555, total_keys: 111100 keys in buffer: 2000000 swapping to file 0 ... loading 111100 keys from storage/0 key 8709600155309441019 is the 1000002th key in the tree saved 1000000 keys to storage/0 inserting the remaining part of size 1000000 to the new B+ tree ... min_key: 4369552140444686358, max_key: 9223361605341576555 closing database ... saving bloom filter ... keys in buffer: 500000 swapping to file 0 ... saving B+ tree to storage/1 ... saved 1111100 keys to storage/1 loading 1000000 keys from storage/0 swapping to file 1 ... saving B+ tree to storage/0 ... saved 1237206 keys to storage/0 loading 1111100 keys from storage/1 saving B+ tree to storage/1 ... saved 1373894 keys to storage/1 saving metatable ... real 0m19.610s user 0m5.131s sys 0m1.292s ``` * PUT 800K lines (cont.): ```bash time ./main put_800k.input initializing database ... initializing bloom filter ... loading bloom filter ... loading metatable ... file_number: 0, start: 2204325261791, end: 4369542047299018435, total_keys: 1237206 file_number: 1, start: 4369547278087225560, end: 9223371123632748569, total_keys: 1373894 closing database ... saving bloom filter ... keys in buffer: 800000 swapping to file 0 ... loading 1237206 keys from storage/0 swapping to file 1 ... saving B+ tree to storage/0 ... saved 1616500 keys to storage/0 loading 1373894 keys from storage/1 saving B+ tree to storage/1 ... saved 1794600 keys to storage/1 saving metatable ... real 0m20.112s user 0m3.358s sys 0m1.386s ``` * PUT 500K lines (cont.): ```bash time ./main put_500k.input initializing database ... initializing bloom filter ... loading bloom filter ... loading metatable ... file_number: 0, start: 2204325261791, end: 4369542047299018435, total_keys: 1616500 file_number: 1, start: 4369544860375114708, end: 9223371123632748569, total_keys: 1794600 closing database ... saving bloom filter ... keys in buffer: 500000 swapping to file 0 ... loading 1616500 keys from storage/0 swapping to file 1 ... saving B+ tree to storage/0 ... saved 1853306 keys to storage/0 loading 1794600 keys from storage/1 key 8161117537835197582 is the 1000003th key in the tree saved 1000000 keys to storage/1 inserting the remaining part of size 1000000 to the new B+ tree ... min_key: 6727028331190238375, max_key: 9223371123632748569 saving B+ tree to storage/2 ... saved 1057794 keys to storage/2 saving metatable ... real 0m20.288s user 0m3.008s sys 0m1.510s ``` * PUT 5M lines (cont.): :::spoiler result (very long) ```bash time ./main put_5m.input initializing database ... initializing bloom filter ... loading bloom filter ... loading metatable ... file_number: 0, start: 1258970314881, end: 4369542047299018435, total_keys: 1853306 file_number: 1, start: 4369544860375114708, end: 6727021897436184075, total_keys: 1000000 file_number: 2, start: 6727025624595350936, end: 9223371123632748569, total_keys: 1057794 keys in buffer: 2000000 swapping to file 0 ... loading 1853306 keys from storage/0 key 674513559108135227 is the 432863th key in the tree saved 999998 keys to storage/0 inserting the remaining part of size 1000002 to the new B+ tree ... min_key: 2204325261791, max_key: 2011130222439663578 swapping to file 0 ... saving B+ tree to storage/3 ... saved 1290237 keys to storage/3 loading 999998 keys from storage/0 swapping to file 1 ... saving B+ tree to storage/0 ... saved 1511212 keys to storage/0 loading 1000000 keys from storage/1 swapping to file 2 ... saving B+ tree to storage/1 ... saved 1511230 keys to storage/1 loading 1057794 keys from storage/2 keys in buffer: 2000000 swapping to file 3 ... saving B+ tree to storage/2 ... saved 1598421 keys to storage/2 loading 1290237 keys from storage/3 swapping to file 0 ... saving B+ tree to storage/3 ... saved 1726354 keys to storage/3 loading 1511212 keys from storage/0 key 4261890766214267824 is the 1000002th key in the tree saved 1000000 keys to storage/0 inserting the remaining part of size 1000000 to the new B+ tree ... min_key: 3177375184135198352, max_key: 4369543443190554006 swapping to file 1 ... saving B+ tree to storage/4 ... saved 1023476 keys to storage/4 loading 1511230 keys from storage/1 key 6623792535245645186 is the 1000002th key in the tree saved 999999 keys to storage/1 inserting the remaining part of size 1000001 to the new B+ tree ... min_key: 5534857028396038253, max_key: 6727024436135558302 swapping to file 2 ... saving B+ tree to storage/5 ... saved 1022329 keys to storage/5 loading 1598421 keys from storage/2 key 8582588515427229060 is the 1000003th key in the tree saved 1000000 keys to storage/2 inserting the remaining part of size 1000000 to the new B+ tree ... min_key: 7894821718381319607, max_key: 9223371123632748569 closing database ... saving bloom filter ... keys in buffer: 1000000 swapping to file 3 ... saving B+ tree to storage/6 ... saved 1138942 keys to storage/6 loading 1726354 keys from storage/3 swapping to file 0 ... saving B+ tree to storage/3 ... saved 1944634 keys to storage/3 loading 1000000 keys from storage/0 swapping to file 4 ... saving B+ tree to storage/0 ... saved 1126735 keys to storage/0 loading 1023476 keys from storage/4 swapping to file 1 ... saving B+ tree to storage/4 ... saved 1152747 keys to storage/4 loading 999999 keys from storage/1 swapping to file 5 ... saving B+ tree to storage/1 ... saved 1125965 keys to storage/1 loading 1022329 keys from storage/5 swapping to file 2 ... saving B+ tree to storage/5 ... saved 1151529 keys to storage/5 loading 1000000 keys from storage/2 swapping to file 6 ... saving B+ tree to storage/2 ... saved 1126591 keys to storage/2 loading 1138942 keys from storage/6 saving B+ tree to storage/6 ... saved 1282899 keys to storage/6 saving metatable ... real 1m41.153s user 0m22.844s sys 0m5.557s ``` ::: ## References * [Reading And Writing Data to Apache HBase](https://www.corejavaguru.com/bigdata/hbase-tutorial/read-and-write) * [Bloom filter - Wikipedia](https://en.wikipedia.org/wiki/Bloom_filter) * [資料結構大便當:Bloom Filter](https://medium.com/@Kadai/%E8%B3%87%E6%96%99%E7%B5%90%E6%A7%8B%E5%A4%A7%E4%BE%BF%E7%95%B6-bloom-filter-58b0320a346d) * [Universal hashing - Wikipedia](https://en.wikipedia.org/wiki/Universal_hashing) * [Universal and Perfect Hashing - CMU](https://www.cs.cmu.edu/~avrim/451f11/lectures/lect1004.pdf) * [B+ tree - Wikipedia](https://en.wikipedia.org/wiki/B%2B_tree) * [Introduction of B+ Tree - GeeksforGeeks](https://www.geeksforgeeks.org/introduction-of-b-tree/) * [B+ Tree Visualization](https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html) * [B+ Tree Basics 1](https://www.youtube.com/watch?v=CYKRMz8yzVU) * [B+ Trees Basics 2 (insertion)](https://www.youtube.com/watch?v=_nY8yR6iqx4)