Enhance find_func through trie structure

[jserv/shecc](https://github.com/jserv/shecc), [ISSUE#7](https://github.com/jserv/shecc/issues/7#issue-695472170), [PR](https://github.com/jserv/shecc/pull/59#issue-1806029497) ## Development enviroment ```shell= > uname -a Linux zoo868e 5.15.90.1-microsoft-standard-WSL2 #1 SMP Fri Jan 27 02:56:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux > gcc --version gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0 ``` ## Problem Statement In [issue #7](https://github.com/jserv/shecc/issues/7#issue-695472170), [reborn2266](https://github.com/reborn2266) used uftrace to analyze the shecc project and found that the strcmp() function is consuming the most execution time in shecc. This is because the find_ function, such as find_func, uses strcmp to check if a function has been declared. Currently, shecc compares each inserted name every time until it finds a match or determines that it hasn't been inserted. [reborn2266](https://github.com/reborn2266) suggested that we can enhance the search performance by utilizing a trie data structure to store the inserted names. ## Implementation Here is the revised structure for the trie. The `index` member stores the index of the function in the `FUNCS` array, while the `next` member keeps track of the index of the next trie node. If the value of `index` is 0, it indicates that the function hasn't been inserted, and if the value of `next` is 0, it means there is no next trie node. There are 128 `next` elements since ASCII has 128 different characters. Even though the user may only utilize a subset of them, it's important to perform the necessary checks before insertion. ```c= typedef struct { int index; int next[128]; } trie_t; ``` I had implement the function `insert_trie` and the function `find_trie` for the insertion and the searching. ```clike= int insert_trie(trie_t *trie, char *name, int funcs_index) { char first_char = *name; int i; if (!first_char) { if (!trie->index) trie->index = funcs_index; return trie->index; } if (!trie->next[first_char]) { trie->next[first_char] = func_tries_idx++; for (i = 0; i < 128; i++) FUNC_TRIES[trie->next[first_char]].next[i] = 0; FUNC_TRIES[trie->next[first_char]].index = 0; } return insert_trie(&FUNC_TRIES[trie->next[first_char]], name + 1, funcs_index); } int find_trie(trie_t *trie, char *name) { char first_char = *name; if (!first_char) return trie->index; else if (!trie->next[first_char]) return 0; return find_trie(&FUNC_TRIES[trie->next[first_char]], name + 1); } ``` ### add_func I have made modifications to the implementation by utilizing the original `FUNCS` array to store function information. When adding a new function, I also insert its index into the trie. Additionally, I have replaced the `strcmp` function with `insert_trie`, which is able to verifies if the function has already been inserted. **Before:** ```clike= func_t *add_func(char *name) { func_t *fn; int i; for (i = 0; i < funcs_idx; i++) if (!strcmp(FUNCS[i].return_def.var_name, name)) return &FUNCS[i]; fn = &FUNCS[funcs_idx++]; strcpy(fn->return_def.var_name, name); return fn; } ``` **After:** ```clike= func_t *add_func(char *name) { func_t *fn; int index = insert_trie( FUNC_TRIES, name, funcs_idx + 1); if (index == funcs_idx + 1) { fn = &FUNCS[funcs_idx++]; strcpy(fn->return_def.var_name, name); } fn = &FUNCS[index - 1]; return fn; } ``` ### find_func Use the function `find_trie` to search the function name. ```clike= func_t *find_func(char func_name[]) { int index = find_trie(FUNC_TRIES, func_name); if (index) return &FUNCS[index - 1]; return NULL; } ``` ## Experimental I have created a sample code to analyze the performance of the `find_func` function. The code sums up the return values of the function from 1 to n, and the main function returns this sum. Here is an example of the code where the main function returns the sum of the return values of `f1`, `f2`, and `f3`. ```clike= int f1(){return 1;} int f2(){return 2;} int f3(){return 3;} int main(){ return f1() + f2() + f3(); } ``` I conducted experiments with `find_func`, ranging from 1 function to 1024 functions, and the self-time calculated by `uftrace` is as follows: |Number of the Function|Number of find_func called|Before|After |---|---|---|---| |1|451|263.596us|31.329us| |2|456|296.300us|35.612us| |4|466|313.824us|38.806us| |8|486|349.073us|49.190us| |16|526|413.888us|43.033us| |32|606|904.453us|64.273us| |64|766|1.890ms|95.702us| |128|1086|4.436ms|133.466us| |256|1726|16.381ms|193.744us| |512|3006|62.848ms|276.330us| |1024|5566|241.019ms|**Segmentation Fault**| When the number of functions exceeds 919, the `find_func` that utilizes the trie structure encounters a **Segmentation Fault**. The reason behind this is that I only allocated 1024 units for the trie, which is the same as the maximum number of functions. As we know, the trie structure requires more than one unit to store a function if the function name has more than one character. Therefore, the maximum number of functions becomes restricted by the maximum number of units in the trie, and it depends on the length of the function names. I attempted to dynamically allocate memory for the trie, but encountered a compilation error due to an explicit cast. I believe the reason for this is that the cast from a void pointer to a pointer of a custom struct type is not yet supported. Despite the limitation imposed on the maximum number of functions when using the trie struct, the performance has improved significantly. The more functions there are, the greater the improvement, as observed in below Figure. ![Figure1](https://hackmd.io/_uploads/S1gNM7yqh.png)