# Improvement [Github](https://github.com/ilkclord/Map_Reduce/tree/main/word-count) ## Buffer The method of extrating each word from the file is reading each character of the file and put into the buffer `word` .It may be time-costing ,because we keep copying the character to the buffer , but actually we can just read the content of mapped memory since its pointer is an global variable So I change the extracting method to using `start` and `size` to indicate the word out . `start` is declared as a global variable , and `size` is the original `count` . ```clike static __thread char *start ; ``` ### Change of Source Code Each funtion related to extracting has add an argument , `size` (count). We also seperate the new-defined word and word by checking `size` , `0` for original word . * **buff_proceed** Modified the loop to the loop without calling `add_letter` ```clike static int buff_proceed(uint32_t tid, char *buff, size_t size, char last) { start = buff ; for (; size; size--, buff++) { if (!IS_LETTER(*buff)) { if (add_sep(tid)) /* Not a letter */ return -1; start = buff + 1 ; continue; } count++ ; } if (last) /* If this is the last buffer, end the word (if any) */ add_sep(tid); return 0; } ``` * **wc_compare** strncasecmp instead of strcasecmp ```clike static inline int __wc_cmp(struct ht_node *n, void *key, char m , int size) { struct wc_word *w = m ? container_of(n, struct wc_word, node_main) : container_of(n, struct wc_word, node); return strncasecmp(GET_WORD(w), (char *) key , size); } ``` * **get_code** Adding the way to calculate the hash value of `start` , `size` pair . ```clike static uint32_t get_code(char *word , int count) { uint32_t code = 0; /* The hash value is a concatenation of the letters */ char shift = (char) (sizeof(unsigned int) * CHAR_BIT - __builtin_clz(N_LETTERS)); //RRR if(count){ for (int i = ((sizeof(code) * 8) / shift) - 1; i >= 0 && count > 0 ; i-- , count--) { char c = tolower(*(word)) - FIRST_LOWER_LETTER; code |= ((uint32_t) c) << (i * shift); word++; } } else{ for (int i = ((sizeof(code) * 8) / shift) - 1; i >= 0 && *word; i--) { char c = tolower(*(word)) - FIRST_LOWER_LETTER; code |= ((uint32_t) c) << (i * shift); word++; } } return code; } ``` ### Performance * **Median Number of 50 times execution** I think there must be a better way to measure ..... ```clike Original : Done in 3.31592 msec New : Done in 2.93286 msec ``` It has improve 20% efficiency ! ## Error Testing by valgrind , it occured an error . ``` ==8729== Invalid read of size 8 ==8729== at 0x109406: list_next (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x109807: ht_get_next (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x10A14B: __wc_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x10A1AB: wc_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x10A54F: mr_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x10AE7A: main (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== Address 0x4bfbed8 is 56 bytes inside a block of size 96 free'd ==8729== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==8729== by 0x10A136: __wc_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x10A1AB: wc_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x10A54F: mr_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x10AE7A: main (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== Block was alloc'd at ==8729== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==8729== by 0x109C7A: wc_add_word (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x10A69D: add_sep (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x10A70F: buff_proceed (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x10AACF: mr_map (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x4869608: start_thread (pthread_create.c:477) ==8729== by 0x49A5292: clone (clone.S:95) ==8729== ==8729== Invalid read of size 8 ==8729== at 0x10941A: list_next (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x109807: ht_get_next (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x10A14B: __wc_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x10A1AB: wc_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x10A54F: mr_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x10AE7A: main (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== Address 0x4c3dfb8 is 56 bytes inside a block of size 96 free'd ==8729== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==8729== by 0x10A136: __wc_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x10A1AB: wc_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x10A54F: mr_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x10AE7A: main (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== Block was alloc'd at ==8729== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==8729== by 0x109C7A: wc_add_word (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x10A69D: add_sep (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x10A70F: buff_proceed (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x10AACF: mr_map (in /home/ilkclord/linux2020/quiz8/project/word-count) ==8729== by 0x4869608: start_thread (pthread_create.c:477) ==8729== by 0x49A5292: clone (clone.S:95) ``` We trace it to `__wc_destroy` , where the code is ```clike static int __wc_destroy(struct wc_cache *wcc, int id) { int valid = (id == -1); for (uint32_t j = 0; j < n_buckets; j++) { struct ht_node *iter = ht_get_first(&wcc->htable, j); for (; iter; iter = ht_get_next(&wcc->htable, j, iter)) { struct wc_word *w = valid ? container_of(iter, struct wc_word, node_main) : container_of(iter, struct wc_word, node); free(w->full_word); free(w); } } return 0; } ``` It is obvious that `free(w)` will also free the ht_node , and makes the iter become invalid , so we exchnge the sequence . ```clike static int __wc_destroy(struct wc_cache *wcc, int id) { int valid = (id == -1); for (uint32_t j = 0; j < n_buckets; j++) { struct ht_node *iter = ht_get_first(&wcc->htable, j); for (; iter; ) { struct wc_word *w = valid ? container_of(iter, struct wc_word, node_main) : container_of(iter, struct wc_word, node); iter = ht_get_next(&wcc->htable, j, iter) ; //update iter first free(w->full_word); free(w); } } return 0; } ``` We update `iter` first , and the error were fixed .