# Improvement
[Github](https://github.com/ilkclord/Map_Reduce/tree/main/word-count)
## Buffer
The method of extrating each word from the file is reading each character of the file and put into the buffer `word` .It may be time-costing ,because we keep copying the character to the buffer , but actually we can just read the content of mapped memory since its pointer is an global variable
So I change the extracting method to using `start` and `size` to indicate the word out . `start` is declared as a global variable , and `size` is the original `count` .
```clike
static __thread char *start ;
```
### Change of Source Code
Each funtion related to extracting has add an argument , `size` (count). We also seperate the new-defined word and word by checking `size` , `0` for original word .
* **buff_proceed**
Modified the loop to the loop without calling `add_letter`
```clike
static int buff_proceed(uint32_t tid, char *buff, size_t size, char last)
{
start = buff ;
for (; size; size--, buff++) {
if (!IS_LETTER(*buff)) {
if (add_sep(tid)) /* Not a letter */
return -1;
start = buff + 1 ;
continue;
}
count++ ;
}
if (last) /* If this is the last buffer, end the word (if any) */
add_sep(tid);
return 0;
}
```
* **wc_compare**
strncasecmp instead of strcasecmp
```clike
static inline int __wc_cmp(struct ht_node *n, void *key, char m , int size)
{
struct wc_word *w = m ? container_of(n, struct wc_word, node_main)
: container_of(n, struct wc_word, node);
return strncasecmp(GET_WORD(w), (char *) key , size);
}
```
* **get_code**
Adding the way to calculate the hash value of `start` , `size` pair .
```clike
static uint32_t get_code(char *word , int count)
{
uint32_t code = 0;
/* The hash value is a concatenation of the letters */
char shift =
(char) (sizeof(unsigned int) * CHAR_BIT - __builtin_clz(N_LETTERS)); //RRR
if(count){
for (int i = ((sizeof(code) * 8) / shift) - 1; i >= 0 && count > 0 ; i-- , count--) {
char c = tolower(*(word)) - FIRST_LOWER_LETTER;
code |= ((uint32_t) c) << (i * shift);
word++;
}
}
else{
for (int i = ((sizeof(code) * 8) / shift) - 1; i >= 0 && *word; i--) {
char c = tolower(*(word)) - FIRST_LOWER_LETTER;
code |= ((uint32_t) c) << (i * shift);
word++;
}
}
return code;
}
```
### Performance
* **Median Number of 50 times execution**
I think there must be a better way to measure .....
```clike
Original :
Done in 3.31592 msec
New :
Done in 2.93286 msec
```
It has improve 20% efficiency !
## Error
Testing by valgrind , it occured an error .
```
==8729== Invalid read of size 8
==8729== at 0x109406: list_next (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x109807: ht_get_next (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x10A14B: __wc_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x10A1AB: wc_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x10A54F: mr_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x10AE7A: main (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== Address 0x4bfbed8 is 56 bytes inside a block of size 96 free'd
==8729== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==8729== by 0x10A136: __wc_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x10A1AB: wc_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x10A54F: mr_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x10AE7A: main (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== Block was alloc'd at
==8729== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==8729== by 0x109C7A: wc_add_word (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x10A69D: add_sep (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x10A70F: buff_proceed (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x10AACF: mr_map (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x4869608: start_thread (pthread_create.c:477)
==8729== by 0x49A5292: clone (clone.S:95)
==8729==
==8729== Invalid read of size 8
==8729== at 0x10941A: list_next (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x109807: ht_get_next (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x10A14B: __wc_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x10A1AB: wc_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x10A54F: mr_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x10AE7A: main (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== Address 0x4c3dfb8 is 56 bytes inside a block of size 96 free'd
==8729== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==8729== by 0x10A136: __wc_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x10A1AB: wc_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x10A54F: mr_destroy (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x10AE7A: main (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== Block was alloc'd at
==8729== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==8729== by 0x109C7A: wc_add_word (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x10A69D: add_sep (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x10A70F: buff_proceed (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x10AACF: mr_map (in /home/ilkclord/linux2020/quiz8/project/word-count)
==8729== by 0x4869608: start_thread (pthread_create.c:477)
==8729== by 0x49A5292: clone (clone.S:95)
```
We trace it to `__wc_destroy` , where the code is
```clike
static int __wc_destroy(struct wc_cache *wcc, int id)
{
int valid = (id == -1);
for (uint32_t j = 0; j < n_buckets; j++) {
struct ht_node *iter = ht_get_first(&wcc->htable, j);
for (; iter; iter = ht_get_next(&wcc->htable, j, iter)) {
struct wc_word *w =
valid ? container_of(iter, struct wc_word, node_main)
: container_of(iter, struct wc_word, node);
free(w->full_word);
free(w);
}
}
return 0;
}
```
It is obvious that `free(w)` will also free the ht_node , and makes the iter become invalid , so we exchnge the sequence .
```clike
static int __wc_destroy(struct wc_cache *wcc, int id)
{
int valid = (id == -1);
for (uint32_t j = 0; j < n_buckets; j++) {
struct ht_node *iter = ht_get_first(&wcc->htable, j);
for (; iter; ) {
struct wc_word *w =
valid ? container_of(iter, struct wc_word, node_main)
: container_of(iter, struct wc_word, node);
iter = ht_get_next(&wcc->htable, j, iter) ; //update iter first
free(w->full_word);
free(w);
}
}
return 0;
}
```
We update `iter` first , and the error were fixed .