Week 8 - Hashing

# Week 8 - Hashing ## Team Team name:Hashing Date:17/05/2022 Members: Nikola Zahariev, Casian Ionescu, Attila Poldan | Role | Name | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------| | **Facilitator** keeps track of time, assigns tasks and makes sure all the group members are heard and that decisions are agreed upon. | | | **Spokesperson** communicates group’s questions and problems to the teacher and talks to other teams; presents the group’s findings. | | | **Reflector** observes and assesses the interactions and performance among team members. Provides positive feedback and intervenes with suggestions to improve groups’ processes. | | | **Recorder** guides consensus building in the group by recording answers to questions. Collects important information and data. | | ## Activities Make sure to have the activities signed off regularly to ensure progress is tracked. Download the provided project and open it in CLion. ### Activity 1: The costs of maintaining order Record your answer here Whenever we insert the movments follow the pattern of: For every insertion of an id there will be a rellocation of n-1 (where n is the size of the arraY) record_pair_ts across the given array. ### Activity 2: Calculate the moduli | id | 8609 | 6348 | 4992 | 5839 | 1622 | 5450 | 7198 | 4827 | | ---------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | | **mod 8** | 1 | 4 | 0 | 7 | 6 | 2 | 6 | 3 | | **mod 16** | 1 | 12 | 0 | 15 | 6 | 10 | 14 | 11 | | **mod 17** | 7 | 7 | 11 | 8 | 7 | 10 | 7 | 16 | ### Activity 3: Hash functions Hash functions are desinged to convert any arbitrary sized value into a fixed sized one which can be used to index the original value in a data structure (hashmap). ### Activity 4: A simple hash function This function is not good because, the function only takes the lenght of a word and uses it as a key, but there are multiple words with the same lenght. This will create keys which can be linked to multiple words. This makes the created key useless because we can not identify a singular value which we are searching for. Instead of a single value it will point to multiple making identifing the desired value not feasible. In other words, it will create collision instances. ### Activity 5: Implement FNV-1a ```c= size_t fnv_hash(const char* key) { const size_t FNV_PRIME = 0x00000100000001B3; const size_t FNV_OFFSET_BASIS = 0xcbf29ce484222325; // TODO: Activity 5 size_t h = FNV_OFFSET_BASIS; for (int i = 0; i < strlen(key); ++i) { h=h ^ key[i]; h=h*FNV_PRIME; } return h; } int main(void) { assert(fnv_hash("") == 0xcbf29ce484222325); assert(fnv_hash("a") == 0xAF63DC4C8601EC8C); assert(fnv_hash("ab") == 0x89C4407B545986A); assert(fnv_hash("abc") == 0xE71FA2190541574B); } ``` ### Activity 6: Implement hash-based indexing ```c= // returns the modulo index of a string key based on its FNV-1a hash size_t hash_index_str(const char* key, size_t size) { size_t h= fnv_hash(key); h=h%size; return h; } int main(void) { assert(hash_index_str("", 4201) == 2552); assert(hash_index_str("a", 4201) == 3432); assert(hash_index_str("ab", 4201) == 618); assert(hash_index_str("abc", 4201) == 1350); } ``` | key | index | | ---------- | ----- | | Alice | 1 | | in | 9 | | Wonderland | 22 | | potion | 20 | | tea | 6 | | rabbit | 24 | ### Activity 7: Function pointers ```c typedef void (*myfunc)(int); void foo(int value) { printf("The answer is %d\n", value); } void bar(myfunc f) { f(42); } int main(void) { myfunc fun = foo; bar(fun); } ``` In the main we declare a function pointer named fun and we pass the function foo to it. "fun"-s value will be the pointer which point to foo-s location. Then we pass this pointer to the function bar and then bar passes an integer of 42 to the pointer. Because we passed the integer to the pointer which points to foo-s location, foo will recive it and executes, printing the string "The answer is 42". ### Activity 8: Hashing and equality testing ```c= bool equal_strings(const char* s1, const char* s2) { if(strcmp(s1,s2)) return false; else return true; } int main(void) { hashmap_t map; map_init(&map, 7, equal_strings, hash_index_str); assert(map.eq_fun("Alice", "Alice")); assert(!map.eq_fun("Alice", "alice")); assert(map.hash_fun("Alice", 31) == 1); assert(map.hash_fun("alice", 31) == 28); } ``` ### Activity 9: Implement lookup ```c kv_pair_t *map_lookup(const hashmap_t *map, key_type key) { // - compute index of the slot at which the key is stored size_t index= map->hash_fun(key,map->capacity); // - search the linked list for the key if(map->slots[index].head==NULL) return NULL; node_t *temp=map->slots[index].head; while(strcmp(temp->kv_pair->key,key)!=0) { if(temp->next==NULL)return NULL; temp=temp->next; } // - if found, return the key-value pair, otherwise return NULL return temp->kv_pair; } ``` ### Activity 10: Implementation time ```c bool map_insert(hashmap_t *map, key_type key, value_type value) { size_t index = map->hash_fun(key, map->capacity); node_t *temp=map->slots[index].head; if (temp!= NULL) { while (strcmp(temp->kv_pair->key, key) != 0) { if (temp->next == NULL) { temp = NULL; break; } else temp = temp->next; } } if (temp == NULL) { list_prepend(&map->slots[index], make_pair(key, value)); map->count = map->count + 1; return true; } else { temp->kv_pair = make_pair(key, value); return false; } } void map_remove(hashmap_t *map, key_type key) { size_t index = map->hash_fun(key, map->capacity); node_t *temp = map->slots[index].head; if (map->slots[index].head != NULL) { while (strcmp(temp->kv_pair->key, key) != 0) { if (temp->next == NULL) { temp = NULL; break; } else temp = temp->next; } } if(temp!=NULL){ list_remove(&map->slots[index],temp); map->count=map->count-1; } } ``` ### Activity 11: Word counting ```c void process_line(char* line, hashmap_t * map) { const char* sep = "\t ()[]_,.;:?!/\\\"\n'"; char* token = strtok(line, sep); while (token) { if (strlen(token)) { kv_pair_t* temp= map_lookup(map,token); if (temp==NULL) map_insert(map,token,1); else temp->value++; } token = strtok(NULL, sep); } } int main(void) { hashmap_t map; map_init(&map, 31, compare, hash_index_str); process_file("short.txt", &map); map_print(&map); map_destroy(&map); } ``` | word | frequency | | ------- | --------- | | "ipsum" | 2 | | "dolor" | 4 | | "sit" | 3 | | "amet" | 3 | ### Activity 12: Tracking load statistics | Capacity | 4201 | 6101 | 15149 | | ---------------------------- | ---- | ---- | ----- | | Fraction of free slots | 0.44 | | | | Load factor | 0.83 | 0.57 | 0.23 | | Slots with more than 1 items | 861 | 676 | 3292 | | Length of longest chain | 6 | 5 | 4 | | Keys in longest chain | | | | Capacity smaller than 200000 for which no slot has more than one word stored after processing "alice0.txt": ### Activity 13: Time complexity | Operation | Time complexity | | --------- | --------------- | | Lookup | O(1) | | Insert | O(1) | | Remove | O(1) | ### Activity 14: Implement rehashing (optional) ```c bool map_rehash(hashmap_t *original) { // if no need to rehash, then do nothing if (original->count * 10 < original->capacity * 7) return false; // set new capacity so that load factor will become 0.5 size_t new_capacity = original->count * 2; // initialize new dynamic array for storing new slots list_t * new_slots = malloc(sizeof(list_t) * new_capacity); if (new_slots == NULL) return false; // TODO: Activity 14 - implement rehashing // Deallocate now-empty array of slots free(original->slots); // Update hashmap to use new slots and new capacity original->slots = new_slots; original->capacity = new_capacity; return true; } ``` ## Looking back ### What we've learnt Formulate at least one lesson learned. ### What were the surprises Fill in... ### What problems we've encountered Fill in... ### What was or still is unclear Fill in... ### How did the group perform? How was the collaboration? What were the reasons for hick-ups? What worked well? What can be improved next time?