# Week 8 - Hashing ## Team Team name: Date: Members | Role | Name | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------| | **Facilitator** keeps track of time, assigns tasks and makes sure all the group members are heard and that decisions are agreed upon. | | | **Spokesperson** communicates group’s questions and problems to the teacher and talks to other teams; presents the group’s findings. | | | **Reflector** observes and assesses the interactions and performance among team members. Provides positive feedback and intervenes with suggestions to improve groups’ processes. | | | **Recorder** guides consensus building in the group by recording answers to questions. Collects important information and data. | | ## Activities Make sure to have the activities signed off regularly to ensure progress is tracked. Download the provided project and open it in CLion. ### Activity 1: The costs of maintaining order 18 ### Activity 2: Calculate the moduli | id | 8609 | 6348 | 4992 | 5839 | 1622 | 5450 | 7198 | 4827 | | ---------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | | **mod 8** | 1 | 4 | 0 | 7 | 6 | 2 | 6 |3| | **mod 16** | 1 | 12 | 0 | 15 | 6 | 10 | 11 | 11 | | **mod 17** | 7 | 7 | 11 | 8 | 7 | 10 | 7 | 16 | ### Activity 3: Hash functions Hash functie is een fucntie die maar een richting op kan. Een goede hash functie krijgt zelden de zelfde waarden uit verschillende invoer. ### Activity 4: A simple hash function Omdat hier vaak de zelfde waarde uitkomt ### Activity 5: Implement FNV-1a ```c int main(void) { assert(fnv_hash("") == 0xcbf29ce484222325); assert(fnv_hash("a") == 0xAF63DC4C8601EC8C); assert(fnv_hash("ab") == 0x89C4407B545986A); assert(fnv_hash("abc") == 0xE71FA2190541574B); } ``` ```c= size_t fnv_hash(const char* key) { const size_t FNV_PRIME = 0x00000100000001B3; const size_t FNV_OFFSET_BASIS = 0xcbf29ce484222325; size_t h = FNV_OFFSET_BASIS; for (int i = 0; i < strlen(key); ++i) { h = h^key[i]; h = h*FNV_PRIME; } return h; } ``` ### Activity 6: Implement hash-based indexing ```c // returns the modulo index of a string key based on its FNV-1a hash size_t hash_index_str(const char* key, size_t M); int main(void) { assert(hash_index_str("", 4201) == 2552); assert(hash_index_str("a", 4201) == 3432); assert(hash_index_str("ab", 4201) == 618); assert(hash_index_str("abc", 4201) == 1350); } ``` | key | index | | ---------- | ----- | | Alice | 1 | | in | 9 | | Wonderland | 22 | | potion | 20 | | tea | 6 | | rabbit | 24 | ### Activity 7: Function pointers ```c typedef void (*myfunc)(int); void foo(int value) { printf("The answer is %d\n", value); } void bar(myfunc f) { f(42); } int main(void) { myfunc fun = foo; bar(fun); } ``` Fun is een pointer naar een functie ### Activity 8: Hashing and equality testing ```c int main(void) { hashmap_t map; map_init(&map, 7, equal_strings, hash_index_str); assert(map.eq_fun("Alice", "Alice")); assert(!map.eq_fun("Alice", "alice")); assert(map.hash_fun("Alice", 31) == 1); assert(map.hash_fun("alice", 31) == 28); } ``` ```c= bool compare(const char* s1, const char* s2) { // TODO: Activity 8 if (strcmp(s1, s2) == 0) { return true;// match! } return false; } ``` ### Activity 9: Implement lookup ```c size_t simple_hash(const char* key, size_t size) { return strlen(key) % size; } int main(void) { hashmap_t map; map_init(&map, 7, compare, simple_hash); list_prepend(&map.slots[5], make_pair("Alice", 42)); list_prepend(&map.slots[3], make_pair("Bob", 84)); list_prepend(&map.slots[5], make_pair("Bob Dobalina", 126)); list_prepend(&map.slots[5], make_pair("Cindy", 6)); kv_pair_t * p1 = map_lookup(&map, "Joe"); kv_pair_t * p2 = map_lookup(&map, "Alice"); kv_pair_t * p3 = map_lookup(&map, "Bob"); kv_pair_t * p4 = map_lookup(&map, "Bob Dobalina"); kv_pair_t * p5 = map_lookup(&map, "Cindy"); assert(p1 == NULL); assert(p2 != NULL && p3 != NULL && p4 != NULL && p5 != NULL); assert(strcmp(p2->key, "Alice") == 0); assert(p2->value == 42); assert(strcmp(p3->key, "Bob") == 0); assert(p3->value == 84); assert(strcmp(p4->key, "Bob Dobalina") == 0); assert(p4->value == 126); assert(strcmp(p5->key, "Cindy") == 0); assert(p5->value == 6); } ``` ```c= kv_pair_t *map_lookup(const hashmap_t *map, key_type key) { (void) key; (void) map; // TODO: Activity 9 - return the key-value pair by pointer if it is present // otherwise, return NULL size_t index = map->hash_fun(key, map->capacity); element_t linkedList = map->slots[index]; node_t *node = linkedList.head; while(!map->eq_fun(node->kv_pair->key, key)){ if(node->next == NULL) return NULL; node = node->next; } return node->kv_pair; // steps: // - compute index of the slot at which the key is stored // - search the linked list for the key // - if found, return the key-value pair, otherwise return NULL return NULL; } ``` ### Activity 10: Implementation time ```c size_t simple_hash(const char* key, size_t size) { return strlen(key) % size; } int main(void) { map_init(&map, 7, compare, simple_hash); map_insert(&map, "Alice", 24); assert(map.count == 1); map_insert(&map, "Alice", 42); assert(map.count == 1); map_insert(&map, "Bob", 84); map_insert(&map, "Bob Dobalina", 126); map_insert(&map, "Cindy", 6); assert(map.count == 4); kv_pair_t * p2 = map_lookup(&map, "Alice"); kv_pair_t * p3 = map_lookup(&map, "Bob"); kv_pair_t * p4 = map_lookup(&map, "Bob Dobalina"); kv_pair_t * p5 = map_lookup(&map, "Cindy"); assert(p2 != NULL && p3 != NULL && p4 != NULL && p5 != NULL); assert(strcmp(p2->key, "Alice") == 0); assert(p2->value == 42); assert(strcmp(p3->key, "Bob") == 0); assert(p3->value == 84); assert(strcmp(p4->key, "Bob Dobalina") == 0); assert(p4->value == 126); assert(strcmp(p5->key, "Cindy") == 0); assert(p5->value == 6); } ``` ```c= void map_remove(hashmap_t *map, key_type key) { (void) key; (void) map; // TODO: Activity 10 - remove the key-value pair from the hashmap if it is present // steps: // - compute index of the slot at which the key is stored // - search the linked list for the key // - if found, remove the corresponding node from the list (see the list_remove function) // - decrement the count of pairs in the map if something was removed size_t index = map->hash_fun(key, map->capacity); element_t *linkedList = &map->slots[index]; node_t *node = linkedList->head; bool found = true; while(!map->eq_fun(node->kv_pair->key, key)){ if(node->next == NULL) { found = false; break; } } if(found){ list_remove(linkedList, node); map->count --; } } ``` ```c= bool map_insert(hashmap_t *map, key_type key, value_type value) { (void) map; (void) key; (void) value; // TODO: Activity 10 - if the key is not in the hashmap, add the // key-value pair. // otherwise, update the current value with the given value // steps: // - compute index of the slot at which the key is stored // - search the linked list for the key // - update the value if the key was found, or // - prepend a new pair to the list if it wasn't found (see the list_prepend function) // - increment the count of pairs in the map (only if a new pair was added) // - return true if a new pair was added, and false otherwise size_t index = map->hash_fun(key, map->capacity); element_t *linkedList = &map->slots[index]; node_t *node = linkedList->head; if(node == NULL){ kv_pair_t *newPair = make_pair(key, value); list_prepend(linkedList, newPair); map->count++; return true; } bool found = true; while(!map->eq_fun(node->kv_pair->key, key)){ if(node->next == NULL) { found = false; break; } node = node->next; } if(found){ node->kv_pair->value = value; } else{ kv_pair_t *newPair = make_pair(key, value); list_prepend(linkedList, newPair); map->count++; return true; } return false; } ``` ### Activity 11: Word counting ```c int main(void) { hashmap_t map; map_init(&map, 31, compare, hash_index_str); process_file("short.txt", &map); map_print(&map); map_destroy(&map); } ``` | word | frequency | | ------- | --------- | | "ipsum" | 2 | | "dolor" |4| | "sit" |3| | "amet" |3| ### Activity 12: Tracking load statistics | Capacity | 4201 | 6101 | 15149 | | ---------------------------- | ---- | ---- | ----- | | Fraction of free slots | 0.44 |0.565645|0.793584| | Load factor | 0.83 | 0.569907|0.229520| | Slots with more than 1 items |861 |676|329| | Length of longest chain |6 |5|4| | Keys in longest chain |GIVE clearer meanwhile caterpillar stretching Queen|jogged already panting tunnel again|lot tasted called Are| Capacity smaller than 200000 for which no slot has more than one word stored after processing "alice0.txt": ### Activity 13: Time complexity | Operation | Time complexity | | --------- | --------------- | | Lookup | O(1) | | Insert | O(1) | | Remove | O(1) | ### Activity 14: Implement rehashing (optional) ```c bool map_rehash(hashmap_t *original) { // if no need to rehash, then do nothing if (original->count * 10 < original->capacity * 7) return false; // set new capacity so that load factor will become 0.5 size_t new_capacity = original->count * 2; // initialize new dynamic array for storing new slots list_t * new_slots = malloc(sizeof(list_t) * new_capacity); if (new_slots == NULL) return false; // TODO: Activity 14 - implement rehashing // Deallocate now-empty array of slots free(original->slots); // Update hashmap to use new slots and new capacity original->slots = new_slots; original->capacity = new_capacity; return true; } ``` ## Looking back ### What we've learnt Formulate at least one lesson learned. ### What were the surprises Fill in... ### What problems we've encountered Fill in... ### What was or still is unclear Fill in... ### How did the group perform? How was the collaboration? What were the reasons for hick-ups? What worked well? What can be improved next time? > Written with [StackEdit](https://stackedit.io/).