# Week 8 - Hashing ## Team Team name: Peach Technologies Date: Members | Role | Name | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------| | **Facilitator** keeps track of time, assigns tasks and makes sure all the group members are heard and that decisions are agreed upon. | Rimma Davletova 508350 | | **Spokesperson** communicates group’s questions and problems to the teacher and talks to other teams; presents the group’s findings. | Yazan Tayeh 473427 | | **Reflector** observes and assesses the interactions and performance among team members. Provides positive feedback and intervenes with suggestions to improve groups’ processes. | Husam Kanoa 512881 | | **Recorder** guides consensus building in the group by recording answers to questions. Collects important information and data. | Nafiz Redwan 515199 | ## Activities Make sure to have the activities signed off regularly to ensure progress is tracked. Download the provided project and open it in CLion. ### Activity 1: The costs of maintaining order 23 ``record_pair_ts`` will be moved ### Activity 2: Calculate the moduli | id | 8609 | 6348 | 4992 | 5839 | 1622 | 5450 | 7198 | 4827 | | ---------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | | **mod 8** | 1 | 4 | 0 | 7 | 6 | 2 | 6 | 3 | | **mod 16** | 1 | 12 | 0 | 15 | 6 | 10 | 14 | 11 | | **mod 17** | 7 | 7 | 11 | 8 | 7 | 10 | 7 | 16 | ### Activity 3: Hash functions The hash function is a function that uses the constant-time operation to store and retrieve the value from the hash table, which is applied on the keys as integers and this is used as the address for values in the hash table. ### Activity 4: A simple hash function ``simple_hash`` is not a good hash function, because the most common words have the same lengths. ### Activity 5: Implement FNV-1a ```c int main(void) { assert(fnv_hash("") == 0xcbf29ce484222325); assert(fnv_hash("a") == 0xAF63DC4C8601EC8C); assert(fnv_hash("ab") == 0x89C4407B545986A); assert(fnv_hash("abc") == 0xE71FA2190541574B); } size_t fnv_hash(const char* key) { const size_t FNV_PRIME = 0x00000100000001B3; const size_t FNV_OFFSET_BASIS = 0xcbf29ce484222325; // TODO: Activity 5 size_t h = FNV_OFFSET_BASIS; for (const char* p = key; *p; p++) { h ^= (size_t)(unsigned char)(*p); h *= FNV_PRIME; } return h; } ``` ### Activity 6: Implement hash-based indexing ```c // returns the modulo index of a string key based on its FNV-1a hash size_t hash_index_str(const char* key, size_t M); int main(void) { assert(hash_index_str("", 4201) == 2552); assert(hash_index_str("a", 4201) == 3432); assert(hash_index_str("ab", 4201) == 618); assert(hash_index_str("abc", 4201) == 1350); } ``` | key | index | | ---------- | ----- | | Alice | 1 | | in | 9 | | Wonderland | 22 | | potion | 20 | | tea | 6 | | rabbit | 31 | ### Activity 7: Function pointers ```c typedef void (*myfunc)(int); void foo(int value) { printf("The answer is %d\n", value); } void bar(myfunc f) { f(42); } int main(void) { myfunc fun = foo; bar(fun); } ``` The variable ``fun`` represents a function, and the function ``bar`` receives a function pointer and calls that function passing the number ``42`` as an argument ### Activity 8: Hashing and equality testing ```c int main(void) { hashmap_t map; map_init(&map, 7, equal_strings, hash_index_str); assert(map.eq_fun("Alice", "Alice")); assert(!map.eq_fun("Alice", "alice")); assert(map.hash_fun("Alice", 31) == 1); assert(map.hash_fun("alice", 31) == 28); } bool equal_strings(const char* s1, const char* s2) { // TODO: Activity 8 if(strlen(s1) == strlen(s2)){ for(size_t i=0; i< strlen(s1); i++){ if(*(s1 +i) != *(s2 + i)){ return false; } } }else{ return false; } return true; } ``` ### Activity 9: Implement lookup ```c size_t simple_hash(const char* key, size_t size) { return strlen(key) % size; } int main(void) { hashmap_t map; map_init(&map, 7, compare, simple_hash); list_prepend(&map.slots[5], make_pair("Alice", 42)); list_prepend(&map.slots[3], make_pair("Bob", 84)); list_prepend(&map.slots[5], make_pair("Bob Dobalina", 126)); list_prepend(&map.slots[5], make_pair("Cindy", 6)); kv_pair_t * p1 = map_lookup(&map, "Joe"); kv_pair_t * p2 = map_lookup(&map, "Alice"); kv_pair_t * p3 = map_lookup(&map, "Bob"); kv_pair_t * p4 = map_lookup(&map, "Bob Dobalina"); kv_pair_t * p5 = map_lookup(&map, "Cindy"); assert(p1 == NULL); assert(p2 != NULL && p3 != NUlL && p4 != NULL && p5 != NULL); assert(strcmp(p2->key, "Alice") == 0); assert(p2->value == 42); assert(strcmp(p3->key, "Bob") == 0); assert(p3->value == 84); assert(strcmp(p4->key, "Bob Dobalina") == 0); assert(p4->value == 126); assert(strcmp(p5->key, "Cindy") == 0); assert(p5->value == 6); } kv_pair_t *map_lookup(const hashmap_t *map, key_type key) { (void) key; (void) map; size_t index = map->hash_fun(key, map->capacity); node_t *current = map->slots[index].head; while(current != NULL){ if(map->eq_fun(current->kv_pair->key, key)){ return current->kv_pair; }else{ current = current->next; } } return NULL; } ``` ### Activity 10: Implementation time ```c size_t simple_hash(const char* key, size_t size) { return strlen(key) % size; } int main(void) { map_init(&map, 7, compare, simple_hash); map_insert(&map, "Alice", 24); assert(map.count == 1); map_insert(&map, "Alice", 42); assert(map.count == 1); map_insert(&map, "Bob", 84); map_insert(&map, "Bob Dobalina", 126); map_insert(&map, "Cindy", 6); assert(map.count == 4); kv_pair_t * p2 = map_lookup(&map, "Alice"); kv_pair_t * p3 = map_lookup(&map, "Bob"); kv_pair_t * p4 = map_lookup(&map, "Bob Dobalina"); kv_pair_t * p5 = map_lookup(&map, "Cindy"); assert(p2 != NULL && p3 != NUlL && p4 != NULL && p5 != NULL); assert(strcmp(p2->key, "Alice") == 0); assert(p2->value == 42); assert(strcmp(p3->key, "Bob") == 0); assert(p3->value == 84); assert(strcmp(p4->key, "Bob Dobalina") == 0); assert(p4->value == 126); assert(strcmp(p5->key, "Cindy") == 0); assert(p5->value == 6); } bool map_insert(hashmap_t *map, key_type key, value_type value) { (void) map; (void) key; (void) value; size_t index = map->hash_fun(key, map->capacity); node_t *current = map->slots[index].head; while(current != NULL){ if(map->eq_fun(current->kv_pair->key, key)){ current->kv_pair->value = value; return false; }else{ current = current->next; } } kv_pair_t *newPair = make_pair(key,value); list_prepend(&map->slots[index],newPair); map->count++; return true; } void map_remove(hashmap_t *map, key_type key) { (void) key; (void) map; size_t index = map->hash_fun(key, map->capacity); node_t *current = map->slots[index].head; while(current != NULL){ if(map->eq_fun(current->kv_pair->key, key)){ list_remove(&map->slots[index],current); map->count--; return; }else{ current = current->next; } } } ``` ### Activity 11: Word counting ```c int main(void) { hashmap_t map; map_init(&map, 31, compare, hash_index_str); process_file("short.txt", &map); map_print(&map); map_destroy(&map); } ``` | word | frequency | | ------- | --------- | | "ipsum" | 2 | | "dolor" | 4 | | "sit" | 3 | | "amet" | 3 | ### Activity 12: Tracking load statistics | Capacity | 4201 | 6101 | 15149 | | ---------------------------- | ---- | ---- | ----- | | Fraction of free slots | 0.44 | 0.43 | 0.77 | | Load factor | 0.83 | 0.57 | 0.23 | | Slots with more than 1 items | 861 | 676 | 329 | | Length of longest chain | 6 | 5 | 4 | | Keys in longest chain |GIVE, clearer, meanwhile, caterpillar, stretching,Queen |jogged, already, panting, tunnel, again | lot, tasted, called, Are | Capacity smaller than 200000 for which no slot has more than one word stored after processing "alice0.txt": ### Activity 13: Time complexity | Operation | Time complexity | | --------- | --------------- | | Lookup | O(n) | | Insert | O(n) | | Remove | O(n) | ### Activity 14: Implement rehashing (optional) ```c bool map_rehash(hashmap_t *original) { // if no need to rehash, then do nothing if (original->count * 10 < original->capacity * 7) return false; // set new capacity so that load factor will become 0.5 size_t new_capacity = original->count * 2; // initialize new dynamic array for storing new slots list_t * new_slots = malloc(sizeof(list_t) * new_capacity); if (new_slots == NULL) return false; // TODO: Activity 14 - implement rehashing // Deallocate now-empty array of slots free(original->slots); // Update hashmap to use new slots and new capacity original->slots = new_slots; original->capacity = new_capacity; return true; } ``` ## Looking back ### What we've learnt Formulate at least one lesson learned. ### What were the surprises Fill in... ### What problems we've encountered Fill in... ### What was or still is unclear Fill in... ### How did the group perform? How was the collaboration? What were the reasons for hick-ups? What worked well? What can be improved next time? > Written with [StackEdit](https://stackedit.io/).