# Week 8 - Hashing
## Team
Team name:
Date:
Members
| Role | Name |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| **Facilitator** keeps track of time, assigns tasks and makes sure all the group members are heard and that decisions are agreed upon. | |
| **Spokesperson** communicates group’s questions and problems to the teacher and talks to other teams; presents the group’s findings. | |
| **Reflector** observes and assesses the interactions and performance among team members. Provides positive feedback and intervenes with suggestions to improve groups’ processes. | |
| **Recorder** guides consensus building in the group by recording answers to questions. Collects important information and data. | |
## Activities
Make sure to have the activities signed off regularly to ensure progress is tracked.
Download the provided project and open it in CLion.
### Activity 1: The costs of maintaining order
18
### Activity 2: Calculate the moduli
| id | 8609 | 6348 | 4992 | 5839 | 1622 | 5450 | 7198 | 4827 |
| ---------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| **mod 8** | 1 | 4 | 0 | 7 | 6 | 2 | 6 |3|
| **mod 16** | 1 | 12 | 0 | 15 | 6 | 10 | 11 | 11 |
| **mod 17** | 7 | 7 | 11 | 8 | 7 | 10 | 7 | 16 |
### Activity 3: Hash functions
Hash functie is een fucntie die maar een richting op kan. Een goede hash functie krijgt zelden de zelfde waarden uit verschillende invoer.
### Activity 4: A simple hash function
Omdat hier vaak de zelfde waarde uitkomt
### Activity 5: Implement FNV-1a
```c
int main(void) {
assert(fnv_hash("") == 0xcbf29ce484222325);
assert(fnv_hash("a") == 0xAF63DC4C8601EC8C);
assert(fnv_hash("ab") == 0x89C4407B545986A);
assert(fnv_hash("abc") == 0xE71FA2190541574B);
}
```
```c=
size_t fnv_hash(const char* key) {
const size_t FNV_PRIME = 0x00000100000001B3;
const size_t FNV_OFFSET_BASIS = 0xcbf29ce484222325;
size_t h = FNV_OFFSET_BASIS;
for (int i = 0; i < strlen(key); ++i) {
h = h^key[i];
h = h*FNV_PRIME;
}
return h;
}
```
### Activity 6: Implement hash-based indexing
```c
// returns the modulo index of a string key based on its FNV-1a hash
size_t hash_index_str(const char* key, size_t M);
int main(void) {
assert(hash_index_str("", 4201) == 2552);
assert(hash_index_str("a", 4201) == 3432);
assert(hash_index_str("ab", 4201) == 618);
assert(hash_index_str("abc", 4201) == 1350);
}
```
| key | index |
| ---------- | ----- |
| Alice | 1 |
| in | 9 |
| Wonderland | 22 |
| potion | 20 |
| tea | 6 |
| rabbit | 24 |
### Activity 7: Function pointers
```c
typedef void (*myfunc)(int);
void foo(int value) {
printf("The answer is %d\n", value);
}
void bar(myfunc f) {
f(42);
}
int main(void) {
myfunc fun = foo;
bar(fun);
}
```
Fun is een pointer naar een functie
### Activity 8: Hashing and equality testing
```c
int main(void) {
hashmap_t map;
map_init(&map, 7, equal_strings, hash_index_str);
assert(map.eq_fun("Alice", "Alice"));
assert(!map.eq_fun("Alice", "alice"));
assert(map.hash_fun("Alice", 31) == 1);
assert(map.hash_fun("alice", 31) == 28);
}
```
```c=
bool compare(const char* s1, const char* s2) {
// TODO: Activity 8
if (strcmp(s1, s2) == 0) {
return true;// match!
}
return false;
}
```
### Activity 9: Implement lookup
```c
size_t simple_hash(const char* key, size_t size) {
return strlen(key) % size;
}
int main(void) {
hashmap_t map;
map_init(&map, 7, compare, simple_hash);
list_prepend(&map.slots[5], make_pair("Alice", 42));
list_prepend(&map.slots[3], make_pair("Bob", 84));
list_prepend(&map.slots[5], make_pair("Bob Dobalina", 126));
list_prepend(&map.slots[5], make_pair("Cindy", 6));
kv_pair_t * p1 = map_lookup(&map, "Joe");
kv_pair_t * p2 = map_lookup(&map, "Alice");
kv_pair_t * p3 = map_lookup(&map, "Bob");
kv_pair_t * p4 = map_lookup(&map, "Bob Dobalina");
kv_pair_t * p5 = map_lookup(&map, "Cindy");
assert(p1 == NULL);
assert(p2 != NULL && p3 != NULL && p4 != NULL && p5 != NULL);
assert(strcmp(p2->key, "Alice") == 0);
assert(p2->value == 42);
assert(strcmp(p3->key, "Bob") == 0);
assert(p3->value == 84);
assert(strcmp(p4->key, "Bob Dobalina") == 0);
assert(p4->value == 126);
assert(strcmp(p5->key, "Cindy") == 0);
assert(p5->value == 6);
}
```
```c=
kv_pair_t *map_lookup(const hashmap_t *map, key_type key) {
(void) key;
(void) map;
// TODO: Activity 9 - return the key-value pair by pointer if it is present
// otherwise, return NULL
size_t index = map->hash_fun(key, map->capacity);
element_t linkedList = map->slots[index];
node_t *node = linkedList.head;
while(!map->eq_fun(node->kv_pair->key, key)){
if(node->next == NULL) return NULL;
node = node->next;
}
return node->kv_pair;
// steps:
// - compute index of the slot at which the key is stored
// - search the linked list for the key
// - if found, return the key-value pair, otherwise return NULL
return NULL;
}
```
### Activity 10: Implementation time
```c
size_t simple_hash(const char* key, size_t size) {
return strlen(key) % size;
}
int main(void) {
map_init(&map, 7, compare, simple_hash);
map_insert(&map, "Alice", 24);
assert(map.count == 1);
map_insert(&map, "Alice", 42);
assert(map.count == 1);
map_insert(&map, "Bob", 84);
map_insert(&map, "Bob Dobalina", 126);
map_insert(&map, "Cindy", 6);
assert(map.count == 4);
kv_pair_t * p2 = map_lookup(&map, "Alice");
kv_pair_t * p3 = map_lookup(&map, "Bob");
kv_pair_t * p4 = map_lookup(&map, "Bob Dobalina");
kv_pair_t * p5 = map_lookup(&map, "Cindy");
assert(p2 != NULL && p3 != NULL && p4 != NULL && p5 != NULL);
assert(strcmp(p2->key, "Alice") == 0);
assert(p2->value == 42);
assert(strcmp(p3->key, "Bob") == 0);
assert(p3->value == 84);
assert(strcmp(p4->key, "Bob Dobalina") == 0);
assert(p4->value == 126);
assert(strcmp(p5->key, "Cindy") == 0);
assert(p5->value == 6);
}
```
```c=
void map_remove(hashmap_t *map, key_type key) {
(void) key;
(void) map;
// TODO: Activity 10 - remove the key-value pair from the hashmap if it is present
// steps:
// - compute index of the slot at which the key is stored
// - search the linked list for the key
// - if found, remove the corresponding node from the list (see the list_remove function)
// - decrement the count of pairs in the map if something was removed
size_t index = map->hash_fun(key, map->capacity);
element_t *linkedList = &map->slots[index];
node_t *node = linkedList->head;
bool found = true;
while(!map->eq_fun(node->kv_pair->key, key)){
if(node->next == NULL) {
found = false;
break;
}
}
if(found){
list_remove(linkedList, node);
map->count --;
}
}
```
```c=
bool map_insert(hashmap_t *map, key_type key, value_type value) {
(void) map;
(void) key;
(void) value;
// TODO: Activity 10 - if the key is not in the hashmap, add the
// key-value pair.
// otherwise, update the current value with the given value
// steps:
// - compute index of the slot at which the key is stored
// - search the linked list for the key
// - update the value if the key was found, or
// - prepend a new pair to the list if it wasn't found (see the list_prepend function)
// - increment the count of pairs in the map (only if a new pair was added)
// - return true if a new pair was added, and false otherwise
size_t index = map->hash_fun(key, map->capacity);
element_t *linkedList = &map->slots[index];
node_t *node = linkedList->head;
if(node == NULL){
kv_pair_t *newPair = make_pair(key, value);
list_prepend(linkedList, newPair);
map->count++;
return true;
}
bool found = true;
while(!map->eq_fun(node->kv_pair->key, key)){
if(node->next == NULL) {
found = false;
break;
}
node = node->next;
}
if(found){
node->kv_pair->value = value;
}
else{
kv_pair_t *newPair = make_pair(key, value);
list_prepend(linkedList, newPair);
map->count++;
return true;
}
return false;
}
```
### Activity 11: Word counting
```c
int main(void) {
hashmap_t map;
map_init(&map, 31, compare, hash_index_str);
process_file("short.txt", &map);
map_print(&map);
map_destroy(&map);
}
```
| word | frequency |
| ------- | --------- |
| "ipsum" | 2 |
| "dolor" |4|
| "sit" |3|
| "amet" |3|
### Activity 12: Tracking load statistics
| Capacity | 4201 | 6101 | 15149 |
| ---------------------------- | ---- | ---- | ----- |
| Fraction of free slots | 0.44 |0.565645|0.793584|
| Load factor | 0.83 | 0.569907|0.229520|
| Slots with more than 1 items |861 |676|329|
| Length of longest chain |6 |5|4|
| Keys in longest chain |GIVE clearer meanwhile caterpillar stretching Queen|jogged already panting tunnel again|lot tasted called Are|
Capacity smaller than 200000 for which no slot has more than one word stored after processing "alice0.txt":
### Activity 13: Time complexity
| Operation | Time complexity |
| --------- | --------------- |
| Lookup | O(1) |
| Insert | O(1) |
| Remove | O(1) |
### Activity 14: Implement rehashing (optional)
```c
bool map_rehash(hashmap_t *original) {
// if no need to rehash, then do nothing
if (original->count * 10 < original->capacity * 7) return false;
// set new capacity so that load factor will become 0.5
size_t new_capacity = original->count * 2;
// initialize new dynamic array for storing new slots
list_t * new_slots = malloc(sizeof(list_t) * new_capacity);
if (new_slots == NULL) return false;
// TODO: Activity 14 - implement rehashing
// Deallocate now-empty array of slots
free(original->slots);
// Update hashmap to use new slots and new capacity
original->slots = new_slots;
original->capacity = new_capacity;
return true;
}
```
## Looking back
### What we've learnt
Formulate at least one lesson learned.
### What were the surprises
Fill in...
### What problems we've encountered
Fill in...
### What was or still is unclear
Fill in...
### How did the group perform?
How was the collaboration? What were the reasons for hick-ups? What worked well? What can be improved next time?
> Written with [StackEdit](https://stackedit.io/).