# Week 8 - Hashing
## Team
Team name: Gappies van andere wappies
Date: 10-05-2022
Members
| Role | Name |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| **Facilitator** keeps track of time, assigns tasks and makes sure all the group members are heard and that decisions are agreed upon. |Stephan|
| **Spokesperson** communicates group’s questions and problems to the teacher and talks to other teams; presents the group’s findings. |Ingmar|
| **Reflector** observes and assesses the interactions and performance among team members. Provides positive feedback and intervenes with suggestions to improve groups’ processes. |Ingmar|
| **Recorder** guides consensus building in the group by recording answers to questions. Collects important information and data. |Stijn|
## Activities
Make sure to have the activities signed off regularly to ensure progress is tracked.
Download the provided project and open it in CLion.
### Activity 1: The costs of maintaining order✅
19
### Activity 2: Calculate the moduli ✔️
| id | 8609 | 6348 | 4992 | 5839 | 1622 | 5450 | 7198 | 4827 |
| ---------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| **mod 8** | 1 |4 |0 | 7 |6 | 2 | 6 | 3 |
| **mod 16** | 1 |12 |0 |15 |6 | 10 |14 | 11 |
| **mod 17** | 7 |7 |11 | 8 |7 | 10 | 7 | 16 |
### Activity 3: Hash functions✅
Hash functions are functions that transform keys into adresses, this can both be done to numbers and strings, the strings are first transformed by using the ASCII chart.
good: They are very efficient in terms of data storage, they only need a little more storage than the data it self.
bad: hash functions have the possibility of collision, this means say for example we have an array of 10, and the key is 100, or 1000 they will both be placed at position 0. Thus causing an collision
### Activity 4: A simple hash function✅
Because in the worst case, 21.19% of the words will have the same hash.
### Activity 5: Implement FNV-1a✅
```c
size_t fnv_hash(const char* key) {
const size_t FNV_PRIME = 0x00000100000001B3;
const size_t FNV_OFFSET_BASIS = 0xcbf29ce484222325;
unsigned long long i = 0;
size_t h = FNV_OFFSET_BASIS;
while (i != strlen(key)) {
h = h^key[i];
h = h * FNV_PRIME;
i++;
}
return h;
}
```
### Activity 6: Implement hash-based indexing✅
```c
// returns the modulo index of a string key based on its FNV-1a hash
size_t hash_index_str(const char* key, size_t size) {
return fnv_hash(key) % size;
}
```
| key | index |
| ---------- | ----- |
| Alice | 1 |
| in | 9 |
| Wonderland | 22 |
| potion | 20 |
| tea | 6 |
| rabbit | 24 |
### Activity 7: Function pointers✅
```c
typedef void (*myfunc)(int);
void foo(int value) {
printf("The answer is %d\n", value);
}
void bar(myfunc f) {
f(42);
}
int main(void) {
myfunc fun = foo;
bar(fun);
}
```
The variable fun represents an instance of the foo function as a variable, so it can be passed around to other functions.
The bar function takes in another function, and calls that function with the argument '42'
### Activity 8: Hashing and equality testing ✅
```c
bool compare(const char* s1, const char* s2) {
return strcmp(s1, s2) == 0;
}
```
### Activity 9: Implement lookup ✅
```c
kv_pair_t *map_lookup(const hashmap_t *map, key_type key) {
size_t index = map->hash_fun(key, map->capacity);
node_t * current = map->slots[index].head;
while (current != NULL) {
if (compare(current->kv_pair->key, key)) return current->kv_pair;
current = current->next;
}
return NULL;
}
```
### Activity 10: Implementation time ✅
```c
bool map_insert(hashmap_t *map, key_type key, value_type value) {
kv_pair_t * ptr = map_lookup(map, key);
if (ptr != NULL) {
ptr->value = value;
return false;
}
size_t index = map->hash_fun(key, map->capacity);
list_prepend(&map->slots[index], make_pair(key, value));
map->count++;
return true;
}
void map_remove(hashmap_t *map, key_type key) {
(void) key;
(void) map;
kv_pair_t * ptr = map_lookup(map, key);
if (ptr != NULL) {
size_t index = map->hash_fun(key, map->capacity);
node_t * node = map->slots[index].head;
while (node->kv_pair->key != key) {
node = node->next;
}
list_remove(&map->slots[index], node);
map->count--;
}
}
```
### Activity 11: Word counting ✅
```c
void process_line(char* line, hashmap_t * map) {
const char* sep = "\t ()[]_,.;:?!/\\\"\n'";
char* token = strtok(line, sep);
while (token) {
if (strlen(token)) {
kv_pair_t * kvPair = map_lookup(map, token);
if (kvPair == NULL) {
map_insert(map, token, 1);
} else {
map_insert(map, token, kvPair->value + 1);
}
}
token = strtok(NULL, sep);
}
}
```
| word | frequency |
| ------- | --------- |
| "ipsum" | 2 |
| "dolor" | 4 |
| "sit" | 3 |
| "amet" | 3 |
### Activity 12: Tracking load statistics ✅
| Capacity | 4201 | 6101 | 15149 |
| ---------------------------- | ---- | ---- | ----- |
| Fraction of free slots | 0.44 | 0.57 | 0.79 |
| Load factor | 0.83 | 0.57 | 0.23 |
| Slots with more than 1 items | 861 | 676 | 329 |
| Length of longest chain | 6 | 5 | 4 |
| Keys in longest chain | [GIVE: 1, clearer: 1, meanwhile: 1, caterpillar: 1, stretching: 1, Queen: 75] | [jogged: 1, already: 3, panting: 2, tunnel: 1, again: 83] | [lot: 1, tasted: 3, called: 15, Are: 5] |
Capacity smaller than 200000 for which no slot has more than one word stored after processing "alice0.txt":
### Activity 13: Time complexity ✅
| Operation | Time complexity |
| --------- | --------------- |
| Lookup | O(1) |
| Insert | O(1) |
| Remove | O(1) |
### Activity 14: Implement rehashing (optional)
```c
bool map_rehash(hashmap_t *original) {
// if no need to rehash, then do nothing
if (original->count * 10 < original->capacity * 7) return false;
// set new capacity so that load factor will become 0.5
size_t new_capacity = original->count * 2;
// initialize new dynamic array for storing new slots
list_t * new_slots = malloc(sizeof(list_t) * new_capacity);
if (new_slots == NULL) return false;
// TODO: Activity 14 - implement rehashing
// Deallocate now-empty array of slots
free(original->slots);
// Update hashmap to use new slots and new capacity
original->slots = new_slots;
original->capacity = new_capacity;
return true;
}
```
## Looking back
### What we've learnt
how to use hash maps and where we can implement them(in real life)
### What were the surprises
that they were a lot of advantages, but still some major disadvantages
### What problems we've encountered
That there were more collisions than we would have liked to
### What was or still is unclear
We asked the things which were unclear in class
### How did the group perform?
perfect, we divided the tasks as usual