# Week 8 - Hashing
## Team
Team name: Error
Date: 29/04/2022
Members
| Role | Name |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| **Facilitator** keeps track of time, assigns tasks and makes sure all the group members are heard and that decisions are agreed upon. |Thanh 516467 |
| **Spokesperson** communicates group’s questions and problems to the teacher and talks to other teams; presents the group’s findings. |Kaloyan 511216 |
| **Reflector** observes and assesses the interactions and performance among team members. Provides positive feedback and intervenes with suggestions to improve groups’ processes. |Minh 511907 |
| **Recorder** guides consensus building in the group by recording answers to questions. Collects important information and data. |Hoa 487272 |
## Activities
Make sure to have the activities signed off regularly to ensure progress is tracked.
Download the provided project and open it in CLion.
### Activity 1: The costs of maintaining order
19 record_pair items will need to be moved
### Activity 2: Calculate the moduli
| id | 8609 | 6348 | 4992 | 5839 | 1622 | 5450 | 7198 | 4827 |
| ---------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| **mod 8** | 1 | 4 | 0 |7 |6 |2 | 6 |7 |
| **mod 16** | 1 | 12 | 0 |15 |6 |10 |14 |15 |
| **mod 17** | 7 | 7 | 11 |8 |7 |10 |7 |8 |
The modula operation has converted the IDs into a smaller value which can be used to obtain an indice to store value to the array.
### Activity 3: Hash functions
Hash functions are used as a technique to map a big number or string to a small integer that can be used as the index in the hash table.
**Properties of a hash function:**
- Generate different hash values for the similar strings
- The hash is much smaller than the input data
- Efficiency in operation: Computationally hash functions are much faster than original one
- Should minimize the number of collisions
**A few examples:**
- In a project, the project manager assign a unique ID number to team members. Then, the manager can use that ID number to retrieve information about that team member.
- In a library, each book will be assigned a unique code. Then, this code is used to retrieve the status and position of the book.
**A good hash function needs to have:**
- Efficiently computable
- Should uniformly distribute the data across the entire set of possible hash values
- Should minimize the number of collisions
- It uses all the input data
### Activity 4: A simple hash function
The simple_hash is not a good hash function because it does not create the unique value for every input.
### Activity 5: Implement FNV-1a
```c
size_t fnv_hash(const char* key) {
size_t FNV_PRIME = 0x00000100000001B3;
size_t FNV_OFFSET_BASIS = 0xcbf29ce484222325;
// TODO: Activity 5
while(*(key)!='\0'){
FNV_OFFSET_BASIS=FNV_OFFSET_BASIS^(*key);
FNV_OFFSET_BASIS=FNV_OFFSET_BASIS*FNV_PRIME;
key++;
}
size_t h = FNV_OFFSET_BASIS;
return h;
}
int main(void) {
printf("FNV: 0x%llx\n",fnv_hash(""));
printf("FNV: 0x%llx\n",fnv_hash("a"));
printf("FNV: 0x%llx\n",fnv_hash("ab"));
printf("FNV: 0x%llx\n",fnv_hash("abc"));
assert(fnv_hash("") == 0xcbf29ce484222325);
assert(fnv_hash("a") == 0xAF63DC4C8601EC8C);
assert(fnv_hash("ab") == 0x89C4407B545986A);
assert(fnv_hash("abc") == 0xE71FA2190541574B);
}
```
### Activity 6: Implement hash-based indexing
```c
size_t hash_index_str(const char* key, size_t size) {
// TODO: Activity 6
size_t hash=fnv_hash(&key[0]);
return hash%size;
}
int main(void) {
assert(hash_index_str("", 4201) == 2552);
assert(hash_index_str("a", 4201) == 3432);
assert(hash_index_str("ab", 4201) == 618);
assert(hash_index_str("abc", 4201) == 1350);
}
```
| key | index |
| ---------- | ----- |
| Alice | 1 |
| in | 9 |
| Wonderland | 22 |
| potion | 20 |
| tea | 6 |
| rabbit | 24 |
### Activity 7: Function pointers
```c
typedef void (*myfunc)(int);
void foo(int value) {
printf("The answer is %d\n", value);
}
void bar(myfunc f) {
f(42);
}
int main(void) {
myfunc fun = foo;
bar(fun);
}
```
So the program will print “The answer is 42” in the console. The variable fun contains the address of function “foo”. The function bar will evoke the function “foo” to print the statement.
### Activity 8: Hashing and equality testing
```c
bool equal_strings(const char *s1, const char *s2) {
// TODO: Activity 8
if (strcmp(s1,s2)==0) return true;
return false;
}
int main(void) {
hashmap_t map;
map_init(&map, 7, equal_strings, hash_index_str);
assert(map.eq_fun("Alice", "Alice"));
assert(!map.eq_fun("Alice", "alice"));
assert(map.hash_fun("Alice", 31) == 1);
assert(map.hash_fun("alice", 31) == 28);
}
```
### Activity 9: Implement lookup
```c
size_t simple_hash(const char* key, size_t size) {
return strlen(key) % size;
}
kv_pair_t *map_lookup(const hashmap_t *map, key_type key) {
// TODO: Activity 9 - return the key-value pair by pointer if it is present
// otherwise, return NULL
size_t pos = simple_hash(key, map->capacity);
node_t *list = map->slots[pos].head;
node_t *temp = list;
while (temp) {
if (equal_strings(temp->kv_pair->key, key)) {
return temp->kv_pair;
}
temp = temp->next;
}
return NULL;
}
int main(void) {
hashmap_t map;
map_init(&map, 7, compare, simple_hash);
list_prepend(&map.slots[5], make_pair("Alice", 42));
list_prepend(&map.slots[3], make_pair("Bob", 84));
list_prepend(&map.slots[5], make_pair("Bob Dobalina", 126));
list_prepend(&map.slots[5], make_pair("Cindy", 6));
kv_pair_t * p1 = map_lookup(&map, "Joe");
kv_pair_t * p2 = map_lookup(&map, "Alice");
kv_pair_t * p3 = map_lookup(&map, "Bob");
kv_pair_t * p4 = map_lookup(&map, "Bob Dobalina");
kv_pair_t * p5 = map_lookup(&map, "Cindy");
assert(p1 == NULL);
assert(p2 != NULL && p3 != NUlL && p4 != NULL && p5 != NULL);
assert(strcmp(p2->key, "Alice") == 0);
assert(p2->value == 42);
assert(strcmp(p3->key, "Bob") == 0);
assert(p3->value == 84);
assert(strcmp(p4->key, "Bob Dobalina") == 0);
assert(p4->value == 126);
assert(strcmp(p5->key, "Cindy") == 0);
assert(p5->value == 6);
}
```
### Activity 10: Implementation time
```c
size_t simple_hash(const char* key, size_t size) {
return strlen(key) % size;
}
bool map_insert(hashmap_t *map, key_type key, value_type value) {
// TODO: Activity 10 - if the key is not in the hashmap, add the
// key-value pair.
// otherwise, update the current value with the given value
kv_pair_t *temp = map_lookup(map, key);
size_t index = simple_hash(key, map->capacity);
if (temp == NULL) {
list_prepend(&map->slots[index], make_pair(key, value));
map->count++;
return true;
} else {
map->slots[index].head->kv_pair->value = value;
}
return false;
}
void map_remove(hashmap_t *map, key_type key) {
// TODO: Activity 10 - remove the key-value pair from the hashmap if it is present
size_t index = simple_hash(key, map->capacity);
node_t *list = map->slots[index].head;
node_t *temp = list;
while (temp) {
if (equal_strings(temp->kv_pair->key, key)) {
list_remove(map->slots, temp);
map->count--;
}
temp = temp->next;
}
}
int main(void) {
map_init(&map, 7, compare, simple_hash);
map_insert(&map, "Alice", 24);
assert(map.count == 1);
map_insert(&map, "Alice", 42);
assert(map.count == 1);
map_insert(&map, "Bob", 84);
map_insert(&map, "Bob Dobalina", 126);
map_insert(&map, "Cindy", 6);
assert(map.count == 4);
kv_pair_t * p2 = map_lookup(&map, "Alice");
kv_pair_t * p3 = map_lookup(&map, "Bob");
kv_pair_t * p4 = map_lookup(&map, "Bob Dobalina");
kv_pair_t * p5 = map_lookup(&map, "Cindy");
assert(p2 != NULL && p3 != NUlL && p4 != NULL && p5 != NULL);
assert(strcmp(p2->key, "Alice") == 0);
assert(p2->value == 42);
assert(strcmp(p3->key, "Bob") == 0);
assert(p3->value == 84);
assert(strcmp(p4->key, "Bob Dobalina") == 0);
assert(p4->value == 126);
assert(strcmp(p5->key, "Cindy") == 0);
assert(p5->value == 6);
}
```
### Activity 11: Word counting
```c
void process_line(char* line, hashmap_t * map) {
const char *sep = "\t ()[]_,.;:?!/\\\"\n'";
char *token = strtok(line, sep);
while (token) {
if (strlen(token)) {
// TODO: Acitivity 11 - update the frequency of the word contained in 'token'
// - use the map_lookup and the map_insert functions
kv_pair_t *temp = map_lookup(map, token);
if (temp == NULL) {
map_insert(map, token, map->count);
}
}
token = strtok(NULL, sep);
}
}
int main(void) {
hashmap_t map;
map_init(&map, 31, compare, hash_index_str);
process_file("short.txt", &map);
map_print(&map);
map_destroy(&map);
}
```
| word | frequency |
| ------- | --------- |
| "ipsum" | 1 |
| "dolor" | 2 |
| "sit" | 3 |
| "amet" | 4 |
### Activity 12: Tracking load statistics
| Capacity | 4201 | 6101 | 15149 |
| ---------------------------- | ---- | ---- | ----- |
| Fraction of free slots | 0.44 | | |
| Load factor | 0.83 | | |
| Slots with more than 1 items | | | |
| Length of longest chain | | | |
| Keys in longest chain | | | |
Capacity smaller than 200000 for which no slot has more than one word stored after processing "alice0.txt":
### Activity 13: Time complexity
| Operation | Time complexity |
| --------- | --------------- |
| Lookup | O(1) |
| Insert | O(n) |
| Remove | O(n) |
### Activity 14: Implement rehashing (optional)
```c
bool map_rehash(hashmap_t *original) {
// if no need to rehash, then do nothing
if (original->count * 10 < original->capacity * 7) return false;
// set new capacity so that load factor will become 0.5
size_t new_capacity = original->count * 2;
// initialize new dynamic array for storing new slots
list_t * new_slots = malloc(sizeof(list_t) * new_capacity);
if (new_slots == NULL) return false;
// TODO: Activity 14 - implement rehashing
// Deallocate now-empty array of slots
free(original->slots);
// Update hashmap to use new slots and new capacity
original->slots = new_slots;
original->capacity = new_capacity;
return true;
}
```
## Looking back
### What we've learnt
- Concept of hash function
- Recognize what is a good hash function
- Understand how hashmap works and its time complexity
- Be able to implement a simple hashmap in C
### What were the surprises
Nothing
### What problems we've encountered
Problem with Activity 12
### What was or still is unclear
Activity 12
### How did the group perform?
Our group work perfectly as usual.
> Written with [StackEdit](https://stackedit.io/).