# Week 8 - Hashing
## Team
Team name: Soon(tm)
Date: 2022/05/11
Members
| Role | Name |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| **Facilitator** keeps track of time, assigns tasks and makes sure all the group members are heard and that decisions are agreed upon. | Stefan |
| **Spokesperson** communicates group’s questions and problems to the teacher and talks to other teams; presents the group’s findings. | Adelina |
| **Reflector** observes and assesses the interactions and performance among team members. Provides positive feedback and intervenes with suggestions to improve groups’ processes. | Tautvydas |
| **Recorder** guides consensus building in the group by recording answers to questions. Collects important information and data. | Ines |
## Activities
### Activity 1: The costs of maintaining order
| Step / array[pos] | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | Moves |
| ---------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| 1 Value | **8609** | null | null | null | null | null | null | null | 0 |
| 2 Values| **6348** | 8609 | null | null | null | null | null | null | 1 |
| 3 Values | **4992** | 6348 | 8609 | null | null | null | null | null | 2 |
| 4 Values | 4992 | **5839** | 6348 | 8609 | null | null | null | null | 2 |
| 5 Values | **1622** | 4992 | 5839 | 6348 | 8609 | null | null | null | 4 |
| 6 Values | 1622 | 4992 | **5450** | 5839 | 6348 | 8609 | null | null | 3 |
| 7 Values | 1622 | 4992 | 5450 | 5839 | 6348 | **7197** | 8609 | null | 1 |
| 8 Values | 1622 | **4827** | 4992 | 5450 | 5839 | 6348 | 7197 | 8609 | 6 |
| Final: | 1622 | 4827 | 4992 | 5450 | 5839 | 6348 | 7197 | 8609 | 19 |
### Activity 2: Calculate the moduli
| id | 8609 | 6348 | 4992 | 5839 | 1622 | 5450 | 7198 | 4827 |
| ---------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| **mod 8** | 1 | 4 | 0 | 7 | 6 | 2 | 5 | 3 |
| **mod 16** | 1 | 12 | 0 | 15 | 6 | 10 | 13 | 11 |
| **mod 17** | 7 | 7 | 11 | 8 | 7 | 10 | 6 | 16 |
### Activity 3: Hash functions
A hash function is any function that can be used to map data of arbitrary size to fixed-size values.
The values returned by a hash function are called hash values, hash codes, digests, or simply hashes.
The values are usually used to index a fixed-size table called a hash table. Use of a hash function to
index a hash table is called hashing or scatter storage addressing.
Hash functions and their associated hash tables are used in data storage and retrieval applications to
access data in a small and nearly constant time per retrieval. They require an amount of storage space only
fractionally greater than the total space required for the data or records themselves.
A good hash function should have the following properties:
1.Efficiently computable.
2.Should uniformly distribute the keys (Each table position equally likely for each key)
Examples:
With modular hashing, the hash function is simply h(k) = k mod m for some m (usually, the number of buckets).
The value k is an integer hash code generated from the key. If m is a power of two (i.e., m=2p), then h(k) is just the p lowest-order bits of k. The SML/NJ
implementation of hash tables does modular hashing with m equal to a power of two. This is very fast but
the the client needs to design the hash function carefully.
Multiplicative hashing, in which the hash index is computed as m * frac(ka).
Here k is again an integer hash code, a is a real number and frac is the function that returns the fractional part of a real number.
Multiplicative hashing sets the hash index from the fractional part of multiplying k by a large real number.
### Activity 4: A simple hash function
Simple hash is not a good hash function because a hash function has the ability to map different keys to different values. And this "simple hash" function only returns the length of the string and therefore doesn't accomplish the goal of executing such.
### Activity 5: Implement FNV-1a
```c
size_t fnv_hash(const char* key) {
const size_t FNV_PRIME = 0x00000100000001B3;
const size_t FNV_OFFSET_BASIS = 0xcbf29ce484222325;
// TODO: Activity 5
size_t h = FNV_OFFSET_BASIS;
for (int i = 0; i < strlen(key); ++i)
{
h ^=key[i];
h = h*FNV_PRIME;
}
return h;
}
```
### Activity 6: Implement hash-based indexing
```c
// returns the modulo index of a string key based on its FNV-1a hash
size_t hash_index_str(const char* key, size_t size) {
size_t temp = fnv_hash(key);
temp = temp % size;
return temp;
}
```
| key | index |
| ---------- | ----- |
| Alice | 1 |
| in | 9 |
| Wonderland | 32 |
| potion | 20 |
| tea | 6 |
| rabbit | 24 |
### Activity 7: Function pointers
```c
typedef void (*myfunc)(int);
void foo(int value) {
printf("The answer is %d\n", value);
}
void bar(myfunc f) {
f(42);
}
int main(void) {
myfunc fun = foo;
bar(fun);
}
```
We point to the function bar, the first line of executable code which is the function foo. So we essentially use the function foo without using a copy of the value but we just point to the compiler where to use the value of 42.
### Activity 8: Hashing and equality testing
```c
/*int main(void) {
hashmap_t map;
map_init(&map, 7, equal_strings, hash_index_str);
assert(map.eq_fun("Alice", "Alice"));
assert(!map.eq_fun("Alice", "alice"));
assert(map.hash_fun("Alice", 31) == 1);
assert(map.hash_fun("alice", 31) == 28);
}*/
bool compare(const char* s1, const char* s2) {
// TODO: Activity 8
if(strcmp(s1,s2) == 0)
{
return true;
}
else{
return false;
}
}
```
### Activity 9: Implement lookup
```c
size_t simple_hash(const char* key, size_t size) {
return strlen(key) % size;
}
int main(void) {
hashmap_t map;
map_init(&map, 7, compare, simple_hash);
list_prepend(&map.slots[5], make_pair("Alice", 42));
list_prepend(&map.slots[3], make_pair("Bob", 84));
list_prepend(&map.slots[5], make_pair("Bob Dobalina", 126));
list_prepend(&map.slots[5], make_pair("Cindy", 6));
kv_pair_t * p1 = map_lookup(&map, "Joe");
kv_pair_t * p2 = map_lookup(&map, "Alice");
kv_pair_t * p3 = map_lookup(&map, "Bob");
kv_pair_t * p4 = map_lookup(&map, "Bob Dobalina");
kv_pair_t * p5 = map_lookup(&map, "Cindy");
assert(p1 == NULL);
assert(p2 != NULL && p3 != NUlL && p4 != NULL && p5 != NULL);
assert(strcmp(p2->key, "Alice") == 0);
assert(p2->value == 42);
assert(strcmp(p3->key, "Bob") == 0);
assert(p3->value == 84);
assert(strcmp(p4->key, "Bob Dobalina") == 0);
assert(p4->value == 126);
assert(strcmp(p5->key, "Cindy") == 0);
assert(p5->value == 6);
}
kv_pair_t *map_lookup(const hashmap_t *map, key_type key) {
(void) key;
(void) map;
int temp = map->hash_fun(key, map->capacity);
if (map->slots[temp].head != NULL)
{
node_t * next = map->slots[temp].head;
while(next != NULL)
{
if(map->eq_fun(next->kv_pair->key , key))
{
return next->kv_pair;
}
next = next->next;
}
}
return NULL;
}
```
### Activity 10: Implementation time
```c
bool map_insert(hashmap_t *map, key_type key, value_type value) {
(void) map;
(void) key;
(void) value;
int temp = map->hash_fun(key,map->capacity);
if(map->slots[temp].head != NULL)
{
node_t * next = map->slots[temp].head;
while (next != NULL)
{
if(map->eq_fun(next->kv_pair->key,key))
{
next->kv_pair->value = value;
return false;
}
next = next->next;
}
}
list_prepend(&map->slots[temp] , make_pair(key,value));
map->count++;
return true;
}
void map_remove(hashmap_t *map, key_type key) {
(void) key;
(void) map;
int temp = map->hash_fun(key,map->capacity);
if(map->slots[temp].head != NULL)
{
node_t * next = map->slots[temp].head;
while (next != NULL)
{
if(map->eq_fun(next->kv_pair->key,key))
{
list_remove(map->slots[temp].head ,next);
map->count--;
}
next = next->next;
}
}
}
```
### Activity 11: Word counting
```c
int main(void) {
hashmap_t map;
map_init(&map, 31, compare, hash_index_str);
process_file("short.txt", &map);
map_print(&map);
map_destroy(&map);
}
void process_fine(char* line, hashmap_t * map) {
const char* sep = "\t ()[]_,.;:?!/\\\"\n'";
char* token = strtok(line, sep);
while (token) {
if (strlen(token)) {
kv_pair_t * temp = map_lookup(map,token);
if(temp == NULL)
{
map_insert(map,token,1);
} else{
temp->value++;
}
}
token = strtok(NULL, sep);
}
}
```
| word | frequency |
| ------- | --------- |
| "ipsum" | 2 |
| "dolor" | 4 |
| "sit" | 3 |
| "amet" | 3 |
### Activity 12: Tracking load
| Capacity | 4201 | 6101 | 15149 |
| ---------------------------- | ---- | ---- | ----- |
| Fraction of free slots | 0.44 | 0.57 | 0.79 |
| Load factor | 0.83 | 0.57 | 0.23 |
| Slots with more than 1 items | 2345 | 2650 | 3127 |
| Length of longest chain | 6 | 5 | 4 |
| Keys in longest chain | nodes -> [GIVE: 1, clearer: 1, meanwhile: 1, caterpillar: 1, stretching: 1, Queen: 75] | nodes -> [jogged: 1, already: 3, panting: 2, tunnel: 1, again: 83] | nodes -> [lot: 1, tasted: 3, called: 15, Are: 5] |
Code:
```c
hashmap_t map;
map_init(&map, 15149, compare, hash_index_str);
process_file("alice0.txt", &map);
/*map_print(&map);*/
long slots_more_than_1 = 0;
for (unsigned long i = 0; i < map.capacity; ++i)
{
if(map.slots[i].head == NULL)
{
slots_more_than_1++;
}
}
printf("%0.2f\n",(double) slots_more_than_1/map.capacity);
printf("%0.2f\n",(double) map.count/map.capacity);
long number_of_chains = 0;
for (unsigned long i = 0; i < map.capacity; ++i)
{
if(list_length(&map.slots[i]) > 0)
{
number_of_chains++;
}
}
printf("%lu\n",number_of_chains);
unsigned long longest_chain_i = 0 , index_longest_chain = 0;
for (unsigned long i = 0; i <map.capacity ; ++i)
{
if(list_length(&map.slots[i]) > longest_chain_i)
{
longest_chain_i = list_length(&map.slots[i]);
index_longest_chain = i;
}
}
printf("%lu\n",longest_chain_i);
node_list_print(map.slots[index_longest_chain].head);
```
### Activity 13: Time complexity
| Operation | Time complexity |
| --------- | --------------- |
| Lookup | O(1) |
| Insert | O(1) |
| Remove | O(1) |
### Activity 14: Implement rehashing (optional)
```c
bool map_rehash(hashmap_t *original) {
// if no need to rehash, then do nothing
if (original->count * 10 < original->capacity * 7) return false;
// set new capacity so that load factor will become 0.5
size_t new_capacity = original->count * 2;
// initialize new dynamic array for storing new slots
list_t * new_slots = malloc(sizeof(list_t) * new_capacity);
if (new_slots == NULL) return false;
// TODO: Activity 14 - implement rehashing
// Deallocate now-empty array of slots
free(original->slots);
// Update hashmap to use new slots and new capacity
original->slots = new_slots;
original->capacity = new_capacity;
return true;
}
```
## Looking back
### What we've learnt
* Have a general understanding as to how a hash map works and their time complexities
### What were the surprises
*
### What problems we've encountered
* Disorganization due to upcoming exams and holidays, which put us off track.
### What was or still is unclear
*
*
### How did the group perform?
* Since the group is newly formed with a new member this week, we were only able to establish the way we are going to work. This week's assignments were mainly done by relying on the group members, who had more knowledge concerning the hash maps. Naturally, the ones that are lagging behind must step up, to alleviate the burden of the more advanced members.