# Week 7 - Hashing
## Team
Team name: Group
Date: 29-03-2021
Members
| Role | Name |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| **Facilitator** keeps track of time, assigns tasks and makes sure all the group members are heard and that decisions are agreed upon. | Cas |
| **Spokesperson** communicates group’s questions and problems to the teacher and talks to other teams; presents the group’s findings. | Cas |
| **Reflector** observes and assesses the interactions and performance among team members. Provides positive feedback and intervenes with suggestions to improve groups’ processes. | Khan |
| **Recorder** guides consensus building in the group by recording answers to questions. Collects important information and data. | Andy |
## Activities
### Activity 1: Find out: cost of maintaining order
*How many times a relocation of items in this array will be needed if the elements are being
inserted with ids in the following order: [8609, 6348, 4992, 5839, 1622, 5450, 7197, 4827]? How
many RecordPairs will be moved in total during this process?*
Answer: The program will have to insert an array 7 times and will have to move 17 Record pairs in total.
### Activity 2: Making numbers fit
*What kind of transformations can you think of that can make the array needed to store elements
with ids [8609, 6348, 4992, 5839, 1622, 5450, 7197, 4827] smaller? *
Answer: Modulo, square root
*What is the time complexity if the insertion, removal and lookup operations of a map, where keys are transformed into indices directly?*
Answer: I think it is O(n).
### Activity 3: Find out: calculate the moduli
| ID | 8609 | 6348 |4992|5839|1622|5450|7197|4827|
| - | - | -|-|-| - | -|-|-|
| Mod 8 | 1 | 4 |0|7|6|2|5|3|
|Mod 16|1 |12 |0|15|6|10|13|11|
| Mod 31| 22| 24|1|11|10|25|5|22|
What can you say about the results? Can indices obtained in this way
be used for storing the elements with those ids in an array?
Answer: Most of them can be stored in this way but some of them will have the same ID, so that's not going to work.
### Activity 4: Find out: hash functions
A hash function is any function that can be used to map data of arbitrary size to fixed-size values.
Properties of Hash Functions:
1) Pre-Image Resistance
- This property means that it should be computationally hard to reverse a hash function.
- In other words, if a hash function h produced a hash value z, then it should be a difficult process to find any input value x that hashes to z.
- This property protects against an attacker who only has a hash value and is trying to find the input.
2) Second Pre-Image Resistance
- This property means given an input and its hash, it should be hard to find a different input with the same hash.
- In other words, if a hash function h for an input x produces hash value h(x), then it should be difficult to find any other input value y such that h(y) = h(x).
- This property of hash function protects against an attacker who has an input value and its hash, and wants to substitute different value as legitimate value in place of original input value.
3) Collision Resistance
- This property means it should be hard to find two different inputs of any length that result in the same hash. This property is also referred to as collision free hash function.
- In other words, for a hash function h, it is hard to find any two different inputs x and y such that h(x) = h(y).
- Since, hash function is compressing function with fixed hash length, it is impossible for a hash function not to have collisions. This property of collision free only confirms that these collisions should be hard to find.
- This property makes it very difficult for an attacker to find two input values with the same hash.
- Also, if a hash function is collision-resistant then it is second pre-image resistant.
*Give a few examples of hash functions*
Message Digest (MD)
Secure Hash Function (SHA)
RIPEMD
Whirlpool
### Activity 5: Implement FNV-1a
```c=
hash_t hash_fnv_gen(const void* data_, size_t size){
hash_t hash = FNV_OFFSET_BASIS;
const unsigned char* data = data_;
for(size_t i = 0; i < size; i++)
{
hash = hash ^ data[i];
hash = FNV_PRIME;
}
return 0;
}
hash_t hash_fnv_str(const char* str){
while(1)
{
static int i = 0;
if (str[i] == '\0')
break;
i++;
}
return 0;
}
```
### Activity 6: Implement hash-based indexing
```c=
size_t hash_index_gen(const void* data, size_t size, size_t M){
hash_t hash = hash_fnv_gen(data, size);
static int index;
for (size_t i = 0; i < size; i++)
{
index = hash % M;
}
return index;
}
size_t hash_index_str(const char* str, size_t M){
static int index;
while(1)
{
int count = 0;
if(str[count] != '\0')
{
index = str[count] % M;
break;
}
count++;
}
return index;
}
```
### Activity 7: Word counting
### Activity 8: Hash collisions
### Activity 9: Activity
## Look back
### What we've learnt?
In this week's activities we have learnt what a hash function is and we're able to name properties of a good hash function. We can also explain how a hashmap works and discuss its time complexity.
### What were the surprises?
In the worst case performace, a hashMap reduces to a linkedList.
### What problems we've encountered?
Finishing the activities.
### What was or still is unclear?
Last 3 activities
### How did the group perform?
The group performed well, but the communication between certain team members could have been better.