# Week 3 - Sorting and Searching
## Team
Team name:Group 6
Date:05-03-2021
Members
| Role | Name |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| **Facilitator** keeps track of time, assigns tasks and makes sure all the group members are heard and that decisions are agreed upon. | Thijs Bish |
| **Spokesperson** communicates group’s questions and problems to the teacher and talks to other teams; presents the group’s findings. | Thijs Bish |
| **Reflector** observes and assesses the interactions and performance among team members. Provides positive feedback and intervenes with suggestions to improve groups’ processes. | Sebastian Wojciechowski |
| **Recorder** guides consensus building in the group by recording answers to questions. Collects important information and data. | IJeoma Oduche |
## Activities
### Activity 1: Purpose of sorted lists
A high level of certainty.
In short, searching in an unsorted array takes O(n) time: you potentially have to look at every item to find out if what you're looking for is there. A sorted array lets you speed up the search. Instead of having to examine every item, you only have to examine at most log2(n) items. That makes a huge difference when numbers get large.
### Activity 2: Explanation of selection sort
https://www.youtube.com/watch?v=pcJHkWwjNl4
### Activity 3: Performance of selection sort
O(n^2), to add that none of the loops depend on the data in the array.
### Activity 4: Implement selection sort
```c=
void selection_sort(Array *pArray)
{
int i, j;
for(i=0; i<pArray->count; i++)
{
for(j=i+1; j<pArray->count; j++)
{
if(compare(&pArray->values[i], &pArray->values[j]) > 0 )
array_swap(pArray, i, j);
}
}
}
```
### Activity 5: Explanation of merge sort
https://www.youtube.com/watch?v=4VqmGXwpLqc&ab_channel=MichaelSambol
### Activity 6: Merge sort - step by step
[9] [3] [0] [17] [12] [1] [1] [5] [2] [10] [8] [4] [9] [3] [2]
[3|9] [0|17] [1|12] [1|5] [2|10] [4|8] [3|9] [2|-]
[0|3|9|17] [1|1|5|12] [2|4|8|10] [2|3|9|-]
[0|1|1|3|5|9|12|17] [2|2|3|4|8|9|10|-]
[0|1|1|2|2|3|3|4|5|8|9|9|10|12|17|-]
### Activity 7: Implement the merge function
```c=
int merge(Array *pArray, Data *pTarget, int iA, int nA)
{
int indexA = iA, indexB = iA + nA;
int nB = nA;
if (pArray->count < indexB + nB)
nB = pArray->count - indexB;
int target_idx = indexA;
while (indexA < iA + nA && indexB < iA + nA + nB)
{
// compare pA[indexA] and pB[indexB]
if (compare(&pArray->values[indexA], &pArray->values[indexB]) > 0)
THIS PART OF CODE NEEDED TO BE DONE BY US - IT WORKS
DELETE THIS PART BEFORE SUBMITTING
------------------------------------------------------
{
// write value from first array to target
pTarget[target_idx] = pArray->values[indexB];
indexB++;
}
else
{
// write value from second array to target
pTarget[target_idx] = pArray->values[indexA];
indexA++;
}
------------------------------------------------------------
target_idx++;
}
// copy remaining elements of the first array
while (indexA < iA + nA)
pTarget[target_idx++] = pArray->values[indexA++];
// copy remaining elements of the second array
while (indexB < iA + nA + nB)
pTarget[target_idx++] = pArray->values[indexB++];
// return size of merged array
return nA + nB;
}
```
### Activity 8: Implement the divide-and-conquer algorithm
```c=
void mergeSort(Array *pArray)
{
// initially, each element is a single array
printf("count:%d\n", pArray->count);
int numArrays = pArray->count;
int arraySize = 1;
// create a temporary space for holding the merged arrays
Data *pWork = malloc(sizeof(Data) * pArray->capacity);
printf("numArrays:%d\n", numArrays);
// merge the arrays until one sorted array is left
while (numArrays > 1)
{
// compute the number of pairs
const int numPairs = numArrays / 2;
// number of elements remaining to be sorted
int remaining = pArray->count;
// merge each pair of consecutive (sub)arrays into a single array
for (int i = 0; i < numPairs; i++)
{
THIS PART OF CODE NEEDED TO BE DONE BY US - IT WORKS
DELETE THIS PART BEFORE SUBMITTING
--------------------------------------------------
merge(pArray, pWork, 2 * i * arraySize, arraySize);
remaining -= arraySize*2; // decrease the number of remaining elements by the number of ←-merged elements
// call merge(pArray, pWork, 2 * i * arraySize, arraySize) to ←- merge the two sorted (sub)arrays into a sorted larger array
--------------------------------------------------
}
// copy the unmerged values
merge(pArray, pWork, 2 * numPairs * arraySize, remaining);
// update numArrays: divide by 2 and round up
numArrays = (numArrays + 1) / 2;
// double the arraySize
arraySize = arraySize * 2;
// swap working space with array
Data *tmp = pArray->values;
pArray->values = pWork;
pWork = tmp;
}
// free the working space
free(pWork);
}
```
### Activity 9: Test merge sort
```c=
Array array;
int sorted_ok;
array_init(&array, 10);
if (!parse_csv(&array, "data_10.csv"))
{
return 1;
}
printf("%d records read\n", array.count);
// TODO: sort the array, based on some criterion, using either selection sort or merge sort
//selection_sort(&array);
mergeSort(&array);
sorted_ok = 1;
printf("Contents after sorting:\n");
printf("[0] ");
data_print(&array.values[0]);
for (int i = 1; i < array.count; i++)
{
printf("[%d] ", i);
data_print(&array.values[i]);
if (compare(&array.values[i - 1], &array.values[i]) > 0)
sorted_ok = 0;
}
if (sorted_ok)
printf("\nData is sorted correctly!\n");
else
printf("\nData is *NOT* sorted correctly!\n");
```
Result of this test is a success
### Activity 10: Binary search - step by step
| Step | Search range | Middle element | Discarded range |
| ---- | ------------ | ---------------| ----------------|
| 1 | [0 .. 22) | 11 ----------- 20| [0..11] |
|2|[12..22)|17 ----------- 55| [17..22]|
|3|[12..16)|14 ----------- 41| [12..14]|
|4|[15..16]|16 ----------- 50| [16]|
|5|[15]|15 ----------- 42| [15]
### Activity 11: Implement binary search
```c
int binarySearch(const Array *pArray, const Data *pValue)
{
int lpos = 0;
int rpos = pArray->count;
while (lpos < rpos)
{
// search the interval [lpos, rpos)
// (lpos is included in the interval, rpos is not)
// determine mid position in [lpos, rpos)
int mid = (lpos + rpos) / 2;
// compare element at middle position against value searched for
int ordering = compare(&pArray->values[mid], pValue);
THIS PART OF CODE NEEDED TO BE DONE BY US - IT WORKS
DELETE THIS PART BEFORE SUBMITTING
------------------------------------
if (ordering > 0)
{
rpos = mid;
// must continue search in interval [lpos, mid)
}
else if (ordering < 0)
{
// must continue search in interval [mid + 1, rpos)
lpos = mid + 1;
}
--------------------------------------
else
{
// value is found at index mid!
return mid;
}
}
// we didn't find the value, so stop and indicate
// to caller by returning -1.
return -1;
}
```
### Activity 12: Test binary search
```c
Data data;
printf("Enter popularity score for searching:");
scanf("%d", &data.popularity);
int position = binarySearch(&array, &data);
if (position!=-1)
{
printf("The element is found on index: %d\n ", position) ;
}
else
{
printf("The element doesn't exist\n") ;
}
```
### Activity 13: Time complexity
Identifying findmax is using bubble sort and findmax2 is using selection sort.
FindMax (Bubble Sort) scales quite great as it uses more swapping times. Hence it scales in a much better way than relative as the worst case of the time complexity would be O(n^2)
FindMax2 (Selection Sort) the scalability is not that great as it first finds the minimum element of all and place it at index 0 of elements. It scales poorly and the time complexity would be O(n^2). Please note that even the best case scenario selection sort is still O(n^2).
### Activity 14: Compare merge sort and selection sort
(times per sort)
| input size| merge sort | selection sort |
| -------- | -------- | ------------ |
| 10 | 0.17ms | 0.0ms |
| 1000 | 0.34ms | 4.8ms |
| 5000 | 1.28ms | 103.9ms |
| 10000 | 2.02ms | 332.31ms |
| 25000 | 11.28ms | 1880.55ms |
| 50000 | 27.4ms | 9902.09ms |
| 100000 | 50.96ms | 44590.81ms |
### Activity 15: Compare naive search and binary search
Linear search is more efficient in comparison of the functions. However, the time complexities change when applied to their corresonspondant data structure. Binary search is operates better in an array with the time complexity of O(log(n)) and linear search will be better in a linked list O(n). Due to the fact that binary search requires random access to the data and linear search requires sequential access.
## Look back
### What we've learnt
Time complexity of different sorting algorithms, which is better in what scenarios.
How fast they are, as well the general O(n) understanding and diving deeper into that. As well the unexpected timings of of binary search.
### What were the surprises
Not many surprises initially, Binary search was a lot faster than expected. Errors that could not be spotted in our code even though we used a debugging tool.(Stuck in console)
### What problems we've encountered
Activity 14, our code was not giving bugs, however, it was stuck on the console. (Selection sorting function was incorrect).Merge sorting worked well, we attempted to debug (watches). However, still faced difficulties diagnosing the problem that was incorrect in our functions. Finally we figured out our two main functions
### What was or still is unclear
How one sorting algorithm succeeded whilst the other failed. Figuring the problem behind the code is unclear since it was only stuck in console and the debugger and build were not giving off any errors we chose to rewrite both functions completely.
### How did the group perform?
It was great, everyone was always supoer postive and putting their foot-forward, trying to make the work load as calming as possible. Everyone was naturally in sync. There were no hickups it was just that our time schedules didn't correspond which we as a group just worked around and respected everyone's time. Everything worked well, from coding to giving our own input as well as the teamwork collectiveness. Hopefully a compensation from other classes that students have to work together for Friday.