Week 3 - Sorting and Searching

# Week 3 - Sorting and Searching ## Team Team name: **Group [497441]** Date: 01/03/2021 Members: Max Thielen | Role | Name | | - | - | | **Facilitator** keeps track of time, assigns tasks and makes sure all the group members are heard and that decisions are agreed upon. | Max Thielen | | **Spokesperson** communicates group’s questions and problems to the teacher and talks to other teams; presents the group’s findings. | Max Thielen | | **Reflector** observes and assesses the interactions and performance among team members. Provides positive feedback and intervenes with suggestions to improve groups’ processes. | Max Thielen | | **Recorder** guides consensus building in the group by recording answers to questions. Collects important information and data. | Max Thielen | ## Activities ### Activity 1: Purpose of sorted lists A sorting algorithm is a very usefull function in a datastructure toolset since it: *1.* makes the data more readable and comprehendable to the user *2.* allows for easier traversing through the dataset for the computer (easier searching) For example, the **merge** and **sorted_insert** functions from last weeks Linked List activities required sorted lists as inputs in order to properly complete their resepective tasks. Another good example, is the **filing** done by a computer's OS. In order to make it easier on the user to find files stored on the hard drive the OS is able to sort the files and folders based on the users prefeance for instance, name or date last accessed. ### Activity 2: Explanation of selection sort Selection sort splits the array into two sections, first being the sorted side which to start with is empty and second being the given unsorted array. From there a varible scannes through the array searching for the lowest number which is then swapped with the first index in the array. Next, the size of the sorted side of the array is increased by one and the size of the unsorted side of the array decreased by one. The scan is repeated until the unsorted array is all take up by the sorted one comfirming that the selection sort function is complete. * *Source: https://www.youtube.com/watch?v=xWBP4lzkoyM&feature=youtu.be* ### Activity 3: Performance of selection sort Since selection sort requires the computer to scan the array as many times as there are elements in the array the run time of the function increases exponentially. This means that for an array of size n versus 2n or 4n the time needed would increase to 4 or 16 times as long. For a linked list the time would most likly increase even more since the swap function is slower due to the iteration required to optain the nodes needed for the swap. ### Activity 4: Implement selection sort ```c= typedef struct Data_Spotify{ char artists[80]; char name[80]; int popularity; // 0..100 float danceability; // 0..1 float energy; // 0..1 int year; // 1921.. float tempo; // (BPM) char id[32]; } Data; int compare_popularity(const Data *pFirst, const Date *pSecond){ if (pFirst->popularity > pSecond->popularity){ // pFirst should come after pSecond return 1; } else if (pFirst->popularity < pSecond->popularity){ // pSecond should come after pFirst return -1; } else{ // no need to change the order return 0; } } // sorts the array using selection sort void selection_sort(Array *pArray) { int i, j, MIN_IDX; for(i=0; i<pArray->count-1; i++){ MIN_IDX = i; for(j=i+1; j<pArray->count; j++){ if(compare(&pArray->values[MIN_IDX], &pArray->values[j])==-1){ MIN_IDX = j; } } array_swap(pArray, MIN_IDX, i); } } ``` ### Activity 5: Explanation of merge sort Merge sort implements a divide an conquer technice to bring order to a data structure. By splitting the array into individual pieces and then slowly putting it back together first making multible size two ordered arrays then moving to four and continuing to double the size until the original size is restored. This approach naturally produces less steps than the select sort. * *Source: https://www.youtube.com/watch?v=4VqmGXwpLqc&feature=youtu.be* ### Activity 6: Merge sort - step by step ![](https://i.imgur.com/U5yyaGQ.png) ### Activity 7: Implement the merge function ```c= // merges the two adjacent sub-arrays, which start at indices // iA and (iA + nA), and have lengths at most nA, // into the larger array pTarget. // returns the number of merged elements. int merge(Array *pArray, Data *pSpace, int iA, int nA) { int indexA = iA, indexB = iA + nA; int nB = nA; if (pArray->count < indexB + nB){ nB = pArray->count - indexB; } int target_idx = indexA; while (indexA < iA + nA && indexB < iA + nA + nB){ // compare pA[indexA] and pB[indexB] if (compare(&pArray->values[indexA], &pArray->values[indexB]) > 0) { // write value from first array to target pSpace[target_idx] = pArray->values[indexA]; ++indexA; } else { // write value from second array to target pSpace[target_idx] = pArray->values[indexB]; ++indexB; } target_idx++; } // copy remaining elements of the first array while (indexA < iA + nA){ pSpace[target_idx++] = pArray->values[indexA++]; } // copy remaining elements of the second array while (indexB < iA + nA + nB){ pSpace[target_idx++] = pArray->values[indexB++]; } // return size of merged array return nA + nB; } ``` ### Activity 8: Implement the divide-and-conquer algorithm ```c= void merge_sort(Array *pArray) { // initially, each element is a single array int numArrays = pArray->count; int arraySize = 1; // create a temporary space for holding the merged arrays Data *pWork = malloc(sizeof(Data) * pArray->capacity); // merge the arrays until one sorted array is left while (numArrays > 1){ // compute the number of pairs const int numPairs = numArrays / 2; // number of elements remaining to be sorted int remaining = pArray->count; // merge each pair of consecutive (sub)arrays into a single array for (int i = 0; i < numPairs; i++){ // call merge(pArray, pWork, 2 * i * arraySize, arraySize) to merge the two sorted (sub)arrays into a sorted larger array // decrease the number of remaining elements by the number of merged elements merge(pArray, pWork, 2 * i * arraySize, arraySize); --remaining; } // copy the unmerged values merge(pArray, pWork, 2 * numPairs * arraySize, remaining); // update numArrays: divide by 2 and round up numArrays = (numArrays + 1) / 2; // double the arraySize arraySize = arraySize * 2; // swap working space with array Data *tmp = pArray->values; pArray->values = pWork; pWork = tmp; } // free the working space free(pWork); } ``` ### Activity 9: Test merge sort ```c= int compareE(const Data *pFirst, const Data *pSecond) { return compare_energy(pFirst, pSecond); } int compareP(const Data *pFirst, const Data *pSecond) { return compare_popularity(pFirst, pSecond); } ``` ### Activity 10: Binary search - step by step ![](https://i.imgur.com/YzNdqW6.png) ### Activity 11: Implement binary search ```c= int binarySearch(const Array *pArray, const Data *pValue) { int lpos = 0; int rpos = pArray->count; while (lpos < rpos) { // determine mid position in [lpos, rpos) int mid = (lpos+rpos)/2; // compare element at middle position against value searched for int ordering = compare(&pArray->values[mid], pValue); if (ordering < 0) {rpos=mid;} else if (ordering > 0) {lpos=mid;} else {return mid;} } // we didn't find the value, so stop and indicate return -1; } ``` ### Activity 12: Test binary search ```c= int main() { Array array; array_init(&array, 10000); if (!parse_csv(&array, "data_1000.csv")) { return 1; } printf("%d records read\n", array.count); selection_sort(&array); printf("Contents:\n"); for (int i = 0; i < array.count; i++) { printf("[%d] ", i); data_print(&array.values[i]); } Data value = array.values[99]; int found = binarySearch(&array, &value); printf("\nSearched for: %d\n", found); return 0; } ``` ### Activity 13: Time complexity Compare both algorithms: ```c= // finds the max. element in array, assumes pArray->count > 0 int findMax(const Array *pArray) { // assume that max is at index 0 int max = pArray->values[0]; // update max by iterating over array for (int i = 1; i < pArray->count; i++){ if (pArray->values[i] > max){max = pArray->values[i];} return max; } } // finds the max. value in array, assumes pArray->count > 0 int findMax2(const Array *pArray) { for (int i = 0; i < pArray->count; i++){ // assume that the max is at index i int indexIContainsTheMax = 1; for (int j = 0; j < pArray->count; j++){ if (pArray->values[i] < pArray->values[j]){ // pArray->values[i] can't be the max indexIContainsTheMax = 0; break; } } // the max value must indeed be at index i if (indexIContainsTheMax){return pArray->values[i];} } } ``` * The **findMax** function is very straight forward. It utilizes a singe for loop to iterate over the given array and stores the highest number always comapring it to the next values until the last element in the array. Due to the single for loop the algorithm has O(n) notation due to the fact that the time complexity liniarly increases as the array inceases in size. * The **findMax2** function takes a more complex approach to the problem by assuming that a certain number in the array is the max and then comparing it against every other element in the array before moving on to assuming that the next number is the max and repeating the comparision against every other value until the max is found in the array. Although this algorithm is faster if the max value is somewhere in the begining of the array, the worst case time is much worse than the **findMax** function. Due to the nested for loops used in the function it becomes O(n^2) notation. ### Activity 14: Compare merge sort and selection sort ![](https://i.imgur.com/R1SaL4y.png) As seen in the run times above the merge sort algorithm is much more efficient than select sort. So much so that once the file size goes up to 100,000 the run time of select sort is almost **half a minute** compared to less than **half a second** with the merge sort. It turns out that due to the nested for loops in the select sort algorithm the O notation is **O(n^2)** making it radically inefficient for large sized arrays. On the other hand the merge sort turns out to be getting more and more efficient the bigger the array becomes. Very interesting but also understandibe with the divide and conquer technice. For this reason the algorithm is labled as **O(log n)**. ![](https://i.imgur.com/FJVQ1O3.png) ![](https://i.imgur.com/2Tr9GEx.png) ![](https://i.imgur.com/chCXEib.png) ### Activity 15: Compare naive search and binary search ```c= int linearSearch(const Array *pArray, const Data *pValue) { for (int i = 0; i < pArray->count; i++){ if (compare(&pArray->values[i], pValue) == 0){return i;} } return -1; } ``` This linear search algorithm simply iterarate over the given array in the search of the value requested. Although this would be an acceptible solution for an unorder data structure the algorithm makes no use of the benifits of an order data structure. Due to the for loop going from index 0 until the index of the value requested the worst case run time would be noted as **O(n)**. In the binary seach, which can only be used on an order data structure, the run time becomes much faster. Similar to the merge sort, the biniary search breaks the array into smaller chuncks, each time creating a smaller search range giving it **O(log n)** run time much like the merge sort. Unfortunatly this does not hold true for a linked list. Since the linked list does not have **random access** capability like in an array, the linked list node retrival would cost the binary search alot of time and therefor would not be much better than the simple liniar/naive search method. ## Look back ### What we've learnt This week I learned about two new sorting methods for data strucutres as well as a new searching strategy for an order data structure. ### What were the surprises I was suprised that I was given the majority of the code required for this weeks problem. I was also surpised about the efficiency achieved by the merge sort and binary search functions and how the O(log n) time complexity behaves. It was facinating to see the relitive efficiency improve as the array size increased. ### What problems we've encountered The only problem I encountered this week was with trying to work with already established functions and inserting only a couple lines of code. This seemed easy on the first look however, proved to be challenging in a new way since reading and understanding someone elses code is difficult as well. ### What was or still is unclear I found this weeks activities to be very clear and unerstandable especially with the given youtube videos as help. ### How did the group perform? Success! Working alone was much easier and enjoyable :D Thank you Dawid!