Week 3 - Sorting and Searching

# Week 3 - Sorting and Searching ## Team Team name: Group 1 Date: 5/3/2021 Members: Len Bauer, Zahir Josefina, Hlib Hyrshko | Role | Name | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------| | **Facilitator** keeps track of time, assigns tasks and makes sure all the group members are heard and that decisions are agreed upon. | Zahir Josefina | | **Spokesperson** communicates group’s questions and problems to the teacher and talks to other teams; presents the group’s findings. | Len Bauer | | **Reflector** observes and assesses the interactions and performance among team members. Provides positive feedback and intervenes with suggestions to improve groups’ processes. | Len Bauer | | **Recorder** guides consensus building in the group by recording answers to questions. Collects important information and data. | Hlib Hryshko | ## Activities ### Activity 1: Purpose of sorted lists - In the sorted list you have an idea of where the item that you're looking for is before you start looking. Meanwhile in an unsorted list its unknown where every item might be unless you go through it. - Sorting can be critical to the user experience in your application, whether it’s ordering a user’s most recent activity by timestamp, or putting a list of email recipients in alphabetical order by last name. e.g. Databases - Students at a school - Finance ### Activity 2: Explanation of selection sort Selection sort is a comparison based sorting algorithm that works by continuesly finding the lowest element of an array and placing it at the beginning. It has a time complexity of `n^2` but due to it's simplicity it can sometimes have significant performance advantages. It works by dividing the input list into two subarrays. One that has already been sorted, and one that still needs to be. It starts off with all elements being in the unsorted one and then one by one moving the lowest element from the unsorted to the sorted sublist. One thing that seperates it from other sorting algorithims like bubble sort (that also have a time complexity of `n^2` if unoptimized) is that it makes the minimum possible number of swaps, `n − 1` in the worst case. > *Selection sort. (2019). Retrieved 9 March 2021, from https://en.wikipedia.org/wiki/Selectionsort* > > *Selection Sort - GeeksforGeeks. (2014). Retrieved 9 March 2021, from https://www.geeksforgeeks.org/selection-sort/* ### Activity 3: Performance of selection sort If you have N elements, then the complexity of the selection sort will be N^2. For example, for N elements it takes T milliseconds. Therefore, for 2N elements it will be 4T and for 4N elements it will be 16T. If we will use linked lists, the complexity will not change. ### Activity 4: Implement selection sort ```c= void selection_sort(Array *pArray) { int i ; for (i = 0; i < pArray->count; i++) { int j ; int index = i ; Data min_value = pArray->values[index] ; for (j = i + 1; j< pArray->count; j++) { if (compare(&pArray->values[j], &pArray->values[index]) < 0) { min_value = pArray->values[j] ; index = j ; } } array_swap(pArray, i, index) ; } } ``` ### Activity 5: Explanation of merge sort Merge sort is a comparison-based sorting algorithim that is generally considered to be rather efficient and its implementations often produce a stable sort and it has a time complexity of `n*log(n)` which means that it scales increadibly well, espescially compared to algorithims such as bubble sort or selection sort. The way it works is by first dividing the unsorted list into `n` sublists, as a list of only one element is considered sorted. It then proceedes to repeatedly merge adjacent sublists producing new sorted sublists until only one list remains which will now be sorted. ### Activity 6: Merge sort - step by step - Divide the unsorted list into `n` sublists, each containing 1 element. - Take adjacent pairs of two singleton lists and merge them to form a list of 2 elements. `N` will now convert into `n/2` lists of size 2. - Repeat the process till a single sorted list of obtained. ![](https://i.imgur.com/WqHTSKW.png) > *Merge Sort Tutorials & Notes | Algorithms | HackerEarth. (2021). Retrieved 10 March 2021, from https://www.hackerearth.com/practice/algorithms/sorting/merge-sort/tutorial/* ### Activity 7: Implement the merge function How you insert code blocks: ```c= int merge(Array *pArray, Data *pTarget, int iA, int nA) { int indexA = iA, indexB = iA + nA; int nB = nA; if (pArray->count < indexB + nB) nB = pArray->count - indexB; int target_idx = indexA; while (indexA < iA + nA && indexB < iA + nA + nB) { // compare pA[indexA] and pB[indexB] if (compare(&pArray->values[indexA], &pArray->values[indexB]) < 0) { // write value from first array to target pTarget[target_idx] = pArray->values[indexA++] ; } else { // write value from second array to target pTarget[target_idx] = pArray->values[indexB++] ; } target_idx++; } // copy remaining elements of the first array while (indexA < iA + nA) pTarget[target_idx++] = pArray->values[indexA++]; // copy remaining elements of the second array while (indexB < iA + nA + nB) pTarget[target_idx++] = pArray->values[indexB++]; // return size of merged array return nA + nB; } ``` ### Activity 8: Implement the divide-and-conquer algorithm ```c= void merge_sort(Array *pArray) { // initially, each element is a single array int num_arrays = pArray->count; int array_size = 1; // create a temporary space for holding the merged arrays Data *pWork = malloc(sizeof(Data) * pArray->capacity); // merge the arrays until one sorted array is left while (num_arrays > 1) { // compute the number of pairs const int num_pairs = num_arrays / 2; // number of elements remaining to be sorted int remaining = pArray->count; // merge each pair of consecutive (sub)arrays into a single array int i ; for (i = 0; i < num_pairs; i++) { // call merge(pArray, pWork, 2 * i * arraySize, arraySize) to merge the two sorted (sub)arrays into a sorted larger array // decrease the number of remaining elements by the number of merged elements merge(pArray, pWork, 2 * i * array_size, array_size) ; remaining -= array_size*2 ; } // copy the unmerged values merge(pArray, pWork, 2 * num_pairs * array_size, remaining); // update numArrays: divide by 2 and round up num_arrays = (num_arrays + 1) / 2; // double the arraySize array_size = array_size * 2; // swap working space with array Data *tmp = pArray->values; pArray->values = pWork; pWork = tmp; } // free the working space free(pWork); } ``` ### Activity 9: Test merge sort ```c= #include <stdio.h> #include <stdlib.h> #include <time.h> #include "spotify.h" /* An example compare function that delegates the comparison */ int compare(const Data *pFirst, const Data *pSecond) { return compare_popularity(pFirst, pSecond); } void merge_sort(Array *pArray); int merge(Array *pArray, Data *pTarget, int iA, int nA) ; int main() { Array array; int sorted_ok; array_init(&array, 100); if (!parse_csv(&array, "data_10.csv")) { return 1; } printf("%d records read\n", array.count); merge_sort(&array); selection_sort(&array) ; sorted_ok = 1; printf("Contents after sorting:\n"); printf("[0] "); data_print(&array.values[0]); int i ; for ( i = 1; i < array.count; i++) { printf("[%d] ", i); data_print(&array.values[i]); if (compare(&array.values[i - 1], &array.values[i]) > 0) sorted_ok = 0; } if (sorted_ok) printf("\nData is sorted correctly!\n"); else printf("\nData is *NOT* sorted correctly!\n"); return 0; } int merge(Array *pArray, Data *pTarget, int iA, int nA) { int indexA = iA, indexB = iA + nA; int nB = nA; if (pArray->count < indexB + nB) nB = pArray->count - indexB; int target_idx = indexA; while (indexA < iA + nA && indexB < iA + nA + nB) { // compare pA[indexA] and pB[indexB] if (compare(&pArray->values[indexA], &pArray->values[indexB]) < 0) { // write value from first array to target pTarget[target_idx] = pArray->values[indexA++] ; } else { // write value from second array to target pTarget[target_idx] = pArray->values[indexB++] ; } target_idx++; } // copy remaining elements of the first array while (indexA < iA + nA) pTarget[target_idx++] = pArray->values[indexA++]; // copy remaining elements of the second array while (indexB < iA + nA + nB) pTarget[target_idx++] = pArray->values[indexB++]; // return size of merged array return nA + nB; } void merge_sort(Array *pArray) { // initially, each element is a single array int num_arrays = pArray->count; int array_size = 1; // create a temporary space for holding the merged arrays Data *pWork = malloc(sizeof(Data) * pArray->capacity); // merge the arrays until one sorted array is left while (num_arrays > 1) { // compute the number of pairs const int num_pairs = num_arrays / 2; // number of elements remaining to be sorted int remaining = pArray->count; // merge each pair of consecutive (sub)arrays into a single array int i ; for (i = 0; i < num_pairs; i++) { // call merge(pArray, pWork, 2 * i * arraySize, arraySize) to merge the two sorted (sub)arrays into a sorted larger array // decrease the number of remaining elements by the number of merged elements merge(pArray, pWork, 2 * i * array_size, array_size) ; remaining -= array_size*2 ; } // copy the unmerged values merge(pArray, pWork, 2 * num_pairs * array_size, remaining); // update numArrays: divide by 2 and round up num_arrays = (num_arrays + 1) / 2; // double the arraySize array_size = array_size * 2; // swap working space with array Data *tmp = pArray->values; pArray->values = pWork; pWork = tmp; } // free the working space free(pWork); } ``` ### Activity 10: Binary search - step by step ![](https://i.imgur.com/Ni56sUL.png) ![](https://i.imgur.com/JVb77kQ.png) ### Activity 11: Implement binary search ```c= int binary_search(const Array *pArray, const Data *pValue) { int l_pos = 0; int r_pos = pArray->count; while (l_pos < r_pos) { // search the interval [l_pos, r_pos) // (lpos is included in the interval, r_pos is not) // determine mid position in [l_pos, r_pos) int mid = (l_pos + r_pos) / 2; // compare element at middle position against value searched for int ordering = compare(&pArray->values[mid], pValue); if (ordering > 0) { // must continue search in interval [lpos, mid) r_pos = mid ; } else if (ordering < 0) { // must continue search in interval [mid + 1, rpos) l_pos = mid + 1 ; } else { // value is found at index mid! return mid; } } // we didn't find the value, so stop and indicate // to caller by returning -1. return -1; } ``` ### Activity 12: Test binary search ```c= #include <stdio.h> #include <stdlib.h> #include <time.h> #include "spotify.h" /* An example compare function that delegates the comparison */ int compare(const Data *pFirst, const Data *pSecond) { return compare_popularity(pFirst, pSecond); } void merge_sort(Array *pArray); int merge(Array *pArray, Data *pTarget, int iA, int nA) ; int binary_search(const Array *pArray, const Data *pValue) ; int main() { Array array; int sorted_ok; array_init(&array, 100); if (!parse_csv(&array, "data_100000.csv")) { return 1; } printf("%d records read\n", array.count); merge_sort(&array); sorted_ok = 1; printf("Contents after sorting:\n"); printf("[0] "); data_print(&array.values[0]); int i ; for ( i = 1; i < array.count; i++) { printf("[%d] ", i); data_print(&array.values[i]); if (compare(&array.values[i - 1], &array.values[i]) > 0) sorted_ok = 0; } if (sorted_ok) printf("\nData is sorted correctly!\n"); else printf("\nData is *NOT* sorted correctly!\n"); Data data ; scanf("%d", &data.popularity) ; int position = binary_search(&array, &data); //position = binary_search(&array, &array.values); if (position!=-1) { printf("The element is found on index: %d\n ", position) ; } else { printf("The element doesn't exist\n") ; } return 0; } int merge(Array *pArray, Data *pTarget, int iA, int nA) { int indexA = iA, indexB = iA + nA; int nB = nA; if (pArray->count < indexB + nB) nB = pArray->count - indexB; int target_idx = indexA; while (indexA < iA + nA && indexB < iA + nA + nB) { // compare pA[indexA] and pB[indexB] if (compare(&pArray->values[indexA], &pArray->values[indexB]) < 0) { // write value from first array to target pTarget[target_idx] = pArray->values[indexA++] ; } else { // write value from second array to target pTarget[target_idx] = pArray->values[indexB++] ; } target_idx++; } // copy remaining elements of the first array while (indexA < iA + nA) pTarget[target_idx++] = pArray->values[indexA++]; // copy remaining elements of the second array while (indexB < iA + nA + nB) pTarget[target_idx++] = pArray->values[indexB++]; // return size of merged array return nA + nB; } void merge_sort(Array *pArray) { // initially, each element is a single array int num_arrays = pArray->count; int array_size = 1; // create a temporary space for holding the merged arrays Data *pWork = malloc(sizeof(Data) * pArray->capacity); // merge the arrays until one sorted array is left while (num_arrays > 1) { // compute the number of pairs const int num_pairs = num_arrays / 2; // number of elements remaining to be sorted int remaining = pArray->count; // merge each pair of consecutive (sub)arrays into a single array int i ; for (i = 0; i < num_pairs; i++) { // call merge(pArray, pWork, 2 * i * arraySize, arraySize) to merge the two sorted (sub)arrays into a sorted larger array // decrease the number of remaining elements by the number of merged elements merge(pArray, pWork, 2 * i * array_size, array_size) ; remaining -= array_size*2 ; } // copy the unmerged values merge(pArray, pWork, 2 * num_pairs * array_size, remaining); // update numArrays: divide by 2 and round up num_arrays = (num_arrays + 1) / 2; // double the arraySize array_size = array_size * 2; // swap working space with array Data *tmp = pArray->values; pArray->values = pWork; pWork = tmp; } // free the working space free(pWork); } int binary_search(const Array *pArray, const Data *pValue) { int l_pos = 0; int r_pos = pArray->count; while (l_pos < r_pos) { // search the interval [l_pos, r_pos) // (lpos is included in the interval, r_pos is not) // determine mid position in [l_pos, r_pos) int mid = (l_pos + r_pos) / 2; // compare element at middle position against value searched for int ordering = compare(&pArray->values[mid], pValue); if (ordering > 0) { // must continue search in interval [lpos, mid) r_pos = mid ; } else if (ordering < 0) { // must continue search in interval [mid + 1, rpos) l_pos = mid + 1 ; } else { // value is found at index mid! return mid; } } // we didn't find the value, so stop and indicate // to caller by returning -1. return -1; } ``` ### Activity 13: Time complexity - The first algorithm is more effecient than the second one. - For the first algorithm the Time complexity is O(n) - For the second one the time complexity is O(n^2) ### Activity 14: Compare merge sort and selection sort - Merge sort is more efficient than selection sort. The only time in which selection sort could be more efficient is when the array is not very large. - In both cases the Time complexity stays the same regardless of the array being sorted or not. ### Activity 15: Compare naive search and binary search - The efficiency of the functions depends on the size of the array, The larger the array is then a binary search would be more if efficient as long as the array is arranged in either a decreasing or increasing order. As for smaller array a linear search would be more efficient. So as the array gets larger and larger binary search becomes more efficient than linear search as time goes on. - The time complexity of a binary search is O(log n) and the time complexity of a linear search is O(n). - The time complixety of a binary search on a linked list is O(n) and a linear search stays the same at O(n). ## Look back ### What we've learnt We learned about sorting and searching. Speciffically: - Bubble sort - Selection sort - Merge sort - Binary search - Linear search we also learned about the time complexity of different sorting algorithms, also know as 'Big O notation', as well as how to handle somewhat larger data sets of around 100,000 data points. ### What were the surprises How efficient some of the exicting sorting and searching algorithms can be, especially for large data sets. That many of the efficient searching algorithms require a presorted dataset for them to work effectivly. e.g - binary search ### What problems we've encountered There were no mayor problems and the entire process went fairly smoothly. ### What was or still is unclear There is nothing that is really still unclear, especially conceptually. The only thing that possibly could require some more time is fully understanding all of the code, in particular all of the provided source code. ### How did the group perform? How was the collaboration? What were the reasons for hick-ups? What worked well? What can be improved next time? Overall the collaboration between the group was good. All team members did their assigned tasks on time, and everyone showed up to the scheduled meetings on time. No major hick-ups were encountered throughout the project, and if anything was ever unclear, one of the other group members could often answer the question. As already mentioned, the overall collaboration worked very well. Everyone did what was asked of them, and no one really had any complaints regarding any of their group members. Really the only thing that could be improved for next time is that instead of dividing up the tasks and everyone doing their assigned tasks, we all do them together instead of doing them alone and then discussing them after the fact. However, this way of working did allow us to work far more efficiently than if we had gone through every exercise as a group. ### Files with the code Here is the main file: ```c= #include <stdio.h> #include <stdlib.h> #include <time.h> #include "spotify.h" /* An example compare function that delegates the comparison */ int compare(const Data *pFirst, const Data *pSecond) { return compare_popularity(pFirst, pSecond); } void selection_sort(Array *pArray) ; void merge_sort(Array *pArray); int merge(Array *pArray, Data *pTarget, int iA, int nA) ; int binary_search(const Array *pArray, const Data *pValue) ; int main() { Array array; int sorted_ok; array_init(&array, 100); if (!parse_csv(&array, "data_100000.csv")) { return 1; } printf("%d records read\n", array.count); merge_sort(&array); sorted_ok = 1; printf("Contents after sorting:\n"); printf("[0] "); data_print(&array.values[0]); int i ; for ( i = 1; i < array.count; i++) { printf("[%d] ", i); data_print(&array.values[i]); if (compare(&array.values[i - 1], &array.values[i]) > 0) sorted_ok = 0; } if (sorted_ok) printf("\nData is sorted correctly!\n"); else printf("\nData is *NOT* sorted correctly!\n"); Data data ; scanf("%d", &data.popularity) ; int position = binary_search(&array, &data); //position = binary_search(&array, &array.values); if (position!=-1) { printf("The element is found on index: %d\n ", position) ; } else { printf("The element doesn't exist\n") ; } return 0; } void selection_sort(Array *pArray) { int i ; for (i = 0; i < pArray->count; i++) { int j ; int index = i ; Data min_value = pArray->values[index] ; for (j = i + 1; j< pArray->count; j++) { if (compare(&pArray->values[j], &pArray->values[index]) < 0) { min_value = pArray->values[j] ; index = j ; } } array_swap(pArray, i, index) ; } } int merge(Array *pArray, Data *pTarget, int iA, int nA) { int indexA = iA, indexB = iA + nA; int nB = nA; if (pArray->count < indexB + nB) nB = pArray->count - indexB; int target_idx = indexA; while (indexA < iA + nA && indexB < iA + nA + nB) { // compare pA[indexA] and pB[indexB] if (compare(&pArray->values[indexA], &pArray->values[indexB]) < 0) { // write value from first array to target pTarget[target_idx] = pArray->values[indexA++] ; } else { // write value from second array to target pTarget[target_idx] = pArray->values[indexB++] ; } target_idx++; } // copy remaining elements of the first array while (indexA < iA + nA) pTarget[target_idx++] = pArray->values[indexA++]; // copy remaining elements of the second array while (indexB < iA + nA + nB) pTarget[target_idx++] = pArray->values[indexB++]; // return size of merged array return nA + nB; } void merge_sort(Array *pArray) { // initially, each element is a single array int num_arrays = pArray->count; int array_size = 1; // create a temporary space for holding the merged arrays Data *pWork = malloc(sizeof(Data) * pArray->capacity); // merge the arrays until one sorted array is left while (num_arrays > 1) { // compute the number of pairs const int num_pairs = num_arrays / 2; // number of elements remaining to be sorted int remaining = pArray->count; // merge each pair of consecutive (sub)arrays into a single array int i ; for (i = 0; i < num_pairs; i++) { // call merge(pArray, pWork, 2 * i * arraySize, arraySize) to merge the two sorted (sub)arrays into a sorted larger array // decrease the number of remaining elements by the number of merged elements merge(pArray, pWork, 2 * i * array_size, array_size) ; remaining -= array_size*2 ; } // copy the unmerged values merge(pArray, pWork, 2 * num_pairs * array_size, remaining); // update numArrays: divide by 2 and round up num_arrays = (num_arrays + 1) / 2; // double the arraySize array_size = array_size * 2; // swap working space with array Data *tmp = pArray->values; pArray->values = pWork; pWork = tmp; } // free the working space free(pWork); } int binary_search(const Array *pArray, const Data *pValue) { int l_pos = 0; int r_pos = pArray->count; while (l_pos < r_pos) { // search the interval [l_pos, r_pos) // (lpos is included in the interval, r_pos is not) // determine mid position in [l_pos, r_pos) int mid = (l_pos + r_pos) / 2; // compare element at middle position against value searched for int ordering = compare(&pArray->values[mid], pValue); if (ordering > 0) { // must continue search in interval [lpos, mid) r_pos = mid ; } else if (ordering < 0) { // must continue search in interval [mid + 1, rpos) l_pos = mid + 1 ; } else { // value is found at index mid! return mid; } } // we didn't find the value, so stop and indicate // to caller by returning -1. return -1; } ``` Here is the header spotify.h that you must add to the c project to make it work ```c= #ifndef SPOTIFY_H_INCLUDED #define SPOTIFY_H_INCLUDED #define MAX_ARTIST_SIZE (40) #define MAX_NAME_SIZE (30) #include <stdlib.h> typedef struct Data_Spotify { char artists[MAX_ARTIST_SIZE]; float danceability; // 0..1 float energy; // 0..1 char id[24]; char name[MAX_NAME_SIZE]; int popularity; // 0..100 float tempo; // BPM int year; // 1921... } Data; typedef struct Array_Type { int count, capacity; Data *values; } Array; /* * pretty-prints the contents of the Data structure */ void data_print(const Data *pData); /* * Initializes the array with a given capacity and count of zero */ void array_init(Array *pArray, int capacity); /* * Appends the given element to the end of the array */ void array_resize(Array *pArray, int capacity) ; /* * Resizes the array with the given capacity */ void array_append(Array *pArray, const Data *pData); /* * Appends the element with the given value to the end of the array */ void array_remove(Array *pArray, int index); /* * Removes the element at the given index */ void array_insert(Array *pArray, int index, const Data *pData); /* * Inserts an element before the specified index */ void array_clear(Array *pArray); /* * Clears the array, leaves the capacity unchanged */ void array_swap(Array *pArray, int idxA, int idxB); /* * Swaps two elements of the array */ void array_shuffle(Array *pArray); /* * Shuffles the array */ int compare_popularity(const Data *pFirst, const Data *pSecond); /* * Compares two data structures on the field 'popularity' */ int compare_energy(const Data *, const Data *); /* * Compares two data structures on the field 'energy' */ int parse_csv(Array *pArray, const char *filename); /* * Parses a CSV file of Spotify Data records and stores them in the array */ #endif // SPOTIFY_H_INCLUDED ``` And here is the implementation that you also have to put in the c project ```c= #include "spotify.h" #include <stdio.h> #include <stdlib.h> void data_print(const Data *pData) { printf("%s - %s (%d), danceability: %.2f, energy: %.2f, popularity: %d, tempo: %.2f\n", pData->artists, pData->name, pData->year, pData->danceability, pData->energy, pData->popularity, pData->tempo); } void array_init(Array *pArray, int capacity) { pArray->count = 0; pArray->capacity = capacity; pArray->values = (Data*) malloc(sizeof(Data[capacity])); } void array_resize(Array *pArray, int capacity) { pArray->values = (Data*) realloc(pArray->values, sizeof(Data[capacity])); pArray->capacity = capacity; } void array_append(Array *pArray, const Data *pData) { if (pArray->count == pArray->capacity) array_resize(pArray, (pArray->capacity + 1) * 3 / 2); pArray->values[pArray->count++] = *pData; } void array_remove(Array *pArray, int index) { pArray->count--; int i ; for ( i = index; i < pArray->count; i++) { pArray->values[i] = pArray->values[i + 1]; } } void array_insert(Array *pArray, int index, const Data *pData) { if (pArray->count == pArray->capacity) array_resize(pArray, (pArray->capacity + 1) * 3 / 2); int i ; for ( i = pArray->count; i > index; i--) pArray->values[i] = pArray->values[i - 1]; pArray->values[index] = *pData; pArray->count++; } void array_clear(Array *pArray) { pArray->count = 0; } void array_swap(Array *pArray, int idxA, int idxB) { Data tmp = pArray->values[idxA]; pArray->values[idxA] = pArray->values[idxB]; pArray->values[idxB] = tmp; } void array_shuffle(Array *pArray) { int i; for (i = 0; i < pArray->count; i++) { array_swap(pArray, i, rand() % pArray->count); } } int compare_popularity(const Data *pFirst, const Data *pSecond) { if (pFirst->popularity < pSecond->popularity) return -1; else if (pFirst->popularity > pSecond->popularity) return 1; else return 0; } int compare_energy(const Data *pFirst, const Data *pSecond) { if (pFirst->energy < pSecond->energy) return -1; else if (pFirst->energy > pSecond->energy) return 1; else return 0; } const char *parseQuoteDelimitedString(char *buf, int size, const char *line) { int pos = 0; while (line[0] != 0 && !(line[0] == '"' && line[1] != '"')) { if (line[0] == '"' && line[1] == '"') { line++; if (pos < size - 1) buf[pos++] = '\"'; } else if (pos < size - 1) { buf[pos++] = line[0]; } line++; } if (line[0] == '"') line++; buf[pos] = 0; return line; } const char *parseCommaDelimitedString(char *buf, int size, const char *line) { int pos = 0; while (line[0] != 0 && line[0] != ',') { if (line[0] == '"' && line[1] == '"') { line++; if (pos < size - 1) buf[pos++] = '\"'; } else if (pos < size - 1) { buf[pos++] = line[0]; } line++; } buf[pos] = 0; return line; } const char *parseString(char *buf, int size, const char *line) { // skip spaces while (line[0] == ' ') line++; if (line[0] == '"') { // parse as quote-delimited string line = parseQuoteDelimitedString(buf, size, line + 1); } else { // parse as comma-delimited string line = parseCommaDelimitedString(buf, size, line); } // expect comma if (line[0] != ',') { return NULL; } else { return line; } } int parse_record(const char *line, Data *pData) { int count; const char *original = line; line = parseString(pData->artists, MAX_ARTIST_SIZE, line); if (!line) { fprintf(stderr, "Error parsing %s\n", original); return 0; } if (sscanf(line, ",%f,%f,%22s,%n", &pData->danceability, &pData->energy, pData->id, &count) < 3) { return 0; } line = parseString(pData->name, MAX_NAME_SIZE, line + count); if (!line) { return 0; } if (sscanf(line, ",%d,%f,%d", &pData->popularity, &pData->tempo, &pData->year) < 3) { return 0; } return 1; } int parse_csv(Array *pArray, const char *filename) { array_clear(pArray); FILE *dataFile = fopen(filename, "r"); if (dataFile) { // storage for a single line char buf[1024]; // skip the first (header) line if (fgets(buf, 1024, dataFile)) { while (fgets(buf, 1024, dataFile)) { Data record; if (!parse_record(buf, &record)) { fprintf(stderr, "Error parsing the following line:\n*%s*\n", buf); } else { array_append(pArray, &record); } } } return 1; } else { fprintf(stderr, "Error opening file %s\n", filename); return 0; } } ```