Array Data Structure

# Introduction Arrays are essential data structures used in programming. They allow us to store and manipulate multiple elements of the same type efficiently. Arrays are used to build more complex data structures like stacks and queues. This article covers definition of arrays, operations on them, and how they are stored in computer memory. We will also see some basic to advanced level problems involving array data structure. ## Definition *Arrays are collections or systematic arrangements of objects of the same data type.* Data types can be integers, characters, doubles, etc. Let's see how to declare an integer array. **Syntax**: `data_type array_name[size]` `int marks[10]` declares an integer array named "marks" with a size of 10. # Usefulness of Arrays in Computer Programming Let's consider an example to understand the usefulness of arrays. **Problem:** We have a class with 10 students, and we need to find the average marks of all the students. **Traditional Approach:** - Let's say we use individual integer variables (`m1`, `m2`, `m3`, ..., `m10`) for storing the marks of each student. - `int m1, m2, . . ., m10` - Now, to find the average, we need to first find the sum. - `int sum = m1 + m2 + . . . + m10` - Then, the average is: - `int average = sum / 10` But if we have a class of 100 students, then it would be very difficult to maintain the 100 individual variables. Also, calculating the sum and average would be cumbersome. **Array based Approach:** - Arrays provide an elegant solution to store and process such data efficiently. - For the given scenario, we can declare an integer array to store the marks of students, `int marks[10]`. - Each element of the array corresponds to the marks of a student. - We can access elements using their indices, starting from 0. So, `marks[0]` indicates the marks of first student, `marks[1]` indicates the marks of second student and so on. - The sum of marks can be calculated using a loop that iterates through the array and adds up the values. ```cpp= int sum = 0; for (i <- 0 to 99) sum = sum + marks[i] ``` - The average is then computed by dividing the sum by the total number of students. ```cpp= int average = sum / 10 ``` ### Note > - Arrays can be created using different data types. > - Apart from integers, arrays can hold characters, floats, and other data types. > - Example: `char characters[5]`, `float prices[20]`, etc. > - Arrays of different data types allow us to store and manipulate diverse kinds of information. > - Arrays can have multiple dimensions. > - In addition to one-dimensional arrays (like `marks[10]`), we can have two-dimensional (like `sudoko[9][9]`), three-dimensional (like `maze[10][10][10]`, and higher-dimensional arrays. > - Two-dimensional arrays can be thought of as matrices or tables. > - Multidimensional arrays provide a powerful way to represent and process structured data. # Array Storage in Computer Memory ![](https://hackmd.io/_uploads/H1WQgTkth.jpg) Computer memory is divided into segments or partitions, with each byte of memory having an address. Memory can be considered as a large array of bytes, and each byte has a unique address. > In C and C++, there are pointer variables which can store these memory addresses. When an array is declared, the computer allocates the required memory for its elements. For example, if we have an integer array with a size of 5, `int A[5]` and each integer occupies 4 bytes, the total size of the array would be 20 bytes. Therefore, the computer will allocate 20 bytes of memory to the array `A`. ![](https://hackmd.io/_uploads/ByA7g6Jt3.jpg) - The memory allocation for the array is continuous, meaning the elements are stored in adjacent memory locations. - Accessing array elements becomes efficient due to the continuous memory allocation. - Modification of array elements takes constant time (O(1)). Let us see why - The array `A` is also known as the `pointer to the first element` and respresents the `base address`. So, we can access any element's address and value using the base address (i.e., `A`) and an offset (discussed below), resulting in constant, i.e., O(1) time. ## Arrays and Pointers * In C and C++, arrays are represented by a pointer to the first element of the array. * The base address of an array is also the address of its first element. * Array indexing is performed using the base address and an offset based on the element's data type size. * Pointer arithmetic allows us to access elements by calculating the address using the base address and offset. *Let's explore how we can retrieve the address and value of array elements using a C program.* **Example**: ```c= #include<stdio.h> int main() { int A[5]; printf("Base address: %d\n", A); printf("Value at base address: %d\n", *A); } ``` Output: ``` x0f5869765859 -4 ``` Here, `A` represents the base address so when printed it prints the address of the memory location it has been assigned to. Whereas, with `*A`, we dereference a pointer and get its value. Since, we have not initialized the array `A`, we got a garbage value stored at the first location. Let us now initialize the array first and try accessing its address and elements. ```c= #include<stdio.h> int main() { int A[5]; A[0] = 2; A[1] = 4; A[2] = 5; A[3] = 8; A[4] = 1; printf("Base address: %d\n", A); printf("Value at base address: %d\n", *A); } ``` Output: ``` x0f5869765859 2 ``` This time we have got 2, which is the value present at first location. Now, let's say, we add another line: `printf("%d\n", A + 1)`. *What would this represent?* This represents the address of second element. We have added 1 here which is known as offset, the offset increases the address by the size of the data type. Since it is an integer array, `A + 1` = `base address + 4 bytes` = location of second element (as array elements are stored in contiguous memory locations). Similary, to get the value present at the second location, we have to just dereference the pointer associated with second element's memory location, i.e., `printf("%d\n", *(A + 1))` Thus, accessing an element of an array can be done in constant time either via `A[i]` or `*(A + i)`. > Java doesn't have pointers but internal working and storage of array elements in memory follows the same set of rules. # Problems based on Arrays In previous sections, we have discussed the basic concepts, internal working, design and access pattern of arrays. Let us now get our hands dirty on some very famous array problems. ## Problem 1 *Given an array representing the heights of buildings, the task is to calculate the total amount of water trapped between the buildings.* ![](https://hackmd.io/_uploads/rkqklayY3.jpg) `heights = [0, 1, 0, 2, 1, 0, 1, 3, 2, 1, 2, 1]` Output: 6 ### Observations * To calculate the total amount of water, we need to determine the amount of water trapped above each building. * The amount of water trapped above a building depends on the presence of taller buildings on its left and right sides. * A building cannot trap water if there is no taller building on its left and right side. * The height of the water column trapped above a building is the minimum of the maximum heights on its left and right sides, minus its own height. ### Approach * For each building, calculate the maximum height on its left side (leftMax). * For each building, calculate the maximum height on its right side (rightMax). * For each building, the height of the water column trapped above it will be: `min(left_max, right_max) - height of the building`. Please note if there is no longer building than the current building, the above formula can yield a negative height, which isn't practically possible. So, use `max(0, trapped_water_height)` for such scenario. * Sum up the heights of all the water columns calculated in the previous step to get the total amount of trapped water. Example: - Given heights: `[0, 1, 0, 2, 1, 0, 1, 3, 2, 1, 2, 1]` - Water trapped at building 1 is: - current_height = 0, left_max = 0, right_max = 3 - trapped_water_height = min(0, 3) - 0 = 0 - actual_trapped_water = max(0, 0) = 0 - Water trapped at building 2 is: - current_height = 1, left_max = 0, right_max = 3 - trapped_water_height = min(0, 3) - 1 = -1 - actual_trapped_water = max(0, -1) = 0 - Water trapped at building 3 is: - current_height = 0, left_max = 1, right_max = 3 - trapped_water_height = min(1, 3) - 0 = 1 - actual_trapped_water = max(0, 1) = 1 - Similarly, water trapped at building 12 is: - current_height = 1, left_max = 3, right_max = 0 - trapped_water_height = min(1, 0) - 1 = -1 - actual_trapped_water = max(0, -1) = 0 **Time Complexity Analysis:** * Calculating `left_max` for an element takes O(N) time as it would required traversing the array from current element to the left end. * Similarly, calculating `right_max` for an element takes O(N) time as it would required traversing the array from current element to the right end. * Therefore, calculating `left_max` and `right_max` for an element would take O(N) + O(N) = O(2N) time. * Doing it for N elements would take O(N) * O(2N) = O(2N^2) = O(N^2) time. **How can we optimize this?** For the array, `heights = [0, 1, 0, 2, 1, 0, 1, 3, 2, 1, 2, 1]`. Let's try calculating the `left_max` for the last building (at 11th index). - First initialize a variable with 0 height, i.e., `left_max = 0` - Start a loop from i = 0 to 11 - At i = 0, `left_max = max(left_max, height[0]) = 0` - At i = 1, `left_max = max(left_max, height[1]) = 1` - At i = 2, `left_max = max(left_max, height[2]) = 1` - At i = 3, `left_max = max(left_max, height[3]) = 2` - At i = 4, `left_max = max(left_max, height[4]) = 2` - At i = 5, `left_max = max(left_max, height[5]) = 2` - At i = 6, `left_max = max(left_max, height[6]) = 2` - At i = 7, `left_max = max(left_max, height[7]) = 3` - At i = 8, `left_max = max(left_max, height[8]) = 3` - At i = 9, `left_max = max(left_max, height[9]) = 3` - At i = 10, `left_max = max(left_max, height[10])= 3` So, the `left_max` at 11th index is 3. If you observe carefully, in this complete process, we calculated the maximum left height at every index. The value of `left_max` variable at index `i` is the maximum left height for the building at index `i + 1`. Therefore, we do not need to calculate the `left_max` for every index separately, we can simply traverse the array once and calculate the `left_max` at every index and store them in an array. Similary, we can calculate the `right_max` of every element in single traversal and store it in another array. In the end, we just need another traversal of array to sum up the height of trapped water using formula: `max(0, min(left_max[i-1], right_max[i+1]) - height[i])` This has decreased our time complexity from O(N ^ 2) to O(N) as we are just traversing the array thrice. ### Pseudocode ```c= left_max[n], right_max[n]; left_max[0] = height[0]; for (i = 1; i < n; i++) { left_max[i] = max(left_max[i-1], height[i-1]) } right_max[n-1] = height[n-1]; for (i = n-2; i >= 0; i--) { right_max[i] = max(right_max[i+1], height[i+1]) } water = 0 for (int i = 0; i < n; i++) { water += max(0, min(left_max[i-1], right_max[i+1]) - height[i]) } ``` ### Complexity Analysis: * Time Complexity: O(N) - We iterate through the array once to calculate left_max and right_max each, and once to calculate the water trapped above each building. * Space Complexity: O(N) - We use additional space to store the left_max and right_max arrays. ## Problem 2 *Given an array, we need to find the majority element.* > The majority element is the element that occurs more than n/2 times, where n is the size of the array. *If there is a majority element, return it. Otherwise, return -1.* Input: `array = [3, 3, 4, 2, 4, 4, 2, 4, 4]` Output: 4 Explanation: - Since the size of the array is 9, the majority element in this case is 4 because it occurs 5 times (more than n/2). > *Can there be two majority elements?* > > Let's say there is an array, `A` of size N, and it has two majority elements, `a` and `b`. So, > `count(a)` + `count(b)` <= N > For `a` and `b` to be the majority element, there count should be atleast N/2 + 1 each. Therefore, > N/2 + 1 + N/2 + 1 <= N > N + 2 <= N, which can never be true. > > So, there can be at most 1 majority element. ### Approach **1. Brute Force Approach** * One way to solve the problem is by using a brute force approach. * We can run two loops to iterate through each element in the array and find its count. For each element, we select it and count its occurrences in the remaining array. * If the total count is greater than n/2, we return the element as the majority element. * If no majority element is found, we return -1. **Example**: - Consider the array: [3, 3, 4, 2, 4, 4, 2, 4, 4] - We start with the first element 3 and count its occurrences: 2 times. Since 2 < 9/2, skip and go to next element. - Next element is 4. Its count is 5. Since 5 > 9/2, we return the element 4. **Complexity Analysis:** * Time Complexity: O(N ^ 2) - We need two nested loops to calculate the frequency of each element. * Space Complexity: O(1) - We need not to use any additional space. **2. Using HashMap**: - An optimized approach involves using a HashMap. - We can create a HashMap to store the frequency of each array element. - Iterate through the array and update the frequency count in the HashMap. - After iterating through the array, we check each element's frequency in the HashMap. - If an element's frequency is greater than n/2, we return it as the majority element. **Example**: - Consider the array: [3, 3, 4, 2, 4, 4, 2, 4, 4] - We create a HashMap to store the frequency of each element: {3: 2, 4: 5, 2: 2} - Iterate through the HashMap and check if any element's frequency is greater than n/2. - The element 4 has a frequency of 5, which is greater than 9/2. - Thus, the majority element is 4. **Complexity Analysis:** * Time Complexity: O(N) - We will iterate the array once to update the frequency against every element. * Space Complexity: O(N) - We will need to use a hashmap which will cost O(N) space at worst. **3. Using Boyer-Moore Voting Algorithm:** **Observations:** - We know that, |M| > N/2, where M is majority element and |M| is the frequency of M. - So, if majority element exists, then `count of all other elements combined < N/2`. Therefore, |M| > |other elements combined|. - Now, if we remove one majority element and one other element, even then |M| > |other elements combined|. - The Boyer-Moore Voting Algorithm is based on the same observation. It assumes that the majority element will exist. And, if exists, and we remove two distinct elements from the array the majority element will still be there. **Algorithm:** - We initialize a candidate variable to the first element of the array and a count variable to 1. - Iterate through the remaining elements of the array: - If the current element is equal to the candidate element, increment the count. - If the current element is different from the candidate element, decrement the count (which essentially means removing two distinct elements from the array). - If the count becomes zero, update the candidate to the current element and reset the count to 1. - After iterating through the array, the candidate element will be the potential majority element. - We need to verify if it occurs more than n/2 times in the array. - Iterate through the array again and count the occurrences of the candidate element. - If the count is greater than n/2, return the candidate element as the majority element. **Example using Boyer-Moore Voting Algorithm:** - Consider the array: [3, 3, 4, 2, 4, 4, 2, 4, 4] - Initialize the candidate as 3 and count as 1. - Iterate through the remaining elements: - Element 3 is the same as the candidate, so increment the count to 2. - Element 4 is different from the candidate, so decrement the count to 1. - Element 2 is different from the candidate, so decrement the count to 0. - Update the candidate to 4 and reset the count to 1. - Element 4 is the same as the candidate, so increment the count to 2. - Element 2 is different from the candidate, so decrement the count to 1. - After iterating through the array, the candidate element is 4. - We need to verify if it occurs more than 9/2 times (since the array size is 9). - Count the occurrences of 4 in the array: 4, 4, 4, 4, 4 (5 times). - Since 5 > 9/2, the majority element is 4. ### Pseudocode ```c= candidate = arr[0] count = 1 for i = 1 to N-1 do if arr[i] == candidate then count = count + 1 else count = count - 1 if count == 0 then candidate = arr[i] count = 1 count = 0 for i = 0 to N-1 do if arr[i] == candidate then count = count + 1 if count > N / 2 then return candidate ``` **Complexity Analysis:** * Time Complexity: O(N) - We are iterating the array twice with a single loop. * Space Complexity: O(1) - We didn't require any additional space. ## Problem 3 *Given a matrix of size N X M, where each row and column is sorted in ascending order. Find the target value in the matrix and return its coordinates, if found.* Input: Matrix, A = ``` [10, 20, 30, 40, 50] [15, 25, 35, 45, 55] [17, 26, 37, 46, 56] [27, 40, 58, 60, 71] ``` Target Value: 37 Output: `(2, 2)` ### Approach **1. Brute Force Approach**: - The brute force approach involves traversing each element of the matrix and checking if it matches the target. - This approach has a time complexity of O(N * M), where N is the number of rows and M is the number of columns. **2. Binary Search Approach:** - Since each row and column is sorted, we can utilize binary search to optimize the search process. - Binary search has a time complexity of O(log(X)), where 'X' is the length of the array. - We can apply binary search on each row of the matrix to find the target, resulting in time complexity of O(M * log(N)). **3. Optimized Approach:** Let's start with making some observations: - Since the matrix is sorted row wise as well as column wise, if target element is less than the first element of matrix OR greater than the last element of matrix, then the target doesn't exist in the matrix. - Now if the target exist, we should have a starting point to traverse the matrix. - Case 1: If we start from (0, 0), we can either go down or towards right if target is greater than `A[0][0]`. - Case 2: If we start from (M-1, N-1), we can either go up or towards left if target is less than `A[M-1][N-1]`. - Case 3: If we start from (0, N-1), we can go down if target is greater than `A[0][N-1]` otherwise we have to go towards left. - Case 4: If we start from (M-1, ), we can go down if target is less than `A[0][N-1]` otherwise we have to go towards right. - If we choose Case 3 or Case 4 as starting point, our search space will keep on decreasing as we discard a row or column at every step, which isn't the case with starting points discussed in Case 1 and 2. - Here we will choose the Case 3 and start from the top right corner of the matrix. - By comparing the target with the top right element, we can determine whether to move left or down. - If the target is smaller, we move down to the next row. - If the target is larger, we move left to the previous column. - This way, we can discard rows or columns based on the comparison with the target value. - For example, let's find the target value 37 in the matrix: ``` [10, 20, 30, 40, 50] [15, 25, 35, 45, 55] [17, 26, 37, 46, 56] [27, 40, 58, 60, 71] ``` - Starting from the top right corner, we compare 37 with 50. Since 37 < 50, we move left to the previous column. - Now, we compare 37 with 40. Since 37 < 40, we move left to the previous column. - Then, we compare 37 with 30. Since 37 > 30, we move down to the next row. - Finally, we compare 37 with 35. Since 37 > 35, we move down to the next row. - At this point, we have found the target value 37. ### Pseudocode ```c= if (A[0][0] < target || A[N - 1][M - 1] > target) return false; i = 0, j = M - 1 while(i < N && j >= 0) do if (A[i][j] == target) return true; else if (A[i][j] > target) j--; else if (A[i][j] < target) i++; end ``` **Complexity Analysis:** * Time Complexity: O(N + M) - In the worst case, we will be traversing the entire row and entire column, resulting in O(N + M) time. * Space Complexity: O(1) - We didn't require any additional space.