Data Structures and Algorithms Lab 3 - Algorithm Analysis

<h1 style=" display: flex; align-items: center; justify-content: center;" > DSaA Lab 3 - Algorithm Analysis </h1> <h4 style="text-align: center;"> The Islamic University of Gaza<br> Engineering Faculty<br> Department of Computer Engineering </h4> <font color="#ec53c4"> Prepared by: Eng. Ahmad Albarasy </font> <br> <font color="#ec53c4"> Under the supervision of: Prof. Ahmad Mahdi </font> --- In this lab, we'll learn how to analyze our algorithms in terms of time and space complexity, which is a crucial skill to have as an engineer. Understanding complexity helps us evaluate how an algorithm performs with different input sizes and compare it with others to determine which one is faster and more efficient. ## Why do we focus on the Worst-Case Scenario? You'll often find people studying the **worst-case scenario** or the **worst-case input**. But why is that you may ask? Well, the main reason behind that is because the **average-case** is usually hard to determine, given the different set of inputs that our algorithm might get tested with, and their probability of occurence. For instance, take a look at this chart from our book: ![image](https://hackmd.io/_uploads/ryPSwayg-e.png) Say that, for example, our inputs are really only of types `A` or `D`. In that case, we are far from accurately measuring the average-case. An average-case analysis usually requires that we calculate expected running times based on a given input distribution, which usually involves sophisticated probability theory. To give a quick glimpse on how we analyze average case, consider the following simple linear search algorithm that iterates through an array of integers `arr` and returns the index of our `target`. For simplicity, we will also assume that there is always an answer: ```java= int linearSearch(int[] arr, int target) { for (int i = 0; i < arr.length; i++) { if (arr[i] == target) { return i; } } return -1; // won't happen based on our assumption } ``` Intuitively, we know that the best case is finding our target at the first index, taking only one iteration. In contrast, the worst case is that our target is located at index `n-1` at the end of the array. Now, to calculate the average case, we use simple mathematics: $$ E = \frac{1 + 2 + 3 + \dots + n}{n} = \frac{n + 1}{2} $$ But life isn't always that simple, because this result is based on the assumption that the probability distibution of each possible input is a uniform one :) If that’s not the case, we need to account for the probability of finding the target at each position. In other words, each position $i$ has its own probability $Pi$, and the expected number of comparisons becomes: $$ E = (1 \times P_1) + (2 \times P_2) + \dots + (n \times P_n) $$ When earlier elements are more likely to be the target, the expected value $E$ becomes smaller, meaning the algorithm performs better *on average* than in the uniform case. **For that reason**, the average case analysis is rarely used in practice, and that comes in the favor of the Worst-case analysis, because it is much easier than average-case analysis, as it requires only the ability to identify the worst-case input, which is often simple. Also, this approach typically leads to better algorithms, making the standard of success for an algorithm to perform well in the worst case necessarily requires that it will do well on every input. That is, designing for the worst case leads to stronger algorithmic “muscles,” much like a track star who always practices by running up an incline. ## Measuring an Algorithm Efficiency As mentioned in the introduction, our goal is to analyze and evaluate the efficiency of algorithms. We have two ways of doing so: 1. Experimental Studies 2. Theoretical Analysis ### Experimental Studies One way to study the efficiency of an algorithm is to implement it and experiment by running the program on various test inputs while recording the time spent during each execution. A simple mechanism for collecting such running times in Java is based on use of the `currentTimeMillis` method of the `System` class. That method reports the number of milliseconds that have passed since a benchmark time known as the epoch (January 1, 1970 UTC): ```java= long startTime = System.currentTimeMillis(); // record the starting time /∗ (run the algorithm) ∗/ long endTime = System.currentTimeMillis(); // record the ending time long elapsed = endTime − startTime; // compute the elapsed time ``` For extremely quick operations, Java provides a method `nanoTime` that measures in nanoseconds rather than milliseconds. Experimenting with algorithms in this way can be helpful for getting a general idea of how one algorithm performs compared to another. However, the measured times reported by both methods `currentTimeMillis` and `nanoTime` will vary greatly from machine to machine, and may likely vary from trial to trial, even on the same machine. This is because many processes share use of a computer’s central processing unit (or CPU) and memory system; therefore, the elapsed time will depend on what other processes are running on the computer when a test is performed. While the precise running time may not be dependable, experiments are quite useful when comparing the efficiency of two or more algorithms, so long as they gathered under similar circumstances. --- ### Theoritical Analysis Our goal is to develop an approach to analyzing the efficiency of algorithms that: 1. Allows us to evaluate the relative efficiency of any two algorithms in a way that is independent of the hardware and software environment. 2. Is performed by studying a high-level description of the algorithm without need for implementation. 3. Takes into account all possible inputs. **To achieve this goal, we use theoretical analysis**, where a high-level description of the algorithm is sufficient for evaluation, eliminating the need for full implementation. #### Random Access Machine Model (RAM) The RAM model is an idealized model of computation used to analyze algorithms in a way that reflects how real computers work but in a simplified form. Think of it like an imaginary computer that has a CPU and almost an infinite amount of memory, where each memory access takes the same amount of time. #### The Seven Functions used for Theoritical Analysis ![image](https://hackmd.io/_uploads/BJw2--WxWl.png) >[!tip] Fun fact >The Travelling Salesman Problem (TSP) is one of the most famous problems in computer science. It asks for the shortest route that visits every city once and returns to the starting point. Even though it sounds simple, no one has ever found an efficient (polynomial-time) solution for it. The best exact algorithms still take exponential time, making it a classic example of a problem that resists optimization to this day #### Defining Primitive Operations When we analyze an algorithm’s efficiency, we don’t measure real time on a computer. Instead, we count primitive operations. Primitive Operations are the basic computations that a computer can perform in a single step, such as: 1. Evaluating an expression: `(x + y) * z` 2. Assigning a value to a variable: `x = 10` 3. Following an object reference: `arr[2]`, `list.length` 4. Calling a method: `sort(arr)` 5. returning from a method: `return false` --- #### Asymptotic Analysis Asymptotic analysis studies how the running time or space requirements of an algorithm grow as the input size `n` increases toward infinity. * It ignores constant factors and lower-order terms that become insignificant for large inputs. * It provides a high-level estimate of an algorithm’s efficiency. --- #### The Big Oh Notation Let $f(n)$ and $g(n)$ be functions mapping positive integers to positive real numbers. We say that $f(n)$ is $O(g(n))$ if there is a real constant $c > 0$ and an integer constant $n0 ≥ 1$ such that: $$ f(n) \le c \cdot g(n), \quad \text{for } n \ge n_0 $$ This definition is often referred to as the “big-Oh” notation, for it is sometimes pronounced as “ f(n) is $big-Oh$ of g(n).” ![image](https://hackmd.io/_uploads/rynL-GWlbl.png) ## Tasks ### Task 1 (5 points) You are given the following algorithm that generates all 2-element combinations from an array of unique integers: ```java= public class FindCombinations { public static int findCombinations(int[] arr) { int n = arr.length; int total = n * (n - 1) / 2; int[][] result = new int[total][2]; int index = 0; for (int i = 0; i < n; i++) { for (int j = i + 1; j < n; j++) { result[index][0] = arr[i]; result[index][1] = arr[j]; index++; } } return total; } } ``` Analyze the algorithm’s running time and determine its order of growth using the Big-Oh notation. ### Task 2 (5 points) You are given the following algorithm that returns the middle node of a Singly Linked List: ```java= /* * class Node { * int data; * Node next; * * Node(int data) { * this.data = data; * this.next = null; * } * } */ public class FindMiddle { public static Node findMiddle(Node head) { int length = 0; Node temp = head; while (temp != null) { length++; temp = temp.next; } int middleIndex = length / 2; temp = head; for (int i = 0; i < middleIndex; i++) { temp = temp.next; } return temp; } } ``` 1. Analyze the algorithm’s running time and determine its order of growth using the Big-Oh notation. 2. Rewrite the algorithm to find the middle in one pass. --- <div style="display: flex; align-items: center; justify-content: center;" > Happy Coding 🙂 </div>