Prefix Sum(前綴和) - HackMD

<style> .reveal .slides { text-align: left; } </style> # Prefix Sum(前綴和) #### TA 許哲瑋 ##### Email: jasper990091@gmail.com --- # What is Prefix Sum? ---- ## Question Given an array of $n$ integers and two indices, $l$ and $r$ $(l \leq r)$, find the sum of all the numbers between index $l$ and index $r$ (inclusive). ---- The simplest approach is to directly add up all the numbers one by one between index $l$ and index $r$ (inclusive). **e.g.** ```shell array: 1 3 5 7 9 11 l = 2, r = 4 (0-based) ``` The answer will be 5 + 7 + 9 = 21 ---- But what if we need to compute the sum for multiple pairs of $l, r$ instead of just one? ---- If we use the previous method, when handling $q$ queries, the worst-case time complexity would be $O(nq)$ ($n$ is the size of array). e.g. ```shell array: 1 3 5 7 9 11 l1 = 1, r1 = 3 -> 3 + 5 + 7 = 15 l2 = 0, r2 = 3 -> 1 + 3 + 5 + 7 = 16 l3 = 2, r3 = 4 -> 5 + 7 + 9 = 21 (0-based) ``` We can observe that the same parts of the array are being recalculated multiple times, which is clearly an inefficient approach. ---- ## That's why we need prefix sum! ---- We can start by creating a new array of the same size as the original array. Let's call the original array `arr` and the newly created array `pre`. We initialize ```shell pre[0] = arr[0] pre[1] = arr[0] + arr[1] pre[2] = arr[0] + arr[1] + arr[2] ``` and so on. ---- In other words, $pre[i]$ stores the sum of all elements from index $0$ to $i$ in `arr`. This array is known as the **prefix sum array**, which allows us to efficiently compute range sums. ---- Returning to the previous problem ```shell array: 1 3 5 7 9 11 l = 2, r = 4 (0-based) ``` We can now obtain the sum of the range by simply computing `pre[4] - pre[1]`. ```shell pre[4] = arr[0] + arr[1] + arr[2] + arr[3] + arr[4] pre[1] = arr[0] + arr[1] pre[4] - pre[1] = arr[2] + arr[3] + arr[4] ``` ---- ## More Generalized: Case 1 ($l=0$): - If $l=0$, then $sum(l,r)=pre[r]$, since there's no need to subtract anything. Case 2 (Otherwise): - $sum(l,r)=pre[r]−pre[l−1]$ ---- Case 1 ($l=0$): ![image](https://hackmd.io/_uploads/ryYJhRsnyg.png) ---- Case 2 (Otherwise): ![image](https://hackmd.io/_uploads/rkJVbXshJx.png =80%x) ---- Using this method, we can create the prefix sum array in $O(n)$ time complexity and reduce each query's complexity to $O(1)$. Thus, the overall complexity becomes $O(n + q)$. ---- Code (in C++): ```cpp vector<int> FindPrefixSum(vector<int> &arr) { int n = arr.size(); vector<int> pre(n); pre[0] = arr[0]; for (int i = 1; i < n; i++){ pre[i] = pre[i - 1] + arr[i]; } return pre; } ``` --- # Example Problem ---- [CSES-1661](https://cses.fi/problemset/task/1661) Given an array of $n$ integers, your task is to count the number of subarrays having sum $x$. ---- We can observe that this problem asks how many subarrays have a sum equal to a given number. By first creating the prefix sum array, the problem is simplified to finding how many pairs $l, r$ $(l \leq r)$ satisfy the equation: $\text{pre}[r] - \text{pre}[l - 1] = x$ (i.e. $sum(l, r) = x$) ---- Now, the problem is transformed into finding how many pairs of numbers in `pre` satisfy the condition that the later number minus the earlier number equals $x$. So, we just need to iterate through the `pre` array one number at a time. For each number, we check how many previous numbers, when subtracted from the current number, equal $x$. Summing up all these counts gives us the final answer. ---- e.g. Suppose we are currently iterating to the **fifth** number in the `pre` array. ```shell x = 6 pre: 1 2 7 2 [8] 6 2 ``` Since **8 needs to subtract 2 to equal the target value 6**, we need to check how many times **2** has appeared before this **8**. In this example, we can see that **there are two occurrences of 2**. ---- Therefore, you can utilize a data structure like **C++'s `map`**, which supports both lookup and modification in $O(\log n)$ time complexity. We use a **map** where the **key $x$** represents a number, and the **value $y$** represents how many times $x$ has appeared **before** the current number. ---- At the beginning, the **map** is empty. After processing each number, we increment its count in **map**. This allows us to efficiently find how many times a specific number has appeared before in $O(\log n)$ time. This allows us to solve the problem efficiently in $O(n \log n)$ time complexity. ---- **Note** - Don't forget to handle the special case where `pre[r]` is equal to the target value, which means the sum of `arr[0]` to `arr[r]` is exactly the target value. - If you are writing the program in C++, use `long long` instead of `int` because the numbers in this problem may exceed the range of `int`. (e.g. `int a = 1;` => `long long a = 1;`) ---- Code (in C++): ```cpp #include<bits/stdc++.h> using namespace std; #define int long long signed main(){ int n, x; cin >> n >> x; vector<int> arr(n); for (int i = 0; i < n; i++){ cin >> arr[i]; } vector<int> pre(n); pre[0] = arr[0]; for (int i = 1; i < n; i++){ pre[i] = pre[i - 1] + arr[i]; } int ans = 0; map<int, int> cnt; for(int i = 0; i < n; i++){ ans += cnt[pre[i] - x]; cnt[pre[i]]++; if(pre[i] == x){ ans++; } } cout << ans << "\n"; } ```