owned this note
owned this note
Published
Linked with GitHub
# [AIdrifter CS 浮生筆錄](https://hackmd.io/@iST40ExoQtubds5LhuuaAw/rypeUnYSb?type=view#Competitive-Programmer%E2%80%99s-Handbook) Competitive Programming's Handbook : Ch3 Sorting and Binary Search && Ch4 Data Structure(STL)
# Sorting
## Sorting theory
### Basic $O(n^2)$
- bubble sort
```C++
for (int i = 0; i < n; i++) {
for (int j = 0; j < n-1; j++) {
if (array[j] > array[j+1]) {
swap(array[j],array[j+1]);
}
}
}
```
- Inversion
- 元素放在錯的位置 排序需要修正到對的位置
The number of inversions indicates
how much work is needed to sort the array.
On the other hand, if the array elements are in the
**reverse order**, the number of inversions is the largest possible:
$1+2+\cdots+(n-1)=\frac{n(n-1)}{2} = O(n^2)$
- worst case $O(n^2)$
- :::info
如果只能swap**連續元素** => worst case會到 $O(n^2)$
:::
Swapping a pair of consecutive elements that are
in the wrong order removes exactly one inversion
from the array. Hence, if a sorting algorithm can only
swap **consecutive elements**, each swap removes
at most one inversion, and the time complexity
of the algorithm is at least $O(n^2)$.
### Advanced O(nlogn)
:::info
高等sort不需要限定只能交換連續元素 => worst case可以到 $O(n \log n)$
:::
- It is possible to sort an array efficiently
in $O(n \log n)$ time using algorithms
that are not limited to **swapping consecutive elements**.
- Sorting lower bound
- 在compare and swap 概念下 有可能突破O(nlogn)速度嘛?
Is it possible to sort an array faster than in O(nlogn) time?
It turns out that this is not possible when we restrict ourselves to sorting algorithms that are based on comparing array elements.
- **Decision tree**
![](https://i.imgur.com/FnekLN0.png)
leaf為最後的排序全部結果
3!種可能 = {1,2,3} {1,3,2} {3,1,2} {2,1,3} {2,3,1} {3,2,1}
說明了對於基於比較的排序算法,我們**至少需要 Ω(nlogn)次比較**,不然最下面那支的葉子節點的結果我們就得不到。當然,最重要的的是,我們這裡得出的結論是對於**任意的排序算法**,任意的哦!
Here ''$x<y?$'' means that some elements $x$ and $y$ are compared.
If $x<y$, the process continues to the left, and otherwise to the right.
The results of the process are the possible ways to sort the array, a total of $n!$ ways.
For this reason, the height of the tree must be at least
- $log_2(n!) = \log_2(1)+\log_2(2)+\cdots+\log_2(n).$
=> $log_2(n!) \ge (n/2) \cdot \log_2(n/2)$
- merge Sort
![](https://i.imgur.com/sOu6jAz.png)
![](https://i.imgur.com/2q6pqIb.png)
### linear Sorting O(n)
- counting sort
- 用array index當作value ,用array 欄位紀錄出現次數
**counting sort** that sorts an array in
$O(n)$ time assuming that every element in the array
is an integer between $0 \ldots c$ and $c=O(n)$.
Counting sort is a very efficient algorithm
- 只有在range C 不能很大的時候用 => array 元素不可以太多
but it can only be used when the constant $c$
is small enough, so that the array elements can
be used as indices in the bookkeeping array.
## Sorting in C++
```cpp
/* increasing or ascending */
vector<int> v = {4,2,5,3,5,8,7};
int a[] = {4,2,5,3,5,8,7};
sort(v.begin(),v.end());
sort(a,a+7);
// desending
sort(v.rbegin(),v.rend());
// string
string s = "monkey";
sort(s.begin(), s.end());
```
### Comparsion operators
pairs,先比first元素,在比second 元素,如果想要自定義有兩種作法:
1. user-defined structure : 自己寫運算子重載
2. comparsion functions : 自己寫compare function
### User-defined structure
For example, the following struct **{P}** contains the x and y coordinates of a point.
The comparison operator **<** is defined so that
the points are sorted primarily by the x coordinate and secondarily by the y coordinate.
```cpp
struct P {
int x, y;
bool operator<(const P &p) {
if (x != p.x) return x < p.x;
else return y < p.y;
}
};
```
### Comparison functions
- It is also possible to give an external
**comparison function** to the sort function
as a **callback function**.
```cpp
bool comp(string a, string b) {
if (a.size() != b.size()) return a.size() < b.size();
return a < b;
}
sort(v.begin(), v.end(), comp);
```
## Binary Search
### homemade
- 如果元素已經全部排序好,這是最好的方法。
The algorithm halves the size of the region at each step,
so the time complexity is $O(\log n)$.
- 正常版
```cpp
int a = 0, b = n-1;
while (a <= b) {
int k = (a+b)/2;
if (array[k] == x) {
// x found at index k
}
if (array[k] > x) b = k-1;
else a = k+1;
}
```
- **控制 jump 長度b的大小**
- **b** : jump的長度,會逐步遞減:
At each step, the jump length will be halved:
first $n/4$, then $n/8$, $n/16$, etc., until
finally the length is 1.
- 如果[k+b] <= x 代表x在右邊 k=n/2 (now loop, condition success)
如果[k+b] > x 代表x在左邊 k=n/4 (next loop)
```cpp
int k = 0;
for (int b = n/2; b >= 1; b /= 2) {
while (k+b < n && array[k+b] <= x) k += b;
}
if (array[k] == x) {
// x found at index k
}
```
### C++ functions
- 這些function都是base binary search去實作的
- `lower_bound()` returns a pointer to the first array element whose value is **at least** x **(<)**.
- `upper_bound()` returns a pointer to the first array element whose value is **larger than** x **(>)**.
- equal_range returns both above pointers.
```cpp
// 0 1 2 3 4 5 6
vector<int> V = {10,10,10,20,20,20,30};
auto l = lower_bound(V.begin(), V.begin()+V.size(), 19) - V.begin();
auto u = upper_bound(V.begin(), V.end(), 20) - V.begin();
// l(3) u(6)
printf(" l(%lu) u(%lu)",l,u);
```
- binary Serach
- `{10,10,10,20,20,20,30}`
- 如果是x=20就會剛好找到,如果是x=19就找不到。
```cpp
auto k = lower_bound(A, A+n, x) - A;
if (k<n && A[k] == x){
// x found at index k
}
```
- 找出value為x的總共有幾個元素:
```cpp
auto l = lower_bound(A, A+n, x) - A;
auto u = upper_bound(A, A+n, x) - A;
cout<< b - a<< "\n";
```
```cpp
auto r = equal_range(A, A+n, x);
cout<< r.second - r.first <<"\n";
```
### Finding the smallest solution
We are given a function ${ok}(x)$
that returns ${true}$ if $x$ is a valid solution and ${false}$ otherwise.
In addition, we know that ${ok}(x)$ is ${false}$
when $x<k$ and ${true}$ when $x \ge k$.
The situation looks as follows:
這邊可以和前面的`while(k+b < n && A[k+b]<=x ) k+=b` 比較:
![](https://i.imgur.com/zOvKr4Z.png)
```cpp
int x = -1;
// The initial jump length z has to be large enough
// Time Complexity: o(log(z))
for (int b = z; b >= 1; b /= 2) {
while (!ok(x+b)) x += b;
}
int k = x+1;
```
### Finding the maximum Value
可以幫我們找到最大y座標時候的x(接近)。
Binary search can also be used to find the maximum value for a function that is first increasing and then decreasing.
$f(x)<f(x+1)$ when $x<k$, and
$f(x)>f(x+1)$ when $x \ge k$.
![](https://i.imgur.com/sreAPlU.png)
```cpp
int f(int x){
return -(2*x-1)*(x+2);
}
int x = -3;
for (int b = 5; b>=1; b/=2){
while( f(x+b) < f(x+b+1)) x += b;
}
int k = x+1;
// k is -1
cout<<k<<"\n";
```
Note that unlike in the ordinary binary search,
here it is not allowed that **consecutive values** of the function are equal.
# Data Structures
## Dynamic arrays
### Vector
- C++ vector,支援以下操作:
- init
- push_back()
- pop_back()
- erase() // be careful ...
- back()
- insert()
```cpp
// size 10, initial value 0
vector<int> v(10);
// size 10, initial value 5
vector<int> v1(10, 5);
// A shorter way to iterate through a vector is as follows
for (auto x : v) {
cout << x << "\n";
}
v.push_back(5); // 0 ... 0 5
v.push_back(2); // 0 ... 0 5 2
cout << v.back() << "\n"; // 2
v.pop_back();
cout << v.back() << "\n"; // 5
v.insert(v.begin()+9, 10); // 0 ... 10 5
```
### String
- ${substr}(k,x)$ returns the substring that begins at position $k$ and has length $x$,
- $\texttt{find}(\texttt{t})$ finds the position of the first occurrence of a substring $t$.
```cpp
string a = "hatti";
// there is special syntax for strings
// that is not available in other data structures.
// Strings can be combined using the {+} symbol.
string b = a+a;
cout << b << "\n"; // hattihatti
b[5] = 'v';
cout << b << "\n"; // hattivatti
string c = b.substr(3,4);
cout << c << "\n"; // tiva
```
## Set
`<set>`
- balanced binary tree : operations work $O(\log n)$
- maintains the order of the elements
- count只有**0 or 1**,sets is that all their elements aredistinct.
- set會用ordered去存,所以最後一個元素必為largest element。
`<unordered_set>`
- hahsing : operations work $O(1)$ time on average.
- unorder_set並沒有用orderd去存,所以速度快。
```cpp
set<int> s;
s.insert(3);
s.insert(2);
s.insert(5);
cout << s.count(3) << "\n"; // 1
cout << s.count(4) << "\n"; // 0
s.erase(3);
s.insert(4);
cout << s.count(3) << "\n"; // 0
cout << s.count(4) << "\n"; // 1
```
```cpp
set<int> s = {2,5,6,8};
cout << s.size() << "\n"; // 4
for (auto x : s) {
cout << x << "\n";
}
```
- multiset and unordered_multiset
- count 可以突破1
```cpp
multiset<int> s;
s.insert(5);
s.insert(5);
s.insert(5);
cout << s.count(5) << "\n"; // 3
// reombe only one instance
s.erase(s.find(5));
cout << s.count(5) << "\n"; // 2
// remove all einstances of an element
s.erase(5);
cout << s.count(5) << "\n"; // 0
```
## Map
- A mapis a generalized array that consists of key-value-pairs.
- the keys in a map can be of any data type and they do nothave to be consecutive values
`<map>` : balanced binary tree: $O(\log n)$
`<unodered_map>`: hashing : $O(1)$
```cpp
// if the value of a key is requested but the map does not contain it,
// the keyis automatically added to the map with a default value.
map<string,int> m;
cout << m["aybabtu"] << "\n"; // 0
// check key exist
if (m.count("aybabtu")) {
// key exists
}
// traversal
for (auto x : m) {
cout << x.first << " " << x.second << "\n";
}
```
## Iterators and ranges
Note the asymmetry in the iterators:
$\texttt{s.begin()}$ points to an element in the data structure,
while $\texttt{s.end()}$ points outside the data structure.
Thus, the range defined by the iterators is $emph{half-open}$.
![](https://i.imgur.com/LBseQ41.png)
```cpp
sort(v.begin(), v.end());
reverse(v.begin(), v.end());
random_shuffle(v.begin(), v.end());
sort(a, a+n);
reverse(a, a+n);
random_shuffle(a, a+n);
```
```cpp
auto it = s.find(x);
if (it == s.end()) {
// x is not found
}
```
- finds the element nearest to $x$
- assumes that the set is not empty, and goes through all possiblecases using an iteratorit.
```cpp
auto it = s.lower_bound(x);
if (it == s.begin()) {
cout << *it << "\n";
} else if (it == s.end()) {
it--;
cout << *it << "\n";
} else {
int a = *it; it--;
int b = *it;
if (x-b < a-x) cout << b << "\n";
else cout << a << "\n";
}
```
## Bitset
- A bitsetis an array whose each value is either 0 or 1.
- require less memory than ordinaryarrays, because each element in a bitset only uses one bit of memory
```cpp
// count 1
bitset<10> s(string("0010011010"));
cout << s.count() << "\n"; // 4
// logic operation
bitset<10> a(string("0010110110"));
bitset<10> b(string("1011011000"));
cout << (a&b) << "\n"; // 0010010000
cout << (a|b) << "\n"; // 1011111110
cout << (a^b) << "\n"; // 1001101110
```
## Deque
- 比vecotr多了`push_front()` 和 `pop_front()` 操作。
```cpp
deque<int> d;
d.push_back(5); // [5]
d.push_back(2); // [5,2]
d.push_front(3); // [3,5,2]
d.pop_back(); // [3,5]
d.pop_front(); // [5]
```
## Stack
- adding an element to the top => `push()`
- removing an element from the top => `pop()`
- It is only possible to access the top element of a stack => `top()`
```cpp
stack<int> s;
s.push(3);
s.push(2);
s.push(5);
cout << s.top(); // 5
s.pop();
cout << s.top(); // 2
```
## Queue
- It is only possible to access the first and last element of a queue.
```cpp
queue<int> q;
q.push(3);
q.push(2);
q.push(5);
cout << q.front(); // 3
q.pop();
cout << q.front(); // 2
```
## Priority Queue
- A priority queuemaintains a set of elements.
- **Heap** is faster than balanced binary tree(ordered set)
- By default, the elements in a C++ priority queue are sorted in **decreasing order**, and it is possible to find and remove the largest element in the queue.
```cpp
priority_queue<int> q;
q.push(3);
q.push(5);
q.push(7);
q.push(2);
cout << q.top() << "\n"; // 7
q.pop();
cout << q.top() << "\n"; // 5
q.pop();
q.push(6);
cout << q.top() << "\n"; // 6
q.pop();
```
- 如果要改成decreased order
- `greather<int>`
```cpp
priority_queue<int,vector<int>,greater<int>> q;
```
## Policy-Based data structure
`<index_set>`
- logarithmatic time
- 可以直接透過額外的inedx去access key。
- The speciality of this set is that we have access to the indices that the elementswould have in a sorted array.
```cpp
#include <ext/pb_ds/assoc_container.hpp>
using namespace __gnu_pbds;
indexed_set s;
s.insert(2);
s.insert(3);
s.insert(7);
s.insert(9);
// index: 0 1 2 3
// key : 2 3 7 9
// val : 0 0 0 0
auto x = s.find_by_order(2);
cout << *x << "\n"; // 7
cout << s.order_of_key(7) << "\n"; // 2
// If the element does not appear in the set,
// we get the position that the element would have in the set:
cout << s.order_of_key(6) << "\n"; // 2
cout << s.order_of_key(8) << "\n"; // 3
```
## Comparsion Sorting
Let us consider a problem where
we are given two lists $A$ and $B$
that both contain $n$ elements.
Our task is to calculate the number of elements
that belong to both of the lists.
For example, for the lists
$A = [5,2,8,9]$ $B = [3,2,9,5]$
the answer is 3 because the numbers **2, 5 and 9** belong to both of the lists.
${Algorithm 1}$
We construct a set of the elements that appear in $A$,
and after this, we iterate through the elements
of $B$ and check for each elements if it also belongs to $A$.
This is efficient because the elements of $A$ are in a set.
Using the $set$ structure,
the time complexity of the algorithm is $O(nlog n)$.
```cpp
int count = 0;
set<int> s;
for(auto a:A) // n * log(n)
s.insert(a);
for(auto b:B) // n * O(1) ??
if(s.count(b)) count++;
return count;
```
${Algorithm 2}$
It is not necessary to maintain an ordered set,
so instead of the ${set}$ structure
we can also use the ${unordered\_set}$ structure.
This is an easy way to make the algorithm
more efficient, because we only have to change
the underlying data structure.
The time complexity of the new algorithm is $O(n)$.
```cpp
int count = 0;
unordered_set<int> s;
for(auto a:A) // n * O(1)
s.insert(a);
for(auto b:B) // n * O(1)
if(s.count(b)) count++;
return count;
```
${Algorithm 3}$
Instead of data structures, we can use sorting.
First, we sort both lists $A$ and $B$.
After this, we iterate through both the lists
at the same time and find the common elements.
The time complexity of sorting is $O(n \log n)$,
and the rest of the algorithm works in $O(n)$ time,
so the total time complexity is $O(n \log n)$.
```cpp
sort(A.begin(), A.end()); // O(nlogn)
sort(B.begin(), B.end()); // O(nlogn)
int count;
auto a_itr = A.begin();
auto b_itr = B.begin();
// O(n)
while(a_itr != A.end() || b_itr != B.end()){
if(*a_itr > *b_itr) b_itr++;
if(*a_itr < *b_itr) a_itr++;
if(*a_itr == *b_itr) count++;
return count;
}
```
- 因為unodered_set是用hash table去實作,所以algorith2一定比algotirhm1快,但是涉及到set的操作,即便時間複雜度都是$O(nlogn)$,sort還是快上許多,因為他操作的資料結構沒這麼複雜,不需要去維護balanced binary tree(unordered_set), 或是hash table(set)。
![](https://i.imgur.com/tlr4G3d.png)
# Reference
Lower Bound:
http://www.zitaoliu.com/cs/algorithm/2011/04/20/introduction-to-algorithm-problem-complexity-and-adversarial-lower-bound/
C++ Standard Libraries(STL):
https://hackmd.io/ww2WwMVDRZWo9CMm2FgtYA?view