排序與搜尋 - HackMD

{%hackmd @hipp0\Hippotumuxthem %} <style> .text-center{ text-align: center; //文字置中 } .text-left{ text-align: left; //文字靠左 } .text-right{ text-align: right; //文字靠右 } </style> <style> .reveal .slides { text-align: left; font-size:30px; } </style> # 排序與搜尋 --- # 排序與搜尋 - 複習 - 常用小工具 - 排序 - 內建排序 - 二分搜 --- <h2 style='color:#C4C400'> 複習 </h2> ---- <h2 style='color:#C4C400'> STL </h2> 最常用 - vector - set - map 某些演算法會用到(暫時先跳過) - queue - priority_queue ---- <h3 style='color:#C4C400'> vector </h3> 可以把它當成 array 的用法，而且他是有序的，所以可以用 vec\[i]。 ---- <h3 style='color:#C4C400'> push_back() </h3> ```cpp= vector<int> vec; int n = 5; for (int i = 0 ; i < 5 ; i++) { vec.push_back(i); } // 之後就可以當 arr for (int i = 0 ; i < 5 ; i++) { cout << vec[i] << endl; } ``` ---- <h3 style='color:#C4C400'> 設置大小 </h3> ```cpp= int n = 5; vector<int> vec(n); // or vec.resize(n); // 直接預設大小為 n for (int i = 0 ; i < 5 ; i++) { vec[i] = i; } for (int i = 0 ; i < 5 ; i++) { cout << vec[i] << endl; } ``` ---- <h3 style='color:#C4C400'> pop_back()/empty() </h3> ```cpp= vector<int> vec; int n = 5; for (int i = 0 ; i < 5 ; i++) { vec.push_back(i); } for (int i = 0 ; i < 5 ; i++) { vec.pop_back(); } if (vec.empty()) { cout << "yes" << endl; } ``` ---- <h3 style='color:#C4C400'> 二維vector </h3> ```cpp= int n = 5, m = 6; // int arr[n][m] vector<vector<int>> vec2(n, vector<int>(m)); // or vector<vector<int>> vec2; vec2.resize(n); for (int i = 0.; i < n ; i++) vec2[i].resize(m); // 之後用法跟 arr[][] 差不多 ``` ---- <h3 style='color:#C4C400'> function </h3> 用 STL 傳進函式會很方便，像是 string 和 char 差別 ```cpp= // 可以這樣傳進來 int f (vector<int> vec, int n) { } // arr 版本 int f (int arr[], int n) { } ``` ---- <h3 style='color:#C4C400'> 差別？ </h3> - vector 丟進去改值的話不會改變（複製一個新的） - array 會 (視為 pointer 傳入) 那要怎麼讓 vector 也可以改值？ ---- <h3 style='color:#C4C400'> reference </h3> Reference 就像就像別名一樣，改個名字而已，但實際引用是同一個 ```cpp= int f (vector<int> &vec) { } ``` ---- <h3 style='color:#C4C400'> fill </h3> 想把 vector 裡面設定一個數字需 O(n) 時間 ```cpp= // vec is a vector<int> with size n fill(vec.begin(), vec.end(), key); // array用法 memset(arr, key, sizeof(arr)); ``` ---- <h3 style='color:#C4C400'> set </h3> - set 預設會由小排到大（同個元素只會有一個） - pair 放 set 會先排 first 再排 second - 不是有序的他是一棵 tree （不支援拿取第 k 個元素） - 插入 n 個元素時間複雜度 $O(nlogn)$ - 搜尋元素 (logn) ---- <h3 style='color:#C4C400'> 用法 </h3> ```cpp= set<int> st; for (int i = 5 ; i >= 0 ; i--) { st.insert(i); } st.erase(4); int key = 4; if (st.find(key) != st.end()) cout << "find it" << endl; else cout << "not find" << endl; for (auto it : st) { cout << it << endl; } ``` ---- <h3 style='color:#C4C400'> map </h3> - 一個 key 對應一個 value 會根據 key 由小排到大 - 只會有一個 key，不會有相同的 - 不是有序的他是一棵 tree - 插入搜尋時間複雜度和 set 一樣 ---- <h3 style='color:#C4C400'> 用法 </h3> ```cpp= map<string, int> mp; for (int i = 5 ; i >= 0 ; i--) { string s; int val; cin >> s >> val; mp[s] = val; } if (st.find(key) != st.end()) cout << "find it" << endl; else cout << "not find" << endl; for (auto it : st) { cout << it.first << it.second << endl; } ``` ---- <h3 style='color:#C4C400'> find 指標 </h3> 和尋找有關的指令，都會回傳一個 pointer ```cpp= // mp is map<string, int> // st is set<int> auto ind = mp.find(key); cout << (*)ind.first << (*)ind.second << endl; auto ind2 = st.find(val); cout << (*)ind2 << endl; ``` ---- <h3 style='color:#C4C400'> 清空 </h3> 目前學到的STL支援 clear() 用法注意不是把內容物設 0, 而是回到原始的空容器 ```cpp= mp.clear(); vec.clear(); st.clear(); // vec 需要重設大小 or push_back() insert .. mp[a] = b; st.insert(c); vec.push_back(d); ``` --- <h2 style='color:#C4C400'> 常用小工具 </h2> - min/max() - swap() - __gcd() - abs() - sqrt() - ceil()/floor()/round() ---- <h3 style='color:#C4C400'> min/max </h3> - 兩個 val ```cpp= int a = 5, b = 3; cout << max(a,b); ``` - 多個 val ```cpp= int a = 1, b = 3, c = 2, d = 4; // ... cout << min({a,b,c,d}); ``` ---- <h3 style='color:#C4C400'> swap() </h3> ```cpp= int a = 3, b = 5; swap(a,b); cout << a << " " << b << endl; ``` - xor寫法 (正常不會這樣寫) ```cpp= int a = 3, b = 5; a ^= b; b ^= a; a ^= b; cout << a << " " << b << endl; ``` ---- <h3 style='color:#C4C400'> __gcd() </h3> - 輾轉相除法的 gcd function - c\+\+17 支援 gcd() 用法但考試沒有C\+\+17 - 兩個底線 _ 和 gcd ```cpp= int a = 24654; int b = 123944; cout << __gcd(a,b); ``` ---- <h3 style='color:#C4C400'> abs() </h3> - 取絕對值 ```cpp= int a = 10, b = 5; if (abs(b-a) > 5) { cout << "long distance" << endl; }else{ cout << "short distance" << endl; } ``` ---- <h3 style='color:#C4C400'> sqrt() </h3> - 開根號 ```cpp= // 會回傳 double，用 int 存會自動捨去小數 double c = 1234; cout << sqrt(c) << endl; ``` ---- <h3 style='color:#C4C400'> 取整 </h3> - 四捨五入 round() - 無條件進位 ceil() - 無條件捨去 floor() - 注意是取整 ```cpp= double c = 1.23; double d = 1.35; cout << ceil(c); // 2 cout << floor(c); // 1 cout << round(c+d); // 3 ``` --- <h2 style='color:#C4C400'> 排序演算法 </h2> - 選擇排序法 - 氣泡排序法 - 插入排序法 - 合併排序法 - 快速排序法 ---- <h3 style='color:#C4C400'> Selection Sort </h3> 反覆從未排序的資料中找出最大(小)值，與最前端尚未排序的資料交換。 ---- ![截圖 2024-04-25 下午3.32.16](https://hackmd.io/_uploads/S1bGvtvbC.png) ---- ![截圖 2024-04-25 下午3.33.08](https://hackmd.io/_uploads/S1brDYw-C.png) ---- ![截圖 2024-04-25 下午3.33.33](https://hackmd.io/_uploads/SJc8wYwZ0.png) ---- ![截圖 2024-04-25 下午3.33.54](https://hackmd.io/_uploads/ByRDwFvWR.png) ---- ### <h3 style='color:#C4C400'> 實作 $O(n^2)$ </h3> ```cpp= // remember using reference void solection_sort (vector<int> &arr, int n){ for (int i = 0 ; i < n ; i++) { int mn = i; for (int j = i + 1 ; j < n ; j++) { if (arr[j] < arr[mn]) mn = j; } swap(arr[i],arr[mn]); } } ``` ---- <h3 style='color:#C4C400'> Bubble Sort </h3> 原理是從第一筆資料開始，比較相鄰兩筆資料，如果兩筆大小順序有誤則做交換，反之則不動，接者再進行下一筆資料比較，所有資料比較完第1回合後，可以確保最後一筆資料是正確的位置。 ---- ![截圖 2024-04-25 下午3.36.30](https://hackmd.io/_uploads/H16Z_Fw-0.png) ---- ![截圖 2024-04-25 下午3.36.53](https://hackmd.io/_uploads/S1V7uKDWA.png) ---- ![截圖 2024-04-25 下午3.37.19](https://hackmd.io/_uploads/rJ6E_YDZR.png) ---- ![截圖 2024-04-25 下午3.37.57](https://hackmd.io/_uploads/r1bwuKD-C.png) ---- ### <h3 style='color:#C4C400'> 實作 $O(n^2)$ </h3> ```cpp= // remember using reference void bubble_sort (vector<int> &arr, int n) { while (n--) { bool check = false; for (int i = 0 ; i < n ; i++) { if (arr[i] > arr[i+1]) swap(arr[i],arr[i+1]), check = true; } if (!check) break; } } ``` ---- <h3 style='color:#C4C400'> Insertion Sort </h3> 原理是逐一將原始資料加入已排序好資料中，並逐一與已排序好的資料作比較，找到對的位置插入。 ---- ![image](https://hackmd.io/_uploads/rJJ2viw-0.png) ---- ![image](https://hackmd.io/_uploads/SJY2Psv-R.png) ![image](https://hackmd.io/_uploads/B1ahDsDbA.png) ---- ![image](https://hackmd.io/_uploads/SJY2Psv-R.png) ![image](https://hackmd.io/_uploads/S1-AvsP-0.png) ---- ![image](https://hackmd.io/_uploads/SJY2Psv-R.png) ![image](https://hackmd.io/_uploads/ryF0DiDZ0.png) ---- ![image](https://hackmd.io/_uploads/SJY2Psv-R.png) ![image](https://hackmd.io/_uploads/SJzkuivWR.png) ---- ### <h3 style='color:#C4C400'> 實作 $O(n^2)$ </h3> ```cpp= void insert_sort (vector<int> &arr, int n) { for (int i = 1 ; i < n ; i++) { int target = i; for (int j = i-1 ; j >= 0 ; j--) { if (arr[target] < arr[j]) { swap(arr[target],arr[j]); target = j; } } } } ``` ---- <h3 style='color:#C4C400'> 練習 </h3> 使用以上三種其中一種試試看 - H001 - 休息一下 ---- <h3 style='color:#C4C400'> Merge Sort </h3> 將資料分割成兩部分，再把分割的部分再分割成兩部分，直到每個部份都剩下2個以下，再回頭將這些部分合併。 ### 時間複雜度 $O(nlogn)$ ---- ![image](https://hackmd.io/_uploads/ryqTdjvbA.png) ---- 實作的部分，因為是使用分治法，所以會在分治那篇做實作，先知道原理就好，在排序的部分很少會自己刻了。 ---- <h3 style='color:#C4C400'> 快速排序法 </h3> 在混亂的資料中，快速排序是目前公認效率極佳的演算法，原理是先從原始資料列中找一個基準值(Pivot)，接著逐一將資料與基準值比較，小於基準值的資料放在左邊，大於基準值的資料放在右邊，再將兩邊區塊分別再找出基準值，重複前面的步驟，直到排序完為止。 ### 時間複雜度 $O(nlogn)$ ---- 操作流程: 1. 資料列中找出一個基準值(Pivot) 2. 將小於Pivot的資料放在左邊，大於Pivot的資料放在右邊 3. 左右兩邊資料分別重複1~2步驟，直到剩下1筆資料 4. 合併 ---- ![image](https://hackmd.io/_uploads/Bkx2FjDbR.png) ---- 實作的部分，同樣是使用分治法，所以會在分治那篇做實作，先知道原理就好，在排序的部分很少會自己刻了。 --- <h2 style='color:#C4C400'> 內建排序 sort ☆☆☆☆☆ </h2> ---- - 在c++ <algorithm\> 裡面的函式，用法是 - sort( RandomIt first, RandomIt last, Compare comp ); - 就是一個資料的頭，跟尾巴，以及排序方式的比較函式 - 同時其時間複雜度也是$O(nlogn)$ 而且優化的最好(如果你能寫贏下個sort就是你的程式碼了) ---- <h3 style='color:#C4C400'> 實作 </h3> 如果不寫比較函數，預設就是由小排到大 ```cpp= vector<int> vec = {1,3,4,2,6}; sort(vec.begin(), vec.end()); for (int i = 0 ; i < 5 ; i++) cout << vec[i] << " "; ``` ---- <h3 style='color:#C4C400'> 比較函數 cmp </h3> - 需要定義一個 bool function - 傳入兩個和排序一樣的值 - 回傳你想要排序的關係 (true false) ---- <h3 style='color:#C4C400'> 實作 </h3> ```cpp= // 假設要排序 vector<int> 要丟 int 進去 bool cmp (int a, int b) { // 把 a 看成資料前面的 // 把 b 看成資料後面的 // 所以假設回傳 a > b // 也就是說，前面資料 > 後面資料才會是 true return a > b; } // main function sort(vec.begin(), vec.end(), cmp); ``` ---- <h3 style='color:#C4C400'> 實作II </h3> ```cpp= // 假設排序 vector<pair<int,int>> vec; // 猜一下如何排序的 bool cmp (pair<int,int> a, pair<int,int> b) { if (a.first == b.first) return a.second > b.second; else return a.first < b.first; } // main function sort(vec.begin(), vec.end(), cmp); ``` ---- <h3 style='color:#C4C400'> 答案 </h3> - 先根據 first 由小到大 - 如果 first 一樣 second 由大排到小 ---- <h3 style='color:#C4C400'> 練習 </h3> 提示： (會用其他型態的也可以 ex: struct) 下次再教 ```cpp= // 存 vector<pair<string, pair<int,int>>> vec; ``` - H002 - 休息一下 --- <h2 style='color:#C4C400'> 搜尋演算法 </h2> 想要在資料裡面，找到某個值在哪 - 線性搜尋 - 二分搜尋 ---- <h3 style='color:#C4C400'> 線性搜尋 </h3> 最簡單也最爆力的方法，直接走訪每個元素，找要的值。舉例來說，想知道班上考X分的那個人是誰。 (用pair存名字和分數) ```cpp= // 假設這邊已經給好所有人的值了 first為名字，second為分數 vector<pair<string,int>> point; int target; cin >> target; for (auto it : point){ if (it.second == target) { cout << it.first << '\n'; break; } } ``` ---- <h3 style='color:#C4C400'> 二分搜尋 </h3> 針對 "以排序好" 的資料，做二分的動作。但請不要認為這很容易，這是很多人很苦惱的地方，因為邊界的判斷以及判斷的方法其實很難掌握。 ---- <h3 style='color:#C4C400'> 會遇到的問題? </h3> 你可能第一次看到二分，會寫出以下程式碼 ```cpp= int l = 0, r = n - 1; while (l < r) { int mid = (l + r)/2; if (vec[mid] > key) l = mid; else if (vec[mid] < key) r = mid; else { cout << mid << endl; break; } } ``` ---- <h3 style='color:#C4C400'> 會遇到的問題? </h3> - 假設現在 l = 5, r = 6，而 key 的位置在 6 - 但是 mid = (l + r)/2 會是 5 - 你會發現會跑無窮迴圈 ---- <h3 style='color:#C4C400'> 方法一 </h3> 針對修改邊界做一點處理 ```cpp= // l 是左邊界 , r 是右邊界 int BinarySearch_min(vector<int> arr,int n,int key) { int l = 0,r = n - 1; while(l < r){ int mid = (l + r)/2; if (arr[mid] < key) l = mid + 1; else r = mid; } return (l + r)/2; } ``` ---- <h3 style='color:#C4C400'> 方法二 </h3> ```cpp= // l 是左邊界 , r 是右邊界 int BinarySearch_Max(vector<int> arr,int n,int key) { int l = 0, r = n - 1; while (l < r) { int mid = (l + r + 1) / 2;//取中間靠後 if (arr[mid] <= key) l = mid; else r = mid - 1; } return (l + r + 1)/2; } ``` ---- <h3 style='color:#C4C400'> 方法三 </h3> 有人會稱之跳跳二分搜 ```cpp= int jump_search (int a[], int n, int x) { if (a[0] >= x) return -1 ; // 檢查第一個是否已經大於X了 int ans = 0; for (int jump = n/2 ; jump > 0 ; jump /= 2) { //jump /= 2 也可以寫成 jump >>= 1 while (ans + jump < n && a[ans + jump] < X) ans += jump; } return ans; } ``` ---- ![image](https://hackmd.io/_uploads/SJ4oynv-0.png) ---- ### <h3 style='color:#C4C400'> 時間複雜度 $O(logn)$ </h3> ---- <h3 style='color:#C4C400'> 練習 </h3> - H003 - 休息一下 ---- <h3 style='color:#C4C400'> 內建二分搜 </h3> - binary_search()：回傳是否在以排序數據中「等於」val (True or False)： ```cpp= bool check = binary_search(v.begin(), v.end(), val); ``` - lower_bound() 找出以排序數據中「大於或等於」val的「最小值」的位置： ```cpp= auto it = lower_bound(v.begin(), v.end(), val); cout << *it; ``` - upper_bound() 找出vector中「大於」val的「最小值」的位置: ```cpp= auto it = upper_bound(v.begin(), v.end(), val); cout << *it ``` ---- <h3 style='color:#C4C400'> 注意 </h3> - 使用lower\upper內建的話，回傳是位置，要取值記得 * - 這兩個雖然也很常用，不過很多時候的二分還是要靠自己寫，所以上面的二分模板請務必記熟。 ---- <h3 style='color:#C4C400'> 挑戰題 </h3> - H004