Try   HackMD

Crust of Rust: Sorting Algorithms

直播錄影

  • 主機資訊
    ​​​​wilson@wilson-HP-Pavilion-Plus-Laptop-14-eh0xxx ~/CrustOfRust> neofetch --stdout
    ​​​​wilson@wilson-HP-Pavilion-Plus-Laptop-14-eh0xxx 
    ​​​​----------------------------------------------- 
    ​​​​OS: Ubuntu 22.04.3 LTS x86_64 
    ​​​​Host: HP Pavilion Plus Laptop 14-eh0xxx 
    ​​​​Kernel: 6.2.0-37-generic 
    ​​​​Uptime: 22 mins 
    ​​​​Packages: 2367 (dpkg), 11 (snap) 
    ​​​​Shell: bash 5.1.16 
    ​​​​Resolution: 2880x1800 
    ​​​​DE: GNOME 42.9 
    ​​​​WM: Mutter 
    ​​​​WM Theme: Adwaita 
    ​​​​Theme: Yaru-dark [GTK2/3] 
    ​​​​Icons: Yaru [GTK2/3] 
    ​​​​Terminal: gnome-terminal 
    ​​​​CPU: 12th Gen Intel i5-12500H (16) @ 4.500GHz 
    ​​​​GPU: Intel Alder Lake-P 
    ​​​​Memory: 2048MiB / 15695MiB 
    
  • Rust 編譯器版本 :
    ​​​​wilson@wilson-HP-Pavilion-Plus-Laptop-14-eh0xxx ~/CrustOfRust> rustc --version
    ​​​​rustc 1.70.0 (90c541806 2023-05-31) (built from a source tarball)
    

Introduction

0:00:00

In this Crust of Rust episode, we implement some common sorting algorithms in Rust. This episode doesn't aim to explain any single concept, but rather showcase what writing "normal" Rust code is like, and explaining various "odd bits" we come across along the way. The thinking here is that sorting algorithms are both familiar and easy to compare across languages, so this might serve as a good bridge into Rust if you are familiar with other languages.

排序在電腦科學是相當常見的,雖然你幾乎不需要在編寫程式的時候自己實作排序演算法,事實上由於它常見的特性,所以它很好作為與其他程式語言的比較點。本次直播的真正目的不是教你如何排序,而是讓你看到這些排序演算法是如何在 Rust 使用慣用的手法(idiomatic way) 去實作的,這樣你就可以跟你熟悉的程式語言做比較。

之前直播的教學方式是專注於特定的 Rust 功能,本次直播更關注更廣泛的問題,讓我們對程式碼進行排序,然後看看我們一路上學到了什麼

Ord and sort in std

0:02:42

Trait std::cmp::Ord

Rust 中的 Ord trait 是一種表達可以相互比較的類型的方式。

// Trait for types that form a total order. // total order : 除了基本的要能與自己做比較, // 還必須總是要能回答值與值之間的大小關係。 // Ex : "43" 與 "43";"43" 與 "foobar" pub trait Ord: Eq + PartialOrd { // Eq : complete equality // PartialOrd : partial ordering. // 可以讓不完全相同的型別進行比較。 // Ex : u16 與 u32 }

本次實作使用 Ord 而不是 PartialOrd,因為如果該集合中某些東西的順序無關緊要,那麼真的很難排序某個東西。不過 Ord 有一些奇怪的屬性,因為 Rust 中的某些類型您可能期望是可排序的,但實際上卻不是,特別是浮點數類型 (f32, f64)。它們實作了 PartialOrd,但沒有實作 Ord,因為浮點數有特別的值叫做 NaN,而 NaN 與 NaN 之間是無法比較的。浮點數類型沒有實作 Eq 的原因也是因為 NaN 的關係,兩個無法比較的東西卻說它們是相等的也是有點奇怪。

搜尋標準函式庫的排序

pub fn sort(&mut self) where T: Ord, // This sort is stable // stable : 如果兩個值 a, b 相同, // 則在排序的過程中不會 swap。 // 1. stable 性質在原始型別看似沒什麼用處, // 但在複雜的型別就很重要了。 // 2. stable 比 unstable 多了一些操作上的限制, // 所以你有可能付出比較多一點的成本在 stable 上。 // 所以 unstable 可能比 stable 使用更少的記憶體或更快速。 pub fn sort_unstable(&mut self) where T: Ord,

Current Implementation : The current algorithm is an adaptive, iterative merge sort inspired by timsort.
等等可能會稍微提一下 timsort,如果沒有的話請自行查閱。

而 sort_by 是客製化的比較函式,例如你可以指定比較結構的某欄位,又或是可以反序排列,後者的應用較為常見 :

pub fn sort_by<F>(&mut self, compare: F) where F: FnMut(&T, &T) -> Ordering,

sort_by_key 的話你要傳入函式,而該函式的回傳值必須有實作 Ord :

pub fn sort_by_key<K, F>(&mut self, f: F) where F: FnMut(&T) -> K, K: Ord,

Sorting algorithms

0:10:04

我們接下來將要做的事是定義 Sort trait,接著為這個 trait 實作很多不同的排序機制。詳細的排序資訊請參考 : Sorting algorithm

Bubble sort

0:12:02

首先先看到 Bubble sort,雖然它的 worst case 是

n2,但它相當的直覺。

:pencil2:題外話
實作完所有的排序演算法後,我們會再實作非常簡易的 benchmark 用來計數我們實作的每個演算法的比較次數,並將輸入長度和該輸入長度的比較次數印出來。

開始建置 Rust 專案 :

$ cargo new --lib orst
$ cd orst
$ vim src/lib.rs

先定義排序演算法的 API :

pub trait Sorter { fn sort<T>(slice: &mut[T]) where T: Ord; } // 如果我們想要讓程式更 fancy, // 我們可以在 slice 上有一個擴展 trait, // 可以讓某些東西使用 Sorter 進行排序。 // // 但現在要做的只是讓人可以呼叫的獨立排序函式。 pub fn sort<T, S>(slice: &mut [T]) where T: Ord, S: Sorter { S::sort(slice) } // mod bubblesort; #[cfg(test)] mod tests { use super::*; struct StdSorter; impl Sorter for StdSorter { fn sort<T>(slice: &mut [T]) where T: Ord, { // 使用標準函式庫的 slice 排序 slice.sort(); } } #[test] fn std_works() { let mut things = vec![4, 2, 3, 1]; sort::<_, StdSorter>(&mut things); assert_eq!(things, &[1, 2, 3, 4]); } }

測試目前 API 的正確性 :

$ cargo test
...
running 1 test
test tests::std_works ... ok
...

Q : How much harder would it be implement these for Iter (or IterMut) instead of slices?
A : 比較難為 Iter 實作,因為 Iter 的話你不知道你將會得到幾個輸入值,例如 stream,除非你用 Vector 來收集這些 stream。所以為 slice 實作排序的情況比為 Iter 實作排序的情況要為常見。

Q : Random access is very useful in sorting
A : 你基本上都需要 random access。

Q : Could we use ExactSizeIterator?
A : 如果你使用 ExactSizeIterator 並知道總共有多少元素,但最終你必須等到全部的值才能做排序。

開始實作 bubble sort 之前,先 uncomment src/lib.rs 的其中一行 :

// mod bubblesort;

接著開啟 src/bubblesort.rs 檔案開始實作原型 :

use super::Sorter; // 定義為 unit struct, // 因為 Sorter 只有儲存 configuration,它並沒有狀態。 pub struct BubbleSort; impl Sorter for BubbleSort { fn sort<T>(slice: &mut [T]) where T: Ord, { // TODO } }

Q : why does vim say [object Object] in the status bar?
A : 可能 Jon 的 vim 某些設定壞了吧,Jon 也不確定真正原因。

Q : coc should keep rust-analyzer up to date. Does for me
A : Jon 用它的 coc 集手動編譯 rust-analyzer,因為他想要讓他 rust-analyzer 支援最新的 Nightly Rust

繼續為 BubbleSort 實作 Sorter :

impl Sorter for BubbleSort { fn sort<T>(slice: &mut [T]) where T: Ord, { // 用來檢查本次 iteration 是否有 swap 的情形 let mut swapped = true; while swapped { swapped = false; for i in 0..slice.len() { // 考慮到邊界的問題 if i = slice.len() - 1 { continue; } if slice[i] > slice[i + 1] { slice.swap(i, i + 1); swapped = true; } } } } }

也可以省掉考慮邊界條件判斷 (省 branch),只要調整 for 終止條件即可 :

impl Sorter for BubbleSort { fn sort<T>(slice: &mut [T]) where T: Ord, { let mut swapped = true; while swapped { swapped = false; // 更改迭代範圍就省下邊界判斷了 for i in 0..(slice.len() - 1) { if slice[i] > slice[i + 1] { slice.swap(i, i + 1); swapped = true; } } } } }

接著新增 test case :

#[test] fn it_works() { // 偶數個元素 // let mut things = vec![4, 2, 3, 1]; // super::sort::<_, BubbleSort>(&mut things); // assert_eq!(things, &[1, 2, 3, 4]); // 奇數個元素 let mut things = vec![4, 2, 5, 3, 1]; super::sort::<_, BubbleSort>(&mut things); assert_eq!(things, &[1, 2, 3, 4, 5]); }

通過了測試 :

$ cargo test
...
running 2 tests
test bubblesort::it_works ... ok
test tests::std_works ... ok
...

再次更改範圍,程式可能看起來更漂亮 ? :

impl Sorter for BubbleSort { fn sort<T>(slice: &mut [T]) where T: Ord, { let mut swapped = true; while swapped { swapped = false; // 起始位置改為 1 的話,終止位置就不用扣 1 for i in 1..slice.len() { // 需做以下替換 : // i -> i - 1 // i + 1 -> i if slice[i - 1] > slice[i] { slice.swap(i - 1, i); swapped = true; } } } } }

一般的 swap 需要暫存變數 :

temp = a a = b b = temp

標準函式庫的 slice::swap 函式簽章 :

pub fn swap(&mut self, a: usize, b: usize) // 實作是用到 std::mem::swap

Q : make sort's generic <S, T> to not require the underscore?
A :

super::sort::<_, BubbleSort>(&mut things); // ok // 第一個泛型參數是我們想要排序的元素的類型, // 第二個泛型參數是排序演算法 super::sort::<BubbleSort, _>(&mut things); // also ok super::sort::<BubbleSort>(&mut things); // not ok // 之所以不行這樣的原因是,因為只我們試著去給排序型別取名字。 // 如果你想要在 Rust 命名每個泛型參數,你必須列舉全部出來。 // `_` 是不想給該泛型參數取名字但要滿足列舉所有泛型參數的限制。
目前程式碼

src/lib.rs :

pub trait Sorter { fn sort<T>(slice: &mut[T]) where T: Ord; } pub fn sort<T, S>(slice: &mut [T]) where T: Ord, S: Sorter { S::sort(slice) } mod bubblesort; #[cfg(test)] mod tests { use super::*; struct StdSorter; impl Sorter for StdSorter { fn sort<T>(slice: &mut [T]) where T: Ord, { slice.sort(); } } #[test] fn std_works() { let mut things = vec![4, 2, 3, 1]; sort::<_, StdSorter>(&mut things); assert_eq!(things, &[1, 2, 3, 4]); } }

src/bubblesort.rs :

use super::Sorter; pub struct BubbleSort; impl Sorter for BubbleSort { fn sort<T>(slice: &mut [T]) where T: Ord, { let mut swapped = true; while swapped { swapped = false; for i in 1..slice.len() { if slice[i - 1] > slice[i] { slice.swap(i - 1, i); swapped = true; } } } } } #[test] fn it_works() { let mut things = vec![4, 2, 5, 3, 1]; super::sort::<_, BubbleSort>(&mut things); assert_eq!(things, &[1, 2, 3, 4, 5]); }

Insertion sort

0:27:42

Insertion sort 也很直覺。

新增一個模組到 src/lib.rs :

... mod bubblesort; +mod insertionsort; ...

接著開啟 src/insertionsort.rs 檔案開始實作原型 :

use super::Sorter; pub struct InsertionSort; impl Sorter for InsertionSort { fn sort<T>(slice: &mut [T]) where T: Ord, { // TODO } } #[test] fn it_works() { let mut things = vec![4, 2, 5, 3, 1]; super::sort::<_, InsertionSort>(&mut things); assert_eq!(things, &[1, 2, 3, 4, 5]); }

繼續為 InsertionSort 實作 Sorter :

impl Sorter for InsertionSort { fn sort<T>(slice: &mut [T]) where T: Ord, { // [ sorted | not sorted ] // sorted 初始為空集合, not sorted 初始為整個 slice。 } }

Q : why not do Sorter::sort(..) directly without free-standing function? am I missing something?
A : 可以。這樣做可以讓你的程式看起來更漂亮,不過有一個缺點是你要讓 Sorter trait 在 scope 內 :

  • src/bubblesort.rs :
    ​​​​#[test] ​​​​fn it_works() ​​​​{ ​​​​ let mut things = vec![4, 2, 5, 3, 1]; ​​​​ BubbleSort::sort(&mut things); ​​​​ assert_eq!(things, &[1, 2, 3, 4, 5]); ​​​​}
  • src/insertionsort.rs :
    ​​​​#[test] ​​​​fn it_works() ​​​​{ ​​​​ let mut things = vec![4, 2, 5, 3, 1]; ​​​​ InsertionSort::sort(&mut things); ​​​​ assert_eq!(things, &[1, 2, 3, 4, 5]); ​​​​}
  • src/lib.rs :
    ​​​​...
    ​​​​    fn std_works() 
    ​​​​    {
    ​​​​        let mut things = vec![4, 2, 3, 1];
    ​​​​        StdSorter::sort(&mut things);
    ​​​​        assert_eq!(things, &[1, 2, 3, 4]);
    ​​​​    }
    ​​​​...
    

Q : Wouldn't be more lose if T: PartialEq?
A : 不行,PartialEq 不能讓你排列。也不能是 PartialOrd,因為你必須在排列的時候必須知道確切的排序規則。

繼續為 InsertionSort 實作 Sorter :

impl Sorter for InsertionSort { fn sort<T>(slice: &mut [T]) where T: Ord, { // [ sorted | not sorted ] // 任何在 unsorted 位置之後的元素都為尚未排序。 // 起始位置可以是 1,因為第 0 個元素本身就排序好了。 for unsorted in 1..slice.len() { // slice[unsorted..] is not sorted // take slice[unsorted] and place in sorted location in slice[..=unsorted] // [ 1 3 4 | 2 ] // 將 key (2) 慢慢往左 swap , // 直到遇到目前小的值 (1) 才停下來。 // [ 1 3 4 2 | ] // [ 1 3 2 4 | ] // [ 1 2 3 4 | ] let mut i = unsorted; while i > 0 && slice[i - 1] > slice[i] { slice.swap(i, i - 1); i -= 1; } } } }

Q : That just sounds like bubble sort going through an identity crisis
A : insertion sort 和 bubble sort 不太相同,而且通常 bubble sort 的表現會更差,因為 bubble sort 每次迭代範圍最多只少掉一個元素,而 insertion sort 只要找到插入點,該次迭代就結束了,所以理論上 bubble sort 的 swap 次數 > insertion sort 的 swap 次數。
如果現在的資料結構是鏈結串列的話,insertion sort 美次迭代實際上就只要找到插入點插入,不需要跟 slice 一樣,將插入點後的元素右移;但如果是 bubble sort 的話每次迭代仍然要慢慢 swap 鏈結串列的節點。所以在鏈結串列上更可以看到 insertion sort 的表現一定會比 bubble sort 還要好。

圖解 insertion sort :
image
我們的實作是採用 Method 1 一直 swap 到插入點為止。Method 2 是找到插入點之後,將所有元素右移。這兩個方法的執行時間主要取決於輸入資料的特徵,但它們的複雜度是相同的。

通過測試 :

$ cargo test ... test insertionsort::it_works ... ok ...
目前程式碼

src/lib.rs :

pub trait Sorter { fn sort<T>(slice: &mut[T]) where T: Ord; } pub fn sort<T, S>(slice: &mut [T]) where T: Ord, S: Sorter { S::sort(slice) } mod bubblesort; mod insertionsort; #[cfg(test)] mod tests { use super::*; struct StdSorter; impl Sorter for StdSorter { fn sort<T>(slice: &mut [T]) where T: Ord, { slice.sort(); } } #[test] fn std_works() { let mut things = vec![4, 2, 3, 1]; StdSorter::sort(&mut things); assert_eq!(things, &[1, 2, 3, 4]); } }

src/bubblesort.rs :

use super::Sorter; pub struct BubbleSort; impl Sorter for BubbleSort { fn sort<T>(slice: &mut [T]) where T: Ord, { let mut swapped = true; while swapped { swapped = false; for i in 1..slice.len() { if slice[i - 1] > slice[i] { slice.swap(i - 1, i); swapped = true; } } } } } #[test] fn it_works() { let mut things = vec![4, 2, 5, 3, 1]; BubbleSort::sort(&mut things); assert_eq!(things, &[1, 2, 3, 4, 5]); }

src/insertionsort.rs :

use super::Sorter; pub struct InsertionSort; impl Sorter for InsertionSort { fn sort<T>(slice: &mut [T]) where T: Ord, { // [ sorted | not sorted ] for unsorted in 1..slice.len() { // slice[unsorted..] is not sorted // take slice[unsorted] and place in sorted location in slice[..=unsorted] // [ 1 3 4 | 2 ] // [ 1 3 4 2 | ] // [ 1 3 2 4 | ] // [ 1 2 3 4 | ] let mut i = unsorted; while i > 0 && slice[i - 1] > slice[i] { slice.swap(i, i - 1); i -= 1; } } } } #[test] fn it_works() { use super::Sorter; let mut things = vec![4, 2, 5, 3, 1]; InsertionSort::sort(&mut things); assert_eq!(things, &[1, 2, 3, 4, 5]); }

Q : Would it be cheating to use binary_search + insert?
A : 這麼做可以減少比較次數,不過 swap 元素仍無法避免 :

impl Sorter for InsertionSort { fn sort<T>(slice: &mut [T]) where T: Ord, { // [ sorted | not sorted ] for unsorted in 1..slice.len() { // slice[unsorted..] is not sorted // take slice[unsorted] and place in sorted location in slice[..=unsorted] // [ 1 3 4 | 2 ] // [ 1 3 4 2 | ] // [ 1 3 2 4 | ] // [ 1 2 3 4 | ] // 新增 smart 用來判斷要不要使用 binary_search let smart = false; if !smart { let mut i = unsorted; while i > 0 && slice[i - 1] > slice[i] { slice.swap(i, i - 1); i -= 1; } } else { // use binary search to find index // then use .insert to splice in i let i = match slice[..unsorted].binary_search(&slice[unsorted]) { // [ a, c, e ].binary_search(c) => Ok(1) Ok(i) => i, // [ a, c, e ].binary_search(b) => Err(1) // 如果 slice 元素有 b,它會在 1 的位置。 Err(i) => i, }; // vector // 1. vector 有 insert 方法可以使用, // 你就不用自己處理插入點後的元素了。 // 2. 如果在 insert 值到 vector 的過程中, // 超過了 vector 的 capacity,它就會 resize。 // slice // 1. slice 沒有 insert 的方法, // 因為 slice 的長度固定, // 而且如果使用 insert 的方法會使得多一個重複元素, // 所以還要刪除原本位置的元素才不會有重複的情況。 // 2. 我們應該要用的是 rotate_right, // 才能維持 slice 大小不變。 // 為何 rotate_right 能夠維持 slice 大小不變 ?" // 因為 rotate_right 有 wrap around 功能。 // Ex: 1 2 [5 6 4] // 1 2 [4 5 6] slice[i..=unsorted].rotate_right(1); } } } }

現在 smart 無論設為 True 還是 False 都可以通過測試了 :

$ cargo test
...
test insertionsort::it_works ... ok
...

為了讓程式碼更彈性,首先為 InsertionSort 結構新增成員 :

pub struct InsertionSort { // 先宣告成 pub pub smart: bool, }

回頭更改 API,src/lib.rs :

pub trait Sorter { // 因為會參照到成員,所以 &self 也要作為輸入 // fn sort<T>(slice: &mut [T]) fn sort<T>(&self, slice: &mut[T]) where T: Ord; } -pub fn sort<T, S>(slice: &mut [T]) -where - T: Ord, - S: Sorter -{ - S::sort(slice) -} mod bubblesort; mod insertionsort; #[cfg(test)] mod tests { use super::*; struct StdSorter; impl Sorter for StdSorter { - fn sort<T>(slice: &mut [T]) + fn sort<T>(&self, slice: &mut [T]) + // 因為會參照到成員,所以 &self 也要作為輸入 where T: Ord, { slice.sort(); } } #[test] fn std_works() { let mut things = vec![4, 2, 3, 1]; - StdSorter::sort(&mut things); + StdSorter.sort(&mut things); assert_eq!(things, &[1, 2, 3, 4]); } }

為了一致性,BubbleSort 也要更改,src/bubblesorc.rs :

... impl Sorter for BubbleSort { - fn sort<T>(slice: &mut [T]) + fn sort<T>(&self, slice: &mut [T]) + // 雖然不會參照到成員, + // 但為了一致性所以 &self 也要作為輸入 where T: Ord, { ... } } #[test] fn it_works() { let mut things = vec![4, 2, 5, 3, 1]; - BubbleSort::sort(&mut things); + BubbleSort.sort(&mut things); assert_eq!(things, &[1, 2, 3, 4, 5]); }

讓 InsertionSort 使用 smart 成員,並新增一個 test case :

use super::Sorter; pub struct InsertionSort { pub smart: bool, } impl Sorter for InsertionSort { - fn sort<T>(slice: &mut [T]) + fn sort<T>(&self, slice: &mut [T]) where T: Ord, { // [ sorted | not sorted ] for unsorted in 1..slice.len() { ... - let smart = true; - if !smart { + if !self.smart { ... } } } } #[test] -fn it_works() +fn it_works_dumb() { let mut things = vec![4, 2, 5, 3, 1]; - InsertionSort::sort(&mut things); + InsertionSort { smart: false }.sort(&mut things); assert_eq!(things, &[1, 2, 3, 4, 5]); } // 新增 test case +#[test] +fn it_works_smart() +{ + let mut things = vec![4, 2, 5, 3, 1]; + InsertionSort { smart: true }.sort(&mut things); + assert_eq!(things, &[1, 2, 3, 4, 5]); +}

:pencil2: :: vs. .

BubbleSort:: -> BubbleSort.

你可以將 :: 等價於其他程式語言的 classmethod,它是該型別的 associated method;而 . 則是該型別的方法,所以我們必須建構出 BubbleSort 才能呼叫它的 sort 方法。

Q : what happens if you binary_search() on an unsorted slice?
A : 你只會得到隨機的索引值。

:pencil2: match vs. unwrap_or_else vs.

let i = match slice[..unsorted].binary_search(&slice[unsorted]) { Ok(i) => i, Err(i) => i, }; let i = slice[..unsorted] .binary_search(&slice[unsorted]) .unwrap_or_else(|i| i);

兩種都可以,Jon 偏好 match,因為 match 更明確。

:pencil2: or(|) let pattern (in the future)

let Ok(i) | Err(i) = slice[..];

現在支援 or pattern 的只有在 match 裡面 :

let x = 1; match x { 1 | 2 => println!("one or two"), 3 => println!("three"), _ => println!("anything"), }

Q : isnt static function when you dont need an instance of the "object" to call that method, sry im noob
A : 是的,你可以將它想為 static 方法。

目前程式碼

src/lib.rs :

pub trait Sorter { fn sort<T>(&self, slice: &mut[T]) where T: Ord; } mod bubblesort; mod insertionsort; #[cfg(test)] mod tests { use super::*; struct StdSorter; impl Sorter for StdSorter { fn sort<T>(&self, slice: &mut [T]) where T: Ord, { slice.sort(); } } #[test] fn std_works() { let mut things = vec![4, 2, 3, 1]; StdSorter.sort(&mut things); assert_eq!(things, &[1, 2, 3, 4]); } }

src/bubblesort.rs :

use super::Sorter; pub struct BubbleSort; impl Sorter for BubbleSort { fn sort<T>(&self, slice: &mut [T]) where T: Ord, { let mut swapped = true; while swapped { swapped = false; for i in 1..slice.len() { if slice[i - 1] > slice[i] { slice.swap(i - 1, i); swapped = true; } } } } } #[test] fn it_works() { let mut things = vec![4, 2, 5, 3, 1]; BubbleSort.sort(&mut things); assert_eq!(things, &[1, 2, 3, 4, 5]); }

src/insertionsort.rs :

use super::Sorter; pub struct InsertionSort { pub smart: bool, } impl Sorter for InsertionSort { fn sort<T>(&self, slice: &mut [T]) where T: Ord, { // [ sorted | not sorted ] for unsorted in 1..slice.len() { // slice[unsorted..] is not sorted // take slice[unsorted] and place in sorted location in slice[..=unsorted] // [ 1 3 4 | 2 ] // [ 1 3 4 2 | ] // [ 1 3 2 4 | ] // [ 1 2 3 4 | ] if !self.smart { let mut i = unsorted; while i > 0 && slice[i - 1] > slice[i] { slice.swap(i, i - 1); i -= 1; } } else { // use binary search to find index // then use .insert to splice in i let i = match slice[..unsorted].binary_search(&slice[unsorted]) { // [ a, c, e ].binary_search(c) => Ok(1) Ok(i) => i, // [ a, c, e ].binary_search(b) => Err(1) Err(i) => i, }; slice[i..=unsorted].rotate_right(1); } } } } #[test] fn it_works_dumb() { let mut things = vec![4, 2, 5, 3, 1]; InsertionSort { smart: false }.sort(&mut things); assert_eq!(things, &[1, 2, 3, 4, 5]); } #[test] fn it_works_smart() { let mut things = vec![4, 2, 5, 3, 1]; InsertionSort { smart: true }.sort(&mut things); assert_eq!(things, &[1, 2, 3, 4, 5]); }

Selection sort

0:52:18

Selection sort 比 insertion sort 還要糟糕,但一樣很直覺而且不需要在排序過程中使用到輔助的記憶體。

新增一個模組到 src/lib.rs :

... mod bubblesort; mod insertionsort; +mod selectionsort; ...

接著開啟 src/selectionsort.rs 檔案開始實作原型 :

use super::Sorter; pub struct SelectionSort; impl Sorter for SelectionSort { fn sort<T>(&self, slice: &mut [T]) where T: Ord, { // [ sorted | not sorted ] for unsorted in 1..slice.len() { // TODO } } } #[test] fn it_works() { let mut things = vec![4, 2, 5, 3, 1]; SelectionSort.sort(&mut things); assert_eq!(things, &[1, 2, 3, 4, 5]); }

繼續為 SelectionSort 實作 Sorter :

impl Sorter for SelectionSort { fn sort<T>(&self, slice: &mut [T]) where T: Ord, { // [ sorted | not sorted ] // selection sort 必須從 0 開始, // 因為第 0 個不一定是最小元素。 // insertion sort 的情況不一樣, // 從 1 開始是因為第 0 個元素本身就排序好了。 for unsorted in 0..slice.len() { let mut smallest_in_rest = unsorted; for i in (unsorted + 1)..slice.len() { if slice[i] < slice[smallest_in_rest] { smallest_in_rest = i; } } if unsorted != smallest_in_rest { slice.swap(unsorted, smallest_in_rest) } } } }
目前程式碼
use super::Sorter; pub struct SelectionSort; impl Sorter for SelectionSort { fn sort<T>(&self, slice: &mut [T]) where T: Ord, { // [ sorted | not sorted ] for unsorted in 0..slice.len() { let mut smallest_in_rest = unsorted; for i in (unsorted + 1)..slice.len() { if slice[i] < slice[smallest_in_rest] { smallest_in_rest = i; } } if unsorted != smallest_in_rest { slice.swap(unsorted, smallest_in_rest) } } } } #[test] fn it_works() { let mut things = vec![4, 2, 5, 3, 1]; SelectionSort.sort(&mut things); assert_eq!(things, &[1, 2, 3, 4, 5]); }

Q : There's a min() on iterators
A : slice 是 iterator,min 會僅回傳最小值,而不是最小值所在的位置 :

fn min(self) -> Option<Self::Item> // 回傳型態為 `Option` 是因為傳入 slice 可能為空。 where Self: Sized, Self::Item: Ord,

使用 enumerate 函式來解決 min 沒辦法回傳最小值位置的問題 :

impl Sorter for SelectionSort { fn sort<T>(&self, slice: &mut [T]) where T: Ord, { // [ sorted | not sorted ] for unsorted in 0..slice.len() { // 迭代器函式版本 let (smallest_in_rest, _) = slice[unsorted..] // 只接最小值位置的值 .iter() .enumerate() .min_by_key(|&(_, v)| v) .expect("slice is non-empty"); let smallest_in_rest = unsorted + smallest_in_rest; // or // Explicit 版本 let mut smallest_in_rest_2 = unsorted; for i in (unsorted + 1)..slice.len() { if slice[i] < slice[smallest_in_rest_2] { smallest_in_rest_2 = i; } } // 檢查迭代器函式版本與 Explicit 版本的值是否相同。 assert_eq!(smallest_in_rest, smallest_in_rest_2); if unsorted != smallest_in_rest { slice.swap(unsorted, smallest_in_rest) } } } }

雖然兩種版本的效能經過編譯器的程式碼最佳化之後效能是差不多的,但迭代器函式版本可以清楚知道每一步在做什麼,所以我們會保留迭代器函式版本。

:question: 生命週期問題 1:02:45

為何 Line 2 正確,Line 錯誤 ? 待解決 !

let a = [-1_i32, 0, 1]; let b = a.iter().min_by_key(|x| x.abs()).unwrap(); let c = a.iter().enumerate().min_by_key(|(x,_)| x).unwrap();

.min_by_key(|(_, v)| v) 編譯失敗 :

error: lifetime may not live long enough
  --> src\selectionsort.rs:20:38
   |
20 |                 .min_by_key(|(_, v)| v)
   |                              ------- ^ returning this value requires that `'1` must outlive `'2`
   |                              |     |
   |                              |     return type of closure is &'2 &T
   |                              has type `&'1 (usize, &T)`

min_by_key 的回傳型態是 <Option<Self::Item>>,而其中的 Self::Item 為參考型態,從 min_by_key 的 a 需要參考可以得知 :

let a = [-3_i32, 0, 1, 5, -10]; assert_eq!(*a.iter().min_by_key(|x| x.abs()).unwrap(), 0);

Self::Item 之所以是回傳型態是因為在尋找最小元素的過程中會走訪所有的元素,最後在回傳最小元素的參考。所以在走訪的時候不能將元素的所有權給 closure。

  • ​​// .min_by_key(|(_, v)| v) ~ERR
    ​​//     等價於 .min_by_key(|t| &t.1)
    
    v 本身是 slice 的參考,而 tuple 的參考只存活在迭代期間,這個 tuple 的參考活這麼短是因為它是由迭代器產生出來的。我們想要 min_by_key 回傳 slice 的參考,而現在的寫法是會回傳 slice 的參考的參考,當迭代結束時,迭代器的值 (slice 的參考的參考) 早已消失,所以編譯器才會報錯。
  • ​​// .min_by_key(|&(_, v)| v)
    ​​//     等價於 .min_by_key(|t| t.1)
    
    & 告訴 closure,解參考這個 tuple ((_, v)) ,並抓到 v 本身,而不是 v 的參考。

使用 map 使實作更加精簡,同時只保留迭代器函式版本 :

impl Sorter for SelectionSort { fn sort<T>(&self, slice: &mut [T]) where T: Ord, { // [ sorted | not sorted ] for unsorted in 0..slice.len() { - let (smallest_in_rest, _) = slice[unsorted..] + let smallest_in_rest = slice[unsorted..] .iter() .enumerate() .min_by_key(|&(_, v)| v) + // 直接在這裡取出 tuple 的元素並調整索引 + .map(|(i, _)| unsorted + i) .expect("slice is non-empty"); - let smallest_in_rest = unsorted + smallest_in_rest; if unsorted != smallest_in_rest { slice.swap(unsorted, smallest_in_rest) } } } }
目前程式碼

src/selectionsort.rs

use super::Sorter; pub struct SelectionSort; impl Sorter for SelectionSort { fn sort<T>(&self, slice: &mut [T]) where T: Ord, { // [ sorted | not sorted ] for unsorted in 0..slice.len() { let smallest_in_rest = slice[unsorted..] .iter() .enumerate() .min_by_key(|&(_, v)| v) .map(|(i, _)| unsorted + i) .expect("slice is non-empty"); if unsorted != smallest_in_rest { slice.swap(unsorted, smallest_in_rest) } } } } #[test] fn it_works() { let mut things = vec![4, 2, 5, 3, 1]; SelectionSort.sort(&mut things); assert_eq!(things, &[1, 2, 3, 4, 5]); }

Quicksort

1:07:27

Quicksort : 並沒那麼直覺,但也沒那麼複雜。

Q : does the compiler take advantage of assert! statements? like if you do assert!(i < slice.len()); let _ = slice[i]; is it gonna skip the slice range check?
A : Jon 覺得可能會。因為 assert 變成了條件語句,並在條件不成立的 branch 上退出。因此編譯器可以假設以下程式碼中的條件成立。Jon 不確定它在多大程度上利用了這一點。

新增一個模組到 src/lib.rs :

... mod bubblesort; mod insertionsort; mod selectionsort; +mod quicksort; ...

接著開啟 src/quicksort.rs 檔案開始實作原型 :

use super::Sorter; pub struct QuickSort; fn quicksort<T: Ord>(slice: &mut [T]) {} impl Sorter for QuickSort { fn sort<T>(&self, slice: &mut [T]) where T: Ord, { // [ sorted | pivot | unsorted ] quicksort(slice) } } #[test] fn it_works() { let mut things = vec![4, 2, 5, 3, 1]; QuickSort.sort(&mut things); assert_eq!(things, &[1, 2, 3, 4, 5]); }

我們首先實作配置記憶體版本,因為比較好理解,再來實作原地版本。

開始實作配置記憶體版本 quicksort :

fn quicksort<T: Ord>(slice: &mut [T]) { match slice.len() { 0 | 1 => return, 2 => { // 剩兩個元素也可以當作終止條件先處理 if slice[0] > slice[1] { slice.swap(0, 1); } return; } _ => {} } let pivot = &slice[0]; let mut left = vec![]; let mut right = vec![]; for i in slice { if slice[i] <= pivot { left.push(slice[i]); } else { right.push(slice[i]); } } quicksort(left); quicksort(right); // merge them together }

我們試圖移出 slice,但我們不被允許移出 slice。因此,我們必須做一些類似的技巧來移動參考,或者像 leftright 只保存索引值。這樣就開始變得很麻煩,雖然配置記憶體版本並沒有實作完整,但至少現在你有一個 high level 的想法知道 quicksort 在做什麼。

Q : Whats the difference between <T: Ord> and <T> ... Where T: Ord?
A : 沒有區別。

開始實作原地版本 quicksort,這次我們 leftright 只用在 slice,而不用像配置記憶體版本用在 vector :

fn quicksort<T: Ord>(slice: &mut [T]) { match slice.len() { 0 | 1 => return, 2 => { if slice[0] > slice[1] { slice.swap(0, 1); } return; } _ => {} } // split_at_mut : 切成 2 個 mutable sub-slices。 // 一個為包含一個元素的 sub-slice,它要當作 pivot; // 一個為包含剩餘元素的 sub-slice,它將被我們排序。 // 1. 配置記憶體版本使用 left vector 和 right vector // 2. 原地版本則是使用 left index 和 right index // pivot : right slice ; slice : left slice // let (pivot, slice) = slice.split_at_mut(1); let (pivot, slice) = slice.split_first_mut().expect("slice is non-empty"); let pivot = &pivot[0]; let mut left = 0; let mut right = 0; for i in 0..slice.len() { if slice[i] <= pivot { // already on the correct side left += 1; } else { // move element to the right side // 因為現在我們正在查看的當前元素需要搬到另一側, // 但這意味著我們必須進行 swap, // 並且在 swap 之後我們必須繼續查看 swap 兩側, // 但 for 迴圈只能讓我們查看一次, // 所以 for 迴圈不能達成我們的目的。 } } ... }

這次嘗試將 leftright 指向 slice 的兩端 :

fn quicksort<T: Ord>(slice: &mut [T]) { ... let (pivot, slice) = slice.split_first_mut().expect("slice is non-empty"); let mut left = 0; let mut right = slice.len() - 1; // 改用 while 迴圈而不是 for 迴圈 while left <= right { // 將 slice[left] 從 T 轉成參考` // 因為 T 不能與參考做比較,參考與參考才能做比較。 if &slice[left] <= pivot { // already on the correct side left += 1; } else { // move element to the right side slice.swap(left, right); right -= 1; } } }

不像前面 for 迴圈的版本在 swap 之後無法查看同一位置,現在的 while 迴圈讓我們可以在 swap 之後繼續查看同一位置 :
image

邊界條件示意圖 :
image

Q : you also need to leave a hole for your pivot :s
A : 我們不需要為 pivot 留位置,因為 pivot 已在正確位置上。

Q : the swap line is wrong, you are swapping beyond the end of the slice
A : 沒有錯,right 的值是 slice.len() - 1

剛剛的實作並沒有排序完,我們必須遞迴的去呼叫 quicksort 函式 :

fn quicksort<T: Ord>(slice: &mut [T]) { match slice.len() { 0 | 1 => return, 2 => { if slice[0] > slice[1] { slice.swap(0, 1); } return; } _ => {} } // slice 變數名稱改為 rest let (pivot, rest) = slice.split_first_mut().expect("slice is non-empty"); // 注意 : left 現在是用在 rest 的索引,而不是 用在 slice let mut left = 0; let mut right = rest.len() - 1; // 改用 while 迴圈而不是 for 迴圈 while left <= right { // 將 slice[left] 從 T 轉成參考` // 因為 T 不能與參考做比較,參考與參考才能做比較。 if &rest[left] <= pivot { // already on the correct side left += 1; } else { // move element to the right side rest.swap(left, right); right -= 1; } } // 原地版本的好處除了不需要額外的記憶體配置,還有不需要有合併的步驟 // split_at_mut(mid: usize) -> (&mut [..mut], &mut [mid..]) let (left, right) = slice.split_at_mut(left); // 加入遞迴呼叫 quicksort(left); quicksort(right); }

Q : Could you add an extra ifelse branch to just move the right counter without swap if it's already on the right side?
A : 如下 :

fn quicksort<T: Ord>(slice: &mut [T]) { ... let mut left = 0; let mut right = rest.len() - 1; while left <= right { if &rest[left] <= pivot { // already on the correct side left += 1; + } else if &rest[right] > pivot { + // right already on the correct side + // avoid unnecessary swaps back and forth + right -= 1 } else { // move element to the right side rest.swap(left, right); right -= 1; } } ... }

若不使用新增條件會造成不必要的交換,如下面動畫的 5 就多做了 1 次沒必要的交換 :
image

Q : why are we splitting again if we already sorted rest?
A : 我們還沒到 rest 做排序,只是將分成大的一堆,小的一堆而已。

Q : Instead of while+<=, loop+if left == right { break; } can also be used
A : 我們現在要的是 left 與 right 相等的情況下,處理完再退出;如果用後者的寫法會再 left 與 right 相等的情況下就直接退出了,是可以這樣寫,但條件式要稍微修改,寫起來會比較不好看一點。

Q : Basically we just want to swap elements that are on the wrong side. So we can just move the counters if they are already on the correct side.
A : 對,如下 :

fn quicksort<T: Ord>(slice: &mut [T]) { ... let (pivot, rest) = slice.split_first_mut().expect("slice is non-empty"); let mut left = 0; let mut right = rest.len() - 1; while left <= right { if &rest[left] <= pivot { // already on the correct side left += 1; } else if &rest[right] > pivot { // right already on the correct side // avoid unnecessary swaps back and forth right -= 1 } else { // 剛剛新增的條件讓我們的在 else 也可以移動 left // left holds a right, and right holds a left, swap them rest.swap(left, right); left += 1; right -= 1; } } ... }
目前程式碼

src/quicksort.rs :

use super::Sorter; pub struct QuickSort; fn quicksort<T: Ord>(slice: &mut [T]) { match slice.len() { 0 | 1 => return, 2 => { if slice[0] > slice[1] { slice.swap(0, 1); } return; } _ => {} } let (pivot, rest) = slice.split_first_mut().expect("slice is non-empty"); let mut left = 0; let mut right = rest.len() - 1; while left <= right { if &rest[left] <= pivot { // already on the correct side left += 1; } else if &rest[right] > pivot { // right already on the correct side // avoid unnecessary swaps back and forth right -= 1 } else { // left holds a right, and right holds a left, swap them rest.swap(left, right); left += 1; right -= 1; } } // split_at_mut(mid: usize) -> (&mut [..mut], &mut [mid..]) let (left, right) = slice.split_at_mut(left); quicksort(left); quicksort(right); } impl Sorter for QuickSort { fn sort<T>(&self, slice: &mut [T]) where T: Ord, { // [ sorted | pivot | unsorted ] quicksort(slice) } } #[test] fn it_works() { let mut things = vec![4, 2, 5, 3, 1]; QuickSort.sort(&mut things); assert_eq!(things, &[1, 2, 3, 4, 5]); }

並沒有通過測試 :

$ cargo test
...
thread 'quicksort::it_works' panicked at 'assertion failed: `(left == right)`
  left: `[2, 4, 1, 3, 5]`,
 right: `[1, 2, 3, 4, 5]`', src\quicksort.rs:58:5
...

pivot 應該最終會位於陣列中最終排序的位置。但我們現在的作法是,我們選擇 pivot 作為第一個元素,然而我們最後沒有移動 pivot。因此 pivot 仍包含在兩個 slice 中。因此,我們的方法是將 pivot 明確放置在最終排序的位置 :

fn quicksort<T: Ord>(slice: &mut [T]) { ... // re-align left to account for the pivot at 0 let left = left + 1; // place the pivot at its final location // 0 為 pivot 的位置 // 新 pivot 的位置在 left - 1 slice.swap(0, left - 1); // split_at_mut(mid: usize) -> (&mut [..mut], &mut [mid..]) // 這樣切會讓 pivot 在 right 的第 0 個位置。 let (left, right) = slice.split_at_mut(left - 1); // 判斷 left 的尾端與 right 的頭端大小關係是否正確 assert!(left.last() <= right.first()); quicksort(left); // 因為現在要傳入的是 right 的部分而不是全部 // 所以前面要加 &mut 才能以 mutable 參考的方式傳入函式 // 我們不想要在 split 的時候包含 pivot quicksort(&mut right[1..]); }

Q : to avoid this you can choose the pivot at the middle of the slice, then you about some moves.
A : Jon 認為並非事實。如果你選擇中間的 pivot,您可能會遇到必須移動 pivot 位置的情形,這聽起來更煩人。

Q : partition_at_index used instead of split_at_mut will nicely exclude the pivot
A : 這個函式還幫你排序,這樣就作弊了 :

pub fn partition_at_index( &mut self, index: usize ) -> (&mut [T], &mut T, &mut [T]) where T: Ord, [src] 👎 Deprecated since 1.49.0: use the select_nth_unstable() instead 🔬 This is a nightly-only experimental API. (slice_partition_at_index #55300) // 使用比較器函式對 slice 進行重新排序, // 以使 index 處的元素處於其最終排序位置。
目前程式碼

src/quicksort.rs :

use super::Sorter; pub struct QuickSort; fn quicksort<T: Ord>(slice: &mut [T]) { match slice.len() { 0 | 1 => return, 2 => { if slice[0] > slice[1] { slice.swap(0, 1); } return; } _ => {} } let (pivot, rest) = slice.split_first_mut().expect("slice is non-empty"); let mut left = 0; let mut right = rest.len() - 1; while left <= right { if &rest[left] <= pivot { // already on the correct side left += 1; } else if &rest[right] > pivot { // right already on the correct side // avoid unnecessary swaps back and forth right -= 1 } else { // left holds a right, and right holds a left, swap them rest.swap(left, right); left += 1; right -= 1; } } // re-align left to account for the pivot at 0 let left = left + 1; // place the pivot at its final location slice.swap(0, left - 1); // split_at_mut(mid: usize) -> (&mut [..mut], &mut [mid..]) let (left, right) = slice.split_at_mut(left - 1); assert!(left.last() <= right.first()); quicksort(left); quicksort(&mut right[1..]); } impl Sorter for QuickSort { fn sort<T>(&self, slice: &mut [T]) where T: Ord, { // [ sorted | pivot | unsorted ] quicksort(slice) } } #[test] fn it_works() { let mut things = vec![4, 2, 5, 3, 1]; QuickSort.sort(&mut things); assert_eq!(things, &[1, 2, 3, 4, 5]); }

通過測試 :

running 6 tests ... test quicksort::it_works ... ok ...

Benchmarking

1:46:22

創建 benckmark 目錄以及檔案 :

$ cd orst
$ mkdir src/bin
$ vi src/bin/main.rs

本次要評估的是比較次數,而不是複雜度。

新增程式碼到 src/lib.rs 使得當匯入 orst 函式庫時,會自動匯入我們實作的排序演算法 :

... +pub use bubblesort::BubbleSort; +pub use insertionsort::InsertionSort; +pub use quicksort::QuickSort; +pub use selectionsort::SelectionSort; ...

開始實作 src/bin/main.rs :

use orst::*; use std::cmp::Ordering; // 本次不討論 Ordering 的用途 use std::sync::{ atomic::{AtomicUsize, ArcRel}, Arc, }; #[derive(Clone)] struct SortEvaluator<T> { t: T, // 每次比較發生就加 1 cmps: Arc<AtomicUsize>, } // 因為我們只想要比較 SortEvaluator::t, // 所以才要為 SortEvaluator 實作 PartialEq impl<T: PartialEq> PartialEq for SortEvaluator<T> { fn eq(&self, other: &Self) -> bool { self.t == other.t // 即使是 Eq,我們實際上也不需要增加計數。 } } impl<T: Eq> Eq for SortEvaluator<T> {} impl<T: PartialOrd> PartialOrd for SortEvaluator<T> { fn partial_cmp(&self, other: &Self) -> Option<Ordering> { // 通常 `cmp` forward 到 partial_cmp self.t.partial_cmp(&other.t) } } impl<T: Ord> Ord for SortEvaluator<T> { fn cmp(&self, other: &Self) -> Ordering { self.cmps.fetch_add(1, ArcRel); self.t.cmp(&other.t) // 在這種情況下,我們知道我們實際上正在測試 Ord,因此 Ord 是在這裡處理的正確選擇。 } } fn main() { }

Q : will we also count swaps?
A : 因為當兩個值交換的時候,並沒有方法被呼叫。所以我們事實上沒有辦法可以計算交換次數。我們需要實作與我們合作。我們可以有一個實作應該呼叫的觸發器,但這似乎有點煩人。

Q : was this faster than just using criterion?
A : Jon 這裡不想使用 criterion 是因為它帶來了更多的複雜性,在這種情況下並不那麼有趣,而且因為 criterion 不會衡量比較次數而是給我們是執行時間的估計值。我們可以在這裡測量執行時間,但我們現在想看看複雜度,這與執行時間有關,但並不完全相同。

Q : Can't the increment ordering be Relaxed, since you're not locking anything?
A : 可以,因為我們的效能評估程式是單執行緒的。雖然 ArcRel 的使用會帶來額外的開銷,但我們現在要測量的是比較次數,所以執行時間比較長沒關係。

ArcRel 都換成 Relaxed :

... use std::sync::{ - atomic::{AtomicUsize, ArcRel}, + atomic::{AtomicUsize, Relaxed}, Arc, }; ... impl<T: PartialOrd> PartialOrd for SortEvaluator<T> { fn partial_cmp(&self, other: &Self) -> Option<Ordering> { - self.cmps.fetch_add(1, ArcRel); + self.cmps.fetch_add(1, Relaxed); self.t.partial_cmp(&other.t) } } ...

實作 main 函式 :

fn main() { let counter = Arc::new(AtomicUsize::new(0)); // 不同資料大小 for &n in &[0, 1, 10, 100, 1000, 10000] { // 每個資料大小都做個幾次 for _ in 0..10 { let values = vec![1; n]; let bench = |sorter| { // 之所以要用 clone 的是因為不同的排序法要比相同的輸入資料才有意義。 let mut value = values.clone(); // 1. 該陣列中的每個值都將包裝一些值, // 但所有這些值都將指向相同的單個AtomicUsize, // 這樣我們就得到了完整的比較次數。 // 2. 在開始每次實際排序之前, // 我們將在該計數器中儲存 0。 counter.store(0, Relaxed); sorter.sort(&mut values); counter.load(Relaxed); }; let took = bench(BubbleSort); let took = bench(InsertionSort {smart : true}); let took = bench(InsertionSort {smart : false}); let took = bench(SelectionSort); let took = bench(QuickSort); } } }

Q : is there a performance difference between using atomicusize and cell<usize> if u only use 1 thread?
A : 有一點差異。我們的實作可以這麼做,因為我們現在不需要 thread safe :
:warning: 以下程式碼有錯誤,馬上會解決,註解有寫錯誤原因。

use orst::*; use std::cmp::Ordering; +use std::cell::Cell; +use std::rc::Rc; -use std::sync::{ - atomic::{AtomicUsize, ArcRel}, - Arc, -}; #[derive(Clone)] struct SortEvaluator<T> { t: T, - cmps: Arc<AtomicUsize>, + cmps: Rc<Cell<usize>>, } impl<T: PartialEq> PartialEq for SortEvaluator<T> { fn eq(&self, other: &Self) -> bool { self.t == other.t } } impl<T: Eq> Eq for SortEvaluator<T> {} impl<T: PartialOrd> PartialOrd for SortEvaluator<T> { fn partial_cmp(&self, other: &Self) -> Option<Ordering> { self.t.partial_cmp(&other.t) } } impl<T: Ord> Ord for SortEvaluator<T> { fn cmp(&self, other: &Self) -> Ordering { - self.cmps.fetch_add(1, ArcRel); + self.cmps.set(self.cmps.get() + 1); self.t.cmp(&other.t) } } fn main() { - let counter = Arc::new(AtomicUsize::new(0)); + let counter = Rc::new(Cell::new(0)); for &n in &[0, 1, 10, 100, 1000, 10000] { for _ in 0..10 { let values = vec![1; n]; + // Line 70 - 74 : rust-analyer 報錯 : + // error[E0038]: the trait `orst::Sorter` cannot be made into an object object + // 因為並不是 Object Safe。 + // Line 61 - Line 68 移動到等等要實作的 bench 函式。 - let bench = |sorter| { + let bench = | sorter: &dyn Sorter| { let mut value = values.clone(); - counter.store(0, Relaxed); + counter.set(0); sorter.sort(&mut values); - counter.load(Relaxed); + counter.get() }; let took = bench(BubbleSort); let took = bench(InsertionSort {smart : true}); let took = bench(InsertionSort {smart : false}); let took = bench(SelectionSort); let took = bench(QuickSort); } } }

Q : why is it not object safe tho
A : Sorter 不是 object safe 的原因是因為我們有一個 generic method。

Q : Can’t you Box the Sorter?
A : 因為 Sorter 有 generic method,所以我們不能 box 它。

解決編譯不過的問題,改用呼叫函式的方法。另外寫一個 bench 函式 :

fn bench<T: Ord + Clone, S: Sorter>( sorter: S, values: &[T], counter: &Cell<usize> ) -> usize { // let mut value = values.clone(); // Q : Isn't that what `to_vec()` does? // A : 改成以下 let mut values: Vec<_> = values.to_vec(); counter.set(0); sorter.sort(&mut values); counter.get() }

回頭修改 main 函式,使用函式呼叫,而不是用到 closure :

fn main() { let counter = Rc::new(Cell::new(0)); for &n in &[0, 1, 10, 100, 1000, 10000] { for _ in 0..10 { let values = vec![1; n]; - let bench = | sortedL &dyn Sorter| { - let mut value = values.clone(); - counter.set(0); - sorter.sort(&mut values); - counter.get() - }; - let took = bench(BubbleSort); + let took = bench(BubbleSort, &values, &counter); - let took = bench(InsertionSort {smart : true}); + let took = bench(InsertionSort {smart : true}, &values, &counter); - let took = bench(InsertionSort {smart : false}); + let took = bench(InsertionSort {smart : false}, &values, &counter); - let took = bench(SelectionSort); + let took = bench(SelectionSort, &values, &counter); - let took = bench(QuickSort); + let took = bench(QuickSort, &values, &counter); } } }

Q : can't you simply sort SortEvaluator<T> instead of T?
A :修改 bench 的函式簽章 :

fn bench<T: Ord + Clone, S: Sorter>( sorter: S, - values: &[T], + values: &[SortEvaluator<T>], counter: &Cell<usize> ) -> usize
目前程式碼
use orst::*; use std::cmp::Ordering; use std::cell::Cell; use std::rc::Rc; struct SortEvaluator<T> { t: T, cmps: Rc<Cell<usize>>, } impl<T: PartialEq> PartialEq for SortEvaluator<T> { fn eq(&self, other: &Self) -> bool { self.t == other.t } } impl<T: Eq> Eq for SortEvaluator<T> {} impl<T: PartialOrd> PartialOrd for SortEvaluator<T> { fn partial_cmp(&self, other: &Self) -> Option<Ordering> { self.t.partial_cmp(&other.t) } } impl<T: Ord> Ord for SortEvaluator<T> { fn cmp(&self, other: &Self) -> Ordering { self.cmps.set(self.cmps.get() + 1); self.t.cmp(&other.t) } } fn main() { let counter = Rc::new(Cell::new(0)); for &n in &[0, 1, 10, 100, 1000, 10000] { for _ in 0..10 { let values = vec![1; n]; let took = bench(BubbleSort, &values, &counter); let took = bench(InsertionSort {smart : true}, &values, &counter); let took = bench(InsertionSort {smart : false}, &values, &counter); let took = bench(SelectionSort, &values, &counter); let took = bench(QuickSort, &values, &counter); } } } fn bench<T: Ord + Clone, S: Sorter>( sorter: S, values: &[SortEvaluator<T>], counter: &Cell<usize> ) -> usize { let mut values: Vec<_> = values.into_iter().cloned().collect(); counter.set(0); sorter.sort(&mut values); counter.get() }

現在問題變成是,我們如何建構我們實際要測試的東西? 先匯入 rand crate,修改 orst/Cargo.toml :

[package] name = "orst" version = "0.1.0" edition = "2021" # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html [dependencies] rand = "0.8"

src/bin/main.rs 開頭匯入 rand crate :

use orst::*; use std::cmp::Ordering; +use rand::prelude::*; use std::cell::Cell; use std::rc::Rc; ...

接著修改 benches/main.rs 的 main 函式 : 2:14:55~~~

fn main() { + // 創出亂數產生器 + let mut rand = rand::thread_rng(); let counter = Rc::new(Cell::new(0)); for &n in &[0, 1, 10, 100, 1000, 10000] { for _ in 0..10 { - let values = vec![1; n]; + // 隨機產生欲排序的資料 + let mut values = Vec::with_capacity(n); + for _ in 0..n{ + values.push(SortEvaluator { + t: rand.gen::<usize>(), + cmps: Rc::clone(&counter), + }); + } + // 後面畫圖會用到,先印出欄位名稱 + println!("algorithm n comparisons"); let took = bench(BubbleSort, &values, &counter); + println!("{} {} {}", "bubble", n, took); let took = bench(InsertionSort {smart : true}, &values, &counter); + println!("{} {} {}", "insertion-smart", n, took); let took = bench(InsertionSort {smart : false}, &values, &counter); + println!("{} {} {}", "insertion-dumb", n, took); let took = bench(SelectionSort, &values, &counter); + println!("{} {} {}", "selection", n, took); let took = bench(QuickSort, &values, &counter); + println!("{} {} {}", "quick", n, took); } } }
目前程式碼

src/bin/main.rs :

use orst::*; use std::cmp::Ordering; use rand::prelude::*; use std::cell::Cell; use std::rc::Rc; #[derive(Clone)] struct SortEvaluator<T> { t: T, cmps: Rc<Cell<usize>>, } impl<T: PartialEq> PartialEq for SortEvaluator<T> { fn eq(&self, other: &Self) -> bool { self.t == other.t } } impl<T: Eq> Eq for SortEvaluator<T> {} impl<T: PartialOrd> PartialOrd for SortEvaluator<T> { fn partial_cmp(&self, other: &Self) -> Option<Ordering> { self.t.partial_cmp(&other.t) } } impl<T: Ord> Ord for SortEvaluator<T> { fn cmp(&self, other: &Self) -> Ordering { self.cmps.set(self.cmps.get() + 1); self.t.cmp(&other.t) } } fn main() { let mut rand = rand::thread_rng(); let counter = Rc::new(Cell::new(0)); for &n in &[0, 1, 10, 100, 1000, 10000] { for _ in 0..10 { let mut values = Vec::with_capacity(n); for _ in 0..n{ values.push(SortEvaluator { t: rand.gen::<usize>(), cmps: Rc::clone(&counter), }); } println!("algorithm n comparisons"); let took = bench(BubbleSort, &values, &counter); println!("{} {} {}", "bubble", n, took); let took = bench(InsertionSort {smart : true}, &values, &counter); println!("{} {} {}", "insertion-smart", n, took); let took = bench(InsertionSort {smart : false}, &values, &counter); println!("{} {} {}", "insertion-dumb", n, took); let took = bench(SelectionSort, &values, &counter); println!("{} {} {}", "selection", n, took); let took = bench(QuickSort, &values, &counter); println!("{} {} {}", "quick", n, took); } } } fn bench<T: Ord + Clone, S: Sorter>( sorter: S, values: &[SortEvaluator<T>], counter: &Cell<usize> ) -> usize { let mut values: Vec<_> = values.into_iter().cloned().collect(); counter.set(0); sorter.sort(&mut values); counter.get() }

Quicksort underflow

2:06:20

測試發現我們的 quick sort 有 underflow 的問題 :
(

264=18446744073709551616)

$ cargo run --release
...
thread 'main' panicked at 'index out of bounds: the len is 6 but the index is 18446744073709551615', /home/wilson/CrustOfRust/orst/src/quicksort.rs:25:20
...

Q : why not work in debug mode?
src/quicksort.rs :

- while left <= right { + while right != usize::MAX && left <= right {

A : 因為在 DEBUG 模式下,Rust 會因為 overflow 和 underflow 的發生而 panic。

在解決 quicksort underflow 之前,先改一下 src/bin/main.rs 產生隨機亂數資料的方法 :

fn main() { let mut rand = rand::thread_rng(); let counter = Rc::new(Cell::new(0)); for &n in &[0, 1, 10, 100, 1000, 10000] { + let mut values = Vec::with_capacity(n); + for _ in 0..n{ + values.push(SortEvaluator { + t: rand.gen::<usize>(), + cmps: Rc::clone(&counter), + }); + } for _ in 0..10 { - let mut values = Vec::with_capacity(n); - for _ in 0..n{ - values.push(SortEvaluator { - t: rand.gen::<usize>(), - cmps: Rc::clone(&counter), - }); - } + values.shuffle(&mut rand); ... } } }

開始解決 right 遇到 underflow 的問題 (不是最佳解),修改 src/quicksort.rs 的 quicksort 函式 :

fn quicksort<T: Ord>(slice: &mut [T]) { ... while left <= right { if &rest[left] <= pivot { // already on the correct side left += 1; } else if &rest[right] > pivot { // right already on the correct side // avoid unnecessary swaps back and forth + if right == 0 { + // we must be done + break; + } right -= 1 } else { // left holds a right, and right holds a left, swap them rest.swap(left, right); left += 1; + if right == 0 { + // we must be done + break; + } right -= 1; } } ... }
目前程式碼

src/bin/main.rs :

use orst::*; use std::cmp::Ordering; use rand::prelude::*; use std::cell::Cell; use std::rc::Rc; #[derive(Clone)] struct SortEvaluator<T> { t: T, cmps: Rc<Cell<usize>>, } impl<T: PartialEq> PartialEq for SortEvaluator<T> { fn eq(&self, other: &Self) -> bool { self.t == other.t } } impl<T: Eq> Eq for SortEvaluator<T> {} impl<T: PartialOrd> PartialOrd for SortEvaluator<T> { fn partial_cmp(&self, other: &Self) -> Option<Ordering> { self.t.partial_cmp(&other.t) } } impl<T: Ord> Ord for SortEvaluator<T> { fn cmp(&self, other: &Self) -> Ordering { self.cmps.set(self.cmps.get() + 1); self.t.cmp(&other.t) } } fn main() { let mut rand = rand::thread_rng(); let counter = Rc::new(Cell::new(0)); for &n in &[0, 1, 10, 100, 1000, 10000] { for _ in 0..10 { let mut values = Vec::with_capacity(n); for _ in 0..n{ values.push(SortEvaluator { t: rand.gen::<usize>(), cmps: Rc::clone(&counter), }); } println!("algorithm n comparisons"); let took = bench(BubbleSort, &values, &counter); println!("{} {} {}", "bubble", n, took); let took = bench(InsertionSort {smart : true}, &values, &counter); println!("{} {} {}", "insertion-smart", n, took); let took = bench(InsertionSort {smart : false}, &values, &counter); println!("{} {} {}", "insertion-dumb", n, took); let took = bench(SelectionSort, &values, &counter); println!("{} {} {}", "selection", n, took); let took = bench(QuickSort, &values, &counter); println!("{} {} {}", "quick", n, took); } } } fn bench<T: Ord + Clone, S: Sorter>( sorter: S, values: &[SortEvaluator<T>], counter: &Cell<usize> ) -> usize { let mut values: Vec<_> = values.into_iter().cloned().collect(); counter.set(0); sorter.sort(&mut values); counter.get() }

src/quicksort.rs :

use super::Sorter; pub struct QuickSort; fn quicksort<T: Ord>(slice: &mut [T]) { match slice.len() { 0 | 1 => return, 2 => { if slice[0] > slice[1] { slice.swap(0, 1); } return; } _ => {} } let (pivot, rest) = slice.split_first_mut().expect("slice is non-empty"); let mut left = 0; let mut right = rest.len() - 1; while left <= right { if &rest[left] <= pivot { // already on the correct side left += 1; } else if &rest[right] > pivot { // right already on the correct side // avoid unnecessary swaps back and forth if right == 0 { // we must be done break; } right -= 1 } else { // left holds a right, and right holds a left, swap them rest.swap(left, right); left += 1; if right == 0 { // we must be done break; } right -= 1; } } // re-align left to account for the pivot at 0 let left = left + 1; // place the pivot at its final location slice.swap(0, left - 1); // split_at_mut(mid: usize) -> (&mut [..mut], &mut [mid..]) let (left, right) = slice.split_at_mut(left - 1); assert!(left.last() <= right.first()); quicksort(left); quicksort(&mut right[1..]); } impl Sorter for QuickSort { fn sort<T>(&self, slice: &mut [T]) where T: Ord, { // [ sorted | pivot | unsorted ] quicksort(slice) } } #[test] fn it_works() { let mut things = vec![4, 2, 5, 3, 1]; QuickSort.sort(&mut things); assert_eq!(things, &[1, 2, 3, 4, 5]); }

Back to benchmarking

2:13:46

測試發現 bubble, insertion-dumb, 以及 quick 的比較次數為 0 :

$ cargo run --release
...
bubble 10000 0
insertion-smart 10000 118949
insertion-dumb 10000 0
selection 10000 49995000
quick 10000 0
...

因為它們都是使用 partial_cmp。Rust 中的 < 運算子和> 運算子都會呼叫 partial_cmp,而不是 cmp。有兩種方法可以解決這個問題。一種是不使用這些運算子,直接使用呼叫 cmp。另一個是將其移至 partial_cmp,我們採用第二種方法,修改 src/bin/main.sr 檔案 :

impl<T: PartialOrd> PartialOrd for SortEvaluator<T> { fn partial_cmp(&self, other: &Self) -> Option<Ordering> { + self.cmps.set(self.cmps.get() + 1); self.t.partial_cmp(&other.t) } } impl<T: Ord> Ord for SortEvaluator<T> { fn cmp(&self, other: &Self) -> Ordering { - self.cmps.set(self.cmps.get() + 1); - self.t.cmp(&other.t) + self.partial_cmp(other).expect("T: Ord") } }

Q : Is there a reason the loop body needs to run if left==right? Can't you just replace the loop condition with left < right?
A : 有,因為 left==right 的時候還沒結束,因為還有一個元素我們沒查看到。


Q : Is incrementing left correct in the else statement too? Don't we need to check it again if we've swapped it?
A : 增加 left 值是 ok,雖然理論上 left 有可能 overflow,但實際上 overflow 並不太可能發生,因為要 overflow 的話數字要很大,而 underflow 的話數字不是很大,只要是小陣列就有可能觸發 underflow。

Q : can you add a check that the list is sorted after bench? just to be sure
A : 可以。

  • nightly 版本支援的用法,修改 src/bin/main.rs :
fn bench<T: Ord + Clone, S: Sorter>( sorter: S, values: &[SortEvaluator<T>], counter: &Cell<usize> ) -> usize { let mut values: Vec<_> = values.into_iter().cloned().collect(); counter.set(0); sorter.sort(&mut values); + assert!(values.is_sorted()); counter.get() }
  • 自己寫,修改 src/bin/main.rs:
fn bench<T: Ord + Clone, S: Sorter>( sorter: S, values: &[SortEvaluator<T>], counter: &Cell<usize> ) -> usize { let mut values: Vec<_> = values.into_iter().cloned().collect(); counter.set(0); sorter.sort(&mut values); + for i in 1..values.len() { + assert!(values[i] >= values[i - 1]); + } counter.get() }

:bulb: 也可以使用 is_sorted crate


:question: 2:17:00
Q : can't you increment on both Ords, since the Ord implementation will never call ord on the wrapper but only on T?
A : 我們可以這麼做,因為這最終會重定向,或者最終會直接呼叫 T。修改 src/bin/main.rs :

... impl<T: Ord> Ord for SortEvaluator<T> { fn cmp(&self, other: &Self) -> Ordering { - self.partial_cmp(other).expect("T: Ord") + self.cmps.set(self.cmps.get() + 1); + self.t.cmp(&other.t) } } ...

Equality 也可以做相同的事。修改 src/bin/main.rs :

... impl<T: PartialEq> PartialEq for SortEvaluator<T> { fn eq(&self, other: &Self) -> bool { + self.cmps.set(self.cmps.get() + 1); self.t == other.t } } ...

Q : wont checking if array is sorted also increment the counter?
A : 對,修改 src/bin/main.rs :

fn bench<T: Ord + Clone, S: Sorter>( sorter: S, values: &[SortEvaluator<T>], counter: &Cell<usize> ) -> usize { let mut values: Vec<_> = values.into_iter().cloned().collect(); counter.set(0); sorter.sort(&mut values); + let count = counter.get(); for i in 1..values.len() { assert!(values[i] >= values[i - 1]); } - counter.get(); + count }

修正完之後再次測試 :

$ cargo run --release
...
bubble 10000 99130086
insertion-smart 10000 118962
insertion-dumb 10000 25110465
selection 10000 49995000
quick 10000 224603
...
目前程式碼 (Tag)

src/bin/main.rs :

use orst::*; use std::cmp::Ordering; use rand::prelude::*; use std::cell::Cell; use std::rc::Rc; #[derive(Clone)] struct SortEvaluator<T> { t: T, cmps: Rc<Cell<usize>>, } impl<T: PartialEq> PartialEq for SortEvaluator<T> { fn eq(&self, other: &Self) -> bool { self.cmps.set(self.cmps.get() + 1); self.t == other.t } } impl<T: Eq> Eq for SortEvaluator<T> {} impl<T: PartialOrd> PartialOrd for SortEvaluator<T> { fn partial_cmp(&self, other: &Self) -> Option<Ordering> { self.cmps.set(self.cmps.get() + 1); self.t.partial_cmp(&other.t) } } impl<T: Ord> Ord for SortEvaluator<T> { fn cmp(&self, other: &Self) -> Ordering { self.cmps.set(self.cmps.get() + 1); self.t.cmp(&other.t) } } fn main() { let mut rand = rand::thread_rng(); let counter = Rc::new(Cell::new(0)); for &n in &[0, 1, 10, 100, 1000, 10000] { for _ in 0..10 { let mut values = Vec::with_capacity(n); for _ in 0..n{ values.push(SortEvaluator { t: rand.gen::<usize>(), cmps: Rc::clone(&counter), }); } println!("algorithm n comparisons"); let took = bench(BubbleSort, &values, &counter); println!("{} {} {}", "bubble", n, took); let took = bench(InsertionSort {smart : true}, &values, &counter); println!("{} {} {}", "insertion-smart", n, took); let took = bench(InsertionSort {smart : false}, &values, &counter); println!("{} {} {}", "insertion-dumb", n, took); let took = bench(SelectionSort, &values, &counter); println!("{} {} {}", "selection", n, took); let took = bench(QuickSort, &values, &counter); println!("{} {} {}", "quick", n, took); } } } fn bench<T: Ord + Clone, S: Sorter>( sorter: S, values: &[SortEvaluator<T>], counter: &Cell<usize> ) -> usize { let mut values: Vec<_> = values.into_iter().cloned().collect(); counter.set(0); sorter.sort(&mut values); let count = counter.get(); for i in 1..values.len() { assert!(values[i] >= values[i - 1]); } counter.get(); count }

src/lib.rs :

pub trait Sorter { fn sort<T>(&self, slice: &mut[T]) where T: Ord; } mod bubblesort; mod insertionsort; mod selectionsort; mod quicksort; pub use bubblesort::BubbleSort; pub use insertionsort::InsertionSort; pub use quicksort::QuickSort; pub use selectionsort::SelectionSort; pub struct StdSorter; impl Sorter for StdSorter { fn sort<T>(&self, slice: &mut [T]) where T: Ord, { slice.sort(); } } #[cfg(test)] mod tests { use super::*; #[test] fn std_works() { let mut things = vec![4, 2, 3, 1]; StdSorter.sort(&mut things); assert_eq!(things, &[1, 2, 3, 4]); } }

將評估結果導到 values.dat :

$ cargo run --release > values.dat

Q : how bad is using nightly for an binary?
A : 如果不需要用就不要用。

Evaluating the results

2:18:56

:pencil2: 安裝 gnuplot-x11

$ cat values.dat | gnuplot -p -e 'plot "-" using 2:3 title 1'
WARNING: Plotting with an 'unknown' terminal.
No output will be generated. Please select a terminal with 'set terminal'
$ sudo apt-get install gnuplot-x11

執行以下程式沒辦法得到我們想要的結果,必須要用比較複雜的語法才可以,原本是希望第 1 欄當作 title,第 2 以及第 3 欄當作橫軸以及縱軸的資料 :

cat values.dat | gnuplot -p -e 'plot "-" using 2:3 title 1'

產出的圖如下 :
image

放棄使用 gnuplot,改用 R 語言的 ggplot2 package 來畫圖。

:pencil2: 安裝 R 語言以及 ggplot2 package :

sudo apt install r-base-core
$ R
> install.packages("ggplot2")

開始畫圖,方式有兩種 :

  • CLI :
    ​​​​$ R
    ​​​​> library(ggplot2)
    ​​​​> t <- read.table('values.dat', header = TRUE)
    ​​​​> t
    ​​​​> ggplot(t, aes(n, comparisons, colour = algorithm)) + geom_point() + geom_smooth()
    
  • 腳本 :
    1. 編寫腳本 plot.r :
      ​​​​​​​​#!/usr/bin/Rscript
      ​​​​​​​​library(ggplot2)
      ​​​​​​​​t <- read.table('values.dat', header = TRUE)
      ​​​​​​​​t
      ​​​​​​​​ggplot(t, aes(n, comparisons, colour = algorithm)) + geom_point() + geom_smooth()
      
    2. 執行腳本 :
      ​​​​​​​​$ R < plot.r --save
      ​​​​​​​​$ open Rplots.pdf
      

從圖可以看到,bubble sort 表現的最爛,insertion-smart 以及 quick sort 表現的最好 :
image

目前程式碼

搜尋 "目前程式碼 (Tag)"

Q : we have now pivoted to the crust of R episode
A : XDD

多測一個標準函式庫的排序演算法
src/lib.rs :

... pub use bubblesort::BubbleSort; pub use insertionsort::InsertionSort; pub use quicksort::QuickSort; pub use selectionsort::SelectionSort; +pub struct StdSorter; +impl Sorter for StdSorter +{ + fn sort<T>(&self, slice: &mut [T]) + where + T: Ord, + { + slice.sort(); + } +} #[cfg(test)] mod tests { use super::*; - struct StdSorter; - impl Sorter for StdSorter - { - fn sort<T>(&self, slice: &mut [T]) - where - T: Ord, - { - slice.sort(); - } - } ... }

修改 src/bin/main.rs 的 main 函式 :

fn main() { ... let took = bench(QuickSort, &values, &counter); println!("{} {} {}", "quick", n, took); + let took = bench(StdSorter, &values, &counter); + println!("{} {} {}", "std", n, took); } } }

再次輸出並畫圖 :

$ cargo run --release > values.dat
$ R < plot.r --save
$ open Rplots.pdf

image

目前程式碼

src/bin/main.rs :

use orst::*; use std::cmp::Ordering; use rand::prelude::*; use std::cell::Cell; use std::rc::Rc; #[derive(Clone)] struct SortEvaluator<T> { t: T, cmps: Rc<Cell<usize>>, } impl<T: PartialEq> PartialEq for SortEvaluator<T> { fn eq(&self, other: &Self) -> bool { self.cmps.set(self.cmps.get() + 1); self.t == other.t } } impl<T: Eq> Eq for SortEvaluator<T> {} impl<T: PartialOrd> PartialOrd for SortEvaluator<T> { fn partial_cmp(&self, other: &Self) -> Option<Ordering> { self.cmps.set(self.cmps.get() + 1); self.t.partial_cmp(&other.t) } } impl<T: Ord> Ord for SortEvaluator<T> { fn cmp(&self, other: &Self) -> Ordering { self.cmps.set(self.cmps.get() + 1); self.t.cmp(&other.t) } } fn main() { let mut rand = rand::thread_rng(); let counter = Rc::new(Cell::new(0)); for &n in &[0, 1, 10, 100, 1000, 10000] { for _ in 0..10 { let mut values = Vec::with_capacity(n); for _ in 0..n{ values.push(SortEvaluator { t: rand.gen::<usize>(), cmps: Rc::clone(&counter), }); } println!("algorithm n comparisons"); let took = bench(BubbleSort, &values, &counter); println!("{} {} {}", "bubble", n, took); let took = bench(InsertionSort {smart : true}, &values, &counter); println!("{} {} {}", "insertion-smart", n, took); let took = bench(InsertionSort {smart : false}, &values, &counter); println!("{} {} {}", "insertion-dumb", n, took); let took = bench(SelectionSort, &values, &counter); println!("{} {} {}", "selection", n, took); let took = bench(QuickSort, &values, &counter); println!("{} {} {}", "quick", n, took); let took = bench(StdSorter, &values, &counter); println!("{} {} {}", "std", n, took); } } } fn bench<T: Ord + Clone, S: Sorter>( sorter: S, values: &[SortEvaluator<T>], counter: &Cell<usize> ) -> usize { let mut values: Vec<_> = values.into_iter().cloned().collect(); counter.set(0); sorter.sort(&mut values); let count = counter.get(); for i in 1..values.len() { assert!(values[i] >= values[i - 1]); } counter.get(); count }

*src/lib.rs :

pub trait Sorter { fn sort<T>(&self, slice: &mut[T]) where T: Ord; } mod bubblesort; mod insertionsort; mod selectionsort; mod quicksort; pub use bubblesort::BubbleSort; pub use insertionsort::InsertionSort; pub use quicksort::QuickSort; pub use selectionsort::SelectionSort; pub struct StdSorter; impl Sorter for StdSorter { fn sort<T>(&self, slice: &mut [T]) where T: Ord, { slice.sort(); } } #[cfg(test)] mod tests { use super::*; #[cfg(test)] mod tests { use super::*; #[test] fn std_works() { let mut things = vec![4, 2, 3, 1]; StdSorter.sort(&mut things); assert_eq!(things, &[1, 2, 3, 4]); } } }

plot.r :

#!/usr/bin/Rscript library(ggplot2) t <- read.table('values.dat', header = TRUE) t ggplot(t, aes(n, comparisons, colour = algorithm)) + geom_point() + geom_smooth()

將圖取對數,修改 plot.r :

#!/usr/bin/Rscript library(ggplot2) t <- read.table('values.dat', header = TRUE) t -ggplot(t, aes(n, comparisons, colour = algorithm)) + geom_point() +ggplot(t, aes(n, comparisons, colour = algorithm)) + geom_point() + geom_smooth() + scale_x_log10() + scale_y_log10()

再次執行腳本產生圖 :

$ R < plot.r --save
$ open Rplots.pdf

改取 log 之後發現,insertion-smart 的比較次數其實比 quick 還要少,感覺很奇怪,不過這裡是衡量比較次數而已,並沒有把交換的次數考慮進來,所以其實還是 quick 會比較快 :
image

目前程式碼

src/bin/main.rs :

use orst::*; use std::cmp::Ordering; use rand::prelude::*; use std::cell::Cell; use std::rc::Rc; #[derive(Clone)] struct SortEvaluator<T> { t: T, cmps: Rc<Cell<usize>>, } impl<T: PartialEq> PartialEq for SortEvaluator<T> { fn eq(&self, other: &Self) -> bool { self.cmps.set(self.cmps.get() + 1); self.t == other.t } } impl<T: Eq> Eq for SortEvaluator<T> {} impl<T: PartialOrd> PartialOrd for SortEvaluator<T> { fn partial_cmp(&self, other: &Self) -> Option<Ordering> { self.cmps.set(self.cmps.get() + 1); self.t.partial_cmp(&other.t) } } impl<T: Ord> Ord for SortEvaluator<T> { fn cmp(&self, other: &Self) -> Ordering { self.cmps.set(self.cmps.get() + 1); self.t.cmp(&other.t) } } fn main() { let mut rand = rand::thread_rng(); let counter = Rc::new(Cell::new(0)); for &n in &[0, 1, 10, 100, 1000, 10000] { for _ in 0..10 { let mut values = Vec::with_capacity(n); for _ in 0..n{ values.push(SortEvaluator { t: rand.gen::<usize>(), cmps: Rc::clone(&counter), }); } println!("algorithm n comparisons"); let took = bench(BubbleSort, &values, &counter); println!("{} {} {}", "bubble", n, took); let took = bench(InsertionSort {smart : true}, &values, &counter); println!("{} {} {}", "insertion-smart", n, took); let took = bench(InsertionSort {smart : false}, &values, &counter); println!("{} {} {}", "insertion-dumb", n, took); let took = bench(SelectionSort, &values, &counter); println!("{} {} {}", "selection", n, took); let took = bench(QuickSort, &values, &counter); println!("{} {} {}", "quick", n, took); let took = bench(StdSorter, &values, &counter); println!("{} {} {}", "std", n, took); } } } fn bench<T: Ord + Clone, S: Sorter>( sorter: S, values: &[SortEvaluator<T>], counter: &Cell<usize> ) -> usize { let mut values: Vec<_> = values.into_iter().cloned().collect(); counter.set(0); sorter.sort(&mut values); let count = counter.get(); for i in 1..values.len() { assert!(values[i] >= values[i - 1]); } counter.get(); count }

*src/lib.rs :

pub trait Sorter { fn sort<T>(&self, slice: &mut[T]) where T: Ord; } mod bubblesort; mod insertionsort; mod selectionsort; mod quicksort; pub use bubblesort::BubbleSort; pub use insertionsort::InsertionSort; pub use quicksort::QuickSort; pub use selectionsort::SelectionSort; pub struct StdSorter; impl Sorter for StdSorter { fn sort<T>(&self, slice: &mut [T]) where T: Ord, { slice.sort(); } } #[cfg(test)] mod tests { use super::*; #[test] fn std_works() { let mut things = vec![4, 2, 3, 1]; StdSorter.sort(&mut things); assert_eq!(things, &[1, 2, 3, 4]); } }

plot.r :

#!/usr/bin/Rscript library(ggplot2) t <- read.table('values.dat', header = TRUE) t ggplot(t, aes(n, comparisons, colour = algorithm)) + geom_point() + geom_smooth() + scale_x_log10() + scale_y_log10()

接著測試執行時間,修改 src/bin/main.rs 的 main 函式以及 bench 函式 :

fn main() { let mut rand = rand::thread_rng(); let counter = Rc::new(Cell::new(0)); for &n in &[0, 1, 10, 100, 1000, 10000] { for _ in 0..10 { let mut values = Vec::with_capacity(n); for _ in 0..n{ values.push(SortEvaluator { t: rand.gen::<usize>(), cmps: Rc::clone(&counter), }); } - println!("algorithm n comparisons"); + println!("algorithm n comparisons time"); let took = bench(BubbleSort, &values, &counter); - println!("{} {} {}", "bubble", n, took); + println!("{} {} {} {}", "bubble", n, took.0, took.1); let took = bench(InsertionSort {smart : true}, &values, &counter); - println!("{} {} {}", "insertion-smart", n, took); + println!("{} {} {} {}", "insertion-smart", n, took.0, took.1); let took = bench(InsertionSort {smart : false}, &values, &counter); - println!("{} {} {}", "insertion-dumb", n, took); + println!("{} {} {} {}", "insertion-dumb", n, took.0, took.1); let took = bench(SelectionSort, &values, &counter); - println!("{} {} {}", "selection", n, took); + println!("{} {} {} {}", "selection", n, took.0, took.1); let took = bench(QuickSort, &values, &counter); - println!("{} {} {}", "quick", n, took); + println!("{} {} {} {}", "quick", n, took.0, took.1); let took = bench(StdSorter, &values, &counter); - println!("{} {} {}", "std", n, took); + println!("{} {} {} {}", "std", n, took.0, took.1); } } } fn bench<T: Ord + Clone, S: Sorter>( sorter: S, values: &[SortEvaluator<T>], counter: &Cell<usize> -) -> usize +) -> (usize, f64) { let mut values: Vec<_> = values.into_iter().cloned().collect(); counter.set(0); + let time = std::time::Instant::now(); sorter.sort(&mut values); + let took = time.elapsed(); let count = counter.get(); for i in 1..values.len() { assert!(values[i] >= values[i - 1]); } counter.get(); - count + (count, took.as_secs_f64()) }

修改腳本:

#!/usr/bin/Rscript library(ggplot2) t <- read.table('values.dat', header = TRUE) t -ggplot(t, aes(n, time, colour = algorithm)) + geom_point() + geom_smooth() + scale_x_log10() + scale_y_log10() +ggplot(t, aes(n, time, colour = algorithm)) + geom_point() + scale_x_log10() + scale_y_log10() + geom_line()

再次輸出並畫圖 :

$ cargo run --release > values.dat
$ R < plot.r --save
$ open Rplots.pdf

image

目前程式碼

src/bin/main.rs :

use orst::*; use std::cmp::Ordering; use rand::prelude::*; use std::cell::Cell; use std::rc::Rc; #[derive(Clone)] struct SortEvaluator<T> { t: T, cmps: Rc<Cell<usize>>, } impl<T: PartialEq> PartialEq for SortEvaluator<T> { fn eq(&self, other: &Self) -> bool { self.cmps.set(self.cmps.get() + 1); self.t == other.t } } impl<T: Eq> Eq for SortEvaluator<T> {} impl<T: PartialOrd> PartialOrd for SortEvaluator<T> { fn partial_cmp(&self, other: &Self) -> Option<Ordering> { self.cmps.set(self.cmps.get() + 1); self.t.partial_cmp(&other.t) } } impl<T: Ord> Ord for SortEvaluator<T> { fn cmp(&self, other: &Self) -> Ordering { self.cmps.set(self.cmps.get() + 1); self.t.cmp(&other.t) } } fn main() { let mut rand = rand::thread_rng(); let counter = Rc::new(Cell::new(0)); for &n in &[0, 1, 10, 100, 1000, 10000] { for _ in 0..10 { let mut values = Vec::with_capacity(n); for _ in 0..n{ values.push(SortEvaluator { t: rand.gen::<usize>(), cmps: Rc::clone(&counter), }); } println!("algorithm n comparisons time"); let took = bench(BubbleSort, &values, &counter); println!("{} {} {} {}", "bubble", n, took.0, took.1); let took = bench(InsertionSort {smart : true}, &values, &counter); println!("{} {} {} {}", "insertion-smart", n, took.0, took.1); let took = bench(InsertionSort {smart : false}, &values, &counter); println!("{} {} {} {}", "insertion-dumb", n, took.0, took.1); let took = bench(SelectionSort, &values, &counter); println!("{} {} {} {}", "selection", n, took.0, took.1); let took = bench(QuickSort, &values, &counter); println!("{} {} {} {}", "quick", n, took.0, took.1); let took = bench(StdSorter, &values, &counter); println!("{} {} {} {}", "std", n, took.0, took.1); } } } fn bench<T: Ord + Clone, S: Sorter>( sorter: S, values: &[SortEvaluator<T>], counter: &Cell<usize> ) -> (usize, f64) { let mut values: Vec<_> = values.into_iter().cloned().collect(); counter.set(0); let time = std::time::Instant::now(); sorter.sort(&mut values); let took = time.elapsed(); let count = counter.get(); for i in 1..values.len() { assert!(values[i] >= values[i - 1]); } counter.get(); (count, took.as_secs_f64()) }

src/lib.rs :

pub trait Sorter { fn sort<T>(&self, slice: &mut[T]) where T: Ord; } mod bubblesort; mod insertionsort; mod selectionsort; mod quicksort; pub use bubblesort::BubbleSort; pub use insertionsort::InsertionSort; pub use quicksort::QuickSort; pub use selectionsort::SelectionSort; pub struct StdSorter; impl Sorter for StdSorter { fn sort<T>(&self, slice: &mut [T]) where T: Ord, { slice.sort(); } } #[cfg(test)] mod tests { use super::*; #[test] fn std_works() { let mut things = vec![4, 2, 3, 1]; StdSorter.sort(&mut things); assert_eq!(things, &[1, 2, 3, 4]); } }

plot.r :

#!/usr/bin/Rscript library(ggplot2) t <- read.table('values.dat', header = TRUE) t ggplot(t, aes(n, time, colour = algorithm)) + geom_point() + scale_x_log10() + scale_y_log10() + geom_line()

Q : could we try an all decreasing slice, I think quick gets pretty bad with that
A : 確實輸入資料為反序的話 quick sort 表現很差,但我們這次直播主要並不是討論排序演算法本身,而是如何用 Rust 去實作排序法,所以我們才隨機的產生資料。實際上你也會想要用到 Timsort 之類的東西,而不是使用我們實作的其中任何一個。

待整理

  1. 1:02:45
  2. Object Safety
  3. Benchmarking 整個小節
  4. 2:17:00