Try   HackMD

Crust of Rust: Lifetime Annotations

直播錄影

  • 主機資訊
    ​​​​wilson@wilson-HP-Pavilion-Plus-Laptop-14-eh0xxx ~/CrustOfRust> neofetch --stdout
    ​​​​wilson@wilson-HP-Pavilion-Plus-Laptop-14-eh0xxx 
    ​​​​----------------------------------------------- 
    ​​​​OS: Ubuntu 22.04.3 LTS x86_64 
    ​​​​Host: HP Pavilion Plus Laptop 14-eh0xxx 
    ​​​​Kernel: 6.2.0-37-generic 
    ​​​​Uptime: 22 mins 
    ​​​​Packages: 2367 (dpkg), 11 (snap) 
    ​​​​Shell: bash 5.1.16 
    ​​​​Resolution: 2880x1800 
    ​​​​DE: GNOME 42.9 
    ​​​​WM: Mutter 
    ​​​​WM Theme: Adwaita 
    ​​​​Theme: Yaru-dark [GTK2/3] 
    ​​​​Icons: Yaru [GTK2/3] 
    ​​​​Terminal: gnome-terminal 
    ​​​​CPU: 12th Gen Intel i5-12500H (16) @ 4.500GHz 
    ​​​​GPU: Intel Alder Lake-P 
    ​​​​Memory: 2876MiB / 15695MiB 
    
  • Rust 編譯器版本 :
    ​​​​wilson@wilson-HP-Pavilion-Plus-Laptop-14-eh0xxx ~/CrustOfRust> rustc --version
    ​​​​rustc 1.70.0 (90c541806 2023-05-31) (built from a source tarball)
    

Introduction

0:00:00

In the 2019 Rust Survey, a lot of people were asking for video content covering intermediate Rust content. So in this first video (possibly of many), we're going to investigate a case where you need multiple explicit lifetime annotations. We explore why they are needed, and why we need more than one in this particular case. We also talk about some of the differences between the string types and introduce generics over a self-defined trait in the process.

Q: Will I be able to follow at all if I have never seen rust before? I have done python and some C/C++ though
A: 不確定,這影片是在你已經看過 Rust 書籍的前提下適合去觀看的。

Rust Survey 2019 Results

Start a rust project

0:03:36
開始建置 Rust 專案 :

$ cargo new --lib strsplit
$ cd strsplit
$ vim src/lib.rs

程式的一開始先加上 warn 的 prelude :

#![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)]

使用 warn 而不是 deny 是因為這個隨著時間推移,編譯器會變聰明因而影響到某些 Lintz,你不會想 Lintz 破壞你程式碼的編譯就因為你用更後面的編譯器來做編譯。在初始開發階段不想要收到這些警告,不然 debug 資訊會讓你失焦,這裡只是讓你知道加上這個 prelude 讓你開發中後期不會忘記一些需要處理的小細節。

Struct and method definitions for StrSplit and first test

0:05:20

先寫出 StrSplit 的建構式原型 :

pub struct StrSplit {} impl StrSplit { pub fn new(haystack: &str, delimiter: &str) -> Self {} }

haystack 是你要搜尋的東西,delimiter 是用來分割東西。回傳 Self 型別,Self 用來引用 impl 區塊,這裡不回傳 str 型別是因為如果當 StrSplit 重新給定型別時,不用更改回傳型態,這樣程式碼比較靈活一些。

接著為 StrSplit 實作 Iterator 的功能 :

// let x: StrSplit; // for part in x { // } impl Iterator for StrSplit { type Item = &str; fn next(&mut self) -> Option<Self::Item> {} }

for 迴圈其實是在呼叫 xxx.next(),持續迭代取到 Some 值,終止條件是取到 None 值。

再來為函式庫寫 test case :

#[test] fn it_works() { let haystack = "a b c d e"; let letters = StrSplit::new(haystack, " "); assert_eq!(letters, vec!["a", "b", "c", "d", "e"].into_iter()); }

Q: equality comparison compares element-wise?
(assert_eq!(letters, vec!["a", "b", "c", "d", "e"].into_iter());)
A: comparision 是 element-wise 方式去比較是否全部東西相同。

Q: Will Higher-Kind lifetimes to be covered?
A: 沒有。WK : Using higher-ranked trait bounds with generics 有時間看一下。

Q: Won't that just add noise when debugging early prototypes? (prelude 那行)
A: 在開發的初始階段你可能不會想要收到 prelude 的警告,因為你開發初期並沒有就先撰寫文件,和符合一些規範,編譯器會跳一堆實際上不影響你編譯的警告,這時候如果程式碼有地方有錯誤,反而警告訊息會讓你失焦導致除錯不容易一些,等到程式開發到一定程度在開啟 prelude 的警告會比較好。

How you decide between a library and a binary

0:09:32

Q: how do you decide between library and binary and how do your check the library output results while coding?
用命令行執行的都是二進位檔,其餘的都是函式庫。二進位檔會創出 source main,函式庫會創出 source lib,你可以在你的 crate 同時擁有這兩種形式。至於測試函式庫的方法就是寫 test case。

Q: what do you use to mock external dependencies in your projects? I have tried mockall unit testing library but am hoping to find something that does not rely on traits for mocking.
A: 不在本影片探討,不過有好方法去做。

:question: 0:10:22 需要再了解是什麼意思 !
Q: i thought all loops desugared to loop with a break condition
A: while loop desuger to loops as well,for 轉換成
while 比較容易去解釋,but you're write the deeper down while 轉成 loop

Start implementing StrSplit

0:10:58

回頭定義 StrSplit 需要什麼欄位 :

pub struct StrSplit { remainder: &str, delimiter: &str, }

remainder 是程式還沒看到的剩餘字串,而 delimiter 是用來分割字串的。

StrSplit 的建構式完整實作 :

impl StrSplit { pub fn new(haystack: &str, delimiter: &str) -> Self { Self { remainder: haystack, delimiter, } } }

欄位和傳入參數有相同的名稱時 (Line 6),可以不用放 :,這樣可以達到程式碼去重的效果。只有在欄位和傳入參數有不相同的名稱時 (Line 5) 才要使用到 :

這裡為什麼要將搜尋字串的變數名稱在 StrSplit 外部叫做 haystack,而在 StrSplit 內部叫做 remainder,因為 StrSplit 內部的字串每次都會處理一部分的字串,然後剩一些尚未處理的字串,接著繼續從尚未處理的字串繼續處理,直到全部的字元都看過為止,所以 StrSplit 內部才會將變數名稱取作 remainder

繼續將 StrSplitIterator 的功能實作的更完整 :

impl Iterator for StrSplit { type Item = &str; fn next(&mut self) -> Option<Self::Item> { if let Some(next_delim) = self.remainder.find(self.delimiter) { let until_delimiter = &self.remainder[..next_delim]; self.remainder = &self.remainder[(next_delim + self.delimiter.len())..]; Some(until_delimiter) } else if self.remainder.is_empty() { // TODO: bug None } else { let rest = self.remainder; self.remainder = ""; Some(rest) } } }

上面程式碼的目標是從 remainder 找到下一個 delimiter,並以 delimiter 作為分界點來分割字串,回傳分割字串的前半部分,並將 remainder 設為分割字串的後半部,重複上述動作直到處理完全部字串。

Line 6 用 if let 的原因是,在搜尋 remainder 內部尋找 delimiter 有兩種可能的結果,一種是有找到,一種是沒找到

  • 找到的話就可以讓 next_delim 透過模式比對的方式,左式為 Some,右式為 Option,因為 Some 只是 Option 的一個包裝值類型,所以匹配,接著將 Option 內的值指派給 next_delim,而不是直接把整個 Option 的值指派給 next_delim
  • 沒找到的話,則因為左式為 Some,右式為 None 而不會進入 if 的條件內部。

Line 7 的 [..next_delimiter] 中 .. 表示字串的起始位置,Line 8 的 [(next_delim + self.delimiter.len())..] 中.. 表示字串的結束位置。

如果在 Line 13 的 else 條件只回傳 self.remainder 而沒有執行 self.remainder = ""; 將會導致一直進入 else 條件的無窮迴圈。

Q: Is the cascaded Self {} really the "preferred" way of implementing that? I'm very much new to Rust and it seems a bit odd coming from other languages
A: Jon 喜歡 Self,這樣比較程式碼比較靈活,但要付出的代價有兩個

  1. 你不能做 local resoning,你必須要弄清楚 impl block 內用到的資料型別是什麼,程式碼短的時候還不會造成太大的困擾,但程式碼如果很長就比較棘手了
  2. 你也必須先前版本的編譯器沒有 Self 這麼彈性的功能,不過這功能很早就加進來了,影響不大。

Q: Jon, maybe you could explain later when should I use associated types VS generics, they don't look that different to me, and thus I always use generics
A: 使用泛型的時機是多個實作用到那個 trait 給定的型別,使用 associated type 的時機是只有一個實作用到那個 trait 給定的型別。

When to use match vs if let some

0:16:15

Q: When do you use match ... with vs if let Some(...)?
A: 如果有多個模式要比對請使用 match,如果只有一個模式要比對,請用 if let

Doesn't compile! missing lifetime specifier

0:17:10

目前程式碼
pub struct StrSplit { remainder: &str, delimiter: &str, } impl StrSplit { pub fn new(haystack: &str, delimiter: &str) -> Self { Self { remainder: haystack, delimiter, } } } impl Iterator for StrSplit { type Item = &str; fn next(&mut self) -> Option<Self::Item> { if let Some(next_delim) = self.remainder.find(self.delimiter) { let until_delimiter = &self.remainder[..next_delim]; self.remainder = &self.remainder[(next_delim + self.delimiter.len())..]; Some(until_delimiter) } else if self.remainder.is_empty() { // TODO: bug None } else { let rest = self.remainder; self.remainder = ""; Some(rest) } } } #[test] fn it_works() { let haystack = "a b c d e"; let letters = StrSplit::new(haystack, " "); assert_eq!(letters, vec!["a", "b", "c", "d", "e"].into_iter()); }

將目前的程式碼嘗試編譯看看,會得到錯誤訊息,以下只摘錄關鍵部分 :

$ cargo test
...
error[E0106]: missing lifetime specifier
...
$ cargo check  # 可以看到不重複的錯誤訊息

接著依照編譯器的提示修改程式碼。

目前程式碼
#![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)] pub struct StrSplit<'a> { remainder: &'a str, delimiter: &'a str, } impl StrSplit<'_> { pub fn new(haystack: &str, delimiter: &str) -> Self { Self { remainder: haystack, delimiter, } } } impl Iterator for StrSplit<'_> { type Item = &str; fn next(&mut self) -> Option<Self::Item> { if let Some(next_delim) = self.remainder.find(self.delimiter) { let until_delimiter = &self.remainder[..next_delim]; self.remainder = &self.remainder[(next_delim + self.delimiter.len())..]; Some(until_delimiter) } else if self.remainder.is_empty() { // TODO: bug None } else { let rest = self.remainder; self.remainder = ""; Some(rest) } } } #[test] fn it_works() { let haystack = "a b c d e"; let letters = StrSplit::new(haystack, " "); assert_eq!(letters, vec!["a", "b", "c", "d", "e"].into_iter()); }

next() 函式部分講解 : <'_> 匿名生命週期,處理機制基本上就跟型別推斷一樣 (讓編譯器自己去推斷每個東西的生命週期的長度)。type Item = &str; 也需要給它生命週期參數是因為回傳的型態是 &str (指向 str 的指標),Rust 不知道程式要持有這個指標多長的時間,這樣無法確保值的生命週期比指標的生命週期還要長 (我們不想要指標的生命週期比值的生命週期還要長的原因是因為我們不要迷途指標的情形發生)

:question: 0:18:55
需要再釐清一下觀念!

Item 也指派生命週期參數 :

impl<'a> Iterator for StrSplit<'a>
{
    type Item = &'a str;
    fn next(&mut self) -> Option<Self::Item>

Item 生命週期的參數,等於也給了 next() 函式回傳值生命週期參數。當每次呼叫 next() 函式時,都會傳入 &mut self,這樣就可以把生命週期跟 StrSplit 綁在一起 (因為 fn next(&mut self) -> Option<Self::Item> 原始樣貌是 fn next(&mut StrSplit<'a>) -> Option<Self::&'a str> ),而 StrSplit 之前又跟傳入參數 haystack 的生命週期綁在一起,這樣等於把回傳值的生命週期也跟 haystack 的生命週期也綁在一起了。

到目前的程式碼修改,還是無法編譯成功,後面繼續改進。

:pencil2: 題外話,若將 StrSplit後面的生命週期參數從 <'a> 改成 <'_> 並進行編譯,編譯器會回報錯誤 : error[E0207]: the lifetime parameter 'a is not constrained by the impl trait, self type, or predicates

Can I be wrong by specifying lifetimes?

0:20:33
Q: can I be wrong by specifying lifetimes?"
A: 並不會,因為錯誤的生命週期無法編譯,就像是你不小心用了錯誤型別,最終你呼叫函式的時候,你必須提供某個型別,但你卻給了別的型別,這時候編譯器會比對函式要的型別會因為不吻合而造成編譯失敗。

Anonymous lifetime '_

0:21:25

Q: how to tell where anonymous lifetime can be used?
A: <'_> 告訴編譯器自己去推每個東西的生命週期,能讓編譯器這麼做的情境只有在一種可能的猜測的情況下才能這麼做 (並不表示同一個 impl 區塊的 '_ 不能表示不同的生命週期,實際上,用到 '_ 生命週期參數的東西都有自己獨一無二的生命週期,東西之間可能有些生命週期是一樣長的)。
進一步說明 :

impl Foo 
{
    fn get_ref(&self) -> &'_ str {}
}

get_ref() 函式這裡有生命週期的只有 &self 的生命週期,所以編譯器可以推出回傳值的生命週期會跟傳入的 &self 參數一樣長,因此不用寫成以下的形式 :

impl Foo 
{
    fn get_ref<'a>(&'a self) -> &'a str {}
}

Q: What is the difference between 'a and '_ ?
A: '_ 用底線是告訴編譯器自己去推斷生命週期,因為我們知道編譯器只有一種可能可以選擇,這時候就能放心交給編譯去做,不用自己特別去處理。而 'a 是 specific 生命週期,有點像泛型的 T

Order lifetimes based on how long they are

0:23:10

Q: Is there any kind of ordering on lifetime specifiers? Like, is 'a > 'b? Or is it just a way of grouping references together as a unit?
A: Yes, subtyping
ex. special lifetime: 'static 存活時間為宣告到剩餘整個程式結束。所以你可以有一個 'a 的 lifettime < 'static lifetime,lifetime 變數名稱不重要,你要叫 'b 也行

Q: 編譯器怎麼知道它是錯誤的卻沒辦法推論它?
A: 範例程式碼如下:

fn multiply(x: (), y: i32) -> i32 
{
    
}

編譯器知道這是錯的,因為編譯器不知道 x 是什麼型別,只有你自己知道,因為 unit 是編譯器不知道的型別。所以編譯器不能告訴你正確答案。

:question: 0:24:43
Q: why would you not elide the lifetime if your leaving the ’_ in the type
A: you basically want to elide whenever you can,???在某些情況下,'_ don't consider this lifetime for the purpose of guessing

Q: there is a way to use multiple lifetimes specifiers at same impl?
A: 後面會講到

Q: Is there any kind of ordering on lifetime specifiers? Like, is 'a > 'b? Or is it just a way of grouping references together as a unit?
A: 有的,但本次不會用到

Anonymous lifetime '_ (with multiple lifetimes)

0:25:18

Q: Does '_ can be used when there is only one possible lifetime? So the compiler can guess properly
A: 並非如此, 請看以下兩個範例皆可 :

  • 泛型:
    ​​​​fn foo<'x, 'y>(x: &'x str y: &'y str) -> &'x str {}
    
  • '_:
    ​​​​fn foo(x: &str y: &'_ str) -> &'_ str {}
    
    傳入參數的 '_ 會轉換成任意獨一無二的生命週期,沒有人跟它一樣。回傳值的 '_ 則是推斷生命週期會綁在傳入參數 x 上而不是傳入參數 y 上,因為傳入參數 y 有自己的生命週期。

Q: So in other words, the life time of StrSplit.remainder and StrSplit.delimiter, is now tied to the lifetime of the StrSplit itself?

pub struct StrSplit<'a> { remainder: &'a str, delimiter: &'a str, }

A: 並非如此。 WK: StrSplit.remainderStrSplit.delimiter 是會綁在傳入參數的生命週期而不是 StrSplit 本身。

Compile error: lifetime of reference outlives lifetime of borrowed content

0:26:52

目前程式碼
#![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)] pub struct StrSplit<'a> { remainder: &'a str, delimiter: &'a str, } impl StrSplit<'_> { pub fn new(haystack: &str, delimiter: &str) -> Self { Self { remainder: haystack, delimiter, } } } impl<'a> Iterator for StrSplit<'a> { type Item = &'a str; fn next(&mut self) -> Option<Self::Item> { if let Some(next_delim) = self.remainder.find(self.delimiter) { let until_delimiter = &self.remainder[..next_delim]; self.remainder = &self.remainder[(next_delim + self.delimiter.len())..]; Some(until_delimiter) } else if self.remainder.is_empty() { // TODO: bug None } else { let rest = self.remainder; self.remainder = ""; Some(rest) } } } #[test] fn it_works() { let haystack = "a b c d e"; let letters = StrSplit::new(haystack, " "); assert_eq!(letters, vec!["a", "b", "c", "d", "e"].into_iter()); }

在我機器上編譯器檢查到以下錯誤 :

$ cargo check
    Checking strsplit v0.1.0 (/home/wilson/CrustOfRust/strsplit)
error: lifetime may not live long enough
  --> src/lib.rs:10:9
   |
8  |       pub fn new(haystack: &str, delimiter: &str) -> Self
   |                            -                         ---- return type is StrSplit<'2>
   |                            |
   |                            let's call the lifetime of this reference `'1`
9  |       {
10 | /         Self {
11 | |             remainder: haystack,
12 | |             delimiter,
13 | |         }
   | |_________^ associated function was supposed to return data with lifetime `'2` but it is returning data with lifetime `'1`

error: lifetime may not live long enough
  --> src/lib.rs:10:9
   |
8  |       pub fn new(haystack: &str, delimiter: &str) -> Self
   |                                             -        ---- return type is StrSplit<'2>
   |                                             |
   |                                             let's call the lifetime of this reference `'3`
9  |       {
10 | /         Self {
11 | |             remainder: haystack,
12 | |             delimiter,
13 | |         }
   | |_________^ associated function was supposed to return data with lifetime `'2` but it is returning data with lifetime `'3`

Jon 的機器編譯器檢查到以下錯誤 :
image

:pencil2: 我使用的編譯器版本比 Jon 使用編譯器版本還新,編譯器推斷能力更強,因而沒得到 Jon 編譯程式時產生的錯誤訊息。

Self 的生命週期是 'a (編譯器自己推斷的),照理說 remainder 也應該獲得 'a 的生命週期 (來自 StrSplit<'a> 定義) 才對 ,但 remainder 卻獲得了 haystack 的生命週期。編譯器不知道 haystack 指標的生命週期跟 StrSplit 的生命週期誰比較長誰比較短;delimiter 也跟 remainder 有相同的情況。

如果 caller 一呼叫 new() 之後馬上卸除 haystack/delimiter 在記憶體的值,這樣會導致 StrSplit 有可能將欄位指向被卸除的值而導致迷途指標

繼續改進程式,將 StrSplit 欄位的生命週期也綁在傳入參數的生命週期 :

-impl StrSplit<'_> { +impl<'a> StrSplit<'a> { - pub fn new(haystack: &str, delimiter: &str) -> Self + pub fn new(haystack: &'a str, delimiter: &'a str) -> Self { Self { remainder: haystack, delimiter, } } }

Q: why do we use generic names like 'a, 'b, etc. for lifetimes and not proper names like (typical) variables?
A: 等等就會讓生命週期參數名稱變得更具有描述性。

Q: how resilient is the anonymous lifetime? will you get yourself in trouble if you rely on it too much or is the compiler going to pick correctly the vast majority of the time?
A: 如果可以,盡量使用匿名生命週期的功能。

Q: Can you impose restrictions between lifetimes?
A: 答案是肯定的, 你可以在 implb 區塊內給多個生命週期參數,並給定生命週期參數與生命週期參數之間的關係,例如,你可以給這樣的關係 : 'a 必須活的比 'b 還長,至少跟 'b 一樣長。但這裡不討論。
Q: why is the 'a next to the "impl" keyword needed?

impl<'a> StrSplit<'a> { pub fn new(haystack: &'a str, delimiter: &'a str) -> Self { Self { remainder: haystack, delimiter, } } }

A: 請看下面的範例 :

  • 錯誤版
    ​​​​struct Foo<T>;
    ​​​​impl Foo<T> {}
    
    編譯器會告訴你,你正在使用 T,但編譯器不知道 T 是什麼
  • 正確版
    ​​​​struct Foo<T>;
    ​​​​impl<T> Foo<T> {}
    
    這樣的意思是說 impl 區塊在 T 之上是泛型。

Q: The Rust typesystem has two bottom types
A: 是的。

Q: "subtyping" is actually the language used for lifetimes in the Rustonomicon
A: 是的。

目前程式碼
// #![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)] pub struct StrSplit<'a> { remainder: &'a str, delimiter: &'a str, } impl<'a> StrSplit<'a> { pub fn new(haystack: &'a str, delimiter: &'a str) -> Self { Self { remainder: haystack, delimiter, } } } impl<'a> Iterator for StrSplit<'a> { type Item = &'a str; fn next(&mut self) -> Option<Self::Item> { if let Some(next_delim) = self.remainder.find(self.delimiter) { let until_delimiter = &self.remainder[..next_delim]; self.remainder = &self.remainder[(next_delim + self.delimiter.len())..]; Some(until_delimiter) } else if self.remainder.is_empty() { // TODO: bug None } else { let rest = self.remainder; self.remainder = ""; Some(rest) } } } #[test] fn it_works() { let haystack = "a b c d e"; let letters = StrSplit::new(haystack, " "); assert_eq!(letters, vec!["a", "b", "c", "d", "e"].into_iter()); }

將目前程式碼再讓編譯器檢查一次看看,終於過了 :

$ cargo check
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s

Static lifetime

0:34:45

&'a str <- &'static str 為什麼這樣 ok ?

self.remainder = "";
  • 這時候就要談到 subtyping 關係了。如果有某個東西有任何的生命週期,你可以將值 (可能來自任意生命週期的參考,或者是有特定生命週期的參考) 指派給它,你能這麼做的前提是,要指派給它的值的生命週期長度必須大於你想要指派的對象,這樣的原則一樣是在避免迷途指標的發生。
  • 至於""static 的原因,是在編譯的時候真的就會把它放在儲存在 disk 二進位檔的 initialized data 區域。
目前程式碼
// #![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)] #[derive(Debug)] pub struct StrSplit<'a> { remainder: &'a str, delimiter: &'a str, } impl<'a> StrSplit<'a> { pub fn new(haystack: &'a str, delimiter: &'a str) -> Self { Self { remainder: haystack, delimiter, } } } impl<'a> Iterator for StrSplit<'a> { type Item = &'a str; fn next(&mut self) -> Option<Self::Item> { if let Some(next_delim) = self.remainder.find(self.delimiter) { let until_delimiter = &self.remainder[..next_delim]; self.remainder = &self.remainder[(next_delim + self.delimiter.len())..]; Some(until_delimiter) } else if self.remainder.is_empty() { // TODO: bug None } else { let rest = self.remainder; self.remainder = ""; Some(rest) } } } #[test] fn it_works() { let haystack = "a b c d e"; let letters = StrSplit::new(haystack, " "); // assert_eq!(letters, vec!["a", "b", "c", "d", "e"].into_iter()); assert!(letters.eq(vec!["a", "b", "c", "d", "e"].into_iter())); }

再進行測試,也順利通過了 :

$ cargo test
...
running 1 test
test it_works ... ok
...

Q: everything by default has static lifetime?
A: 值的生命週期是取決於什麼時候被卸除,如果那個值不是被宣告成 'statuc' 卻從未被卸除,那它就像是有 static 生命週期的假象。函式內部宣告的變數放在堆疊,在離開函式時,會去清掉那些在堆疊內區域變數的值,此時函式內部的變數的生命週期就已經結束了。

Q: can i think about strsplit like a foldr?
A: 不行,StrSplit 要做的就是分割字串而已。

將 test case 寫的簡潔一點:

#[test] fn it_works() { let haystack = "a b c d e"; let letters: Vec<_> = StrSplit::new(haystack, " ").collect(); assert_eq!(letters, vec!["a", "b", "c", "d", "e"]); }

Q: Don't variables die at end-of-scope, not just return?
A: 只要值還沒被卸除,生命週期就還沒結束,因為離開作用域時值會被卸除,此時的生命週期才會到期,回到剛剛值是否預設為 static 的問題,再次說明,是取決於值能在記憶體多久,而不是預設為生命週期是 static

Bug when a delimiter tails a string

0:41:27

新增 test case,delimiter 在 tail 的位置,預期最後一個子字串應該要是 "":

#[test] fn tail() { let haystack = "a b c d "; let letters: Vec<_> = StrSplit::new(haystack, " ").collect(); assert_eq!(letters, vec!["a", "b", "c", "d", ""]); }
目前程式碼
// #![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)] #[derive(Debug)] pub struct StrSplit<'a> { remainder: &'a str, delimiter: &'a str, } impl<'a> StrSplit<'a> { pub fn new(haystack: &'a str, delimiter: &'a str) -> Self { Self { remainder: haystack, delimiter, } } } impl<'a> Iterator for StrSplit<'a> { type Item = &'a str; fn next(&mut self) -> Option<Self::Item> { if let Some(next_delim) = self.remainder.find(self.delimiter) { let until_delimiter = &self.remainder[..next_delim]; self.remainder = &self.remainder[(next_delim + self.delimiter.len())..]; Some(until_delimiter) } else if self.remainder.is_empty() { // TODO: bug None } else { let rest = self.remainder; self.remainder = ""; Some(rest) } } } #[test] fn it_works() { let haystack = "a b c d e"; let letters = StrSplit::new(haystack, " "); assert!(letters.eq(vec!["a", "b", "c", "d", "e"].into_iter())); } #[test] fn tail() { let haystack = "a b c d "; let letters: Vec<_> = StrSplit::new(haystack, " ").collect(); assert_eq!(letters, vec!["a", "b", "c", "d", ""]); }

實際上得到的結果卻是:

$ cargo test
  left: `["a", "b", "c", "d"]`,
 right: `["a", "b", "c", "d", ""]`', src/lib.rs:50:5

所以我們應該要修改 next() 函式,下面是我們要改的範圍:

else if self.remainder.is_empty() {
            // TODO: bug
            None
        } else {
            let rest = self.remainder;
            self.remainder = "";
            Some(rest)
        }

這裡 remainder 出來是 "",我們要區分出是 remainder"",或者是 remainder"" 但我們還沒 yield。

要解決這個問題,先回到 StrSplit 的結構,將 remainder 的資料型別改成 Option,這是關鍵,因為等等我們要用到 Optiontake() 函式來取得值的所有權。建構式的 remainder 也要改變資料型別,將傳進來的參數包進 Some 裡面 :

#[derive(Debug)] pub struct StrSplit<'a> { - remainder: &'a str, + remainder: Option<&'a str>, delimiter: &'a str, } impl<'a> StrSplit<'a> { pub fn new(haystack: &'a str, delimiter: &'a str) -> Self { Self { - remainder: haystack, + remainder: Some(haystack), delimiter, } } }

Q: lifetimes are for stack allocated memory? heap allocations like String don't have specified lifetimes?
A: heap 也是有生命週期的,只要 heap 的值被卸除了,其生命週期就已經結束了,但也有可能從頭到尾都沒卸除,就會變成像是 static 一樣,但要發生 heap 的值沒被卸除的情況是 Box::leak, Box leak 回傳的就是 static 參考,這功能並不等於記憶體洩漏

Q: If you dumped the binary, could you spot the static allocation ?
A: 你可以在 dump 看到 static allocation,但如果是 empty string ("") 則不行,因為它被編譯器最佳化掉了。

原本只想修部份,現在變修整個 next() 函式 :

impl<'a> Iterator for StrSplit<'a> { type Item = &'a str; fn next(&mut self) -> Option<Self::Item> { if let Some(ref mut remainder) = self.remainder { // 等價於 let remainder = &mut self.remainder // 而不是 let mut remainder = &self.remainder; if let Some(next_delim) = remainder.find(self.delimiter) { let until_delimiter = &remainder[..next_delim]; *remainder = &remainder[(next_delim + self.delimiter.len())..]; Some(until_delimiter) } else { self.remainder.take() // 當第一次得到空字串時,可以讓我們 yield 空字串。 } } else { None // 空字串被 yield 後, // self.remainder 內的值的所有權被拿走,會變成 None } } }

if let 只是在比模式有沒有匹配而已,而不是在比兩邊的值時否相等。

Q: what does ref keyword mean?
A: 請看下面範例:

  • 無法達成我們目的的做法 (但是是合理的語法) :
    ​​​​if let Some(remainder) = self.remainder {
    
    會導致值會脫離 self.remainder
  • 可以達成我們目的的做法 :
    如果左式與右式匹配 (Some(ref mut x) = Option(y)),則 x 會拿借用 &mut y,而不是將 y 的值移動到 x
    ​​​​    if let Some(ref mut remainder /* &mut &'a str */) = self.remainder /* Option<&'a str> */{
    

    :pencil2: 如果左式與右式匹配 (Some(mut x) = Option(y)), 則 x會拿到 mut y ,這樣是移動 y 的值到 x

    我們這裡只是想要拿到 &mut self.remainder 如果右式是 Some。這裡的 mut 是在說可以改變參考對象的值,而不是更換參考對象。下面即是更改參考對象的值的方法 :
    ​​​​    *remainder = &remainder[(next_delim + self.delimiter.len())..]
    

What is the ref keyword and why not &

0:48:07

Q: what is ref keyword means? Is it same as & ?
A: 請看下面範例 :

  • 無法達成我們目的的做法 (但是是合理的語法) :
    ​​​​if let Some(&mut remainder) = self.remainder 
    
    是看右式是不是也是 Some(&mut T),如果是的話 remainder 的資料型別會是 mut T
  • 可以達成我們目的的做法有兩種 :
    1. 看右式是不是也是 Some(mut T),如果是的話 remainder 的資料型別會是 ref mut T :
      ​​​​​​​​if let Some(ref mut remainder) = self.remainder 
      
    2. Q: if let Some(remainder) = &mut self.remainder {} ?
      A: 較不好的寫法,但仍可達到我們的目的
      ​​​​​​​​    if let Some(remainder) = &mut self.remainder 
      

What's the * on the left of remainder

0:51:36
Q: what is the deref on the left side of the assignment doing?
A: 請看下面程式碼範例 :

*remainder = &remainder[(next_delim + self.delimiter.len())..]

左式的資料型別 : &mut &'a str (指標的指標)
右式的資料型別 : &'a str (指標)
因為左式與右式的資料型別不同,所以要把左式解參考才能將右式的值指派給左式。

:bulb: &remainder[(next_delim + self.delimiter.len())..] 編譯器解讀順序為 :

  1. remainder[(next_delim + self.delimiter.len())..] 取到某段字串範圍。
  2. &remainder[(next_delim + self.delimiter.len())..] 取到某段字串範圍的記憶體位置。

What is take() doing

0:52:46
Q: What is the ".take()" call doing ?
A: 請看下面說明

self.remainder.take()
// impl<T> Option<T> {fn tak(&mut self) -> Option<T
// if Option is None, return None
// if Option is Some, then set Option to None and Return the Some

:pencil2: 每一個 let statement 都是模式比對

簡化 next() 函式區塊的程式碼 :

impl<'a> Iterator for StrSplit<'a> { type Item = &'a str; fn next(&mut self) -> Option<Self::Item> { let ref mut remainder = self.remainder?; // 上面式子也可以用下式表示一樣的操作 // let remainder = &mut self.remainder?; if let Some(next_delim) = remainder.find(self.delimiter) { let until_delimiter = &remainder[..next_delim]; *remainder = &remainder[(next_delim + self.delimiter.len())..]; Some(until_delimiter) } else { self.remainder.take() } } }
目前程式碼 (需要再修改,有無窮迴圈)
// #![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)] #[derive(Debug)] pub struct StrSplit<'a> { remainder: Option<&'a str>, delimiter: &'a str, } impl<'a> StrSplit<'a> { pub fn new(haystack: &'a str, delimiter: &'a str) -> Self { Self { remainder: Some(haystack), delimiter, } } } impl<'a> Iterator for StrSplit<'a> { type Item = &'a str; fn next(&mut self) -> Option<Self::Item> { let ref mut remainder = self.remainder?; if let Some(next_delim) = remainder.find(self.delimiter) { let until_delimiter = &remainder[..next_delim]; *remainder = &remainder[(next_delim + self.delimiter.len())..]; Some(until_delimiter) } else { self.remainder.take() } } } #[test] fn it_works() { let haystack = "a b c d e"; let letters = StrSplit::new(haystack, " "); assert!(letters.eq(vec!["a", "b", "c", "d", "e"].into_iter())); } #[test] fn tail() { let haystack = "a b c d "; let letters: Vec<_> = StrSplit::new(haystack, " ").collect(); assert_eq!(letters, vec!["a", "b", "c", "d", ""]); }
$ cargo test Finished test [unoptimized + debuginfo] target(s) in 0.00s Running unittests src/lib.rs (target/debug/deps/strsplit-a9fa65918e243300) running 2 tests test it_works ... FAILED

Mutable references are one level deep

0:54:48

Q: If self is mutable here, why is self.remainder not mutable by default? (Coming from a C background, I'm thinking about this kind of like const)

A: Mutable references 只有一層的深度,傳入 &mut self 只讓我們可以修改 self 的任何欄位,但欄位指向的值,拿 delimiter 來例子來說,它是指向 immutable 字串,所以它指向的值是不能修改的,但它可以改指向別的 immutable 字串。

Solving a hang with as_mut()

0:55:39

前面會造成無窮迴圈的原因是值並未被移動,因為ref mut 沒發揮作用,沒發揮作用的原因如下 :

let ref mut remainder = self.remainder?;

? 用法是 if self.remainder is None return None,否則回傳在 Some 裡面的值,就像拆除包裝一樣。一般來說上式應該發揮作用,但是因為 self.remainder Option 裡面的值是 Copy 型態,所以上式在做指派值的動作時是做了 Copy 的動作而不是 Move導致左式的 remainder (ptrRemainderCopy) 跟右式的 self.remainder (ptrRemainder) 變成兩個不同指標的指標 :

  • Move 的情況 (我們想要的)
    
    
    
    
    
    
    structs
    
    
    
    structa
    
    a
    
    b
    
    c
    
    d
    
    e
    
     
    
    
    
    structp
    
    ref mut remainder
    
    &ptrRemainderMove
    
    
    
    structaptr
    
    ptrRemainderMove
    
    &str
    
    
    
    structp:p->structaptr:nw
    
    
    
    
    
    structaptr:ptr->structa:nw
    
    
    
    
    
    structbptr
    
    ptrRemainder
    
    &str
    
    
    
    structbptr:ptr->structa:nw
    
    
    
    
    
    
  • Copy 的情況 (現在的情況)
    
    
    
    
    
    
    structs
    
    
    
    structa
    
    a
    
    b
    
    c
    
    d
    
    e
    
     
    
    
    
    structp
    
    ref mut remainder
    
    &ptrRemainderCopy
    
    
    
    structaptr
    
    ptrRemainderCopy
    
    &str
    
    
    
    structp:p->structaptr:nw
    
    
    
    
    
    structaptr:ptr->structa:nw
    
    
    
    
    
    structbptr
    
    ptrRemainder
    
    &str
    
    
    
    structbptr:ptr->structa:nw
    
    
    
    
    
    

所以當我們執行到 *remainder = &remainder[(next_delim + self.delimiter.len())..];,只是改變 Copy 那份 (ptrRemainderCopy)的值,self.remainder (ptrRemainder) 沒有跟著做相同的操作,最終導致了無窮迴圈。

:question: 尚為解決疑問,為何 Option 內的 &'a str 會是 Copy 型態?

為了讓左式借用右式值的參考,右式要加上 as_mut() :

impl<'a> Iterator for StrSplit<'a> { type Item = &'a str; fn next(&mut self) -> Option<Self::Item> { - let ref mut remainder = self.remainder?; + let remainder = self.remainder.as_mut()?; + // impl<T> Option<T> { fn as_mut(&mut self) -> Option<&mut T> } , + // 再搭配 ? 拆包裝即可達到我們的目的 if let Some(next_delim) = remainder.find(self.delimiter) { let until_delimiter = &remainder[..next_delim]; *remainder = &remainder[(next_delim + self.delimiter.len())..]; Some(until_delimiter) } else { self.remainder.take() } } }
目前程式碼
// #![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)] #[derive(Debug)] pub struct StrSplit<'a> { remainder: Option<&'a str>, delimiter: &'a str, } impl<'a> StrSplit<'a> { pub fn new(haystack: &'a str, delimiter: &'a str) -> Self { Self { remainder: Some(haystack), delimiter, } } } impl<'a> Iterator for StrSplit<'a> { type Item = &'a str; fn next(&mut self) -> Option<Self::Item> { let remainder = self.remainder.as_mut()?; if let Some(next_delim) = remainder.find(self.delimiter) { let until_delimiter = &remainder[..next_delim]; *remainder = &remainder[(next_delim + self.delimiter.len())..]; Some(until_delimiter) } else { self.remainder.take() } } } #[test] fn it_works() { let haystack = "a b c d e"; let letters = StrSplit::new(haystack, " "); assert!(letters.eq(vec!["a", "b", "c", "d", "e"].into_iter())); } #[test] fn tail() { let haystack = "a b c d "; let letters: Vec<_> = StrSplit::new(haystack, " ").collect(); assert_eq!(letters, vec!["a", "b", "c", "d", ""]); }

這樣即可解決無窮迴圈的問題了 :

$ cargo test
...
running 2 tests
test tail ... ok
test it_works ... ok
...

Multiple lifetimes, implementing until_char

0:57:49

到目前為止還沒講解到多個生命週期的情形,接下要來要開始講解多個生命週期的情形了。

首先,新增 until_char() 函式 :

pub fn until_char(s: &str, c: char) -> &str { StrSplit::new(s, &format!("{}", c)) .next() .expect("StrSplit always gives at least one result!") }

新增 test case :

#[test] fn until_char_test() { assert_eq!(until_char("hello world", 'o'), "hell"); }
目前程式碼
// #![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)] #[derive(Debug)] pub struct StrSplit<'a> { remainder: Option<&'a str>, delimiter: &'a str, } impl<'a> StrSplit<'a> { pub fn new(haystack: &'a str, delimiter: &'a str) -> Self { Self { remainder: Some(haystack), delimiter, } } } impl<'a> Iterator for StrSplit<'a> { type Item = &'a str; fn next(&mut self) -> Option<Self::Item> { let remainder = self.remainder.as_mut()?; if let Some(next_delim) = remainder.find(self.delimiter) { let until_delimiter = &remainder[..next_delim]; *remainder = &remainder[(next_delim + self.delimiter.len())..]; Some(until_delimiter) } else { self.remainder.take() } } } pub fn until_char(s: &str, c: char) -> &str { StrSplit::new(s, &format!("{}", c)) .next() .expect("StrSplit always gives at least one result!") } #[test] fn until_char_test() { assert_eq!(until_char("hello world", 'o'), "hell"); } #[test] fn it_works() { let haystack = "a b c d e"; let letters = StrSplit::new(haystack, " "); assert!(letters.eq(vec!["a", "b", "c", "d", "e"].into_iter())); } #[test] fn tail() { let haystack = "a b c d "; let letters: Vec<_> = StrSplit::new(haystack, " ").collect(); assert_eq!(letters, vec!["a", "b", "c", "d", ""]); }

編譯器檢查到以下錯誤 :

$ cargo check
    Checking strsplit v0.1.0 (/home/wilson/CrustOfRust/strsplit)
error[E0515]: cannot return value referencing temporary value
  --> src/lib.rs:35:5
   |
35 |       StrSplit::new(s, &format!("{}", c))
   |       ^                 ---------------- temporary value created here
   |  _____|
   | |
36 | |         .next()
37 | |         .expect("StrSplit always gives at least one result!")
   | |_____________________________________________________________^ returns a value referencing data owned by the current function

因為回傳值的生命週期跟 &format!("{}", c)) 綁在一起,但 &format!("{}", c)) 會在離開函式時值就會被卸除了,導致回傳值指向非法的記憶體區域。至於為什麼回傳值的生命週期是跟 &format!("{}", c)) 綁在一起 而不是 s 呢? 原因是我們前面宣告兩個傳進來的參數的生命週期都是 'a ,但由於現在 &format!("{}", c)) 的生命週期比較短 (只能活在函式內部),就把它當成 'a (多個生命週期的情況下,要取短的生命週期),所以等於這個回傳值跟函式綁一起,因為函式活著,&format!("{}", c)) 才活著。但我們想要的是:

pub fn until_char<'s>(s : &'s str, c: char) -> &'s str 

如何告訴 Rust 這樣是 ok 的? 我們必須要有多個生命週期才能解決。

Difference between a str and a String

1:03:19

Q: Should we copy the delimiter into our struct?
A: delimiter 宣告成 String,這樣就不用解多個生命週期的問題 :

#[derive(Debug)] pub struct StrSplit<'a> { remainder: Option<&'a str>, delimiter: String, }

因為 String 屬於 heap-allocated,沒有生命週期跟它綁在一起。

  1. str -> [char]
    str 類似於 [char], str 沒有 size,因為它就像是 slice,它只是個字元序列,它不知道序列本身有多長,它只知道它是字元序列而已。
  2. &str -> &[char]
    &strfat pointer,fat pointer 是 two-word 值,包含一個指向 slice 的第一個元素,以及 slice 的元素數量。
    它可以指向任何記憶體位置,ex. 在 stack, heap, static memory 的值
  3. String ->Vec<char>
    String 是 heap-allocated,可動態調整序列長度。
    • 如果你有 String 你可以很簡單的得到 &str :
      String -> &str (cheap AsRef)
    • &str 要轉成 String,就涉及複製值以及 heap allocation :
      &str -> String (expensive) Clone

但將 delimiter 宣告成 String 有兩個壞處 :

  1. 要求記憶體配置,對效能會有衝擊。
  2. 如果你使用了 String,就表示你要有記憶體配置器,將會導致我們這個函式庫無法相容於沒有記憶體配置器的嵌入式設備之類的問題。

所以這裡不用 String 的解法。

Multiple lifetimes (continued)

1:08:15

你通常不需要有多個生命週期,只有在一些特殊案例下才要用到,比如說我們今天討論的這個案例,使用多個參考,要強調的一點是,這些參考並不指向相同的東西,現在我們要的回傳值只想要綁在其中一個參考而已 :

#[derive(Debug)] -pub struct StrSplit<'a> +pub struct StrSplit<'haystack, 'delimiter> { - remainder: Option<&'a str>, + remainder: Option<&'haystack str>, - delimiter: &'a str, + delimiter: &'delimiter str, } -impl<'a> StrSplit<'a> { +impl<'haystack, 'delimiter> StrSplit<'haystack, 'delimiter> { - pub fn new(haystack: &'a str, delimiter: &'a str) -> Self + pub fn new(haystack: &'haystack str, delimiter: &'delimiter str) -> Self { Self { remainder: Some(haystack), delimiter, } } } -impl<'a> Iterator for StrSplit<'a> +impl<'haystack, 'delimiter> Iterator for StrSplit<'haystack, 'delimiter> { - type Item = &'a str; + type Item = &'haystack str; + // 這樣回傳值就可以只有綁在 haystack 上而已囉 fn next(&mut self) -> Option<Self::Item> { ... } } ...

此時傳進 new() 函式的參數編譯器不會強求要有相同的生命週期了。

接著故意將 Some(until_delimiter) 換成 Some(self.delimiter)

目前程式碼
// #![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)] #[derive(Debug)] pub struct StrSplit<'haystack, 'delimiter> { remainder: Option<&'haystack str>, delimiter: &'delimiter str, } impl<'haystack, 'delimiter> StrSplit<'haystack, 'delimiter> { pub fn new(haystack: &'haystack str, delimiter: &'delimiter str) -> Self { Self { remainder: Some(haystack), delimiter, } } } impl<'haystack, 'delimiter> Iterator for StrSplit<'haystack, 'delimiter> { type Item = &'haystack str; fn next(&mut self) -> Option<Self::Item> { let remainder = self.remainder.as_mut()?; if let Some(next_delim) = remainder.find(self.delimiter) { let until_delimiter = &remainder[..next_delim]; *remainder = &remainder[(next_delim + self.delimiter.len())..]; Some(self.delimiter) } else { self.remainder.take() } } } pub fn until_char(s: &str, c: char) -> &str { StrSplit::new(s, &format!("{}", c)) .next() .expect("StrSplit always gives at least one result!") } #[test] fn until_char_test() { assert_eq!(until_char("hello world", 'o'), "hell"); } #[test] fn it_works() { let haystack = "a b c d e"; let letters = StrSplit::new(haystack, " "); assert!(letters.eq(vec!["a", "b", "c", "d", "e"].into_iter())); } #[test] fn tail() { let haystack = "a b c d "; let letters: Vec<_> = StrSplit::new(haystack, " ").collect(); assert_eq!(letters, vec!["a", "b", "c", "d", ""]); }

獲得編譯器會檢查到以下錯誤:

$ cargo check
error: lifetime may not live long enough
  --> src\lib.rs:28:13
   |
17 | impl<'haystack, 'delimiter> Iterator for StrSplit<'haystack, 'delimiter>
   |      ---------  ---------- lifetime `'delimiter` defined here
   |      |
   |      lifetime `'haystack` defined here
...
28 |             Some(self.delimiter)
   |             ^^^^^^^^^^^^^^^^^^^^ method was supposed to return data with lifetime `'haystack` but it is returning data with lifetime `'delimiter`
   |
   = help: consider adding the following bound: `'delimiter: 'haystack`

測試新增 bound where ...,告訴編譯器說 'delimiter 的生命週期長度 > 'haystack 的生命週期長度,意思同 'delimiter 實作 'haystack,即使用了前面提到的 subtyping 關係 :

impl<'haystack, 'delimiter> Iterator for StrSplit<'haystack, 'delimiter> where 'delimiter: 'haystack

這樣你前面宣告 type Item = &'haystack str; ,後面回傳 Some(self.delimiter) 就可以編譯的過,但編譯錯誤又回到了回傳值的 生命週期跟 &format!("{}", c) 一樣長,函式退出即結束其生命週期。

目前程式碼
// #![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)] #[derive(Debug)] pub struct StrSplit<'haystack, 'delimiter> { remainder: Option<&'haystack str>, delimiter: &'delimiter str, } impl<'haystack, 'delimiter> StrSplit<'haystack, 'delimiter> { pub fn new(haystack: &'haystack str, delimiter: &'delimiter str) -> Self { Self { remainder: Some(haystack), delimiter, } } } impl<'haystack, 'delimiter> Iterator for StrSplit<'haystack, 'delimiter> where 'delimiter: 'haystack { type Item = &'haystack str; fn next(&mut self) -> Option<Self::Item> { let remainder = self.remainder.as_mut()?; if let Some(next_delim) = remainder.find(self.delimiter) { let until_delimiter = &remainder[..next_delim]; *remainder = &remainder[(next_delim + self.delimiter.len())..]; Some(self.delimiter) } else { self.remainder.take() } } } pub fn until_char(s: &str, c: char) -> &str { StrSplit::new(s, &format!("{}", c)) .next() .expect("StrSplit always gives at least one result!") } #[test] fn until_char_test() { assert_eq!(until_char("hello world", 'o'), "hell"); } #[test] fn it_works() { let haystack = "a b c d e"; let letters = StrSplit::new(haystack, " "); assert!(letters.eq(vec!["a", "b", "c", "d", "e"].into_iter())); } #[test] fn tail() { let haystack = "a b c d "; let letters: Vec<_> = StrSplit::new(haystack, " ").collect(); assert_eq!(letters, vec!["a", "b", "c", "d", ""]); }

編譯器檢查得到以下錯誤:

$ cargo check
error[E0515]: cannot return value referencing temporary value
  --> src\lib.rs:39:5
   |
39 |       StrSplit::new(s, &format!("{}", c))
   |       ^                 ---------------- temporary value created here
   |  _____|
   | |
40 | |         .next()
41 | |         .expect("StrSplit always gives at least one result!")
   | |_____________________________________________________________^ returns a value referencing data owned by the current function

將程式碼做修改 :

... impl<'haystack, 'delimiter> Iterator for StrSplit<'haystack, 'delimiter> -where - 'delimiter: 'haystack { type Item = &'haystack str; fn next(&mut self) -> Option<Self::Item> { let remainder = self.remainder.as_mut()?; if let Some(next_delim) = remainder.find(self.delimiter) { let until_delimiter = &remainder[..next_delim]; *remainder = &remainder[(next_delim + self.delimiter.len())..]; - Some(self.delimiter) + Some(until_delimiter) } else { self.remainder.take() } } } -pub fn until_char(s: &str, c: char) -> &str +pub fn until_char<'s>(s : &'s str, c: char) -> &'s str { StrSplit::new(s, &format!("{}", c)) .next() .expect("StrSplit always gives at least one result!") } #[test] fn until_char_test() { assert_eq!(until_char("hello world", 'o'), "hell"); } #[test] fn it_works() { let haystack = "a b c d e"; - let letters = StrSplit::new(haystack, " "); + let letters: Vec<_> = StrSplit::new(haystack, " ").collect(); - assert!(letters.eq(vec!["a", "b", "c", "d", "e"].into_iter())); + assert_eq!(letters, vec!["a", "b", "c", "d", "e"]); } ...
目前程式碼
// #![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)] #[derive(Debug)] pub struct StrSplit<'haystack, 'delimiter> { remainder: Option<&'haystack str>, delimiter: &'delimiter str, } impl<'haystack, 'delimiter> StrSplit<'haystack, 'delimiter> { pub fn new(haystack: &'haystack str, delimiter: &'delimiter str) -> Self { Self { remainder: Some(haystack), delimiter, } } } impl<'haystack, 'delimiter> Iterator for StrSplit<'haystack, 'delimiter> { type Item = &'haystack str; fn next(&mut self) -> Option<Self::Item> { let remainder = self.remainder.as_mut()?; if let Some(next_delim) = remainder.find(self.delimiter) { let until_delimiter = &remainder[..next_delim]; *remainder = &remainder[(next_delim + self.delimiter.len())..]; Some(until_delimiter) } else { self.remainder.take() } } } pub fn until_char<'s>(s : &'s str, c: char) -> &'s str { StrSplit::new(s, &format!("{}", c)) .next() .expect("StrSplit always gives at least one result!") } #[test] fn until_char_test() { assert_eq!(until_char("hello world", 'o'), "hell"); } #[test] fn it_works() { let haystack = "a b c d e"; let letters: Vec<_> = StrSplit::new(haystack, " ").collect(); assert_eq!(letters, vec!["a", "b", "c", "d", "e"]); } #[test] fn tail() { let haystack = "a b c d "; let letters: Vec<_> = StrSplit::new(haystack, " ").collect(); assert_eq!(letters, vec!["a", "b", "c", "d", ""]); }

再次測試程式碼 :

$ cargo test
   Compiling strsplit v0.1.0 (/home/wilson/CrustOfRust/strsplit)
    Finished test [unoptimized + debuginfo] target(s) in 0.28s
     Running unittests src/lib.rs (target/debug/deps/strsplit-dd83426d9f98ae71)

running 3 tests
test until_char_test ... ok
test it_works ... ok
test tail ... ok

test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests strsplit

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

為什麼我們不用將回傳值的生命週期綁在 &format!("{}", c) 上,因為我們回傳值根本用不到它的值,所以根本不需要將回傳值也跟它綁在一起。

Q: can you put _ for delimiter lifetime to say it's not needed?
A: 可以,兩個函式簽章改成以下 :

  • '_ : 表示非 'haystack 的任何變數的生命週期都可以獨一無二的 生命週期。
    ​​​​impl<'haystack> Iterator for StrSplit<'haystack, '_>
    
  • 編譯器必須將 '_ 的生命週期跟 s 綁在一起,否則沒有其他生命週期可以綁了。
    ​​​​pub fn until_char(s : &str, c: char) -> &'_ str 
    
    Jon 認為編譯器應該要提示該函式簽章缺少了 '_ 指出每個生命週期都是自動推斷,有點要求 explicit 的感覺,但實際上其實可以不用加 '_ 也可以編譯的過 :
    ​​​​pub fn until_char(s : &str, c: char) -> &str 
    

&format!("{}", c) 這個是回傳 String 的資料型別,所以仍用到 heap allocation,接下來要擺脫這個記憶體配置。

Generic delimiter (Delimiter trait)

1:15:24

如何在 next() 不要用到 &format!("{}", c) 的記憶體配置 ? 如何讓 c 不要轉換成 String 的型別,而是成為任何可以在 String 中找到自己的東西的型別 ?

首先,我們先新加一個 trait :

pub trait Delimiter { fn find_next(&self, s: &str) -> Option<(usize, usize)>; }

接著要做這幾件事情 :

  1. 'delimiter 生命週期參數都換成 D 泛型
  2. 加入新的 bound 到 next()
  3. 使用新 trait Delimiter 的 find_next()
#[derive(Debug)] -pub struct StrSplit<'haystack, 'delimiter> +pub struct StrSplit<'haystack, D> { remainder: Option<&'haystack str>, - delimiter: &'delimiter str, + delimiter: D, } -impl<'haystack, 'delimiter> StrSplit<'haystack, 'delimiter> { +impl<'haystack, D> StrSplit<'haystack, D> { - pub fn new(haystack: &'haystack str, delimiter: &'delimiter str) -> Self + pub fn new(haystack: &'haystack str, delimiter: D) -> Self { Self { remainder: Some(haystack), delimiter, } } } -impl<'haystack, 'delimiter> Iterator for StrSplit<'haystack, 'delimiter> +impl<'haystack, D> Iterator for StrSplit<'haystack, D> where D: Delimiter { type Item = &'haystack str; fn next(&mut self) -> Option<Self::Item> { let remainder = self.remainder.as_mut()?; - if let Some(next_delim) = remainder.find(self.delimiter) { + if let Some((delim_start, delim_end)) = self.delimiter.find_next(remainder) { - let until_delimiter = &remainder[..next_delim]; + let until_delimiter = &remainder[..delim_start]; - *remainder = &remainder[(next_delim + self.delimiter.len())..]; + *remainder = &remainder[delim_end..]; Some(until_delimiter) } else { self.remainder.take() } } } ...

接著為 &str 實作 Delimiter 的 trait :

impl Delimiter for &str { fn find_next(&self, s: &str) -> Option<(usize, usize)> { s.find(self).map(|start| (start, start + self.len())) } }

並將 &format!("{}", c)) 改為 &*format!("{}", c)) (型態為 &str),因為我們的程式允許給 &str 的型態。

:bulb:String 轉成 &str 的方法
摘錄 What does &* combined together do in Rust? 內容 :

let s = "hi".to_string(); // : String let a = &s;

What's the type of a? It's simply &String! This shouldn't be very surprising, since we take the reference of a String. Ok, but what about this?

let s = "hi".to_string(); // : String let b = &*s; // equivalent to `&(*s)` What's the type of b? It's &str! Wow, what happened?

因為前面的修改讓程式泛型化於任何 D,這個 D 可以是參考,也可以是想活多久就活多久的某資料型態,它只有一個限制是你給的資料型別要有 Delimiter trait 而已

目前程式碼 (可以編譯,但尚未擺脫 heap 的記憶體配置,這裡的目的是先將 delimiter) 而已。
#[derive(Debug)] pub struct StrSplit<'haystack, D> { remainder: Option<&'haystack str>, delimiter: D, } impl<'haystack, D> StrSplit<'haystack, D> { pub fn new(haystack: &'haystack str, delimiter: D) -> Self { Self { remainder: Some(haystack), delimiter, } } } pub trait Delimiter { fn find_next(&self, s: &str) -> Option<(usize, usize)>; } impl<'haystack, D> Iterator for StrSplit<'haystack, D> where D: Delimiter { type Item = &'haystack str; fn next(&mut self) -> Option<Self::Item> { let remainder = self.remainder.as_mut()?; if let Some((delim_start, delim_end)) = self.delimiter.find_next(remainder) { let until_delimiter = &remainder[..delim_start]; *remainder = &remainder[delim_end..]; Some(until_delimiter) } else { self.remainder.take() } } } impl Delimiter for &str { // 這裡的 &self 的資料型別是 &&str fn find_next(&self, s: &str) -> Option<(usize, usize)> { s.find(self).map(|start| (start, start + self.len())) /* s.find(self) 在幹嘛? find 是 String 的方法,你可以給一個 String,它會告訴你 String 的起始位置。 find 會回傳 Option<找到的東西的位置> map(...) 則是當 find 回傳的是 None 則回傳 None,否則回傳 Some,此時我們想要改 Some 裡面的值成 ((start, start + self.len())) Q: why self.len() and not s.len()? A: 因為 self 是我們要搜尋的對象,self.len() 是 delimiter 的長度,所以加上 self.len() 才能得到我們目前找到的 delimiter 的起始位置跟終點位置 */ } } pub fn until_char(s : &str, c: char) -> &str { StrSplit::new(s, &*format!("{}", c)) .next() .expect("StrSplit always gives at least one result!") } #[test] fn until_char_test() { assert_eq!(until_char("hello world", 'o'), "hell"); } #[test] fn it_works() { let haystack = "a b c d e"; let letters: Vec<_> = StrSplit::new(haystack, " ").collect(); assert_eq!(letters, vec!["a", "b", "c", "d", "e"]); } #[test] fn tail() { let haystack = "a b c d "; let letters: Vec<_> = StrSplit::new(haystack, " ").collect(); assert_eq!(letters, vec!["a", "b", "c", "d", ""]); }

接著為 char 實作 Delimiter trait :

impl Delimiter for char { fn find_next(&self, s: &str) -> Option<(usize, usize)> { s.char_indices() .find(|(_, c)| c == self) .map(|(start, _)| (start, start + 1)) } }
  1. char_indices() : 走訪整個字串
  2. find(...) : 搜尋一個我們在找的字元
  3. map(...) : 將 find 的結果透過 map 來操作值城我們要的 : (start, start + 1),其中 +1 是因為 char 的長度就是 1

並且將 StrSplit::new(s, &*format!("{}", c)) 換成 StrSplit::new(s, c),得到不用 heap allocate 的程式囉。

目前程式碼
#[derive(Debug)] pub struct StrSplit<'haystack, D> { remainder: Option<&'haystack str>, delimiter: D, } impl<'haystack, D> StrSplit<'haystack, D> { pub fn new(haystack: &'haystack str, delimiter: D) -> Self { Self { remainder: Some(haystack), delimiter, } } } pub trait Delimiter { fn find_next(&self, s: &str) -> Option<(usize, usize)>; } impl<'haystack, D> Iterator for StrSplit<'haystack, D> where D: Delimiter { type Item = &'haystack str; fn next(&mut self) -> Option<Self::Item> { let remainder = self.remainder.as_mut()?; if let Some((delim_start, delim_end)) = self.delimiter.find_next(remainder) { let until_delimiter = &remainder[..delim_start]; *remainder = &remainder[delim_end..]; Some(until_delimiter) } else { self.remainder.take() } } } impl Delimiter for &str { fn find_next(&self, s: &str) -> Option<(usize, usize)> { s.find(self).map(|start| (start, start + self.len())) } } impl Delimiter for char { fn find_next(&self, s: &str) -> Option<(usize, usize)> { s.char_indices() .find(|(_, c)| c == self) .map(|(start, _)| (start, start + 1)) } } pub fn until_char(s : &str, c: char) -> &str { StrSplit::new(s, c) .next() .expect("StrSplit always gives at least one result!") } #[test] fn until_char_test() { assert_eq!(until_char("hello world", 'o'), "hell"); } #[test] fn it_works() { let haystack = "a b c d e"; let letters: Vec<_> = StrSplit::new(haystack, " ").collect(); assert_eq!(letters, vec!["a", "b", "c", "d", "e"]); } #[test] fn tail() { let haystack = "a b c d "; let letters: Vec<_> = StrSplit::new(haystack, " ").collect(); assert_eq!(letters, vec!["a", "b", "c", "d", ""]); }

char length utf8

1:23:14

start + 11 可以改成 self.len_utf8() :

impl Delimiter for char { fn find_next(&self, s: &str) -> Option<(usize, usize)> { s.char_indices() .find(|(_, c)| c == self) .map(|(start, _)| (start, start + self.len_utf8())) } }

Standard library split

1:25:30
今天的實作都在標準函式庫可以看的到更完整的實作 :

  • str::find()
  • str::split()
    可以看到標準函式庫要切的字串的生命週期參數也是 'a,然後 delimiter (這裡是 Pattern,可以做更複雜的模式比對)也是實作成泛型。
    ​​​​pub fn split<'a, P>(&'a self, pat: P) -> Split<'a, P> ⓘ
    ​​​​where
    ​​​​    P: Pattern<'a>,
    
    Trait std::str::pattern::Pattern

今天的實作都可以直接呼叫標準函式庫的 split() 函式來達到分割字串的目的,本次的目的是要探討生命週期的概念,而不是要教你怎麼去用標準函式庫,或讓這個程式碼公開成 crate,因為標準函式庫已經實作很完整了。

Q&A

1:27:39

Q: Why can't you create a String from the str fat pointer? You already know where the bytes are in memory and the length of it.
A: 因為你不擁有 str fat pointer 指向的記憶體,String 假設它擁有底層的記憶體,它假設當它被卸除值時,必須要釋放它的記憶體,也假設它可以在必要的情況下增減記憶體使用量,如果採用任意指標和長度並決定我現在要擁有它,這將是不正確的,因為值底層記憶體的所有權你並沒有。

Q: Don't you think Rust is kind of less readable than other languages like Go, Python? the syntax is kind of different I guess?
A: 並非如此, 如果你用到跟其他程式一樣的 feature,同樣是 readable 的,Rust 只是額外增加一些 feature 要求額外的語法,也是因為有這些額外的語法,Rust 可以做到一些其他程式做不到的事情。

Q: The pattern and the haystack seem to be sharing the same lifetime 'a
A: 因為 Rust 程式設計師還沒想出設計什麼,就先弄成一樣的。補充 : Searcher 的生命週期是 Pattern 正在搜尋的字串的生命週期。

Q: when you see something linke Type<'x>, how do you know what x is the lifetime of?
A: 你不知道 x 的生命週期,就像你看到 type T,你卻不知道 type T 是什麼一樣。

Q: what do you think of Rust having a future in the industry?
A: 請參考之前的演講

Q: could you publish this as a gist so we can play with it?
A: github repo

Q: Can you make it work with stdin() as an input instead of a &str? :)
A: 不容易去實作,因為它是 stream 而不是 constant,所以它不是你可以 seek in 的,但你可以自己實作看看。

Q: How do you think generic associates types will improve trait definitions? (aka think StreamIterator that allows complex iterators with borrowed items)
A: 不太有幫助,因為你要更多的 existential types。但它可以幫助 clone 少一點。

Q: do you intend to do some lectures for newcomers to Rust from other languages and if not, is there some resources/streamers that you would recommend?
A: 沒有計畫去做給新手的影片,這系列的影片都會聚焦在某些主題上,此系列的目標觀眾是中階的 Rust 程式設計師而。

:pencil2: GitHub Comment
Q : Sorry if this was already asked elsewhere, but I am still not sure why:

if let Some(ref mut remainder) = self.remainder { // (1) ok let ref mut remainder = self.remainder?; // (2) nok if let Some(remainder) = &mut self.remainder { // (3) ok let remainder = &mut self.remainder?; // (4) nok if let Some(remainder) = self.remainder.as_mut() { // ok let remainder = self.remainder.as_mut()?; // ok

I see that if a type is Copy, it gets copied instead of moved. But aren't we dealing with same types in 1 vs 2, 3 vs 4?
A : The ? in 2 and 4 copies the entire Option when the inner type is Copy. What we then take a mutable reference to is what is inside of that copy, not the original Option. In the last case, we turn Option<T> into Option<&mut T>, which, when copied, still yields a mutable reference into the original Option. Does that help?

Q : Hey @jonhoo, how common is it to see or do:

impl Trait for &Type {...} //? // such as impl Delimiter for &str {...}

Additionally, if you intend to use both Type and &Type as implementers of Trait, for instance:

let x1: Vec<_> = StrSplit::new(haystack, MyOtherDelimiterFlavor::new(...)).collect(); let x2: Vec<_> = StrSplit::new(haystack, &MyOtherDelimiterFlavor::new(...)).collect();

would one then need to write redundant impl blocks, or is there some syntactic sugar for controlling this?
A : It's actually fairly common, precisely to improve the ergonomics of using the trait as you indicate. In general, if you implement a trait that only takes &self it's pretty reasonable to implement the trait for &MyType, &mut MyType and Box<MyType>. If a trait method takes &mut self, skip &MyType. But of course, the downside is that now if the trait changes in the future, more breakage will ensure since your consumers expected to be able to transparently use & (or &mut).

待整理

  1. 0:10:22
  2. 0:18:55
  3. 0:24:43
  4. 為何 Option 內的 &'a str 會是 Copy 型態?