owned this note
owned this note
Published
Linked with GitHub
---
tags: RUST LANGUAGE
---
# [Crust of Rust](/jLixlQ1ASZaBHqCBDFu1Iw): Lifetime Annotations
==[直播錄影](https://www.youtube.com/watch?v=Oxaex566Irg)==
* 主機資訊
```rust
wilson@wilson-HP-Pavilion-Plus-Laptop-14-eh0xxx ~/CrustOfRust> neofetch --stdout
wilson@wilson-HP-Pavilion-Plus-Laptop-14-eh0xxx
-----------------------------------------------
OS: Ubuntu 22.04.3 LTS x86_64
Host: HP Pavilion Plus Laptop 14-eh0xxx
Kernel: 6.2.0-37-generic
Uptime: 22 mins
Packages: 2367 (dpkg), 11 (snap)
Shell: bash 5.1.16
Resolution: 2880x1800
DE: GNOME 42.9
WM: Mutter
WM Theme: Adwaita
Theme: Yaru-dark [GTK2/3]
Icons: Yaru [GTK2/3]
Terminal: gnome-terminal
CPU: 12th Gen Intel i5-12500H (16) @ 4.500GHz
GPU: Intel Alder Lake-P
Memory: 2876MiB / 15695MiB
```
* Rust 編譯器版本 :
```rust
wilson@wilson-HP-Pavilion-Plus-Laptop-14-eh0xxx ~/CrustOfRust> rustc --version
rustc 1.70.0 (90c541806 2023-05-31) (built from a source tarball)
```
## Introduction
[0:00:00](https://www.youtube.com/watch?v=rAl-9HwD858&t=0s)
In the 2019 Rust Survey, a lot of people were asking for video content covering intermediate Rust content. So in this first video (possibly of many), we're going to investigate a case where you need multiple explicit lifetime annotations. We explore why they are needed, and why we need more than one in this particular case. We also talk about some of the differences between the string types and introduce generics over a self-defined trait in the process.
Q: Will I be able to follow at all if I have never seen rust before? I have done python and some C/C++ though
A: 不確定,這影片是在你已經看過 Rust 書籍的前提下適合去觀看的。
[Rust Survey 2019 Results](https://blog.rust-lang.org/2020/04/17/Rust-survey-2019.html)
## Start a rust project
[0:03:36](https://www.youtube.com/watch?v=rAl-9HwD858&t=216s)
開始建置 Rust 專案 :
``` shell
$ cargo new --lib strsplit
$ cd strsplit
$ vim src/lib.rs
```
程式的一開始先加上 warn 的 [prelude](https://doc.rust-lang.org/std/prelude/index.html) :
```rust
#![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)]
```
使用 `warn` 而不是 `deny` 是因為這個隨著時間推移,編譯器會變聰明因而影響到某些 [Lintz](https://doc.rust-lang.org/rustc/lints/levels.html),你不會想 [Lintz](https://doc.rust-lang.org/rustc/lints/levels.html) 破壞你程式碼的編譯就因為你用更後面版本的編譯器來做編譯。在初始開發階段不想要收到這些警告,不然 debug 資訊會讓你失焦,這裡只是讓你知道加上這個 prelude 讓你開發中後期不會忘記一些需要處理的小細節。
## Struct and method definitions for StrSplit and first test
[0:05:20](https://www.youtube.com/watch?v=rAl-9HwD858&t=320s)
先寫出 `StrSplit` 的建構式原型 :
```rust=
pub struct StrSplit {}
impl StrSplit {
pub fn new(haystack: &str, delimiter: &str) -> Self {}
}
```
`haystack` 是你要搜尋的東西,`delimiter` 是用來分割東西。回傳 `Self` 型別,`Self` 用來引用 impl 區塊,這裡不回傳 `str` 型別是因為如果當 `StrSplit` 重新給定型別時,不用更改回傳型態,這樣程式碼比較靈活一些。
:::info
:bulb: [Self vs. self](https://stackoverflow.com/questions/32304595/whats-the-difference-between-self-and-self)
:::
接著為 `StrSplit` 實作 `Iterator` 的功能 :
```rust=
// let x: StrSplit;
// for part in x {
// }
impl Iterator for StrSplit
{
type Item = &str;
fn next(&mut self) -> Option<Self::Item> {}
}
```
`for` 迴圈其實是在呼叫 `xxx.next()`,持續迭代取到 Some 值,終止條件是取到 None 值。
再來為函式庫寫 test case :
```rust=
#[test]
fn it_works()
{
let haystack = "a b c d e";
let letters = StrSplit::new(haystack, " ");
assert_eq!(letters, vec!["a", "b", "c", "d", "e"].into_iter());
}
```
Q: equality comparison compares element-wise?
(`assert_eq!(letters, vec!["a", "b", "c", "d", "e"].into_iter());`)
A: comparision 是 element-wise 方式去比較是否全部東西相同。
Q: Will Higher-Kind lifetimes to be covered?
A: 沒有。WK : [我也浅谈【泛型参数】的【晚·绑定late bound】](https://rustcc.cn/article?id=7568f6b8-72c4-4418-9247-fdfeddab6ad8) 有時間看一下。
Q: Won't that just add noise when debugging early prototypes? (prelude 那行)
A: 在開發的初始階段你可能不會想要收到 prelude 的警告,因為你開發初期並沒有就先撰寫文件,和符合一些規範,編譯器會跳一堆實際上不影響你編譯的警告,這時候如果程式碼有地方有錯誤,反而警告訊息會讓你失焦導致除錯不容易一些,等到程式開發到一定程度在開啟 prelude 的警告會比較好。
## How you decide between a library and a binary
[0:09:32](https://www.youtube.com/watch?v=rAl-9HwD858&t=572s)
Q: how do you decide between library and binary and how do your check the library output results while coding?
用命令行執行的都是二進位檔,其餘的都是函式庫。二進位檔會創出 source main,函式庫會創出 source lib,你可以在你的 crate 同時擁有這兩種形式。至於測試函式庫的方法就是寫 test case。
Q: what do you use to mock external dependencies in your projects? I have tried mockall unit testing library but am hoping to find something that does not rely on traits for mocking.
A: 不在本影片探討,不過有好方法去做。
Q: i thought all loops desugared to `loop` with a break condition
A: while loop desuger to loops as well. It's easier to explain it as `for` turns into `while` (直接從 for 跳到最基本的 loop 可能跨越了太多概念層次,不利於理解。), but you're right the deeper down `while` turns into `loop`.
## Start implementing StrSplit
[0:10:58](https://www.youtube.com/watch?v=rAl-9HwD858&t=658s)
回頭定義 `StrSplit` 需要什麼欄位 :
```rust=
pub struct StrSplit
{
remainder: &str,
delimiter: &str,
}
```
`remainder` 是程式還沒看到的剩餘字串,而 `delimiter` 是用來分割字串的。
將 `StrSplit` 的建構式完整實作 :
```rust=
impl StrSplit {
pub fn new(haystack: &str, delimiter: &str) -> Self
{
Self {
remainder: haystack,
delimiter,
}
}
}
```
欄位和傳入參數有相同的名稱時 (Line 6),可以不用放 `:`,這樣可以達到程式碼去重的效果。只有在欄位和傳入參數有不相同的名稱時 (Line 5) 才要使用到 `:`。
這裡為什麼要將搜尋字串的變數名稱在 `StrSplit` 外部叫做 `haystack`,而在 `StrSplit` 內部叫做 `remainder`,因為 `StrSplit` 內部的字串每次都會處理一部分的字串,然後剩一些尚未處理的字串,接著繼續從尚未處理的字串繼續處理,直到全部的字元都看過為止,所以 `StrSplit` 內部才會將變數名稱取作 `remainder`。
繼續將 `StrSplit` 的`Iterator` 的功能實作的更完整 :
```rust=
impl Iterator for StrSplit
{
type Item = &str;
fn next(&mut self) -> Option<Self::Item>
{
if let Some(next_delim) = self.remainder.find(self.delimiter) {
let until_delimiter = &self.remainder[..next_delim];
self.remainder = &self.remainder[(next_delim + self.delimiter.len())..];
Some(until_delimiter)
} else if self.remainder.is_empty() {
// TODO: bug
None
} else {
let rest = self.remainder;
self.remainder = "";
Some(rest)
}
}
}
```
上面程式碼的目標是從 `remainder` 找到下一個 `delimiter`,並以 `delimiter` 作為分界點來分割字串,回傳分割字串的前半部分,並將 `remainder` 設為分割字串的後半部,重複上述動作直到處理完全部字串。
:::info
:bulb: [Option, Some, Result](https://blog.vgot.net/archives/rust-some.html)
:::
Line 6 用 `if let` 的原因是,在搜尋 `remainder` 內部尋找 `delimiter` 有兩種可能的結果,一種是有找到,一種是沒找到
* 找到的話就可以讓 `next_delim` 透過模式比對的方式,左式為 `Some`,右式為 `Option`,因為 `Some` 只是 `Option` 的一個包裝值類型,所以匹配,接著將 `Option` 內的值指派給 `next_delim`,而不是直接把整個 `Option` 的值指派給 `next_delim`。
* 沒找到的話,則因為左式為 `Some`,右式為 `None` 而不會進入 `if` 的條件內部。
Line 7 的 [..next_delimiter] 中 `..` 表示字串的起始位置,Line 8 的 [(next_delim + self.delimiter.len())..] 中`..` 表示字串的結束位置。
如果在 Line 13 的 `else` 條件只回傳 `self.remainder` 而沒有執行 `self.remainder = ""`; 將會導致一直進入 `else` 條件的無窮迴圈。
Q: Is the cascaded `Self {}` really the "preferred" way of implementing that? I'm very much new to Rust and it seems a bit odd coming from other languages
A: Jon 喜歡 `Self`,這樣比較程式碼比較靈活,但要付出的代價有兩個
1. 你不能做 [local resoning](https://stackoverflow.com/questions/61207079/what-is-local-reasoning-exactly),你必須要弄清楚 impl block 內用到的資料型別是什麼,程式碼短的時候還不會造成太大的困擾,但程式碼如果很長就比較棘手了
2. 你不能使用太舊的 Rust 編譯器,因為先前版本的編譯器沒有 `Self` 這麼彈性的功能,不過這功能很早就加進來了,影響不大。
Q: Jon, maybe you could explain later when should I use associated types VS generics, they don't look that different to me, and thus I always use generics
A:
* Use generics if you think that multiple implementations of that trait might exist for a given type. Like with `Iterator<Item=T>`, where a type might have multiple different ways of iterating.
* Use associated types if only one implementation makes sense for any given type. Like the `type Item;` in the `Iterator` trait, where an iterator naturally has just one output item type.
## When to use match vs if let some
[0:16:15](https://www.youtube.com/watch?v=rAl-9HwD858&t=975s)
Q: When do you use `match ... with` vs `if let Some(...)`?
A: 如果有多個模式要比對請使用 `match`,如果只有<font color=red>一個</font>模式要比對,請用 `if let`。
## Doesn't compile! missing lifetime specifier
[0:17:10](https://www.youtube.com/watch?v=rAl-9HwD858&t=1030s)
:::spoiler {state="close"} 目前程式碼
```rust=
pub struct StrSplit
{
remainder: &str,
delimiter: &str,
}
impl StrSplit {
pub fn new(haystack: &str, delimiter: &str) -> Self
{
Self {
remainder: haystack,
delimiter,
}
}
}
impl Iterator for StrSplit
{
type Item = &str;
fn next(&mut self) -> Option<Self::Item>
{
if let Some(next_delim) = self.remainder.find(self.delimiter) {
let until_delimiter = &self.remainder[..next_delim];
self.remainder = &self.remainder[(next_delim + self.delimiter.len())..];
Some(until_delimiter)
} else if self.remainder.is_empty() {
// TODO: bug
None
} else {
let rest = self.remainder;
self.remainder = "";
Some(rest)
}
}
}
#[test]
fn it_works()
{
let haystack = "a b c d e";
let letters = StrSplit::new(haystack, " ");
assert_eq!(letters, vec!["a", "b", "c", "d", "e"].into_iter());
}
```
:::
將目前的程式碼嘗試編譯看看,會得到錯誤訊息,以下只摘錄關鍵部分 :
```shell
$ cargo test
...
error[E0106]: missing lifetime specifier
...
$ cargo check # 可以看到不重複的錯誤訊息
```
接著依照編譯器的提示修改程式碼。
:::spoiler {state="close"} 目前程式碼
```rust=
#![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)]
pub struct StrSplit<'a>
{
remainder: &'a str,
delimiter: &'a str,
}
impl StrSplit<'_> {
pub fn new(haystack: &str, delimiter: &str) -> Self
{
Self {
remainder: haystack,
delimiter,
}
}
}
impl Iterator for StrSplit<'_>
{
type Item = &str;
fn next(&mut self) -> Option<Self::Item>
{
if let Some(next_delim) = self.remainder.find(self.delimiter) {
let until_delimiter = &self.remainder[..next_delim];
self.remainder = &self.remainder[(next_delim + self.delimiter.len())..];
Some(until_delimiter)
} else if self.remainder.is_empty() {
// TODO: bug
None
} else {
let rest = self.remainder;
self.remainder = "";
Some(rest)
}
}
}
#[test]
fn it_works()
{
let haystack = "a b c d e";
let letters = StrSplit::new(haystack, " ");
assert_eq!(letters, vec!["a", "b", "c", "d", "e"].into_iter());
}
```
:::
`next()` 函式部分講解 : `<'_>` 匿名生命週期,處理機制基本上就跟[型別推斷](https://rustc-dev-guide.rust-lang.org/type-inference.html)一樣 (讓編譯器自己去推斷每個東西的生命週期的長度)。`type Item = &str;` 也需要給它生命週期參數是因為回傳的型態是 `&str` (指向 `str` 的指標),Rust 不知道程式要持有這個指標多長的時間,這樣無法確保值的生命週期比指標的生命週期還要長 (我們不想要指標的生命週期比值的生命週期還要長的原因是因為我們不要[迷途指標](https://en.wikipedia.org/wiki/Dangling_pointer)的情形發生)
:::info
:bulb: [impl<'a> Type<'a>](https://stackoverflow.com/questions/39355984/what-does-the-first-explicit-lifetime-specifier-on-an-impl-mean)
:::
:::info
:bulb: [Elision rules are as follows](https://doc.rust-lang.org/nomicon/lifetime-elision.html):
* Each elided lifetime in input position becomes a distinct lifetime parameter.
```rust
// 省略寫法
fn process_strings(s1: &str, s2: &str) {
...
}
// 等同於(編譯器自動推斷)
fn process_strings<'a, 'b>(s1: &'a str, s2: &'b str) {
...
}
```
* If there is exactly one input lifetime position (elided or not), that lifetime is assigned to all elided output lifetimes.
```rust
// 省略寫法
fn get_prefix(s: &str) -> &str {
&s[0..3]
}
// 等同於(編譯器自動推斷)
fn get_prefix<'a>(s: &'a str) -> &'a str {
&s[0..3]
}
```
* If there are multiple input lifetime positions, but one of them is `&self` or `&mut self`, the lifetime of `self` is assigned to all elided output lifetimes.
```rust
struct Parser {
data: String,
}
impl Parser {
// 省略寫法
fn process(&self, input: &str) -> &str {
&self.data
}
// 等同於(編譯器自動推斷)
fn process<'a, 'b>(&'a self, input: &'b str) -> &'a str {
&self.data
}
}
```
* Otherwise, it is an error to elide an output lifetime.
```rust
// 這會編譯錯誤
fn choose(x: &str, y: &str) -> &str {
if x.len() > y.len() { x } else { y }
}
// 必須明確指定生命週期
fn choose<'a>(x: &'a str, y: &'a str) -> &'a str {
if x.len() > y.len() { x } else { y }
}
```
* 有兩個輸入參考 `x` 和 `y`
* 沒有 `&self` 參數
* 前三條規則都不適用,所以不能省略輸出生命週期
:::
為什麼 `next()` 函式的回傳值的生命週期會與 `&mut self` 綁定? 因為該函式符合 Elision rules 的第二條規則:
```rust
impl<'a> Iterator for StrSplit<'a>
{
type Item = &'a str;
fn next(&mut self) -> Option<Self::Item>
```
為什麼目前的程式碼仍然無法編譯成功? 因為 `next()` 回傳值的生命週期已經與 `StrSplit` 的生命週期綁定了,但我們要是回傳值的生命週期實際上要比 `StrSplit` 的生命週期還要長,即使 `StrSplit` 被卸除了,我們從 Iterator 的 `next()` 取得的回傳值仍要可用。所以我們希望的是把生命週期與 `haystack` 綁定,使得 `next()` 取得的回傳值不會因為卸除 `StrSplit` 而導致值不可用。
:::success
:pencil2: 題外話,若將 `StrSplit`後面的生命週期參數從 <'a> 改成 <'_> 並進行編譯,編譯器會回報錯誤 : `error[E0207]: the lifetime parameter `'a` is not constrained by the impl trait, self type, or predicates`
:::
## Can I be wrong by specifying lifetimes?
[0:20:33](https://www.youtube.com/watch?v=rAl-9HwD858&t=1233s)
Q: can I be wrong by specifying lifetimes?"
A: 並不會,因為錯誤的生命週期無法編譯,就像是你不小心用了錯誤型別,最終你呼叫函式的時候,你必須提供某個型別,但你卻給了別的型別,這時候編譯器會比對函式要的型別會因為不吻合而造成編譯失敗。
## Anonymous lifetime '_
[0:21:25](https://www.youtube.com/watch?v=rAl-9HwD858&t=1285s)
Q: how to tell where anonymous lifetime can be used?
A: `<'_>` 告訴編譯器自己去推每個東西的生命週期,能讓編譯器這麼做的情境只有在一種可能的猜測的情況下才能這麼做 (並不表示同一個 impl 區塊的 `'_` 不能表示不同的生命週期,實際上,用到 `'_` 生命週期參數的東西都有自己獨一無二的生命週期,~~東西之間可能有些生命週期是一樣長的~~)。
進一步說明 :
```rust
impl Foo
{
fn get_ref(&self) -> &'_ str {}
}
```
`get_ref()` 函式這裡有生命週期的只有 `&self` 的生命週期,所以編譯器可以推出回傳值的生命週期會跟傳入的 `&self` 參數一樣長,因此不用寫成以下的形式 :
```rust
impl Foo
{
fn get_ref<'a>(&'a self) -> &'a str {}
}
```
Q: What is the difference between 'a and '_ ?
A: `'_ ` 用底線是告訴編譯器自己去推斷生命週期,因為我們知道編譯器只有一種可能可以選擇,這時候就能放心交給編譯去做,不用自己特別去處理。而 `'a` 是 specific 生命週期,有點像泛型的 `T`。
## Order lifetimes based on how long they are
[0:23:10](https://www.youtube.com/watch?v=rAl-9HwD858&t=1390s)
Q: Is there any kind of ordering on lifetime specifiers? Like, is 'a > 'b? Or is it just a way of grouping references together as a unit?
A: Yes, [subtyping](https://doc.rust-lang.org/nomicon/subtyping.html)
ex. special lifetime: 'static 存活時間為宣告到剩餘整個程式結束。所以你可以有一個 'a 的 lifettime < 'static lifetime,lifetime 變數名稱不重要,你要叫 'b 也行
Q: 編譯器怎麼知道它是錯誤的卻沒辦法推論它?
A: 範例程式碼如下:
```rust
fn multiply(x: (), y: i32) -> i32
{
}
```
編譯器知道這是錯的,因為編譯器不知道 `x` 是什麼型別,只有你自己知道,因為 [unit]((https://doc.rust-lang.org/std/primitive.unit.html#)) 是編譯器不知道的型別。所以編譯器不能告訴你正確答案。
Q: why would you not elide the lifetime if your leaving the ’_ in the type
A: 程式碼範例:
```rust
struct Parser<'a> {
input: &'a str,
}
impl<'a> Parser<'a> {
// 版本 1: 使用省略
fn parse(&self) -> &str {
// 一些處理邏輯
&self.input[0..5]
}
// 版本 2: 使用 '_
fn parse_explicit(&self) -> &'_ str {
// 一些處理邏輯
&self.input[0..5]
}
}
```
在版本 1 中,Rust會自動推斷返回的 `&str` 和 `self` 有相同的生命週期,即回傳的字串參考的生命週期與 `self` 的生命週期相關聯。
而在版本 2 中,使用 `&'_ str` 明確告訴編譯器:<font color="red">"這個返回值有一個生命週期,但不要將它與其他參數(如 `self`)的生命週期自動關聯起來"</font>。這樣可以防止編譯器做出可能不正確的假設。
Q: there is a way to use multiple lifetimes specifiers at same impl?
A: 後面會講到
Q: Is there any kind of ordering on lifetime specifiers? Like, is 'a > 'b? Or is it just a way of grouping references together as a unit?
A: 有的,但本次不會用到
## Anonymous lifetime '_ (with multiple lifetimes)
[0:25:18](https://www.youtube.com/watch?v=rAl-9HwD858&t=1518s)
:::info
:bulb: `'_` 出現位置:
* 在輸入位置 (傳入參數) 使用 `'_` 會創建一個全新的匿名生命週期
* 在輸出位置 (回傳值) 使用 `'_` 會根據標準的生命週期省略規則進行推斷
:::
Q: Does '_ can be used when there is only one possible lifetime? So the compiler can guess properly
A: 請看以下兩個範例:
* 泛型 (Ok):
```rust
fn foo<'x, 'y>(x: &'x str, y: &'y str) -> &'x str {}
```
* `'_` (Error):
```rust
fn foo(x: &str, y: &'_ str) -> &'_ str {}
// x: &str → 等價於 x: &'_ str
```
:::warning
Jon的說明: ~~傳入參數的 `'_` 會轉換成任意獨一無二的生命週期,沒有人跟它一樣。回傳值的 `'_` 則是推斷生命週期會綁在傳入參數 `x` 上而不是傳入參數 `y` 上,因為傳入參數 `y` 有自己的生命週期。~~
然而 Jon 的說明可能有誤,因為編譯器顯示 error[E0106]: missing lifetime,而不是將回傳值的生命週期綁在傳入參數 `x` 上。會出現這個錯誤是因為符合目前的函式簽章符合 Elision rules 規則 4,所以編譯失敗。
:::
Q: So in other words, the life time of StrSplit.remainder and StrSplit.delimiter, is now tied to the lifetime of the StrSplit itself?
```rust=
pub struct StrSplit<'a>
{
remainder: &'a str,
delimiter: &'a str,
}
```
A: 並非如此。 WK: `StrSplit.remainder` 和 `StrSplit.delimiter` 是會綁在傳入參數的生命週期而不是 `StrSplit` 本身。
## Compile error: lifetime of reference outlives lifetime of borrowed content
[0:26:52](https://www.youtube.com/watch?v=rAl-9HwD858&t=1612s)
:::spoiler {state="close"} 目前程式碼
```rust=
#![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)]
pub struct StrSplit<'a>
{
remainder: &'a str,
delimiter: &'a str,
}
impl StrSplit<'_> {
pub fn new(haystack: &str, delimiter: &str) -> Self
{
Self {
remainder: haystack,
delimiter,
}
}
}
impl<'a> Iterator for StrSplit<'a>
{
type Item = &'a str;
fn next(&mut self) -> Option<Self::Item>
{
if let Some(next_delim) = self.remainder.find(self.delimiter) {
let until_delimiter = &self.remainder[..next_delim];
self.remainder = &self.remainder[(next_delim + self.delimiter.len())..];
Some(until_delimiter)
} else if self.remainder.is_empty() {
// TODO: bug
None
} else {
let rest = self.remainder;
self.remainder = "";
Some(rest)
}
}
}
#[test]
fn it_works()
{
let haystack = "a b c d e";
let letters = StrSplit::new(haystack, " ");
assert_eq!(letters, vec!["a", "b", "c", "d", "e"].into_iter());
}
```
:::
在我機器上編譯器檢查到以下錯誤 (在目前的實作之下,`StrSplit` 的生命週期可能比 `haystack` 以及 `delimiter` 還要長,導致出現[迷途指標](https://en.wikipedia.org/wiki/Dangling_pointer),導致編譯失敗):
```rust
$ cargo check
Checking strsplit v0.1.0 (/home/wilson/CrustOfRust/strsplit)
error: lifetime may not live long enough
--> src/lib.rs:10:9
|
8 | pub fn new(haystack: &str, delimiter: &str) -> Self
| - ---- return type is StrSplit<'2>
| |
| let's call the lifetime of this reference `'1`
9 | {
10 | / Self {
11 | | remainder: haystack,
12 | | delimiter,
13 | | }
| |_________^ associated function was supposed to return data with lifetime `'2` but it is returning data with lifetime `'1`
error: lifetime may not live long enough
--> src/lib.rs:10:9
|
8 | pub fn new(haystack: &str, delimiter: &str) -> Self
| - ---- return type is StrSplit<'2>
| |
| let's call the lifetime of this reference `'3`
9 | {
10 | / Self {
11 | | remainder: haystack,
12 | | delimiter,
13 | | }
| |_________^ associated function was supposed to return data with lifetime `'2` but it is returning data with lifetime `'3`
```
Jon 的機器編譯器檢查到以下錯誤 :

:::success
:pencil2: 我使用的編譯器版本比 Jon 使用編譯器版本還新,編譯器推斷能力更強,因而沒得到 Jon 編譯程式時產生的錯誤訊息。
:::
`Self` 的生命週期是 `'a` (編譯器自己推斷的),照理說 `remainder` 也應該獲得 `'a` 的生命週期 (來自 `StrSplit<'a>` 定義) 才對 ,但 `remainder` 卻獲得了 `haystack` 的生命週期。編譯器不知道 `haystack` 指標的生命週期跟 `StrSplit` 的生命週期誰比較長誰比較短;`delimiter` 也跟 `remainder` 有相同的情況。
如果 caller 一呼叫 `new()` 之後馬上卸除 `haystack`/`delimiter` 在記憶體的值,這樣會導致 `StrSplit` 有可能將欄位指向被卸除的值而導致[迷途指標](https://en.wikipedia.org/wiki/Dangling_pointer)。
繼續改進程式,將 `StrSplit` 欄位的生命週期也綁在傳入參數的生命週期 :
```diff=
-impl StrSplit<'_> {
+impl<'a> StrSplit<'a> {
- pub fn new(haystack: &str, delimiter: &str) -> Self
+ pub fn new(haystack: &'a str, delimiter: &'a str) -> Self
{
Self {
remainder: haystack,
delimiter,
}
}
}
```
Q: why do we use generic names like 'a, 'b, etc. for lifetimes and not proper names like (typical) variables?
A: 等等就會讓生命週期參數名稱變得更具有描述性。
Q: how resilient is the anonymous lifetime? will you get yourself in trouble if you rely on it too much or is the compiler going to pick correctly the vast majority of the time?
A: 如果可以,盡量使用匿名生命週期的功能。
Q: Can you impose restrictions between lifetimes?
A: 答案是肯定的, 你可以在 impl 區塊內給多個生命週期參數,並給定生命週期參數與生命週期參數之間的關係,例如,你可以給這樣的關係: `'a` 必須活的比 `'b` 還長,至少跟 `'b` 一樣長。但這裡不討論。
Q: why is the 'a next to the "impl" keyword needed?
```rust=
impl<'a> StrSplit<'a> {
pub fn new(haystack: &'a str, delimiter: &'a str) -> Self
{
Self {
remainder: haystack,
delimiter,
}
}
}
```
A: 請看下面的範例 :
* 錯誤版
```rust
struct Foo<T>;
impl Foo<T> {}
```
編譯器會告訴你,你正在使用 `T`,但編譯器不知道 `T` 是什麼
* 正確版
```rust
struct Foo<T>;
impl<T> Foo<T> {}
```
這樣的意思是說這個 impl 區塊是泛型的,使用了參數 T。
Q: The Rust typesystem has two [bottom types](https://stackoverflow.com/questions/51832396/why-does-rust-have-a-never-primitive-type)
A: 是的,分別是 [never](https://doc.rust-lang.org/std/primitive.never.html) (`i`) 以及 `enum Void {}`。
:::info
:bulb: [bottom type](https://hackmd.io/@FohtAO04T92wF-8_TVATYQ/SJ0vcjyWL#What-is-the--type)
* These types are unusual in that it is impossible to construct an instance of them - they are said to be 'uninhabited'.
* Unlike other empty enums, the ! type has the ability to coerce to any other type.
```rust
fn match_it(val: Option<u8>) {
let a: bool = match val {
Some(_) => true,
_ => panic!()
};
}
```
Here, we match on an `Option<u8>`, producing a bool. Each arm of a match expression must produce the same type, so that the overall expression always produces the same type. The first arm produces `true` (a `bool`), while the second arm calls `panic!()`, which produces `!`.
Coercion works because any code that has a `!` type available/in-scope is known to be statically unreachable. In the case of the match expression, we know that anything after the call to `panic!()` is unreachable. So, <font color="red">it's fine to "pretend"</font> that `panic!()` produced a `bool`, <font color="red">since no one will be able to observe the fact</font> that no `bool` was ever produced.
:::
Q: "subtyping" is actually the language used for lifetimes in the Rustonomicon
A: 是的。
:::spoiler {state="close"} 目前程式碼
```rust=
// #![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)]
pub struct StrSplit<'a>
{
remainder: &'a str,
delimiter: &'a str,
}
impl<'a> StrSplit<'a> {
pub fn new(haystack: &'a str, delimiter: &'a str) -> Self
{
Self {
remainder: haystack,
delimiter,
}
}
}
impl<'a> Iterator for StrSplit<'a>
{
type Item = &'a str;
fn next(&mut self) -> Option<Self::Item>
{
if let Some(next_delim) = self.remainder.find(self.delimiter) {
let until_delimiter = &self.remainder[..next_delim];
self.remainder = &self.remainder[(next_delim + self.delimiter.len())..];
Some(until_delimiter)
} else if self.remainder.is_empty() {
// TODO: bug
None
} else {
let rest = self.remainder;
self.remainder = "";
Some(rest)
}
}
}
#[test]
fn it_works()
{
let haystack = "a b c d e";
let letters = StrSplit::new(haystack, " ");
assert_eq!(letters, vec!["a", "b", "c", "d", "e"].into_iter());
}
```
:::
將目前程式碼再讓編譯器檢查一次看看,終於過了 :
```shell
$ cargo check
Finished dev [unoptimized + debuginfo] target(s) in 0.04s
```
## Static lifetime
[0:34:45](https://www.youtube.com/watch?v=rAl-9HwD858&t=2085s)
`&'a str` <- `&'static str` 為什麼這樣 ok ?
```rust
self.remainder = "";
```
* 這時候就要談到 [subtyping](https://doc.rust-lang.org/nomicon/subtyping.html) 關係了。如果有某個東西有任何的生命週期,你可以將值 (可能來自任意生命週期的參考,或者是有特定生命週期的參考) 指派給它,你能這麼做的前提是,要指派給它的值的生命週期長度必須大於你想要指派的對象,這樣的原則一樣是在避免迷途指標的發生。
* 至於`""` 是的生命週期是 `static` 的原因,是在編譯的時候真的就會把它放在儲存在 disk 二進位檔的 initialized data 區域。
:::spoiler {state="close"} 目前程式碼
```rust=
// #![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)]
#[derive(Debug)]
pub struct StrSplit<'a>
{
remainder: &'a str,
delimiter: &'a str,
}
impl<'a> StrSplit<'a> {
pub fn new(haystack: &'a str, delimiter: &'a str) -> Self
{
Self {
remainder: haystack,
delimiter,
}
}
}
impl<'a> Iterator for StrSplit<'a>
{
type Item = &'a str;
fn next(&mut self) -> Option<Self::Item>
{
if let Some(next_delim) = self.remainder.find(self.delimiter) {
let until_delimiter = &self.remainder[..next_delim];
self.remainder = &self.remainder[(next_delim + self.delimiter.len())..];
Some(until_delimiter)
} else if self.remainder.is_empty() {
// TODO: bug
None
} else {
let rest = self.remainder;
self.remainder = "";
Some(rest)
}
}
}
#[test]
fn it_works()
{
let haystack = "a b c d e";
let letters = StrSplit::new(haystack, " ");
// assert_eq!(letters, vec!["a", "b", "c", "d", "e"].into_iter());
assert!(letters.eq(vec!["a", "b", "c", "d", "e"].into_iter()));
}
```
:::
再進行測試,也順利通過了 :
```rust
$ cargo test
...
running 1 test
test it_works ... ok
...
```
Q: everything by default has static lifetime?
A: 值的生命週期是取決於什麼時候被卸除,如果那個值不是被宣告成 `'static'` 卻從未被卸除,那它就像是有 `static` 生命週期的假象。函式內部宣告的變數放在堆疊,在離開函式時,會去清掉那些在堆疊內區域變數的值,此時函式內部的變數的生命週期就已經結束了。
Q: can i think about strsplit like a [foldr](https://stackoverflow.com/questions/1757740/how-does-foldr-work)?
A: 不行,`StrSplit` 要做的就是分割字串而已。
將 test case 寫的簡潔一點:
```rust=
#[test]
fn it_works()
{
let haystack = "a b c d e";
let letters: Vec<_> = StrSplit::new(haystack, " ").collect();
assert_eq!(letters, vec!["a", "b", "c", "d", "e"]);
}
```
Q: Don't variables die at end-of-scope, not just return?
A: 只要值還沒被卸除,生命週期就還沒結束,因為離開作用域時值會被卸除,此時的生命週期才會到期,回到剛剛值是否預設為 `static` 的問題,再次說明,是取決於值能在記憶體多久,而不是預設為生命週期是 `static`。
## Bug when a delimiter tails a string
[0:41:27](https://www.youtube.com/watch?v=rAl-9HwD858&t=2487s)
新增 test case,`delimiter` 在 tail 的位置,預期最後一個子字串應該要是 `""`:
```rust=
#[test]
fn tail()
{
let haystack = "a b c d ";
let letters: Vec<_> = StrSplit::new(haystack, " ").collect();
assert_eq!(letters, vec!["a", "b", "c", "d", ""]);
}
```
:::spoiler {state="close"} 目前程式碼
```rust=
// #![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)]
#[derive(Debug)]
pub struct StrSplit<'a>
{
remainder: &'a str,
delimiter: &'a str,
}
impl<'a> StrSplit<'a> {
pub fn new(haystack: &'a str, delimiter: &'a str) -> Self
{
Self {
remainder: haystack,
delimiter,
}
}
}
impl<'a> Iterator for StrSplit<'a>
{
type Item = &'a str;
fn next(&mut self) -> Option<Self::Item>
{
if let Some(next_delim) = self.remainder.find(self.delimiter) {
let until_delimiter = &self.remainder[..next_delim];
self.remainder = &self.remainder[(next_delim + self.delimiter.len())..];
Some(until_delimiter)
} else if self.remainder.is_empty() {
// TODO: bug
None
} else {
let rest = self.remainder;
self.remainder = "";
Some(rest)
}
}
}
#[test]
fn it_works()
{
let haystack = "a b c d e";
let letters = StrSplit::new(haystack, " ");
assert!(letters.eq(vec!["a", "b", "c", "d", "e"].into_iter()));
}
#[test]
fn tail()
{
let haystack = "a b c d ";
let letters: Vec<_> = StrSplit::new(haystack, " ").collect();
assert_eq!(letters, vec!["a", "b", "c", "d", ""]);
}
```
:::
實際上得到的結果卻是:
```rust
$ cargo test
left: `["a", "b", "c", "d"]`,
right: `["a", "b", "c", "d", ""]`', src/lib.rs:50:5
```
所以我們應該要修改 `next()` 函式,下面是我們要改的範圍:
```rust
else if self.remainder.is_empty() {
// TODO: bug
None
} else {
let rest = self.remainder;
self.remainder = "";
Some(rest)
}
```
這裡 `remainder` 出來是 `""`,我們要區分出是 `remainder` 是 `""`,或者是 `remainder` 是 `""` 但我們還沒 yield。
要解決這個問題,先回到 `StrSplit` 的結構,將 `remainder` 的資料型別改成 `Option`,這是關鍵,因為等等我們要用到 `Option` 的 `take()` 函式來取得值的所有權。建構式的 `remainder` 也要改變資料型別,將傳進來的參數包進 `Some` 裡面 :
```diff=
#[derive(Debug)]
pub struct StrSplit<'a>
{
- remainder: &'a str,
+ remainder: Option<&'a str>,
delimiter: &'a str,
}
impl<'a> StrSplit<'a> {
pub fn new(haystack: &'a str, delimiter: &'a str) -> Self
{
Self {
- remainder: haystack,
+ remainder: Some(haystack),
delimiter,
}
}
}
```
Q: lifetimes are for stack allocated memory? heap allocations like String don't have specified lifetimes?
A: heap 也是有生命週期的,只要 heap 的值被卸除了,其生命週期就已經結束了,但也有可能從頭到尾都沒卸除,就會變成像是 static 一樣,但要發生 heap 的值沒被卸除的情況是 [Box::leak](https://www.zhihu.com/question/511520023/answer/2310578784), Box leak 回傳的就是 static 參考,這功能並不等於[記憶體洩漏](https://en.wikipedia.org/wiki/Memory_leak)。
Q: If you dumped the binary, could you spot the static allocation ?
A: 你可以在 dump 看到 static allocation,但如果是 empty string ("") 則不行,因為它被編譯器最佳化掉了。
原本只想修部份,現在變修整個 `next()` 函式 :
```rust=
impl<'a> Iterator for StrSplit<'a>
{
type Item = &'a str;
fn next(&mut self) -> Option<Self::Item>
{
if let Some(ref mut remainder) = self.remainder {
// 等價於 let remainder = &mut self.remainder
// 而不是 let mut remainder = &self.remainder;
if let Some(next_delim) = remainder.find(self.delimiter) {
let until_delimiter = &remainder[..next_delim];
*remainder = &remainder[(next_delim + self.delimiter.len())..];
Some(until_delimiter)
} else {
self.remainder.take() // 當第一次得到空字串時,可以讓我們 yield 空字串。
}
} else {
None // 空字串被 yield 後,
// self.remainder 內的值的所有權被拿走,會變成 None
}
}
}
```
:::info
:bulb: 參照連結
1. [Keyword ref](https://doc.rust-lang.org/std/keyword.ref.html): `ref` annotates pattern bindings to <font color="red">make them borrow rather than move</font>. It is not a part of the pattern as far as matching is concerned: <font color="red">it does not affect whether a value is matched, only how it is matched</font>.
2. [if let Statement](https://www.geeksforgeeks.org/rust-if-let-statement/)
:::
`if let` 只是在比模式有沒有匹配而已,而不是在比兩邊的值時否相等。
Q: what does ref keyword mean?
A: 請看下面範例:
* 無法達成我們目的的做法 (但是是合理的語法) :
```rust
if let Some(remainder) = self.remainder {
```
會導致 `self.remainder` 的<font color="red">指標</font>被 Copy 到 `remainder`,但我們這裡只是想要借用 `self.remainder` 的指標而已。倘若 `remainder` 與 self.remainder 並非相同的指標,在修改 `remainder` 就無法影響到 `self.remainder` 了。
:::info
:bulb: 為什麼 `Option` 內的 `&'a str` 會是 Copy 型態?
參照 [Enum Option](https://doc.rust-lang.org/std/option/enum.Option.html#impl-Copy-for-Option%3CT%3E) 中提到的:
```rust
impl<T> Copy for Option<T>
where
T: Copy,
```
注意: `str` 本身並沒有實作 Copy trait,`&str` 才有實作 Copy trait。<font color="red">這裡的 Copy 指的是指標的 Copy 而不是指標所指向的值的 Copy</font>! (節錄 [Pointer types](https://doc.rust-lang.org/reference/types/pointer.html#shared-references-) 的部分內容: Copying a reference is a “shallow” operation: it involves only copying the pointer itself, that is, pointers are `Copy`.)
補充: `str` 本身由於是 [Dynamically Sized Types](https://doc.rust-lang.org/reference/dynamically-sized-types.html),所以 `str` 不能是某變數的型別,只能搭配 pointer 抑或是一些其他情境之下使用。
:::
* 可以達成我們目的的做法 :
如果左式與右式匹配 (`Some(ref mut x) = Option(y)`),則 `x` 會拿借用 `&mut y`。
```rust
if let Some(ref mut remainder /* &mut &'a str */) = self.remainder /* Option<&'a str> */{
// 最內層的 &'a str 是一個不可變字串參考,其生命週期是 'a。
// 最外層的 &mut 是一個可變參考,指向這個「不可變字串參考」。
// 由於 remainder 是指標的指標,所以當我們要修改不可變字串參考時需要先 dereference。
```
:::warning
```rust
if let Some(mut remainder /* &'a str */) = self.remainder /* Option<&'a str> */{
// remainder 是新的指標,與 self.remainder 都指向同一個不可變字串。
:::
## What is the ref keyword and why not &
[0:48:07](https://www.youtube.com/watch?v=rAl-9HwD858&t=2887s)
Q: what is ref keyword means? Is it same as & ?
A: 請看下面範例 :
* 無法達成我們目的的做法 (但是是合理的語法) :
```rust
if let Some(&mut remainder) = self.remainder
```
是看右式是不是也是 `Some(&mut T)`,如果是的話 `remainder` 的資料型別會是 `mut T`。
* 可以達成我們目的的做法有兩種 :
1. 看右式是不是也是 `Some(mut T)`,如果是的話 `remainder` 的資料型別會是 `ref mut T` :
```rust
if let Some(ref mut remainder) = self.remainder
```
2. Q: if let Some(remainder) = &mut self.remainder {...} ?
A: 較不好的寫法,但仍可達到我們的目的
```rust
if let Some(remainder) = &mut self.remainder
```
## What's the * on the left of remainder
[0:51:36](https://www.youtube.com/watch?v=rAl-9HwD858&t=3096s)
Q: what is the deref on the left side of the assignment doing?
A: 請看下面程式碼範例 :
```rust
*remainder = &remainder[(next_delim + self.delimiter.len())..]
```
左式的資料型別 : `&mut &'a str` (指標的指標)
右式的資料型別 : `&'a str` (指標)
因為左式與右式的資料型別不同,所以要把左式解參考才能將右式的值指派給左式。
:::info
:bulb: 補充:
`&remainder[(next_delim + self.delimiter.len())..]` 其中的 `remainder` 雖然是 `&mut &str`,但 Rust 的解參考規則會自動幫你轉成 `str`,所以這裡的右邊其實是:
```rust
&(**remainder)[(next_delim + self.delimiter.len())..]
```
參考 [Array and slice indexing expressions](https://doc.rust-lang.org/reference/expressions/array-expr.html) 官方文件:
For other types an index expression `a[b]` is equivalent to `*std::ops::Index::index(&a, b)`, or `*std::ops::IndexMut::index_mut(&mut a, b)` in a mutable place expression context. Just as with methods, <font color="red">Rust will also insert dereference operations on a repeatedly to find an implementation (`Index` or `IndexMut` trait)</font>.
:::
## What is take() doing
[0:52:46](https://www.youtube.com/watch?v=rAl-9HwD858&t=3166s)
Q: What is the ".take()" call doing ?
A: 請看下面說明
```rust
self.remainder.take()
// impl<T> Option<T> {fn tak(&mut self) -> Option<T
// if Option is None, return None
// if Option is Some, then set Option to None and Return the Some
```
:::success
:pencil2: 每一個 `let` statement 都是模式比對
:::
簡化 `next()` 函式區塊的程式碼 :
```rust=
impl<'a> Iterator for StrSplit<'a>
{
type Item = &'a str;
fn next(&mut self) -> Option<Self::Item>
{
let ref mut remainder = self.remainder?;
// 上面式子也可以用下式表示一樣的操作
// let remainder = &mut self.remainder?;
if let Some(next_delim) = remainder.find(self.delimiter) {
let until_delimiter = &remainder[..next_delim];
*remainder = &remainder[(next_delim + self.delimiter.len())..];
Some(until_delimiter)
} else {
self.remainder.take()
}
}
}
```
:::spoiler {state="close"} 目前程式碼 (需要再修改,有無窮迴圈)
```rust=
// #![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)]
#[derive(Debug)]
pub struct StrSplit<'a>
{
remainder: Option<&'a str>,
delimiter: &'a str,
}
impl<'a> StrSplit<'a> {
pub fn new(haystack: &'a str, delimiter: &'a str) -> Self
{
Self {
remainder: Some(haystack),
delimiter,
}
}
}
impl<'a> Iterator for StrSplit<'a>
{
type Item = &'a str;
fn next(&mut self) -> Option<Self::Item>
{
let ref mut remainder = self.remainder?;
if let Some(next_delim) = remainder.find(self.delimiter) {
let until_delimiter = &remainder[..next_delim];
*remainder = &remainder[(next_delim + self.delimiter.len())..];
Some(until_delimiter)
} else {
self.remainder.take()
}
}
}
#[test]
fn it_works()
{
let haystack = "a b c d e";
let letters = StrSplit::new(haystack, " ");
assert!(letters.eq(vec!["a", "b", "c", "d", "e"].into_iter()));
}
#[test]
fn tail()
{
let haystack = "a b c d ";
let letters: Vec<_> = StrSplit::new(haystack, " ").collect();
assert_eq!(letters, vec!["a", "b", "c", "d", ""]);
}
```
:::
```rust=
$ cargo test
Finished test [unoptimized + debuginfo] target(s) in 0.00s
Running unittests src/lib.rs (target/debug/deps/strsplit-a9fa65918e243300)
running 2 tests
test it_works ... FAILED
```
## Mutable references are one level deep
[0:54:48](https://www.youtube.com/watch?v=rAl-9HwD858&t=3288s)
Q: If `self` is mutable here, why is `self.remainder` not mutable by default? (Coming from a C background, I'm thinking about this kind of like const)
A: Mutable references 只有一層的深度,傳入 `&mut self` 只讓我們可以修改 `self` 的任何欄位,但欄位指向的值,拿 `delimiter` 來例子來說,它是指向 immutable 字串,所以它指向的值是不能修改的,但它可以改指向別的 immutable 字串。
## Solving a hang with as_mut()
[0:55:39](https://www.youtube.com/watch?v=rAl-9HwD858&t=3339s)
前面會造成無窮迴圈的原因是值並未被移動,因為`ref mut` 沒發揮作用,沒發揮作用的原因如下 :
```rust
let ref mut remainder = self.remainder?;
```
`?` 用法是 if self.remainder is None return None,否則回傳在 `Some` 裡面的值,就像拆除包裝一樣。一般來說上式應該發揮作用,但是因為 `self.remainder` `Option` 裡面的值是 `Copy` 型態,所以上式在做指派值的動作時是做了 **Copy** 的動作而不是 **Move**,<font color=red>導致左式的 `remainder` (`ptrRemainderCopy`) 跟右式的 `self.remainder` (`ptrRemainder`) 變成兩個不同指標的指標</font> :
* Move 的情況 (我們想要的)
```graphviz
digraph structs {
node[shape=record]
{rank=same; structa}
structp [label="ref mut remainder|<p> &ptrRemainderMove"];
structaptr [label="<name_ptr> ptrRemainderMove|<ptr> &str"];
structbptr [label="<name_ptr> ptrRemainder|<ptr> &str" style=filled];
structa [label="<A> a|b|c|d|e|"];
structaptr:ptr -> structa:A:nw
structbptr:ptr -> structa:A:nw[style=dashed]
structp:p -> structaptr:name_ptr:nw
}
```
* Copy 的情況 (現在的情況)
```graphviz
digraph structs {
node[shape=record]
{rank=same; structa}
structp [label="ref mut remainder|<p> &ptrRemainderCopy"];
structaptr [label="<name_ptr> ptrRemainderCopy|<ptr> &str"];
structbptr [label="<name_ptr> ptrRemainder|<ptr> &str"];
structa [label="<A> a|b|c|d|e|"];
structaptr:ptr -> structa:A:nw
structbptr:ptr -> structa:A:nw
structp:p -> structaptr:name_ptr:nw
}
```
所以當我們執行到 `*remainder = &remainder[(next_delim + self.delimiter.len())..];`,只是改變 Copy 那份 (`ptrRemainderCopy`)的值,self.remainder (`ptrRemainder`) 沒有跟著做相同的操作,最終導致了無窮迴圈。
:::warning
:question: 尚為解決疑問,為何 `Option` 內的 `&'a str` 會是 Copy 型態?
:::
為了讓左式借用右式值的參考,右式要加上 `as_mut()` :
```diff=
impl<'a> Iterator for StrSplit<'a>
{
type Item = &'a str;
fn next(&mut self) -> Option<Self::Item>
{
- let ref mut remainder = self.remainder?;
+ let remainder = self.remainder.as_mut()?;
+ // impl<T> Option<T> { fn as_mut(&mut self) -> Option<&mut T> } ,
+ // 再搭配 ? 拆包裝即可達到我們的目的
if let Some(next_delim) = remainder.find(self.delimiter) {
let until_delimiter = &remainder[..next_delim];
*remainder = &remainder[(next_delim + self.delimiter.len())..];
Some(until_delimiter)
} else {
self.remainder.take()
}
}
}
```
:::spoiler {state="close"} 目前程式碼
```rust=
// #![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)]
#[derive(Debug)]
pub struct StrSplit<'a>
{
remainder: Option<&'a str>,
delimiter: &'a str,
}
impl<'a> StrSplit<'a> {
pub fn new(haystack: &'a str, delimiter: &'a str) -> Self
{
Self {
remainder: Some(haystack),
delimiter,
}
}
}
impl<'a> Iterator for StrSplit<'a>
{
type Item = &'a str;
fn next(&mut self) -> Option<Self::Item>
{
let remainder = self.remainder.as_mut()?;
if let Some(next_delim) = remainder.find(self.delimiter) {
let until_delimiter = &remainder[..next_delim];
*remainder = &remainder[(next_delim + self.delimiter.len())..];
Some(until_delimiter)
} else {
self.remainder.take()
}
}
}
#[test]
fn it_works()
{
let haystack = "a b c d e";
let letters = StrSplit::new(haystack, " ");
assert!(letters.eq(vec!["a", "b", "c", "d", "e"].into_iter()));
}
#[test]
fn tail()
{
let haystack = "a b c d ";
let letters: Vec<_> = StrSplit::new(haystack, " ").collect();
assert_eq!(letters, vec!["a", "b", "c", "d", ""]);
}
```
:::
這樣即可解決無窮迴圈的問題了 :
```rust
$ cargo test
...
running 2 tests
test tail ... ok
test it_works ... ok
...
```
## Multiple lifetimes, implementing until_char
[0:57:49](https://www.youtube.com/watch?v=rAl-9HwD858&t=3469s)
到目前為止還沒講解到多個生命週期的情形,接下要來要開始講解多個生命週期的情形了。
首先,新增 `until_char()` 函式 :
```rust=
pub fn until_char(s: &str, c: char) -> &str
{
StrSplit::new(s, &format!("{}", c))
.next()
.expect("StrSplit always gives at least one result!")
}
```
新增 test case :
```rust=
#[test]
fn until_char_test()
{
assert_eq!(until_char("hello world", 'o'), "hell");
}
```
:::spoiler {state="close"} 目前程式碼
```rust=
// #![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)]
#[derive(Debug)]
pub struct StrSplit<'a>
{
remainder: Option<&'a str>,
delimiter: &'a str,
}
impl<'a> StrSplit<'a> {
pub fn new(haystack: &'a str, delimiter: &'a str) -> Self
{
Self {
remainder: Some(haystack),
delimiter,
}
}
}
impl<'a> Iterator for StrSplit<'a>
{
type Item = &'a str;
fn next(&mut self) -> Option<Self::Item>
{
let remainder = self.remainder.as_mut()?;
if let Some(next_delim) = remainder.find(self.delimiter) {
let until_delimiter = &remainder[..next_delim];
*remainder = &remainder[(next_delim + self.delimiter.len())..];
Some(until_delimiter)
} else {
self.remainder.take()
}
}
}
pub fn until_char(s: &str, c: char) -> &str
{
StrSplit::new(s, &format!("{}", c))
.next()
.expect("StrSplit always gives at least one result!")
}
#[test]
fn until_char_test()
{
assert_eq!(until_char("hello world", 'o'), "hell");
}
#[test]
fn it_works()
{
let haystack = "a b c d e";
let letters = StrSplit::new(haystack, " ");
assert!(letters.eq(vec!["a", "b", "c", "d", "e"].into_iter()));
}
#[test]
fn tail()
{
let haystack = "a b c d ";
let letters: Vec<_> = StrSplit::new(haystack, " ").collect();
assert_eq!(letters, vec!["a", "b", "c", "d", ""]);
}
```
:::
編譯器檢查到以下錯誤 :
```rust
$ cargo check
Checking strsplit v0.1.0 (/home/wilson/CrustOfRust/strsplit)
error[E0515]: cannot return value referencing temporary value
--> src/lib.rs:35:5
|
35 | StrSplit::new(s, &format!("{}", c))
| ^ ---------------- temporary value created here
| _____|
| |
36 | | .next()
37 | | .expect("StrSplit always gives at least one result!")
| |_____________________________________________________________^ returns a value referencing data owned by the current function
```
因為回傳值的生命週期跟 `&format!("{}", c))` 綁在一起,但 `&format!("{}", c))` 會在離開函式時值就會被卸除了,導致回傳值指向非法的記憶體區域。至於為什麼回傳值的生命週期是跟 `&format!("{}", c))` 綁在一起 而不是 `s` 呢? 原因是我們前面宣告兩個傳進來的參數的生命週期都是 `'a` ,但由於現在 `&format!("{}", c))` 的生命週期比較短 (只能活在函式內部),就把它當成 `'a` (多個生命週期的情況下,要取短的生命週期),所以等於這個回傳值跟函式綁一起,因為函式活著,`&format!("{}", c))` 才活著。但我們想要的是:
```rust
pub fn until_char<'s>(s : &'s str, c: char) -> &'s str
```
如何告訴 Rust 這樣是 ok 的? 我們必須要有多個生命週期才能解決。
## Difference between a str and a String
[1:03:19](https://www.youtube.com/watch?v=rAl-9HwD858&t=3799s)
Q: Should we copy the delimiter into our struct?
A: delimiter 宣告成 `String`,這樣就不用解多個生命週期的問題 :
```rust=
#[derive(Debug)]
pub struct StrSplit<'a>
{
remainder: Option<&'a str>,
delimiter: String,
}
```
因為 String 屬於 heap-allocated,沒有生命週期跟它綁在一起。
:::info
:bulb: [String vs. str](https://stackoverflow.com/questions/24158114/what-are-the-differences-between-rusts-string-and-str)
:::
1. `str -> [char]`
`str` 類似於 `[char]`, `str` 沒有 size,因為它就像是 slice,它只是個字元序列,它不知道序列本身有多長,它只知道它是字元序列而已。
2. `&str -> &[char]`
`&str` 是 [fat pointer](https://stackoverflow.com/questions/57754901/what-is-a-fat-pointer),fat pointer 是 two-word 值,包含一個指向 slice 的第一個元素,以及 slice 的元素數量。
它可以指向任何記憶體位置,ex. 在 stack, heap, static memory... 的值
3. `String ->Vec<char>`
String 是 heap-allocated,可動態調整序列長度。
* 如果你有 `String` 你可以很簡單的得到 `&str` :
`String -> &str` (cheap -- AsRef)
* &str 要轉成 String,就涉及複製值以及 heap allocation :
`&str -> String` (expensive) -- Clone
但將 `delimiter` 宣告成 `String` 有兩個壞處 :
1. 要求記憶體配置,對效能會有衝擊。
2. 如果你使用了 `String`,就表示你要有記憶體配置器,將會導致我們這個函式庫無法相容於沒有記憶體配置器的嵌入式設備之類的問題。
所以這裡不用 String 的解法。
## Multiple lifetimes (continued)
[1:08:15](https://www.youtube.com/watch?v=rAl-9HwD858&t=4095s)
你通常不需要有多個生命週期,只有在一些特殊案例下才要用到,比如說我們今天討論的這個案例,使用多個參考,要強調的一點是,這些參考並不指向相同的東西,現在我們要的回傳值只想要綁在其中一個參考而已 :
```diff=
#[derive(Debug)]
-pub struct StrSplit<'a>
+pub struct StrSplit<'haystack, 'delimiter>
{
- remainder: Option<&'a str>,
+ remainder: Option<&'haystack str>,
- delimiter: &'a str,
+ delimiter: &'delimiter str,
}
-impl<'a> StrSplit<'a> {
+impl<'haystack, 'delimiter> StrSplit<'haystack, 'delimiter> {
- pub fn new(haystack: &'a str, delimiter: &'a str) -> Self
+ pub fn new(haystack: &'haystack str, delimiter: &'delimiter str) -> Self
{
Self {
remainder: Some(haystack),
delimiter,
}
}
}
-impl<'a> Iterator for StrSplit<'a>
+impl<'haystack, 'delimiter> Iterator for StrSplit<'haystack, 'delimiter>
{
- type Item = &'a str;
+ type Item = &'haystack str;
+ // 這樣回傳值就可以只有綁在 haystack 上而已囉
fn next(&mut self) -> Option<Self::Item>
{
...
}
}
...
```
此時傳進 `new()` 函式的參數編譯器不會強求要有相同的生命週期了。
接著故意將 `Some(until_delimiter)` 換成 `Some(self.delimiter)`。
:::spoiler 目前程式碼
```rust=
// #![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)]
#[derive(Debug)]
pub struct StrSplit<'haystack, 'delimiter>
{
remainder: Option<&'haystack str>,
delimiter: &'delimiter str,
}
impl<'haystack, 'delimiter> StrSplit<'haystack, 'delimiter> {
pub fn new(haystack: &'haystack str, delimiter: &'delimiter str) -> Self
{
Self {
remainder: Some(haystack),
delimiter,
}
}
}
impl<'haystack, 'delimiter> Iterator for StrSplit<'haystack, 'delimiter>
{
type Item = &'haystack str;
fn next(&mut self) -> Option<Self::Item>
{
let remainder = self.remainder.as_mut()?;
if let Some(next_delim) = remainder.find(self.delimiter) {
let until_delimiter = &remainder[..next_delim];
*remainder = &remainder[(next_delim + self.delimiter.len())..];
Some(self.delimiter)
} else {
self.remainder.take()
}
}
}
pub fn until_char(s: &str, c: char) -> &str
{
StrSplit::new(s, &format!("{}", c))
.next()
.expect("StrSplit always gives at least one result!")
}
#[test]
fn until_char_test()
{
assert_eq!(until_char("hello world", 'o'), "hell");
}
#[test]
fn it_works()
{
let haystack = "a b c d e";
let letters = StrSplit::new(haystack, " ");
assert!(letters.eq(vec!["a", "b", "c", "d", "e"].into_iter()));
}
#[test]
fn tail()
{
let haystack = "a b c d ";
let letters: Vec<_> = StrSplit::new(haystack, " ").collect();
assert_eq!(letters, vec!["a", "b", "c", "d", ""]);
}
```
:::
獲得編譯器會檢查到以下錯誤:
```rust
$ cargo check
error: lifetime may not live long enough
--> src\lib.rs:28:13
|
17 | impl<'haystack, 'delimiter> Iterator for StrSplit<'haystack, 'delimiter>
| --------- ---------- lifetime `'delimiter` defined here
| |
| lifetime `'haystack` defined here
...
28 | Some(self.delimiter)
| ^^^^^^^^^^^^^^^^^^^^ method was supposed to return data with lifetime `'haystack` but it is returning data with lifetime `'delimiter`
|
= help: consider adding the following bound: `'delimiter: 'haystack`
```
測試新增 bound `where ...`,告訴編譯器說 `'delimiter` 的生命週期長度 > `'haystack` 的生命週期長度,意思同 `'delimiter` 實作 `'haystack`,即使用了前面提到的 [subtyping](https://doc.rust-lang.org/nomicon/subtyping.html) 關係 :
```rust=
impl<'haystack, 'delimiter> Iterator for StrSplit<'haystack, 'delimiter>
where
'delimiter: 'haystack
```
這樣你前面宣告 `type Item = &'haystack str; `,後面回傳 `
Some(self.delimiter)` 就可以編譯的過,但編譯錯誤又回到了回傳值的 生命週期跟 `&format!("{}", c)` 一樣長,函式退出即結束其生命週期。
:::spoiler {state="close"} 目前程式碼
```rust=
// #![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)]
#[derive(Debug)]
pub struct StrSplit<'haystack, 'delimiter>
{
remainder: Option<&'haystack str>,
delimiter: &'delimiter str,
}
impl<'haystack, 'delimiter> StrSplit<'haystack, 'delimiter> {
pub fn new(haystack: &'haystack str, delimiter: &'delimiter str) -> Self
{
Self {
remainder: Some(haystack),
delimiter,
}
}
}
impl<'haystack, 'delimiter> Iterator for StrSplit<'haystack, 'delimiter>
where
'delimiter: 'haystack
{
type Item = &'haystack str;
fn next(&mut self) -> Option<Self::Item>
{
let remainder = self.remainder.as_mut()?;
if let Some(next_delim) = remainder.find(self.delimiter) {
let until_delimiter = &remainder[..next_delim];
*remainder = &remainder[(next_delim + self.delimiter.len())..];
Some(self.delimiter)
} else {
self.remainder.take()
}
}
}
pub fn until_char(s: &str, c: char) -> &str
{
StrSplit::new(s, &format!("{}", c))
.next()
.expect("StrSplit always gives at least one result!")
}
#[test]
fn until_char_test()
{
assert_eq!(until_char("hello world", 'o'), "hell");
}
#[test]
fn it_works()
{
let haystack = "a b c d e";
let letters = StrSplit::new(haystack, " ");
assert!(letters.eq(vec!["a", "b", "c", "d", "e"].into_iter()));
}
#[test]
fn tail()
{
let haystack = "a b c d ";
let letters: Vec<_> = StrSplit::new(haystack, " ").collect();
assert_eq!(letters, vec!["a", "b", "c", "d", ""]);
}
```
:::
編譯器檢查得到以下錯誤:
```rust
$ cargo check
error[E0515]: cannot return value referencing temporary value
--> src\lib.rs:39:5
|
39 | StrSplit::new(s, &format!("{}", c))
| ^ ---------------- temporary value created here
| _____|
| |
40 | | .next()
41 | | .expect("StrSplit always gives at least one result!")
| |_____________________________________________________________^ returns a value referencing data owned by the current function
```
將程式碼做修改 :
```diff=
...
impl<'haystack, 'delimiter> Iterator for StrSplit<'haystack, 'delimiter>
-where
- 'delimiter: 'haystack
{
type Item = &'haystack str;
fn next(&mut self) -> Option<Self::Item>
{
let remainder = self.remainder.as_mut()?;
if let Some(next_delim) = remainder.find(self.delimiter) {
let until_delimiter = &remainder[..next_delim];
*remainder = &remainder[(next_delim + self.delimiter.len())..];
- Some(self.delimiter)
+ Some(until_delimiter)
} else {
self.remainder.take()
}
}
}
-pub fn until_char(s: &str, c: char) -> &str
+pub fn until_char<'s>(s : &'s str, c: char) -> &'s str
{
StrSplit::new(s, &format!("{}", c))
.next()
.expect("StrSplit always gives at least one result!")
}
#[test]
fn until_char_test()
{
assert_eq!(until_char("hello world", 'o'), "hell");
}
#[test]
fn it_works()
{
let haystack = "a b c d e";
- let letters = StrSplit::new(haystack, " ");
+ let letters: Vec<_> = StrSplit::new(haystack, " ").collect();
- assert!(letters.eq(vec!["a", "b", "c", "d", "e"].into_iter()));
+ assert_eq!(letters, vec!["a", "b", "c", "d", "e"]);
}
...
```
:::spoiler {state="close"} 目前程式碼
```rust=
// #![warn(missing_debug_implementations, rust_2018_idioms, missing_docs)]
#[derive(Debug)]
pub struct StrSplit<'haystack, 'delimiter>
{
remainder: Option<&'haystack str>,
delimiter: &'delimiter str,
}
impl<'haystack, 'delimiter> StrSplit<'haystack, 'delimiter> {
pub fn new(haystack: &'haystack str, delimiter: &'delimiter str) -> Self
{
Self {
remainder: Some(haystack),
delimiter,
}
}
}
impl<'haystack, 'delimiter> Iterator for StrSplit<'haystack, 'delimiter>
{
type Item = &'haystack str;
fn next(&mut self) -> Option<Self::Item>
{
let remainder = self.remainder.as_mut()?;
if let Some(next_delim) = remainder.find(self.delimiter) {
let until_delimiter = &remainder[..next_delim];
*remainder = &remainder[(next_delim + self.delimiter.len())..];
Some(until_delimiter)
} else {
self.remainder.take()
}
}
}
pub fn until_char<'s>(s : &'s str, c: char) -> &'s str
{
StrSplit::new(s, &format!("{}", c))
.next()
.expect("StrSplit always gives at least one result!")
}
#[test]
fn until_char_test()
{
assert_eq!(until_char("hello world", 'o'), "hell");
}
#[test]
fn it_works()
{
let haystack = "a b c d e";
let letters: Vec<_> = StrSplit::new(haystack, " ").collect();
assert_eq!(letters, vec!["a", "b", "c", "d", "e"]);
}
#[test]
fn tail()
{
let haystack = "a b c d ";
let letters: Vec<_> = StrSplit::new(haystack, " ").collect();
assert_eq!(letters, vec!["a", "b", "c", "d", ""]);
}
```
:::
再次測試程式碼 :
```rust
$ cargo test
Compiling strsplit v0.1.0 (/home/wilson/CrustOfRust/strsplit)
Finished test [unoptimized + debuginfo] target(s) in 0.28s
Running unittests src/lib.rs (target/debug/deps/strsplit-dd83426d9f98ae71)
running 3 tests
test until_char_test ... ok
test it_works ... ok
test tail ... ok
test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests strsplit
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
```
為什麼我們不用將回傳值的生命週期綁在 `&format!("{}", c)` 上,因為我們回傳值根本用不到它的值,所以根本不需要將回傳值也跟它綁在一起。
Q: can you put _ for delimiter lifetime to say it's not needed?
A: 可以,兩個函式簽章改成以下 :
* `'_` : 表示非 'haystack 的任何變數的生命週期都可以獨一無二的 生命週期。
```rust
impl<'haystack> Iterator for StrSplit<'haystack, '_>
```
* 編譯器必須將 `'_` 的生命週期跟 `s` 綁在一起,否則沒有其他生命週期可以綁了。
```rust
pub fn until_char(s : &str, c: char) -> &'_ str
```
Jon 認為編譯器應該要提示該函式簽章缺少了 `'_` 指出每個生命週期都是自動推斷,有點要求 explicit 的感覺,但實際上其實可以不用加 `'_` 也可以編譯的過 :
```rust
pub fn until_char(s : &str, c: char) -> &str
```
`&format!("{}", c)` 這個是回傳 `String` 的資料型別,所以仍用到 heap allocation,接下來要擺脫這個記憶體配置。
## Generic delimiter (Delimiter trait)
[1:15:24](https://www.youtube.com/watch?v=rAl-9HwD858&t=4524s)
如何在 `next()` 不要用到 `&format!("{}", c)` 的記憶體配置 ? 如何讓 `c` 不要轉換成 `String` 的型別,而是成為任何可以在 `String` 中找到自己的東西的型別 ?
首先,我們先新加一個 trait :
```rust=
pub trait Delimiter
{
fn find_next(&self, s: &str) -> Option<(usize, usize)>;
}
```
接著要做這幾件事情 :
1. `'delimiter` 生命週期參數都換成 `D` 泛型
2. 加入新的 bound 到 `next()`
3. 使用新 trait Delimiter 的 `find_next()`
```diff=
#[derive(Debug)]
-pub struct StrSplit<'haystack, 'delimiter>
+pub struct StrSplit<'haystack, D>
{
remainder: Option<&'haystack str>,
- delimiter: &'delimiter str,
+ delimiter: D,
}
-impl<'haystack, 'delimiter> StrSplit<'haystack, 'delimiter> {
+impl<'haystack, D> StrSplit<'haystack, D> {
- pub fn new(haystack: &'haystack str, delimiter: &'delimiter str) -> Self
+ pub fn new(haystack: &'haystack str, delimiter: D) -> Self
{
Self {
remainder: Some(haystack),
delimiter,
}
}
}
-impl<'haystack, 'delimiter> Iterator for StrSplit<'haystack, 'delimiter>
+impl<'haystack, D> Iterator for StrSplit<'haystack, D> where D: Delimiter
{
type Item = &'haystack str;
fn next(&mut self) -> Option<Self::Item>
{
let remainder = self.remainder.as_mut()?;
- if let Some(next_delim) = remainder.find(self.delimiter) {
+ if let Some((delim_start, delim_end)) = self.delimiter.find_next(remainder) {
- let until_delimiter = &remainder[..next_delim];
+ let until_delimiter = &remainder[..delim_start];
- *remainder = &remainder[(next_delim + self.delimiter.len())..];
+ *remainder = &remainder[delim_end..];
Some(until_delimiter)
} else {
self.remainder.take()
}
}
}
...
```
接著為 `&str` 實作 `Delimiter` 的 trait :
```rust=
impl Delimiter for &str
{
fn find_next(&self, s: &str) -> Option<(usize, usize)>
{
s.find(self).map(|start| (start, start + self.len()))
}
}
```
並將 `&format!("{}", c))` 改為 `&*format!("{}", c))` (型態為 `&str`),因為我們的程式允許給 `&str` 的型態。
:::info
:bulb: 將 `String` 轉成 `&str` 的方法
摘錄 [What does &* combined together do in Rust?](https://stackoverflow.com/questions/41273041/what-does-combined-together-do-in-rust) 內容 :
```rust=
let s = "hi".to_string(); // : String
let a = &s;
```
What's the type of a? It's simply &String! This shouldn't be very surprising, since we take the reference of a String. Ok, but what about this?
```rust=
let s = "hi".to_string(); // : String
let b = &*s; // equivalent to `&(*s)`
What's the type of b? It's &str! Wow, what happened?
```
...
:::
因為前面的修改讓程式泛型化於任何 `D`,這個 `D` 可以是參考,也可以是想活多久就活多久的某資料型態,它只有一個限制是你給的資料型別要有 `Delimiter` trait 而已
:::spoiler {state="close"} 目前程式碼 (可以編譯,但尚未擺脫 heap 的記憶體配置,這裡的目的是先將 `delimiter`) 而已。
```rust=
#[derive(Debug)]
pub struct StrSplit<'haystack, D>
{
remainder: Option<&'haystack str>,
delimiter: D,
}
impl<'haystack, D> StrSplit<'haystack, D> {
pub fn new(haystack: &'haystack str, delimiter: D) -> Self
{
Self {
remainder: Some(haystack),
delimiter,
}
}
}
pub trait Delimiter
{
fn find_next(&self, s: &str) -> Option<(usize, usize)>;
}
impl<'haystack, D> Iterator for StrSplit<'haystack, D> where D: Delimiter
{
type Item = &'haystack str;
fn next(&mut self) -> Option<Self::Item>
{
let remainder = self.remainder.as_mut()?;
if let Some((delim_start, delim_end)) = self.delimiter.find_next(remainder) {
let until_delimiter = &remainder[..delim_start];
*remainder = &remainder[delim_end..];
Some(until_delimiter)
} else {
self.remainder.take()
}
}
}
impl Delimiter for &str
{
// 這裡的 &self 的資料型別是 &&str
fn find_next(&self, s: &str) -> Option<(usize, usize)>
{
s.find(self).map(|start| (start, start + self.len()))
/*
s.find(self) 在幹嘛?
find 是 String 的方法,你可以給一個 String,它會告訴你 String 的起始位置。
find 會回傳 Option<找到的東西的位置>
map(...) 則是當 find 回傳的是 None 則回傳 None,否則回傳 Some,此時我們想要改 Some 裡面的值成 ((start, start + self.len()))
Q: why self.len() and not s.len()?
A: 因為 self 是我們要搜尋的對象,self.len() 是 delimiter 的長度,所以加上 self.len() 才能得到我們目前找到的 delimiter 的起始位置跟終點位置
*/
}
}
pub fn until_char(s : &str, c: char) -> &str
{
StrSplit::new(s, &*format!("{}", c))
.next()
.expect("StrSplit always gives at least one result!")
}
#[test]
fn until_char_test()
{
assert_eq!(until_char("hello world", 'o'), "hell");
}
#[test]
fn it_works()
{
let haystack = "a b c d e";
let letters: Vec<_> = StrSplit::new(haystack, " ").collect();
assert_eq!(letters, vec!["a", "b", "c", "d", "e"]);
}
#[test]
fn tail()
{
let haystack = "a b c d ";
let letters: Vec<_> = StrSplit::new(haystack, " ").collect();
assert_eq!(letters, vec!["a", "b", "c", "d", ""]);
}
```
:::
接著為 char 實作 Delimiter trait :
```rust=
impl Delimiter for char
{
fn find_next(&self, s: &str) -> Option<(usize, usize)>
{
s.char_indices()
.find(|(_, c)| c == self)
.map(|(start, _)| (start, start + 1))
}
}
```
1. `char_indices()` : 走訪整個字串
2. `find(...)` : 搜尋一個我們在找的字元
3. `map(...)` : 將 find 的結果透過 map 來操作值城我們要的 : `(start, start + 1)`,其中 `+1` 是因為 `char` 的長度就是 `1`。
並且將 `StrSplit::new(s, &*format!("{}", c))` 換成 `StrSplit::new(s, c)`,得到不用 heap allocate 的程式囉。
:::spoiler {state="close"} 目前程式碼
```rust=
#[derive(Debug)]
pub struct StrSplit<'haystack, D>
{
remainder: Option<&'haystack str>,
delimiter: D,
}
impl<'haystack, D> StrSplit<'haystack, D> {
pub fn new(haystack: &'haystack str, delimiter: D) -> Self
{
Self {
remainder: Some(haystack),
delimiter,
}
}
}
pub trait Delimiter
{
fn find_next(&self, s: &str) -> Option<(usize, usize)>;
}
impl<'haystack, D> Iterator for StrSplit<'haystack, D> where D: Delimiter
{
type Item = &'haystack str;
fn next(&mut self) -> Option<Self::Item>
{
let remainder = self.remainder.as_mut()?;
if let Some((delim_start, delim_end)) = self.delimiter.find_next(remainder) {
let until_delimiter = &remainder[..delim_start];
*remainder = &remainder[delim_end..];
Some(until_delimiter)
} else {
self.remainder.take()
}
}
}
impl Delimiter for &str
{
fn find_next(&self, s: &str) -> Option<(usize, usize)>
{
s.find(self).map(|start| (start, start + self.len()))
}
}
impl Delimiter for char
{
fn find_next(&self, s: &str) -> Option<(usize, usize)>
{
s.char_indices()
.find(|(_, c)| c == self)
.map(|(start, _)| (start, start + 1))
}
}
pub fn until_char(s : &str, c: char) -> &str
{
StrSplit::new(s, c)
.next()
.expect("StrSplit always gives at least one result!")
}
#[test]
fn until_char_test()
{
assert_eq!(until_char("hello world", 'o'), "hell");
}
#[test]
fn it_works()
{
let haystack = "a b c d e";
let letters: Vec<_> = StrSplit::new(haystack, " ").collect();
assert_eq!(letters, vec!["a", "b", "c", "d", "e"]);
}
#[test]
fn tail()
{
let haystack = "a b c d ";
let letters: Vec<_> = StrSplit::new(haystack, " ").collect();
assert_eq!(letters, vec!["a", "b", "c", "d", ""]);
}
```
:::
## char length utf8
[1:23:14](https://www.youtube.com/watch?v=rAl-9HwD858&t=4994s)
`start + 1` 的 `1` 可以改成 `self.len_utf8()` :
```rust=
impl Delimiter for char
{
fn find_next(&self, s: &str) -> Option<(usize, usize)>
{
s.char_indices()
.find(|(_, c)| c == self)
.map(|(start, _)| (start, start + self.len_utf8()))
}
}
```
## Standard library split
[1:25:30](https://www.youtube.com/watch?v=rAl-9HwD858&t=5130s)
今天的實作都在標準函式庫可以看的到更完整的實作 :
* [`str::find()`](https://doc.rust-lang.org/std/primitive.str.html#method.find)
* [`str::split()`](https://doc.rust-lang.org/std/primitive.str.html#method.split)
可以看到標準函式庫要切的字串的生命週期參數也是 `'a`,然後 `delimiter` (這裡是 `Pattern`,可以做更複雜的模式比對)也是實作成泛型。
```
pub fn split<'a, P>(&'a self, pat: P) -> Split<'a, P> ⓘ
where
P: Pattern<'a>,
```
[`Trait std::str::pattern::Pattern`](https://doc.rust-lang.org/std/str/pattern/trait.Pattern.html)
今天的實作都可以直接呼叫標準函式庫的 `split()` 函式來達到分割字串的目的,本次的目的是要探討生命週期的概念,而不是要教你怎麼去用標準函式庫,或讓這個程式碼公開成 crate,因為標準函式庫已經實作很完整了。
## Q&A
[1:27:39](https://www.youtube.com/watch?v=rAl-9HwD858&t=5259s)
Q: Why can't you create a String from the str fat pointer? You already know where the bytes are in memory and the length of it.
A: 因為你不擁有 str fat pointer 指向的記憶體,String 假設它擁有底層的記憶體,它假設當它被卸除值時,必須要釋放它的記憶體,也假設它可以在必要的情況下增減記憶體使用量,如果採用任意指標和長度並決定我現在要擁有它,這將是不正確的,因為值底層記憶體的所有權你並沒有。
Q: Don't you think Rust is kind of less readable than other languages like Go, Python? the syntax is kind of different I guess?
A: 並非如此, 如果你用到跟其他程式一樣的 feature,同樣是 readable 的,Rust 只是額外增加一些 feature 要求額外的語法,也是因為有這些額外的語法,Rust 可以做到一些其他程式做不到的事情。
Q: The pattern and the haystack seem to be sharing the same lifetime 'a
A: 因為 Rust 程式設計師還沒想出設計什麼,就先弄成一樣的。補充 : [`Searcher`](https://doc.rust-lang.org/std/str/pattern/trait.Searcher.html) 的生命週期是 `Pattern` 正在搜尋的字串的生命週期。
Q: when you see something linke Type<'x>, how do you know what x is the lifetime of?
A: 你不知道 `x` 的生命週期,就像你看到 type `T`,你卻不知道 type `T` 是什麼一樣。
Q: what do you think of Rust having a future in the industry?
A: 請參考之前的[演講](https://youtu.be/DnT-LUQgc7s?t=3485)。
Q: could you publish this as a gist so we can play with it?
A: [github repo](https://gist.github.com/jonhoo/2a7fdcf79be03e51a5f95cd326f2a1e8)
Q: Can you make it work with stdin() as an input instead of a &str? :)
A: 不容易去實作,因為它是 stream 而不是 constant,所以它不是你可以 seek in 的,但你可以自己實作看看。
Q: How do you think generic associates types will improve trait definitions? (aka think StreamIterator that allows complex iterators with borrowed items)
A: 不太有幫助,因為你要更多的 [existential types](https://fenix0.com/rust-existential-type/)。但它可以幫助 clone 少一點。
Q: do you intend to do some lectures for newcomers to Rust from other languages and if not, is there some resources/streamers that you would recommend?
A: 沒有計畫去做給新手的影片,這系列的影片都會聚焦在某些主題上,此系列的目標觀眾是中階的 Rust 程式設計師而。
:::success
:pencil2: GitHub Comment
[Q : ](https://gist.github.com/jonhoo/2a7fdcf79be03e51a5f95cd326f2a1e8?permalink_comment_id=3301724#gistcomment-3301724)Sorry if this was already asked elsewhere, but I am still not sure why:
```rust=
if let Some(ref mut remainder) = self.remainder { // (1) ok
let ref mut remainder = self.remainder?; // (2) nok
if let Some(remainder) = &mut self.remainder { // (3) ok
let remainder = &mut self.remainder?; // (4) nok
if let Some(remainder) = self.remainder.as_mut() { // ok
let remainder = self.remainder.as_mut()?; // ok
```
I see that if a type is Copy, it gets copied instead of moved. But aren't we dealing with same types in 1 vs 2, 3 vs 4?
[A : ](https://gist.github.com/jonhoo/2a7fdcf79be03e51a5f95cd326f2a1e8?permalink_comment_id=3302571#gistcomment-3302571)The `?` in 2 and 4 copies the entire `Option` when the inner type is `Copy`. What we then take a mutable reference to is what is inside of that copy, not the original Option. In the last case, we turn `Option<T>` into `Option<&mut T>`, which, when copied, still yields a mutable reference into the original Option. Does that help?
[Q : ](https://gist.github.com/jonhoo/2a7fdcf79be03e51a5f95cd326f2a1e8?permalink_comment_id=3691447#gistcomment-3691447)Hey @jonhoo, how common is it to see or do:
```rust=
impl Trait for &Type {...} //?
// such as
impl Delimiter for &str {...}
```
Additionally, if you intend to use both Type and &Type as implementers of Trait, for instance:
```rust=
let x1: Vec<_> = StrSplit::new(haystack, MyOtherDelimiterFlavor::new(...)).collect();
let x2: Vec<_> = StrSplit::new(haystack, &MyOtherDelimiterFlavor::new(...)).collect();
```
would one then need to write redundant impl blocks, or is there some syntactic sugar for controlling this?
[A : ](https://gist.github.com/jonhoo/2a7fdcf79be03e51a5f95cd326f2a1e8?permalink_comment_id=3692966#gistcomment-3692966)It's actually fairly common, precisely to improve the ergonomics of using the trait as you indicate. In general, if you implement a trait that only takes `&self` it's pretty reasonable to implement the trait for `&MyType`, `&mut MyType` and `Box<MyType>`. If a trait method takes `&mut self`, skip `&MyType`. But of course, the downside is that now if the trait changes in the future, more breakage will ensure since your consumers expected to be able to transparently use `&` (or `&mut`).
:::
## 待整理
1. ~~0:10:22~~
2. ~~0:18:55~~
3. ~~0:24:43~~
3. ~~為什麼 `Option` 內的 `&'a str` 會是 Copy 型態?~~