分析「快慢指標」

在 LeetCode 上，「快慢指標」(fast and slow pointers) 是常見處理鏈結串列的手法，不過以「快慢指標」的方式來取得鏈結串列的中點，是否較「單一指標」走訪二遍能更快呢？其實二者具備相同的時間複雜度 \(O(n)\)，本文嘗試從編譯器最佳化和記憶體階層架構，觀察二者的表現。

對比演算法的 C 語言描述
首先，作為鏈結串列節點的結構體為：

struct list_node {
    int val;
    struct list_node *next;
};

「快慢指標」方式的 C 語言描述為：

struct list_node *middle_node(struct list_node *head)
{
    struct list_node *slow, *fast;
    slow = fast = head;
    while (fast && fast->next) {
        slow = slow->next;
        fast = fast->next->next;
    }
    return slow;
}

圖示如下：

digraph "Circular Linked List" {
    rankdir = "LR";
    node1[shape = circle, label = "1"];
    node2[shape = circle, label = "2"];    
    node3[shape = circle, label = "3"];
    node4[shape = circle, label = "4"];
    node5[shape = circle, label = "5"];
    
    node0__[shape = none, label = "fast", fontcolor = "blue"];
    node1__[shape = circle, label = "1", style = invis];
    node2__[shape = circle, label = "2", style = invis];    
    
    node0_[shape = none, label = "slow", fontcolor = "red"];
    node1_[shape = circle, label = "1", style = invis];     
    node0_ -> node1_;
    node0__ -> node1__ -> node2__;
    
    node1 -> node2;
    node2 -> node1;
    node2 -> node3;
    node3 -> node2;
    node3 -> node4;
    node4 -> node3;
    node4 -> node5;
    node5 -> node4;
}

「單一指標」方式的 C 語言描述為：

struct list_node *middle_node(struct list_node *head)
{
    struct list_node *cur = head;
    int n = 0;
    while (cur) {
        ++n;
        cur = cur->next;
    }
    int k = 0;
    cur = head;
    while (k < n / 2) {
        ++k;
        cur = cur->next;
    }
    return cur;
}

假設有一條長度為 9 個節點的鏈結串列，記憶體存取的次數為:

使用「快慢指標」方式，快指標完整的走訪鏈結串列 9 個節點，慢指標走訪前 5 個節點
使用「單一指標」方式，首先走訪鏈結串列一次共 9 個節點，再逐一走訪至中點，亦即 5 個節點

兩種方式都涉及對鏈結串列的 \(9 + 5 = 14\) 個節點的記憶體位置進行存取，二者的時間複雜度皆為：
\[ O(n) + O\left(\frac{n}{2}\right) = O\left(\frac{3}{2}n\right). \]

接著，對照上述二者的編譯器輸出。用 GCC 分別編譯上述二個程式後，可得以下組合程式碼:

「快慢指標」方式的組合程式碼：

    movq	%rdi, %rax
	jmp	.L2
.L4:
	movq	8(%rax), %rax ; fast = tmp->next
	movq	8(%rdi), %rdi ; slow = slow->next
.L2:
	testq	%rax, %rax    ; 判斷 fast == 0 ？ 亦即判斷 fast == NULL ？
	je	.L3 ; 若 fast == NULL 則跳躍至 .L3
	movq	8(%rax), %rax ; 把 tmp = fast->next
	testq	%rax, %rax ; 判斷 tmp == NULL ?
	jne	.L4 ; 若 tmp 不為 NULL，則跳躍至 .L4
.L3:
	movq	%rdi, %rax ; 把返回值複製至 %rax
	ret

「單一指標」方式的組合程式碼：























	movq	%rdi, %rax ; cur = head
	movl	$0, %edx   ; n = 0
	jmp	.L2
.L3:
	addl	$1, %edx      ; ++n
	movq	8(%rax), %rax ; cur = cur->next
.L2:
	testq	%rax, %rax ; cur == NULL ?
	jne	.L3        ; 若 cur 不為 NULL，則跳躍至 .L3
	movl	$0, %ecx   ; k = 0
	jmp	.L4
.L5:
	addl	$1, %ecx ; ++k
	movq	8(%rdi), %rdi ; cur = cur->next
.L4:
	movl	%edx, %eax ; int tmp = n
	shrl	$31, %eax  ; tmp = tmp < 0 ? 1 : 0
	addl	%edx, %eax ; tmp +=n
	sarl	%eax       ; tmp >>= 1
	cmpl	%ecx, %eax ; tmp - k
	jg	.L5
	movq	%rdi, %rax
	ret

「單一指標」方式的組合程式碼中，第 16 行至第 19 行的指令是為了計算 n / 2 的結果，因為除數是 2 的冪，於是除法可取代為位移運算，即使開啟更積極的編譯器最佳化 (如 -O3)，上述的程式碼沒有顯著的指令差異。

排除編譯器最佳化的因素後，我們來看現代處理器的影響。在具有良好時間局部性 (temporal locality) 的程式中，被參照過一次的記憶體位置可能不久後仍會被多次參照；在具有良好空間局部性 (spatial locality) 的程式中，若一個記憶體位置被參照過，那程式很可能在隨後參照其鄰近的記憶體位置。

從空間局部性來看：鏈結串列中的節點儲存的位置並無規律，亦即每個節點在記憶體中分散地保存，都具有較差的空間局部性，於是可推論二者演算法的空間局部性表現相同。

從時間局部性來看：

「快慢指標」方式的迴圈中，鏈結串列的前半部分中，快指標存取完成後、迴圈結束前，慢指標已存取相同的節點．同一個節點被二次存取的時間間隔較近。
「單一指標」方式的迴圈中，鏈結串列的前半部分中，同一個節點的第二次存取在第一次走訪完成之後，同一個節點被兩次存取的時間間隔較遠。

於是，「快慢指標」方式具有更好的時間局部性，而「單一指標」方式的第二次走訪鏈結串列時，可供其利用的「前半段鏈結串列中的節點」的快取更容易被鏈結串列的「後半段鏈結串列中的節點」的快取所替換，由此將更頻繁的觸發快取錯失 (cache miss)。更甚者，若鏈結串列的節點數量增大，則在相同的快取中能快取的節點更少，快取的替換將更頻繁，「單一指標」方式相較於「快慢指標」方式，蒙受更大的效能衝擊，因為前半段的鏈結串列的節點的快取更容易被替換。

延伸閱讀: 現代處理器設計: Cache 原理和實際影響

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.