# The Block I/O Layer
主要分別為存取到底是不是randomly的
(whether the device can seek to one position from another)
* Block devices (random)
* random access of fixed-size chunks of data(block)
* hard disk
* floppy drives
* blu-ray readers
* flash memory
* character devices (serial)
* accessed as a stream of sequential data, one byte after another.
* serial ports
* keyboards
block device比char divice複雜得多
# Anatomy of a Block Device
* block device
* sector : 最小的位址單元
* 512bytes最常見
* 要看實際HW設計
* 不能在往下提取更小的單元
* 但同時間可提取多個sectors
* block
* software的最小邏輯位址單元
* filesystem的抽象化
* filesystems can be accessed only in multiples of a block
* 雖然disk上最小的執行單元是sector,但kernel總是用blocks來做各種操作
* block size need to be a multiple of a sector
* and need to be power of 2
* block還被規定不能比page size大
* 綜上所述
* Common block sizes are
* 512 bytes
* 1k bytes
* 4kbytes
有些人常常會創造出很多名詞讓人非常迷惑,比如說把sectors叫做hard sectors或 "device blocks",然後把blocks叫做filesystem blocks或 I/O blocks,這個章節不會這樣叫
sectors就是sectors, blocks就是blocks
sector得重要性在於,所有的 device I/O都是以sectors為基礎完成的,the higher-level concept used by the kernel--blocks--is built on top of sectors.

# Buffers and Buffer Heads
Buffer Heads的目的就是把memory buffer跟 disk blocks之間的關係描述清楚
When a block is stored in memory—say, after a read or pending a write—it is stored in a
> buffer
>
* buffer
* 每個buffer 1對1 associated with one block
* buffer就代表disk block in memory
* 前面有提到a block is several sectors but no more than a page size
* a page can hold one or more blocks in memory
* buffer descriptor
* buffer head
* struct buffer_head
* <linux/buffer_head.h>

* b_state
* the state of this particular buffer
* the legal flags are stored in the bh_state_bits enum


* bh_state_bits
* 
* 
* b_count
* 
physical block on disk跟 buffer in memory的關聯用b_bdev的b_blocknr-th logical block來描述
kernel 2.6之前這個structure是主要的,但有缺點而且kernel比較prefer用Page來工作,buffer head的最小單位是blocks這樣很沒效率,所以kernel 2.6之後 block I/O的控制全部都直接透過pages跟address space了,在16章 The Page Cache and Page Writeback有討論
* address_space
* pdflush daemons
還有一個缺點是,buffer head只描述了一個單一的buffer. 當在做I/O operations時, kernel只能把一個大的block I/O operations切成好幾個buffer_head來做事,This results in needless overhead and space consumption,所以下面討論改良版
# The bio Structure
* bio structure
* basic container for block I/O in kernel
* <linux/bio.h>
* 他把block I/O operations切成 a list of segments
* a segment is a chunk of buffer that is contiguous in memory
* 所以個別的buffer不用在memory裡是連續的
* (阿不就是多了linklist來描述的好處....)
* Vector I/O such as this is called
* scatter-gather I/O

bio structure的主要目的是代表 in-flight(執行中的動態) block I/O operation
most important fields are
* bi_io_vec
* bi_vcnt
* bi_idx

# I/O vectors
* bi_io_vec
* points to an array of bio_vec structures
* bio_vec
* vector of the form <page, offset, len>
* 描述segment
* 是放在哪個physical page裡

總結來說
* 一個block I/O request可以用bio structure表達
* 每個request包含了一個到數個的blocks
* 被bio_vec用array的形式存起來
* bio_vec表達了segment跟physical page的對應
# Request Queues
see <linux/blkdev.h>
Block devices用 request queues來存放pending的block I/O requests.
* block device driver負責處理相關事宜
* 空了滿了要怎樣沙小的
* each element in queue are request
* 用 struct request表達
* 雖然disk的blocks必須是相鄰的,但放到memory後不用了(前面有提到用linked list串起來了)
* each bio structure can describe multiple segments
* each request can be composed of multiple bio structures
ex:
* request
* bio1
* segment1
* segment2
* segment3
* bio2
* bio3
# I/O Schedulers
sending out requests to the block devices in the order that the kernel issues them.
因為disk的seek動作太慢,kernel不直接送block I/O requests to the disk.kernel 用merging and sorting來改善performance(I/O Schedulers的工作)
* process scheduler
* 把工作抽象化成process
* I/O Scheduler
* 把工作抽象化成block I/O requests
# The Job of an I/O Scheduler
管理block device's request queue.
有點像process scheduler
* decides the order of requests in the queue
* and what time each request is dispatched to the block device
* 終極目標就是要減少使用disk seeks的時間
* results in greater global throughput
2 primary actions to minimize seeks
* Merging
* 把多個requests合併成一個
* 考慮一個例子,某個filesystem現在送了一個request給queue, 說從一個file去read a chunk of data好了
* scheduler會去找queue裡面是不是有鄰近的sector的request,有就把它們合併成同一個,總之就是減少disk seek time的次數,有相鄰就一次連續的讀出來就好。(移動磁頭是最耗時間的)
* Sorting
* 如果沒有可以合併的,把他丟到queue的最尾端等待處理
這有點像電梯演算法
This is similar to the algorithm employed in elevators—elevators do not jump all over, wildly, from floor to floor. Instead, they try to move gracefully in a single direction.
When the final floor is reached in one direction, the elevator can reverse course and move in the other direction. Because of this similarity, I/O schedulers (or sometimes just their sorting algorithm) are called elevators.
# The Linus Elevator
The Linus Elevator performs both merging and sorting, When a request is added to the queue
* first checked against every other pending request to see whether it is a possible candidate for merging
* front merging
* If the new request immediately proceeds an existing request
* back merging
* if the new request immediately precedes an existing request
沒辦法merge再insert進queue內,插入要參考sector等等的資訊插的聰明點
# The Deadline I/O Scheduler





# The Anticipatory I/O Scheduler



# The Complete Fair Queuing I/O Scheduler

# The Noop I/O Scheduler


# I/O Scheduler Selection

隨時可以在command line 切換 I/O scheduling演算法