fio
profiler
htop
iostat
perf
strace
ztex
Fio spawns a number of threads or processes doing a particular type of I/O action as
specified by the user
job files
#I/O type
Defines the I/O pattern issued to the file(s). We may only be reading sequentially from this file(s), or we may be writing randomly. Or even mixing reads and writes, sequentially or randomly. Should we be doing buffered I/O, or direct/raw I/O?
#Block size
In how large chunks are we issuing I/O? This may be a single value, or it may describe a range of block sizes.
#I/O size
How much data are we going to be reading/writing.
#I/O engine
How do we issue I/O? We could be memory mapping the file, we could be using regular read/write, we could be using splice, async I/O, or even SG (SCSI generic sg).
#I/O depth
If the I/O engine is async, how large a queuing depth do we want to maintain?
Target file/device
How many files are we spreading the workload over.
Threads, processes and job synchronization
How many threads or processes should we spread this workload over.
; -- start job file --
[random-writers]
ioengine=libaio
iodepth=4
rw=randwrite
bs=32k
direct=0
size=64m
numjobs=4
; -- end job file --
上面代表, 用 libaio (asynchronous), 4 個 io unit, 做 random write, blocksize 32k, buffered IO, 打滿 size 就停, 共 4 個 jobs fork() 出來
ztex
libaio | pysnc | |
---|---|---|
CPU% (htop) | 106 | 37.9 |
MEM% (htop) | 10.8 | 10.8 |
IPOS | 34.4 | 18.6 |
wMiB/s | 134 | 72.7 |
libaio | pysnc | |
---|---|---|
IPOS | 34.4 to 45.0 | 18.6 to 18.6 |
wMiB/s | 134 to 180 | 72.7 to 72.7 |
ioengine | times |
---|---|
libaio | 1424 |
pysnc | 372 |
non-buffered | buffered | |
---|---|---|
IPOS | 34.4 | 173 |
wMiB/s | 134 | 676 |
ioengine | times |
---|---|
non-buffered | 785 |
buffered | 11,517 |
1k | 4k | 4M | |
---|---|---|---|
IPOS | 37.3 | 37.2 | 0.116 |
wMiB/s | 36.4 | 145 | 464 |
ioengine | times |
---|---|
1k | 2337 |
4k | 818 |
4M | 44 |
1k | 4k | 4M | |
---|---|---|---|
nanosleep | 7738/2967 | 7060/2965 | 291/2874 |
shmdt | 5000/1 | 5000/1 | 52000/1 |
可以看出來 block size 越大, sleep 時間越短, 需要花更多時間在 share memory mapping 處理
欄位表示: (usec/call) / (call)
如: 7738/2967 表示該 system call
被呼叫了 2967 次, 平均每次呼叫花 7738 usec
ztex
sparse file 為了節省空間, 只會在 header 標示檔案大小
fallocate file 會向 disk 要空間, 但不會清空, 只是標記為 uninitialized 所以在 allocate 的時候很快速, 等要操作時在處理
效能來說 regular > fallocate > sparse, 因為 fallocate 要處理 mark; sparse 需要處理 metadata.
sparse file, fallocated file 隨著 寫入次數 增加, 效能也有所變化
io request 次數雖然 sparse 比較多, 但 IOPS 比較小, 這是因為許多 io requests 會拿去寫 meta data 實際寫入的 io 不多
rw 從 write 改成 read
interactive process viewer
即時看到 process 造成的 system loading. 包括, cpu utilization, memory usage 等等.
我自己覺得這個很適合作為第一步, 看哪個 process 造成比較大的資源消耗
Q: 上面cpu使用率,各個顏色的意義?
Q: load average代表的意義?
Q: memory欄位可以看出甚麼?
Q: process各欄位可以提供甚麼資訊?
PID: process ID number.
USER: The process’s owner.
PR: The process’s priority. 數字越小 priority 越高
NI: The nice value of the process, which affects its priority.
VIRT: How much virtual memory the process is using.
RES: How much physical RAM the process is using(KB).
SHR: How much shared memory the process is using.
S: The current status of the process (zombied, sleeping, running, uninterruptedly sleeping, or traced).
%CPU: The percentage of the processor time used by the process.
%MEM: The percentage of physical RAM used by the process.
TIME+: How much processor time the process has used.
COMMAND: The name of the command that started the process.
monitoring system input/output device loading by observing the time the devices are active in relation to their average transfer rates.
iostat -x 1
: shows detail, refresh every second
用於分析各裝置 io 的效能
Interpretations:
explan @nice and @steal
@nice - change process priority
@steal
perf 裡面有三種事件: 由硬體 ( cpu 支援) PMU 發起的 event; 由 kernel process 發起的; 在 Linux kernel source code 留下的 hook 觸發的事件;
(note) 可在 include/trace/events/block.h 找到相關 events
ztex
perf record -g -a $1 $2 # Run a command and record its profile into perf.data
perf script > /tmp/perf.raw.events # Read perf.data (created by perf record) and display trace output
stackcollapse-perf.pl /tmp/perf.raw.events > /tmp/perf.events.folded
flamegraph.pl /tmp/perf.events.folded > perf.events.svg
Flame graph 基於一些 profiler (perf_event, Dtrace, …) 的輸出, 將 call stack 視覺化.
X-axis 不是按照時間排序的, 左到右是依照字母排序的, 寬的 box 代表該函式呼叫在 stack 中的頻率越大.
Y-axis 是 stack depth, root 在底, leaf 在 top. 所以假設 a 在 b 下面, a 是 b 的 parent.
從 stackcollapse-perf.pl, 可以看出來, 靠著分析 raw event 將 call stack parse 成 single line. 這也是為甚麼時間會消失, 取而代之的是頻率.
ztex
-c
--summary-only
Summary:
htop: 可以即時看到 process 對系統的 loading, 包括 cpu utilization, memory loading … etc
iostat: 可以即時看到 device 的 io 效能, 包括目前的讀寫效率. io queue 大小, io request issue 的時間 等等 (utilization 代表意義)
perf: 可以分析 cpu 的效能表現, 包括 event 的發生頻率, 比如 cache misses 的發生頻率. 搭配 flame graph 可以視覺化 call stack 的情形, 包括某個 call 的 sample 占比.
strace: 追蹤 signals 跟 system calls. 比如可以先用 -c 看哪個 system call 很花時間, 值得追蹤. 在用 -i 去追蹤 每個 system call 花費的時間來找 bottleneck.
ztex
see:
嘗試一次讀取比較多個 block
ztex