突破盲點的 bash 使用技巧

bash 是 linux 下最通用的互動式 shell。在有圖形界面以前，shell 就是 unix 使用者認知中電腦的全部。本課程將會介紹多種冷門的 bash 使用技巧，包含迴圈、多工、互動使用技巧、腳本撰寫；帶領聽眾重新理解 shell 的設計哲學。其中腳本會以 sh 為主，而互動式技巧則會以 bash 為主。

講者介紹： gholk

linux 使用者，興趣使然的 web 開發者。
宣稱只寫 vanilla js ，但其實只是懶得學框架。
自認為只是 linux 使用者，所以堅持能用 shell 解決的事情就不動到 c 、 js 等一般語言。
非本科生，未來應該不是靠程式過活，所以可以任性地把自己定位為使用者。

shell 與通用程式語言

一般完整的程式語言特性是，內部功能完整，但與外部其它程式溝通困難。
shell 則是設計用來呼叫、組合所有其它程式的語言。

二者間的取捨

用通用程式語言設計的程式，應該行為單純，只專注在主要的功能。
在使用時再經由 shell 的包裝，配合 萬用字元 、管道、迴圈等達成各式各樣的功能。
（因為之前在 node.js 裡處理參數的時候寫得很煩。）

shell 的理念

每個程序都有 stdin stdout stderr ，可以傳入 argv ，繼承環境變數。 shell 的核心功能就是呼叫、組合各程序，達到目的。

除了依賴外部程序， shell 中也能自定義函數、 括號命令群組 ，函數也像程序一樣有 std stream argv 環境變數等功能，而命令群組一樣有 std stream ，但缺少 argv 。

截取 stream 的一部份

cat some-stream | {
    head -c 8 >/dev/null # 丟掉開頭的 8 byte
    foo=$(head -c 4) # 讀取 4 byte
    # do something
    echo $(basename $foo)
    cat # 把剩下的輸入原封不動丟進輸出
} > result # 把結果存起來

在行處理上不適用

以行來處理會漏資料，我不知道為什麼。

yes hello | nl | {
    head -n 1 >/dev/null # 丟掉開頭一行
    foo=$(head -n 1) # 讀取一行
    echo $foo
}

只能回去用 read

yes hello | nl | {
    read n
    for i in `seq 1 n`
    do
        read line
    done
}

管道的縮排應該怎麼寫？

管道符為最後一個字元時，可以省略跳脫結尾換行符的反斜線。

awk '{sum += $1; print sum}' < file |
    sed -n '/[13579]$/p' |
    cut -d ' ' -f 2-4 |
    sort -n

也有些人覺得應該要寫清楚，顯式手動跳脫比較好 ~~，就像某些 js 開發者硬要加分號一樣~~ 。

cat file |\
    awk '{sum += $1; print sum}' |\
    sed -n '/[13579]$/p'

那就摻在一起作成撒尿牛丸啊！ 這好像是 gnu 風格的縮排？不確定從哪看來的，但目前我覺得這種寫法最美觀。

cat file \
| awk '{sum += $1; print sum}' \
| sed -n '/[13579]$/p'

為什麼要用 cat

cat 的好處，統一格式，一律把輸入的檔案放在最開頭。

cat file \
| awk '{sum += $1; print sum}' \
| sed -n '/[13579]$/p' \
| cut -d ' ' -f 2-4 \
| sort -n

如果不用 cat ，重道向的檔案慣例是寫在第一個命令的結尾處，後面跟著的命令同樣位置卻是空的，造成不協調感。

awk '{sum += $1; print sum}' <file \
| sed -n '/[13579]$/p' \
| cut -d ' ' -f 2-4 \
| sort -n

重導向符的位置

但其實重導向符的位置不一定要在結尾，要擺開頭、中間、結尾都可以。

awk '{sum += $1; print sum}' < file \
| sed -n '/[13579]$/p'

<file awk '{sum += $1; print sum}' \
| sed >result -n '/[13579]$/p'

>df.log df

所以可以寫成這樣，夠反人類吧？有沒有忽然覺得 cat 比較好看了？

< file \
    awk '{sum += $1; print sum}' |
    sed -n '/[13579]$/p' |
    cut -d ' ' -f 2-4 |
    sort -n

如果你記得命令群組的用法，那也可以用在這裡：

{
    awk '{sum += $1; print sum}' |
    sed -n '/[13579]$/p' |
    cut -d ' ' -f 2-4 |
    sort -n
} <file >result

有點像是沒有宣告成函數的寫法：

do_something() {
    awk '{sum += $1; print sum}' |
    sed -n '/[13579]$/p' |
    cut -d ' ' -f 2-4 |
    sort -n
}

do_something <file >result

同一個檔案不能同時是管道的輸入與輸出

因為 > 會清空檔案內容，讓輸入讀不到資料。

do_something <file >file

組合出一對多的管道

在複雜的情況時，會需要組合多個輸入輸出；所以為了模擬複雜的情況，介紹幾個程式：

paste

逐行合併二個檔案

~$ seq 0 2 8 > even
~$ seq 1 2 9 > odd
~$ paste even odd
0       1
2       3
4       5
6       7
8       9

座標轉換程式 proj4

<lonlat.txt awk '{print $2,$3,$4}' \
| proj -f %.4f +proj=utm \
> utm.txt

stdin 只有一個

有時，我們想把多個結果併在同一個檔案裡，可以用 paste 把二個檔案的每一行串接起來。

<xyz.txt cs2cs -f %.4f +proj=cart +to +proj=utm \
| paste xyz.txt - >xyz-utm.txt

但如果想要一次併好幾個檔案呢？

(
<xyz.txt cs2cs -f %.4f +proj=cart +to +proj=utm
<xyz.txt cs2cs -f %.4f +proj=cart +to +proj=lonlat
) | paste - - # ?????

很遺憾，你只有一個 stdin 可以用。

process substitution

這個功能可以讓你把某個命令當作檔案寫入或讀取。寫入是 >( ) ，讀取是 <( ) 。

diff <(tar -xOf v1.tar main.c) <(tar -xOf v2.tar main.c)

進程替換其實就是把該部份語法，置換成連結到該命令的管道。

~:$ printf 'process substitution: "%s"\n' <(true)
process substitution: "/dev/fd/63"
~:$ ll >(true)
l-wx------ 1 gholk gholk 64  8月  9 11:39 /dev/fd/63 -> pipe:[109154]
~:$ ll <(true)
lr-x------ 1 gholk gholk 64  8月  9 11:39 /dev/fd/63 -> pipe:[109168]

那結合二者，就能把多個程序的輸出結合在一起了！

paste \
<( <xyz.txt awk '{print $2,$3,$4}' \
   | cs2cs -f %.4f +proj=cart +to +proj=utm
) \
<( <xyz.txt awk '{print $2,$3,$4}' \
   | cs2cs -f %.4f +proj=cart +to +proj=lonlat
)

只是有個小問題，process substitution 是 bash 的專屬功能，sh 不支援。

tee and fifo

所以如果要在 sh 中執行，那得再介紹 fifo。 fifo 搭配 tee 就能實現一到多的管道。

fifo

mkfifo my-fifo
echo hey >my-fifo &
cat my-fifo

start=>start: start
xyz=>parallel: file with name and xyz coordinate
awk=>operation: awk extract xyz part
tee=>parallel: tee dulpicate xyz
pll=>operation: proj xyz to lonlat
putm=>operation: proj xyz to utm
paste=>operation: paste merge
result=>end: file with name, xyz, utm, lonlat

start(right)->xyz
xyz(path1, bottom)->awk->tee
tee(path3, left)->pll->paste
tee(path2, bottom)->putm->paste
xyz(path3, right)->paste
paste->result

fifo_list='xyz2lonlat xyz2utm lonlat utm'
mkfifo $fifo_list

# 對 fifo 的寫入在開始讀取前會阻塞所以要丟到背景
awk '{print $2,$3,$4}' <xyz.txt | tee xyz2lonlat > xyz2utm &
cs2cs -f %.8f +proj=cart +to +proj=lonlat <xyz2lonlat >lonlat &
cs2cs -f %.4f +proj=cart +to +proj=utm <xyz2utm >utm &

paste xyz.txt lonlat utm > all.txt
rm $fifo_list

fifo 其實是解決輸入輸出流沒有名字的問題。因為 paste 需要多個輸入，但只用 stdin 只有一個，所以用多個 fifo 來連接輸入輸出。
要實現一對多，最關鍵還是 tee；其實 tee 也可以用 >( ) pipe 給多個程式。

tee 與 fifo 的阻塞

tee 與 fifo 都是會阻塞的操作。對 fifo 讀取或寫入時，如果沒有其它程式同時在寫入或讀取，就會阻塞。

tee 也是會阻塞，如果 tee 寫入的任一個程序或是檔案阻塞，那 tee 所有的輸出都會阻塞。

server 當自己家用

ssh 不一定是執行一個互動式的 shell ，也可以直接執行命令。

ssh user@my.lab.ml ls

如果有一台以上的電腦，但某些程式只裝在特定一台，可以用 ssh 幾乎無縫接軌取用。

cat xyz.txt \
| ssh user@my.lab.ml cs2cs +proj=cart +to +proj=lonlat

善用別名

ssh 可以用別名代表一台伺服器，不然每次都要打一長串帳號域名，一點都不像自己家。

# ~/.ssh/config
# see `man ssh_config`
Host lab
HostName my.lab.ml
User user

cat xyz.txt \
| ssh lab cs2cs +proj=cart +to +proj=lonlat

自動壓縮或手動壓縮

如果資料量比較大時，可以考慮壓縮加速：

gzip --to-stdout xyz.txt \
| ssh lab zcat - \
       \| cs2cs +proj=cart +to +proj=lonlat \
       \| gzip - \
| zcat -

每次都手動壓也很累，不如直接在 ssh 層使用自動壓縮吧：

# ~/.ssh/config
# global
Compression yes

# or only compression in host
Host lab
HostName my.lab.ml
Compression yes

善用 stdin

但 ssh 執行遠端電腦上的程式的問題是，只有 stdin 能用。如果真得要傳複數檔案，可以考慮用 tar 來打包，但就得執行一長串命令來解開再打包回來了；還要注意不要丟任何東西到 stdout，不然回來的 tar 會壞掉。（只能靠 stderr 來 log 了。）

傳送單個 tar 封存：

tar -cf - file1 file2 | ssh lab tar -xf -

傳送 tar 封存並執行達端程式，再把結果用 tar 送回來：

tar -cf - x y z | ssh lab '
tar -xf -
paste x y z | cs2cs +proj=cart +to +proj=lonlat > lonlat
tar -cf - x y z lonlat
rm x y z lonlat
' > result.tar

ssh key

只是你可能需要用 ssh-key 免密碼登入，才能達到全家就是你家的方便等級。如果怕安全問題，可以手動修改 server 上的 ~/.ssh/authorized_keys ，用完就註解掉該行。

ssh control master

或是用 ssh control master 登入一次後就複用既有的 ssh 連線。除了在連線存在期間不用重覆登入外，還可以省下建 tcp 連線的時間。

~:$ time ssh lab true
real    0m1.123s
user    0m0.032s
sys     0m0.032s
~:$ # using ssh control master
~:$ time ssh lab true
real    0m0.030s
user    0m0.004s
sys     0m0.008s

# ~/.ssh/config
Host lab
HostName my.lab.ml
User gholk
ControlMaster auto
ControlPath ~/.ssh/ssh-control-master-%r@%h:%p
ControlPersist 300

當然，第一次登入還是要密碼，所以如果要完全自動化，還是得用 ssh key。

只是用 control master 之後，不管同時執行了幾個 ssh ，都只會跑在同一個 ssh 連線上。

ssh -N lab # login with password then C-z to background
bg # run previous command in background
for i in *
do
    # do something with ssh
    cat $i | sed | ssh lab proj | awk >result-$i
done
kill %% # kill ssh -N process

用 disown 讓程式在登出後繼續運行

(
for tar in *.tar
do 
    tar -xf $tar summary
    mv summary $(basename $tar .tar)-summary
done
) >extract-tar.log 2>&1 &
disown -h %%
exit

tail -f extract-tar.log
# or with modern command `less`
less +F extract-tar.log

放棄難用的 nohup

nohup 只能用在執行檔上。但如果你要用 sh 就另當別論了， disown 只能在 bash 中使用。

其實在迴圈的 done 後面直接加 & 就能丟到背景了，

while sleep 1s
do
  echo dont sleep
done &

關於 & 的意義，其實比較像 ; 。

echo a & echo b
echo a ; echo b

加括號是有時要一次執行好幾個命令，要全部丟到背景就可以用括號包起來成群組，再把群組丟到背景。

(
   uncompress NCTU0010.19d.Z
   crx2rnx NCTU0010.19d > NCTU0010.19o
   xz NCTU0010.19o
) &

迴圈寫法

有些命令適合放 while 後面。

find -name '*.jpg' | while read file
do
    cp $file /tmp
    # some other code
done

比較醜，但比較好理解的寫法。

find -name '*.jpg' | while true
do
    read file
    if [ -z "$file" ]
    then break
    fi
    # some other code
done

我也常用 sleep ：

while sleep 1s
# or use `true`
while true
do
    sleep 1s
done

平行處理壓縮時間

for gzip in *.gz
do
    gzip -d $gzip
    compute-some-thing ${gzip%.gz}
done

同時處理二個

for gzip in *.gz
do
    gzip -d $gzip
    wait
    compute-some-thing ${gzip%.gz} &
done

我有一個大膽的想法

for gzip in *.gz
do
    (
    gzip -d $gzip
    compute-some-thing ${gzip%.gz}
    ) &
done

比較保險的平行處理，直接跑二個迴圈。

for gzip in *[13579].gz
do
    gzip -d $gzip
    compute-some-thing ${gzip%.gz}
done &

for gzip in *[24680].gz
do
    gzip -d $gzip
    compute-some-thing ${gzip%.gz}
done &

編輯以前的命令並執行

很多人應該都知道可以用 ↑ ↓ 來執行以前的命令。

在歷史記錄中搜尋

另一個更好用的是 readline 的 C-r (reverse search) 可以搜尋以前的命令，不用 C-b 一個個找按半天。

C-r
輸入要搜尋命令包含的字串
每輸入一個字，會即時顯示搜尋到的結果
按 enter 執行，按 ← → 編輯，再按一次 C-r 換下一個匹配的， C-c 取消。

fc: fix command

這個命令可以開啟編輯器編輯上一個執行的 shell 命令，編輯完離開後就會執行。可以用選項控制要編輯第 n 個命令，或是像 C-r 一樣搜尋。

但其實 fc 不好用，因為不太可能記住要改的命令是第幾個，而且搜尋匹配的第一個結果不一定是你印象中的。

直接開啟編輯器

另一個 readline 快捷鍵是 C-x C-e (edit command in editor) ，是直接開啟編輯器編輯目前打到一半的 shell 命令。主要用在命令打得很長的時候，只靠 shell 基礎的功能編輯會很痛苦。

編輯器同 fc 預設都是 vi 或看 EDITOR 變數。所以請至少有一個能在 shell 中使用的編輯器。

取代 fc 的功能

搭配 C-r 搜尋的話，就搜到了再按 C-x C-e 編輯即可，互動搜尋的效果會比 fc 盲搜的結果好很多。

為什麼多行命令被壓成一行？

問題在 cmdhist 與 lithist 這二個選項。

shopt -s cmdhist # save multiple line command in single history entry
                 # but join in single line with `;`

shopt -s lithist # keep `\n` instead use `;`

多行命令

~:$ for i in `seq 2 6`
> do
> echo $i
> done
2
3
4
5
6

only enable cmdhist

~:$ for i in `seq 2 6`; do echo $i; done

long multi-line command

~:$ for id in $(tail -n +2 csrs.id); do echo $id; curl-csrs-ppp get $id > $id.zip; basename=$(basename $(unzip -l $id.zip | awk '$4 ~ /mari.*pdf/ { print $4 }') .pdf); mv $id.zip $basename.19o.zip; unzip $basename.19o.zip $basename.csv; sleep 20s; done

enable lithist

~:$ for id in $(tail -n +2 csrs.id)
do echo $id
curl-csrs-ppp get $id > $id.zip
basename=$(basename $(unzip -l $id.zip | awk '$4 ~ /mari.*pdf/ { print $4 }') .pdf)
mv $id.zip $basename.19o.zip
unzip $basename.19o.zip $basename.csv
sleep 20s
done

lithist 故障

lithist 有時候會壞掉，多行命令會被分開成一行一行，可能是因為舊的歷史檔案 ~/.bash_history 格式亂掉。像如果歷史檔案的大小超過限制會被截斷，格式就會亂掉。修正或直接刪掉就會正常。

edit function, alias, script

https://github.com/GHolk/loco/blob/master/bash_function#L78

在腳本中啟動另一個子 shell

當你需要在腳本內在另一支程式內執行一系列命令，一般是要寫到另一個檔案直接執行。

sftp -b batch.sftp remote-server

但有時候不希望多一個檔案，管理起來會很麻煩，會想要都寫在同一個同案裡。

here doc

比較直覺也比較保險的作法是用 heredoc，但缺點是變數會被展開，可能需要跳脫。（如果你的 IDE 有跳脫的快速鍵就不成問題。）

#!/bin/sh

rinex=NCTU0010.19o
docker exec --interactive --tty gxh bash <<GUEST

. /usr/local/GipsyX/rc_GipsyX.sh

rinex=inside-docker
echo $rinex # NCTU0010.19o
echo \$rinex # inside-docker

gd2e.py -rnxFile $rinex

GUEST

用 tail 抓出自己的內容

主要是用 $0 會指向檔案本身的技巧，事先算好行數，但 $0 會存完整路徑是 bash 的擴充， sh 中 $0 只會存最終檔名。

#!/bin/sh
a=b
tail +5 $0 | su -l guest -c sh
exit

whoami # guest
a=c
echo $a
file=$(echo *.*)

這是 debian 裡 grub-mkconfig 的做法，因為 grub 腳本中用到了大量的變數，如果一一跳脫可讀性會很差。

#!/bin/sh
exec tail -n +3 $0
# This file provides an easy way to add custom menu entries.  Simply type the
# menu entries you want to add after this comment.  Be careful not to change
# the 'exec tail' line above.

menuentry '[system] shutdown' {
    halt
}

用 sed 定位 exit

不用手動計算行數，但要注意不要匹配到奇怪的東西。

#!/bin/sh
sed '1,/^exit$/d' $0 | dbus-run-session sh
exit

gvfs-mount ftp://my.lab.ml
cp .gvfs/ftp/some-file.zip .

冷門用法

摺疊

seq 10 | paste - -

格式化輸出類似的字串

seq -f "(%.0f)" 10

printf "%s\0" *
# similar to
find -maxdepth 1 -print0

重覆輸出

printf "yes\n%.0s" `seq 5`

yes "yes" | head -n 5

yes 的真正用途

當某些程式執行時會問很多 yes no 的時候，用 yes 告訴他。

yes | sudo apt install the-world

ex 批次編輯

雖然 sed 也可以批次編輯，但有些功能還是要用可以來回跳躍的真正的編輯器比較方便，而且編輯器還是比較快。

後來發現其實差不多，當檔案太大時，都是卡在硬碟寫入瓶頸。

for rinex in *.rnx
do
    echo '
1
/ANT #
s/-Unknown-/TPSG3_A1/
w
n
'
done | ex *.rnx

為什麼不是 ed ？

ED(1)               Unix Programmer's Manual                ED(1)

NAME
     ed - text editor

SYNOPSIS
     ed [ - ] [ -x ] [ name ]
DESCRIPTION
     Ed is the standard text editor.

有 vi 就有 ex

ex 的好處是，太新的發行版不一定有裝 ed ，但一定有裝 vi ，有 vi 就有 ex 。

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.