# awk-getline-example ###### tags: `awk`,`rhel` > REF: > [Explicit Input with getline](https://www.gnu.org/software/gawk/manual/html_node/Getline.html) > [String-Manipulation Functions](https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html) > [GNU: difference between RS and RT](https://www.gnu.org/software/gawk/manual/html_node/gawk-split-records.html) [Toc] # Example ## [*Plain getline*: remove C-style comment('/*...*/') from input](https://www.gnu.org/software/gawk/manual/html_node/Plain-Getline.html) 常用在對當前record的動作已經處裡完後,但"現在"像要對下個record做事 * `getline` * 預先讀取下一個record,並將結果存到 $0 中 * 沒讀到record(已到檔案最後)返回0, 有讀到record返回1,碰到error檔案無法開啟返回-1 * 變數$0,NF,NR,FNR,RT值已被改變 * `getline` 與 [next](https://hackmd.io/@3oatvhDfTSqijOwo0tingw/BkxO94Gtd#Next) - 當觸發 `getline` 或 `next`,都會馬上讓 awk 停止對當前 record 的動作(action),並讀取下個record, 而對於下一個 record, 兩者行為不同: - `next` 會重新開始處理下行 record, 原本停止之後剩餘的 action 不會被執行. 針對下一行 record 從 awk program 開頭重新開始驗證 pattern 執行 action - `getline` 不會重新開始,不會改變流程, 下一行 record 存進 `$0`(更新), 接著繼續進行剩下來的 pattern(或說rule), 並做其後的action * index() * position=index("peaunt","au") * 在peaunt中找尋,au第一次發生的位置,並返回位置. 此例中positino=3 * 如果index()沒有找到東西,則返回0 * substr() * sub-str=substr("abcdefg",3,2) * 給予 "abcdefg",我要c開始後長度為2的字串,此例中sub-str="cd" * sub-str=substr("abcdefg",3) * 如果沒有給予長度,則返回index後所有字串 * sub-str="cdefg" ``` awk { while ((start = index($0, "/*")) != 0) { # if true, index到了"/*"的所在,則把"/*"前面的text取出 out = substr($0, 1, start - 1) # (處理"/*"內的文字)將當前record中,"/*"後的text存進變數中 rest = substr($0, start + 2) while ((end = index(rest, "*/")) == 0) { # if true,代表在當前record中的"/*"後面,沒有找到"*/" # 就代表註解橫跨多行,故繼續存更多text到變數中 # 觸發getline, 將結果存進 $0中, 如果發生檔案無法打開,則程式終止,並打印錯誤訊息 if (getline <= 0) { print("unexpected EOF or error:", ERRNO) > "/dev/stderr" exit } # 將下一行內容(即$0),與rest合併 rest = rest $0 # 下一輪while迴圈將重新檢視,rest中有沒有"*/" # 如果沒有則繼續增加text進rest # 如果沒有則此while loop 停下 } # index()在rest中找到了"*/",所以while loop停下 # 新的rest變數改成, "*/"之後的text,即透過這步將找到的comment去掉 rest = substr(rest, end + 2) # 將新的rest與原本"/*"前的文字組合在一起,放進$0 $0 = out rest } # 對於其他沒有註解的record,直接印出整行即可,不用跑while迴圈 print $0 } ``` sample text ``` mon/*comment*/key rab/*commen t*/bit horse /*comment*/more text part 1 /*comment*/part 2 /*comment*/part 3 no comment ``` output ``` monkey rabbit horse more text part 1 part 2 part 3 no comment ``` ## [*getline/variables* Given a list, swap every two lines](https://www.gnu.org/software/gawk/manual/html_node/Getline_002fVariable.html) * `getline` * `getline vars` * 預先讀取下個record,並將結果存在 `vars` 中 * `getline vars`不干擾現行的flow,所以只是更新的幾個變數,接下來的rule(或說pattern)還是繼續(不重新開始)下去 * 變數NR,FNR,RT值已被改變 - 因為下個record存進`vars`中,並未把record切成數個fields,所以`$0`與`NF`都沒有被改變 * RS and RT * RS is a single character > RT is the same single character * RS is a *regular expression* > RT is the actual input text that match the regular expression * RS is null string > RT is null string ``` awk { if ((getline tmp) > 0){ print tmp print $0 } else print $0 } ``` sample text ``` tiger jim bono benson eric jy jason ``` ourput ``` tiger jim bono benson eric jy jason ``` ## [*getline file* getline from file redirection](https://www.gnu.org/software/gawk/manual/html_node/Getline_002fFile.html) - ` getline < "/path/to/file"` - 從別的檔案抓一行到我的awk裡 - NR, FNR沒有變 ``` bash cat name.list #Tiger #jim #JY #Verinica echo $'20 30\n30 40\n10 55' | \ awk '{ if($1==10){ getline < "/tmp/name.list"; print $0 } else print $0 }' ``` output ``` 20 30 30 40 Tiger ``` ## *getline/file/var* [看不懂](https://www.gnu.org/software/gawk/manual/html_node/Getline_002fVariable_002fFile.html) ``` getline var < file``` read input file and store it in ```var``` None of the predefined variables are changed,except ```var```. Record is not split into fields. ``` bash echo $'10 20\n30 40\n@include 56\n55 100' | \ awk '{ if (NF == 2 && $1 == "@include") { while ((getline line < $2) > 0) print line close($2) } else print $0 }' ``` ## Using getline from a Piped command 利用某 command 的 stdout 當作 `getline` 的 input: 1. `command | getline ` * `df -h` 的 stdout 被 `getline` 儲存在 `$0` 中 * `df -h` 的 stdout 會被 awk 分成數個欄位: * `RS == "\n"`, `FS == " "` * `NF` 根據 command 的 stdout 每一行都不一樣, 是 6 或 7 * `NR` 與 `FNR` 同時都停在處理 `getline` 的那行 record, 不會隨 command 的 stdout 有多行而變動, 在此例中一直維持在 4 2. ```command | getline var``` * `df -h` 的 stdout 儲存在 `var` 變數中 `$0` 沒有被改變 * `df -h` 的 stdout **沒有被 awk 分成數個欄位** - `NF` 沒有被改變, 在此例中處理 "@execute df", 所以 `NF` 的值為 2 - `NR` 與 `FNR`, 停在處理 "@execute df" 這個 record, 所以 `NR` 與 `FNR` 皆為 4 - (Quoted from GNU:) ``` none of the predefined variables are changed However, RT is set ``` 3. 將要 pipe 給 getline 的 command 用 "( )" 括起來是個好習慣, 這是為了讓此awk script能夠在各版本運行. 此例中就是指 `tmp` 變數, 其包含要運行的指令, 並在其外面加上 "( )" ``` bash echo $'foo\nbar\nbaz\n@execute df\nbletch' | \ awk '{ if ($1 == "@execute") { tmp = "(" substr($0, 10) " " "-h" ")" while ((tmp | getline ) > 0){print($0,NR,FNR,NF,"[",RT,"]","[",RS,"]","[",FS,"]")} #while ((tmp | getline var ) > 0){print($0,NR,FNR,NF,"[",RT,"]","[",RS,"]","[",FS,"]")} close(tmp) } else print($0,NR,FNR,NF,"[",RT,"]","[",RS,"]","[",FS,"]") }' ``` ``` foo 1 1 1 [ ] [ ] [ ] bar 2 2 1 [ ] [ ] [ ] baz 3 3 1 [ ] [ ] [ ] Filesystem Size Used Avail Use% Mounted on 4 4 7 [ ] [ ] [ ] devtmpfs 142G 0 142G 0% /dev 4 4 6 [ ] [ ] [ ] tmpfs 142G 0 142G 0% /dev/shm 4 4 6 [ ] [ ] [ ] tmpfs 142G 9.5M 142G 1% /run 4 4 6 [ ] [ ] [ ] tmpfs 142G 0 142G 0% /sys/fs/cgroup 4 4 6 [ ] [ ] [ ] /dev/mapper/rhel-root 50G 4.6G 46G 10% / 4 4 6 [ ] [ ] [ ] /dev/mapper/rhel-home 689G 5.3G 683G 1% /home 4 4 6 [ ] [ ] [ ] /dev/sda2 1014M 161M 854M 16% /boot 4 4 6 [ ] [ ] [ ] /dev/md1 1.8T 157G 1.6T 9% /opt 4 4 6 [ ] [ ] [ ] /dev/sda1 599M 6.6M 593M 2% /boot/efi 4 4 6 [ ] [ ] [ ] tmpfs 29G 0 29G 0% /run/user/0 4 4 6 [ ] [ ] [ ] bletch 5 5 1 [ ] [ ] [ ] ``` ## 定位在 END BLOCK 最後一行 在讀取檔案時, 要針對最後一行做布一樣的事情: - 在 BEGIN block 中先跑個迴圈, 取得總 NR 數目存進 `last` 中 - 在最後一行時設定條件判斷, 做不一樣的處理 ``` awk #!/bin/gawk BEGIN{ while((getline t < ARGV[1]) > 0){last++;} close(ARGV[1]); } { print((last==FNR)?$0:$0" this is the last line") } ``` ## 定位在 END BLOCK 最後一行: read target file twice 利用 FNR 讀取到第二個檔案時會重新計算, 而 NR 則持續累積的特性 ``` bash awk -f test.awk target-file target-file ``` ``` awk { if(FNR==NR){ last++;next } {print(0, ((last==FNR)?"I am Last":"")}' } ``` ## Using getline from a Coprocess {%hackmd k2jFfreJS1q0SbtsMB8-IA?edit %} ## Using getline into variable from a coprocess (同上例) ## Using *getline* in BEGIN and END clauses > ref > [BEGIN and END special pattern](https://www.gnu.org/software/gawk/manual/html_node/BEGIN_002fEND.html) ## Summary of ```getline``` Variants [sunnary](https://www.gnu.org/software/gawk/manual/html_node/Getline-Summary.html) ![](https://i.imgur.com/8E72wEk.png)