# awk-getline-example
###### tags: `awk`,`rhel`
> REF:
> [Explicit Input with getline](https://www.gnu.org/software/gawk/manual/html_node/Getline.html)
> [String-Manipulation Functions](https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html)
> [GNU: difference between RS and RT](https://www.gnu.org/software/gawk/manual/html_node/gawk-split-records.html)
[Toc]
# Example
## [*Plain getline*: remove C-style comment('/*...*/') from input](https://www.gnu.org/software/gawk/manual/html_node/Plain-Getline.html)
常用在對當前record的動作已經處裡完後,但"現在"像要對下個record做事
* `getline`
* 預先讀取下一個record,並將結果存到 $0 中
* 沒讀到record(已到檔案最後)返回0, 有讀到record返回1,碰到error檔案無法開啟返回-1
* 變數$0,NF,NR,FNR,RT值已被改變
* `getline` 與 [next](https://hackmd.io/@3oatvhDfTSqijOwo0tingw/BkxO94Gtd#Next)
- 當觸發 `getline` 或 `next`,都會馬上讓 awk 停止對當前 record 的動作(action),並讀取下個record, 而對於下一個 record, 兩者行為不同:
- `next` 會重新開始處理下行 record, 原本停止之後剩餘的 action 不會被執行. 針對下一行 record 從 awk program 開頭重新開始驗證 pattern 執行 action
- `getline` 不會重新開始,不會改變流程, 下一行 record 存進 `$0`(更新), 接著繼續進行剩下來的 pattern(或說rule), 並做其後的action
* index()
* position=index("peaunt","au")
* 在peaunt中找尋,au第一次發生的位置,並返回位置. 此例中positino=3
* 如果index()沒有找到東西,則返回0
* substr()
* sub-str=substr("abcdefg",3,2)
* 給予 "abcdefg",我要c開始後長度為2的字串,此例中sub-str="cd"
* sub-str=substr("abcdefg",3)
* 如果沒有給予長度,則返回index後所有字串
* sub-str="cdefg"
``` awk
{
while ((start = index($0, "/*")) != 0) {
# if true, index到了"/*"的所在,則把"/*"前面的text取出
out = substr($0, 1, start - 1)
# (處理"/*"內的文字)將當前record中,"/*"後的text存進變數中
rest = substr($0, start + 2)
while ((end = index(rest, "*/")) == 0) {
# if true,代表在當前record中的"/*"後面,沒有找到"*/"
# 就代表註解橫跨多行,故繼續存更多text到變數中
# 觸發getline, 將結果存進 $0中, 如果發生檔案無法打開,則程式終止,並打印錯誤訊息
if (getline <= 0) {
print("unexpected EOF or error:", ERRNO) > "/dev/stderr"
exit
}
# 將下一行內容(即$0),與rest合併
rest = rest $0
# 下一輪while迴圈將重新檢視,rest中有沒有"*/"
# 如果沒有則繼續增加text進rest
# 如果沒有則此while loop 停下
}
# index()在rest中找到了"*/",所以while loop停下
# 新的rest變數改成, "*/"之後的text,即透過這步將找到的comment去掉
rest = substr(rest, end + 2)
# 將新的rest與原本"/*"前的文字組合在一起,放進$0
$0 = out rest
}
# 對於其他沒有註解的record,直接印出整行即可,不用跑while迴圈
print $0
}
```
sample text
```
mon/*comment*/key
rab/*commen
t*/bit
horse /*comment*/more text
part 1 /*comment*/part 2 /*comment*/part 3
no comment
```
output
```
monkey
rabbit
horse more text
part 1 part 2 part 3
no comment
```
## [*getline/variables* Given a list, swap every two lines](https://www.gnu.org/software/gawk/manual/html_node/Getline_002fVariable.html)
* `getline`
* `getline vars`
* 預先讀取下個record,並將結果存在 `vars` 中
* `getline vars`不干擾現行的flow,所以只是更新的幾個變數,接下來的rule(或說pattern)還是繼續(不重新開始)下去
* 變數NR,FNR,RT值已被改變
- 因為下個record存進`vars`中,並未把record切成數個fields,所以`$0`與`NF`都沒有被改變
* RS and RT
* RS is a single character > RT is the same single character
* RS is a *regular expression* > RT is the actual input text that match the regular expression
* RS is null string > RT is null string
``` awk
{
if ((getline tmp) > 0){
print tmp
print $0
} else
print $0
}
```
sample text
```
tiger
jim
bono
benson
eric
jy
jason
```
ourput
```
tiger
jim
bono
benson
eric
jy
jason
```
## [*getline file* getline from file redirection](https://www.gnu.org/software/gawk/manual/html_node/Getline_002fFile.html)
- ` getline < "/path/to/file"`
- 從別的檔案抓一行到我的awk裡
- NR, FNR沒有變
``` bash
cat name.list
#Tiger
#jim
#JY
#Verinica
echo $'20 30\n30 40\n10 55' | \
awk '{
if($1==10){
getline < "/tmp/name.list";
print $0
} else print $0
}'
```
output
```
20 30
30 40
Tiger
```
## *getline/file/var* [看不懂](https://www.gnu.org/software/gawk/manual/html_node/Getline_002fVariable_002fFile.html)
``` getline var < file``` read input file and store it in ```var```
None of the predefined variables are changed,except ```var```.
Record is not split into fields.
``` bash
echo $'10 20\n30 40\n@include 56\n55 100' | \
awk '{
if (NF == 2 && $1 == "@include") {
while ((getline line < $2) > 0)
print line
close($2)
} else
print $0
}'
```
## Using getline from a Piped command
利用某 command 的 stdout 當作 `getline` 的 input:
1. `command | getline `
* `df -h` 的 stdout 被 `getline` 儲存在 `$0` 中
* `df -h` 的 stdout 會被 awk 分成數個欄位:
* `RS == "\n"`, `FS == " "`
* `NF` 根據 command 的 stdout 每一行都不一樣, 是 6 或 7
* `NR` 與 `FNR` 同時都停在處理 `getline` 的那行 record, 不會隨 command 的 stdout 有多行而變動, 在此例中一直維持在 4
2. ```command | getline var```
* `df -h` 的 stdout 儲存在 `var` 變數中 `$0` 沒有被改變
* `df -h` 的 stdout **沒有被 awk 分成數個欄位**
- `NF` 沒有被改變, 在此例中處理 "@execute df", 所以 `NF` 的值為 2
- `NR` 與 `FNR`, 停在處理 "@execute df" 這個 record, 所以 `NR` 與 `FNR` 皆為 4
- (Quoted from GNU:)
```
none of the predefined variables are changed
However, RT is set
```
3. 將要 pipe 給 getline 的 command 用 "( )" 括起來是個好習慣, 這是為了讓此awk script能夠在各版本運行. 此例中就是指 `tmp` 變數, 其包含要運行的指令, 並在其外面加上 "( )"
``` bash
echo $'foo\nbar\nbaz\n@execute df\nbletch' | \
awk '{
if ($1 == "@execute") {
tmp = "(" substr($0, 10) " " "-h" ")"
while ((tmp | getline ) > 0){print($0,NR,FNR,NF,"[",RT,"]","[",RS,"]","[",FS,"]")}
#while ((tmp | getline var ) > 0){print($0,NR,FNR,NF,"[",RT,"]","[",RS,"]","[",FS,"]")}
close(tmp)
} else
print($0,NR,FNR,NF,"[",RT,"]","[",RS,"]","[",FS,"]")
}'
```
```
foo 1 1 1 [
] [
] [ ]
bar 2 2 1 [
] [
] [ ]
baz 3 3 1 [
] [
] [ ]
Filesystem Size Used Avail Use% Mounted on 4 4 7 [
] [
] [ ]
devtmpfs 142G 0 142G 0% /dev 4 4 6 [
] [
] [ ]
tmpfs 142G 0 142G 0% /dev/shm 4 4 6 [
] [
] [ ]
tmpfs 142G 9.5M 142G 1% /run 4 4 6 [
] [
] [ ]
tmpfs 142G 0 142G 0% /sys/fs/cgroup 4 4 6 [
] [
] [ ]
/dev/mapper/rhel-root 50G 4.6G 46G 10% / 4 4 6 [
] [
] [ ]
/dev/mapper/rhel-home 689G 5.3G 683G 1% /home 4 4 6 [
] [
] [ ]
/dev/sda2 1014M 161M 854M 16% /boot 4 4 6 [
] [
] [ ]
/dev/md1 1.8T 157G 1.6T 9% /opt 4 4 6 [
] [
] [ ]
/dev/sda1 599M 6.6M 593M 2% /boot/efi 4 4 6 [
] [
] [ ]
tmpfs 29G 0 29G 0% /run/user/0 4 4 6 [
] [
] [ ]
bletch 5 5 1 [
] [
] [ ]
```
## 定位在 END BLOCK 最後一行
在讀取檔案時, 要針對最後一行做布一樣的事情:
- 在 BEGIN block 中先跑個迴圈, 取得總 NR 數目存進 `last` 中
- 在最後一行時設定條件判斷, 做不一樣的處理
``` awk
#!/bin/gawk
BEGIN{
while((getline t < ARGV[1]) > 0){last++;}
close(ARGV[1]);
}
{ print((last==FNR)?$0:$0" this is the last line") }
```
## 定位在 END BLOCK 最後一行: read target file twice
利用 FNR 讀取到第二個檔案時會重新計算, 而 NR 則持續累積的特性
``` bash
awk -f test.awk target-file target-file
```
``` awk
{
if(FNR==NR){
last++;next
}
{print(0, ((last==FNR)?"I am Last":"")}'
}
```
## Using getline from a Coprocess
{%hackmd k2jFfreJS1q0SbtsMB8-IA?edit %}
## Using getline into variable from a coprocess
(同上例)
## Using *getline* in BEGIN and END clauses
> ref
> [BEGIN and END special pattern](https://www.gnu.org/software/gawk/manual/html_node/BEGIN_002fEND.html)
## Summary of ```getline``` Variants
[sunnary](https://www.gnu.org/software/gawk/manual/html_node/Getline-Summary.html)
