# UNIX tool: sed & awk note
###### tags: `NSYSU`
[TOC]
## sed
### one-line
#### double space a file
```shell=
sed G
```
#### double space a file which already has some blank lines in it:
```shell=
sed '/^$/d;G'
```
#### triple space a file:
```shell=
sed 'G;G'
```
#### undo double-spacing (assumes all even-numbered lines are always blank):
```shell=
sed 'n;d'
```
#### insert a blank line above every line which matches "regex":
```shell=
sed '/regex/{x;p;x;}'
```
#### insert a blank line below every line which matches "regex":
```shell=
sed '/regex/G'
```
#### insert a blank line above and below every line which matches "regex":
```shell=
sed '/regex/{x;p;x;G;}'
```
#### number each line of a file (like grep -n "^"):
```shell=
sed = filename | sed 'N;s/\n/:/'
```
#### number each line of a file (like cat -n):
```shell=
cat filename
```

```shell=
sed = filename | \
sed 'N; s/^/ /;s/ *\(.\{6,\}\)\n/\1\t/'
```

#### number each line of file, but only print numbers if line is not blank:
```shell=
sed '/./=' filename | sed '/./N;s/\n/ /'
```
#### count lines (like "wc -l"):
```shell=
sed -n '$='
```
#### delete leading whitespace (spaces, tabs) from front of each line:
```shell=
sed 's/^[ \t]*//'
```
#### delete trailing whitespace (spaces, tabs) from end of each line:
```shell=
sed 's/[ \t]*$//'
```
#### delete BOTH leading & trailing whitespace from each line:
```shell=
sed 's/^[ \t]*//;s/[ \t]*$//'
```
#### insert 5-space margin on left of each line:
```shell=
sed 's/^/ /'
```
#### change scarlet or ruby or puce to "red":
```shell=
sed 's/scarlet/red/g;s/ruby/red/g;s/puce/red/g'
```
#### align all text flush right on a 79-column width (set at 78 plus 1 space):
```shell=
sed ':a;s/^.\{1,78\};$/ &/;ta'
```
#### substitute (find and replace) "foo" with "bar" on each line (replace only the 1st instance):
```shell=
sed 's/foo/bar/'
```
#### replace only the 4th instance:
```shell=
sed 's/foo/bar/4'
```
#### replaces ALL instances:
```shell=
sed 's/foo/bar/g'
```
#### replace only the last case:
```shell=
sed 's/\(.*\)foo/\1bar/'
```
#### replace the next-to-last case:
```shell=
sed 's/\(.*\)foo\(.*foo\)/\1bar\2/'
```
#### substitute "foo" with "bar" ONLY for lines which contain "baz":
```shell=
sed '/baz/s/foo/bar/g'
```
#### substitute "foo" with "bar" EXCEPT for lines which contain "baz":
```shell=
sed '/baz/\!s/foo/bar/g'
```
#### center all text in the middle of 79-columns, with spaces on the right to fill the columns and with leading spaces being significant:
```shell=
sed ':a;s/^.\{1,77\}$/ & /;ta'
```
#### center all text in the middle of 79-columns, with no trailing spaces and ignoring leading spaces:
```shell=
sed ':a;s/^.\{1,77\}$/ &/;ta;s/\( *\)\1/\1/'
```
#### reverse order of lines (like "tac"):
```shell=
sed '1\!G;h;$\!d' # method 1
sed -n '1\!G;h;$p' # method 2
sed -n '2,$G;h;$p' # method 3
```
#### reverse the character on the line (like "rev"):
```shell=
sed '/\n/\!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'
```
#### join pairs of lines side-by-side:
```shell=
sed '$\!N;s/\n/ /'
```
#### if a line ends with a backslash, append the next line to it:
```shell=
sed ':a;/\\$/N;s/\\\n//;ta'
```
#### if a line begins with "=" then append it to the previous line & replace the "=" with a space:
```shell=
sed ':a;$\!N;s/\n=/ /;ta;P;D'
```
#### add commas to numeric strings, changing "1234567" to "1,234,567":
```shell=
sed ':a;s/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta'
sed ':a;s/\B[0-9]\{3\}\>/,&/;ta' # GNU sed
```
#### add commas to numbers with decimal points and minus signs (GNU sed only):
```shell=
sed -r ':a;s/(^|[^0-9.])([0-9]+)([0-9]{3})\
/\1\2,\3/g;ta'
```
#### add a blank line after every 5 lines (after lines 5, 10, 15, 20, etc.):
```shell=
sed 'n;n;n;n;G;'
sed '0~5G' # GNU sed only
```
## awk
### built-in varialbe
--------------------
VAR | Description
----|--------------
$0 | The entire line
$n | Field n
ARGC| EX: ARGC == 2
ARGV| EX: ARGV[0]\=="awk", ARGV[1]\=="filename"; Note: awk arguments are filenames.
NF | Number of fields in current line (or record)
NR | Number of lines (or records) read so far
FS | Input field separator; Default: BEGIN{FS\="[ \t]+"}; Note: it will only apply to future input lines/records.
RS | Input Record Separator; Default: BEGIN{RS\="\n"}
OFS | Output field separator; Default: BEGIN{OFS\=" "}
ORS | Output Record Separator; Default: BEGIN{ORS="\n"}
### Flags
----
flag | Description
--------|-------------
-f <filename> | Uses the awk file instead; Can also use #!/usr/bin/awk -f in first line.
-F "x"| Uses the symbol(s) in "x" for the field separator
### Operators
----
op | Description
----|----
= | assignment operator; sets a variable equal to a value or string
== | equality operator; TRUE if both sides are equal
!= | inverse equality operator
\~/\!~| extended regular expression comparison
&& | logical AND
\|\| | logical OR
! | logical NOT
<, >, <=, >= | relational operators
+, -, \/, *, %, ^ | Math operators
space | String concatenation(implicit or explicit)
### Pattern
1. use extended regular expression
2. syntax example
```shell=
/regex/{action} #if match pattern, do...
```
3. special pattern
+ BEGIN
+ def: matches before the first input line is read
+ END
+ def: matches after the last input line has been read
+ example
```shell=
BEGIN{FS=",";print "NAME RATE HOURS"; print "" }{ print }
END{ print "total number of employees is", NR }
```
:::info
NOTE:
Although NR retains its value after the last input line has been read, $0 does not (on some systems):
solution:
```shell=
{ last = $0 }
END{ print NR ":", last }
```
:::
---
### Control flow
#### if-else
example
```shell=
$2 > 6 { n = n + 1; pay = pay + $2 * $3 }
END {
if (n > 0)
print n, "employees, total pay is", pay, "average pay is", pay/n
else
print "no employees are paid more than $6/hour"
}
```
---
#### while loop
example
```shell=
{ i = 1
while (i <= $3) {
printf(“\t%.2f\n”, $1 * (1 + $2) ^ i)
i = i + 1
}
}
```
---
#### for loop
example
```shell=
{ for (i = 1; i <= $3; i = i + 1)
printf(“\t%.2f\n”, $1 * (1 + $2) ^ i)
}
```
---
### built-in function
#### string
1. length()
+ Def: tell you the number of elements in an array OR characters in a string.
+ example: equivalent of wc
```shell=
#!/usr/bin/awk -f
{ nc = nc + length($0) + 1
nw = nw + NF
}
END { print NR, "lines,", nw, "words,", nc, "characters" }
```
2. index()
+ Def:
+ equivalent of the strstr() function of C
+ Returns a positive value when the substring is found.
+ The number returned is the location of the substring.
:::warning
If the substring consists of 2 or more characters, all of them must be found, sequentially, in the same order, for a match.
:::
+ example: Find a comma
```shell=
{ sentence="This is a short, useless sentence.";
if (index(sentence, ",") > 0)
{printf("Found comma at %d\n", index(sentence,","));}
}
```
3. sub()
+ Def: perform string substitution
+ syntax
```shell=
sub(/old/, "new", stringvariable)
```
:::info
+ stringvariable: Optional, default is $0
+ 1 is returned if a substitution occurs, otherwise 0
:::
+ example
```shell=
echo 1 + 2 + 3 | awk '{sub("[1 ]+","@")}1'
```
4. gsub()
+ Def: global substitution
+ Like sed global substitution
```shell=
sed 's/regex/"new"/g'
```
5. substr()
+ Def: extracts part of a string.
:::info
+ It takes 2 or 3 arguments.
+ The 1st is the string.
+ The 2nd is the position of the start of the substring to extract.
+ The 3rd is the length of the string to extract.
+ If this 3rd argument is missing, the rest of the string is used.
:::
+ example: process some mail addresses
```shell=
#!/bin/awk -f
{ if ((x=index($1,"@")) > 0)
{ username = substr($1,1,x-1);
hostname = substr($1,x+1,length($1)-x);
printf("username = %s, hostname = %s\n", username, ostname);
} }
```
6. split()
+ Def: divides up a string
:::info
+ Takes 3 arguments: the string, an array to fill, and a separator.
+ The 3rd argument is a regular expression.
+ split($0,A,FS) will create an array such that:
```shell=
$1==A[1], $2==A[2], … $NF==A[NF].
```
:::
+ example: break up a sentence into words
```shell=
#!/usr/bin/awk -f
BEGIN {
string="This is a string, is it not?"; search=" ";
n=split(string,array,search);
for (i=1; i<=n; i++) {
printf("Word[%d]=%s\n",i,array[i]);
} exit;
}
```
#### Output
1. printf()
+ syntax
```shell=
printf( format, val1, val2, val3, … )
```
:::warning
AWK makes no distinction between a number and a string!
:::
2. print
+ Def
:::info
+ Put spaces where ever there are commas
+ Insert a new-line at the end
:::
### Associative Arrays
+ Def
+ like python dictionary
+ use hash table to make some task more efficient.
+ example
+ awk
```shell=
#!/usr/bin/awk
BEGIN{
dict["danke"] = "thanks"
dict["schoen"] = "beautiful"
}
{for(i=1;i<=NF;i++)if($i in dict) $i=dict[$i]}
1
```
:::info
Reserved word:in
>$i is in dict ? true : false
:::
+ python
```python=
dict = {"danke" : "thanks",
"schoen" : "beautiful"}
```
:::warning
+ NOTE
+ key is case-sensitive and must be full matched.
+ can't match substring.
+ The code below will not match the key which is combined with two fields.
```shell=
BEGIN{
dict["danke"] = "thanks"
dict["schoen"] = "beautiful"
dict["danke schoen"] = "thank you very much."
}
{for(i=1;i<=NF;i++)if($i in dict) $i=dict[$i]}
1
```
+ solution
```shell=
#!/usr/bin/awk
BEGIN{
dict["danke"] = "thanks"
dict["schoen"] = "beautiful"
dict["danke schoen"] = "thank you very much."
}
{
for(i = 1; i <= NF; i++)
{
if($i" "$(i+1) in dict)
{
$i = dict[$i" "$(i+1)]
$(i+1) = ""
}
if($i in dict)
$i = dict[$i]
}
}
1
```
:::
+ multi-dimensional array
+ example
```shell=
{ L[NF" "NR]=$0; }
{ L[NF "," var]=$0; }
{ L[$1" "$3]=$0; }
```
+ Split() and Associative Array
+ How to create an array indexed by numbers?
+ month[1] is 'Jan', month[2] is 'Feb', month[3] is 'Mar', …
```shell=
split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", month, " ")
```
+ How to create an array named "mdigit", indexed by these “month” strings?
+ mdigit["Jan"] is 1, mdigit["Feb"] is 2
```shell=
for (i=1; i<=12; i++)
mdigit[month[i]] = i
```
### AWK one-liners
#### file spacing
+ Double space a file
```shell=
awk '1;{print ""}'
#or
awk 'BEGIN{ORS="\n\n"};1'
```
+ Double space a file which already has blank lines in it?
+ Output file should contain no more than one blank line between lines of text.
```shell=
awk 'NF{print $0 "\n"}'
```
:::warning
NOTE:
On Unix systems, DOS lines which have only CRLF (\r\n) are
often treated as non-blank, and thus 'NF' alone will return TRUE.
:::
+ Triple space a file?
```shell=
awk '1;{print "\n"}'
```