# AWK
###### tags: `publish` `coding`
This is the summary of "The AWK Programming Language"
<img src=https://i.imgur.com/yi7ib5M.jpg width="300"/>
# Chapter 1: An Awk Tutorial
```bash=
# This is the exmaple file used in this chapter
Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18
```
* The program is wrapped by single quote mark. Actions are enclosed in braces to distinguish them from patterns
* If match but no action -> print original line.
```bash=
awk 'program' infiles
awk ' pattern { action }' infiles
```
```bash=
awk 'program' # by doing this, you can type input and test the result
awk -f progfile infiles # the program can be written in a separate progfile, and be called here
```
* Here is an exmaple of standard error message:
```bash=
awk: syntax error at source line 1
context is
$3 == 0 >>> [ <<<
extra }
missing ]
awk: bailing out at source line 1
```
* There are two types of data in awk: numbers and strings of characters.
* Awk reads its input one line at a time and splits each line into fields, where, by default, a field is a sequence of characters that doesn't contain any blanks or tabs.
```bash=
# printing every line
{ print }
# printing certain fields
{ print $1, $3}
```
* NF is the the number of fields
* NR is the buner of lines. NR counts the number of lines read so far
```bash=
{ print NR, $0} # this gives line number for the output
```
* Fancier output with printf
```bash=
printf(format, value_1, value_2, ..., value_n)
{ printf("total pay for %s is %.2f\n", $1, $2, $3) }
{ printf("%-8s $%6.2f\n", $1, $2 * $3)}
# %-8s prints a left-justified in a field 8 characters wide
```
```bash=
$2 >= 5
$2 * $3 > 50
$1 == "Susie"
!($2 >= 4 || $3 >= 20)
```
* Selection by comparison, computation, text content, and combinations of patterns
* Regular expression can be used to specify much more elaborate patterns.
* The special pattern BEGIN matches before the first line of the first input file is read, and END matches after the last line of the last file has been processed.
```bash=
BEGIN { print " NAME RATE HOURS"; print ""} # print "" prints a new link
{ print }
# The output is:
NAME RATE HOURS
Beth 4.00 0
Dan 3.75 0
...
```
* You can compute in the program, for example:
```bash=
$3 > 15 {emp = emp +1 }
END {print emp, "employees worked more than 15 hours."}
# This prints the number of employees worked more than 15 hours at the end of output
```
* You can do string concatenation with:
```bash=
{ names = names $1 " "}
END { print names }
```
* You can print the last line of the input file by:
```bash=
{ last = $0 }
END { print last }
```
* If-Else Statement
```bash=
$2 > 6 { n = n + 1; pay = pay + $2 * $3 }
END { if (n > 0)
print n, "employees, total pay is", pay, "average pay is", pay/n
else
print "no employees are paid more than $6/hour"
}
```
* While Statement
```bash=
{
i = 1
while (i <= $3){
printf("\t%.2f\n", $1 * (1 + $2)^i)
i = i + 1
}
}
```
* For statement
```bash=
{
for ( i = 1; i <= $3; i = i + 1)
printf("\t%.2f\n", $1 * (1 + $2)^i)
}
```
* Arrays
```bash=
# reverse the lines in the file
{ line[NR] = $0 } # remember each input line
END { i = NR
while (i > 0){
print line[i]
i = i - 1
}
}
```
* Some handful "One-liners" (Only pick two of them)
```bash=
# Print the total number of lines that contain Beth
/Beth/ { nlines = nlines + 1}
END { print nlines}
# Print the largest first field and the line that contains it
$1 > max { max = $1; maxline = $0 }
END { print max, maxline }
```
# Chapter 2: The AWK Language
```bash=
# This is the input file used in this chapter
USSR 8649 275 Asia
Canada 3852 25 North America
China 3705 1032 Asia
USA 3615 237 North America
Brazil 3286 134 South America
India 1267 746 Asia
Mexico 762 78 North America
France 211 55 Europe
Japan 144 120 Asia
Germany 96 61 Europe
England 94 56 Europe
```
* A long statement may be spread over several lines by inserting a backslash and newline at each break.
* Summary of patterns:
* BEGIN { statements }: The statements are executed once before anyinput has been read.
* END { statements }: The statements are executed once after all input has been read.
* expression { statements }: The statements are executed at each input line where the expression is true, that is, nonzero or nonnull.
* /redular expression/ { statements }: The statements are executed at each input line that contains a string matched by the regular expression.
* compound pattern { statements }: A compound pattern combines expressions with &&, ||, ! and parenthese; the statements are executed at each input line where the compound pattern is true.
* pattern1, pattern2 { statements }: A range pattern matches each input line from a line matched by pattern1 to the next line matched by pattern2, inclusive; the statements are executed at each matching line.
* BEGIN and END.
1. provide a way to gain control for initialization and wrapup.
2. if more than one, execute in order.
3. not mandatory but prefered to put BEGIN first and END last.
```bash=
awk '
BEGIN { FS = " "
printf("%10s %6s %5s %s\n\n", "Country", "Area", "Pop", "Continent")
}
{
printf("%10s %6d %5d %s\n", $1, $2, $3, $4)
area = area + $2
pop = pop + $3
}
END { printf("\n%10s %6d %5d\n", "TOTAL", area, pop) }
' test.txt
Output:
Country Area Pop Continent
USSR 8649 275 Asia
Canada 3852 25 North
China 3705 1032 Asia
USA 3615 237 North
Brazil 3286 134 South
India 1267 746 Asia
Mexico 762 78 North
France 211 55 Europe
Japan 144 120 Asia
Germany 96 61 Europe
England 94 56 Europe
TOTAL 25681 2819
```
* Expressions as patterns
1. expression is automatically tranformed, from string to numeric value or vice versa, depending on the operator.
| Operator | Meaning
| -------- | -------- |
| <, <=, ==, !=, >=, > | no need to explain |
|~|matched by|
|!~| not matched by|
# Chapter 3: Data Processing
# Chapter 4: Reports and Databases
# Chapter 5: Processing Words
# Chapter 6: Little Languages
# Chapter 7: Experiments with Algorithms
# Chapter 8: Epliog
# Appendex A: AWK Summary