# AWK ###### tags: `publish` `coding` This is the summary of "The AWK Programming Language" <img src=https://i.imgur.com/yi7ib5M.jpg width="300"/> # Chapter 1: An Awk Tutorial ```bash= # This is the exmaple file used in this chapter Beth 4.00 0 Dan 3.75 0 Kathy 4.00 10 Mark 5.00 20 Mary 5.50 22 Susie 4.25 18 ``` * The program is wrapped by single quote mark. Actions are enclosed in braces to distinguish them from patterns * If match but no action -> print original line. ```bash= awk 'program' infiles awk ' pattern { action }' infiles ``` ```bash= awk 'program' # by doing this, you can type input and test the result awk -f progfile infiles # the program can be written in a separate progfile, and be called here ``` * Here is an exmaple of standard error message: ```bash= awk: syntax error at source line 1 context is $3 == 0 >>> [ <<< extra } missing ] awk: bailing out at source line 1 ``` * There are two types of data in awk: numbers and strings of characters. * Awk reads its input one line at a time and splits each line into fields, where, by default, a field is a sequence of characters that doesn't contain any blanks or tabs. ```bash= # printing every line { print } # printing certain fields { print $1, $3} ``` * NF is the the number of fields * NR is the buner of lines. NR counts the number of lines read so far ```bash= { print NR, $0} # this gives line number for the output ``` * Fancier output with printf ```bash= printf(format, value_1, value_2, ..., value_n) { printf("total pay for %s is %.2f\n", $1, $2, $3) } { printf("%-8s $%6.2f\n", $1, $2 * $3)} # %-8s prints a left-justified in a field 8 characters wide ``` ```bash= $2 >= 5 $2 * $3 > 50 $1 == "Susie" !($2 >= 4 || $3 >= 20) ``` * Selection by comparison, computation, text content, and combinations of patterns * Regular expression can be used to specify much more elaborate patterns. * The special pattern BEGIN matches before the first line of the first input file is read, and END matches after the last line of the last file has been processed. ```bash= BEGIN { print " NAME RATE HOURS"; print ""} # print "" prints a new link { print } # The output is: NAME RATE HOURS Beth 4.00 0 Dan 3.75 0 ... ``` * You can compute in the program, for example: ```bash= $3 > 15 {emp = emp +1 } END {print emp, "employees worked more than 15 hours."} # This prints the number of employees worked more than 15 hours at the end of output ``` * You can do string concatenation with: ```bash= { names = names $1 " "} END { print names } ``` * You can print the last line of the input file by: ```bash= { last = $0 } END { print last } ``` * If-Else Statement ```bash= $2 > 6 { n = n + 1; pay = pay + $2 * $3 } END { if (n > 0) print n, "employees, total pay is", pay, "average pay is", pay/n else print "no employees are paid more than $6/hour" } ``` * While Statement ```bash= { i = 1 while (i <= $3){ printf("\t%.2f\n", $1 * (1 + $2)^i) i = i + 1 } } ``` * For statement ```bash= { for ( i = 1; i <= $3; i = i + 1) printf("\t%.2f\n", $1 * (1 + $2)^i) } ``` * Arrays ```bash= # reverse the lines in the file { line[NR] = $0 } # remember each input line END { i = NR while (i > 0){ print line[i] i = i - 1 } } ``` * Some handful "One-liners" (Only pick two of them) ```bash= # Print the total number of lines that contain Beth /Beth/ { nlines = nlines + 1} END { print nlines} # Print the largest first field and the line that contains it $1 > max { max = $1; maxline = $0 } END { print max, maxline } ``` # Chapter 2: The AWK Language ```bash= # This is the input file used in this chapter USSR 8649 275 Asia Canada 3852 25 North America China 3705 1032 Asia USA 3615 237 North America Brazil 3286 134 South America India 1267 746 Asia Mexico 762 78 North America France 211 55 Europe Japan 144 120 Asia Germany 96 61 Europe England 94 56 Europe ``` * A long statement may be spread over several lines by inserting a backslash and newline at each break. * Summary of patterns: * BEGIN { statements }: The statements are executed once before anyinput has been read. * END { statements }: The statements are executed once after all input has been read. * expression { statements }: The statements are executed at each input line where the expression is true, that is, nonzero or nonnull. * /redular expression/ { statements }: The statements are executed at each input line that contains a string matched by the regular expression. * compound pattern { statements }: A compound pattern combines expressions with &&, ||, ! and parenthese; the statements are executed at each input line where the compound pattern is true. * pattern1, pattern2 { statements }: A range pattern matches each input line from a line matched by pattern1 to the next line matched by pattern2, inclusive; the statements are executed at each matching line. * BEGIN and END. 1. provide a way to gain control for initialization and wrapup. 2. if more than one, execute in order. 3. not mandatory but prefered to put BEGIN first and END last. ```bash= awk ' BEGIN { FS = " " printf("%10s %6s %5s %s\n\n", "Country", "Area", "Pop", "Continent") } { printf("%10s %6d %5d %s\n", $1, $2, $3, $4) area = area + $2 pop = pop + $3 } END { printf("\n%10s %6d %5d\n", "TOTAL", area, pop) } ' test.txt Output: Country Area Pop Continent USSR 8649 275 Asia Canada 3852 25 North China 3705 1032 Asia USA 3615 237 North Brazil 3286 134 South India 1267 746 Asia Mexico 762 78 North France 211 55 Europe Japan 144 120 Asia Germany 96 61 Europe England 94 56 Europe TOTAL 25681 2819 ``` * Expressions as patterns 1. expression is automatically tranformed, from string to numeric value or vice versa, depending on the operator. | Operator | Meaning | -------- | -------- | | <, <=, ==, !=, >=, > | no need to explain | |~|matched by| |!~| not matched by| # Chapter 3: Data Processing # Chapter 4: Reports and Databases # Chapter 5: Processing Words # Chapter 6: Little Languages # Chapter 7: Experiments with Algorithms # Chapter 8: Epliog # Appendex A: AWK Summary