UNIX tool: sed & awk note

# UNIX tool: sed & awk note ###### tags: `NSYSU` [TOC] ## sed ### one-line #### double space a file ```shell= sed G ``` #### double space a file which already has some blank lines in it: ```shell= sed '/^$/d;G' ``` #### triple space a file: ```shell= sed 'G;G' ``` #### undo double-spacing (assumes all even-numbered lines are always blank): ```shell= sed 'n;d' ``` #### insert a blank line above every line which matches "regex": ```shell= sed '/regex/{x;p;x;}' ``` #### insert a blank line below every line which matches "regex": ```shell= sed '/regex/G' ``` #### insert a blank line above and below every line which matches "regex": ```shell= sed '/regex/{x;p;x;G;}' ``` #### number each line of a file (like grep -n "^"): ```shell= sed = filename | sed 'N;s/\n/:/' ``` #### number each line of a file (like cat -n): ```shell= cat filename ``` ![](https://i.imgur.com/OpFL6fW.png) ```shell= sed = filename | \ sed 'N; s/^/ /;s/ *$.\{6,\}$\n/\1\t/' ``` ![](https://i.imgur.com/bUhhOiE.png) #### number each line of file, but only print numbers if line is not blank: ```shell= sed '/./=' filename | sed '/./N;s/\n/ /' ``` #### count lines (like "wc -l"): ```shell= sed -n '$=' ``` #### delete leading whitespace (spaces, tabs) from front of each line: ```shell= sed 's/^[ \t]*//' ``` #### delete trailing whitespace (spaces, tabs) from end of each line: ```shell= sed 's/[ \t]*$//' ``` #### delete BOTH leading & trailing whitespace from each line: ```shell= sed 's/^[ \t]*//;s/[ \t]*$//' ``` #### insert 5-space margin on left of each line: ```shell= sed 's/^/ /' ``` #### change scarlet or ruby or puce to "red": ```shell= sed 's/scarlet/red/g;s/ruby/red/g;s/puce/red/g' ``` #### align all text flush right on a 79-column width (set at 78 plus 1 space): ```shell= sed ':a;s/^.\{1,78\};$/ &/;ta' ``` #### substitute (find and replace) "foo" with "bar" on each line (replace only the 1st instance): ```shell= sed 's/foo/bar/' ``` #### replace only the 4th instance: ```shell= sed 's/foo/bar/4' ``` #### replaces ALL instances: ```shell= sed 's/foo/bar/g' ``` #### replace only the last case: ```shell= sed 's/$.*$foo/\1bar/' ``` #### replace the next-to-last case: ```shell= sed 's/$.*$foo$.*foo$/\1bar\2/' ``` #### substitute "foo" with "bar" ONLY for lines which contain "baz": ```shell= sed '/baz/s/foo/bar/g' ``` #### substitute "foo" with "bar" EXCEPT for lines which contain "baz": ```shell= sed '/baz/\!s/foo/bar/g' ``` #### center all text in the middle of 79-columns, with spaces on the right to fill the columns and with leading spaces being significant: ```shell= sed ':a;s/^.\{1,77\}$/ & /;ta' ``` #### center all text in the middle of 79-columns, with no trailing spaces and ignoring leading spaces: ```shell= sed ':a;s/^.\{1,77\}$/ &/;ta;s/$ *$\1/\1/' ``` #### reverse order of lines (like "tac"): ```shell= sed '1\!G;h;$\!d' # method 1 sed -n '1\!G;h;$p' # method 2 sed -n '2,$G;h;$p' # method 3 ``` #### reverse the character on the line (like "rev"): ```shell= sed '/\n/\!G;s/$.$$.*\n$/&\2\1/;//D;s/.//' ``` #### join pairs of lines side-by-side: ```shell= sed '$\!N;s/\n/ /' ``` #### if a line ends with a backslash, append the next line to it: ```shell= sed ':a;/\\$/N;s/\\\n//;ta' ``` #### if a line begins with "=" then append it to the previous line & replace the "=" with a space: ```shell= sed ':a;$\!N;s/\n=/ /;ta;P;D' ``` #### add commas to numeric strings, changing "1234567" to "1,234,567": ```shell= sed ':a;s/$.*[0-9]$$[0-9]\{3\}$/\1,\2/;ta' sed ':a;s/\B[0-9]\{3\}\>/,&/;ta' # GNU sed ``` #### add commas to numbers with decimal points and minus signs (GNU sed only): ```shell= sed -r ':a;s/(^|[^0-9.])([0-9]+)([0-9]{3})\ /\1\2,\3/g;ta' ``` #### add a blank line after every 5 lines (after lines 5, 10, 15, 20, etc.): ```shell= sed 'n;n;n;n;G;' sed '0~5G' # GNU sed only ``` ## awk ### built-in varialbe -------------------- VAR | Description ----|-------------- $0 | The entire line $n | Field n ARGC| EX: ARGC == 2 ARGV| EX: ARGV[0]\=="awk", ARGV[1]\=="filename"; Note: awk arguments are filenames. NF | Number of fields in current line (or record) NR | Number of lines (or records) read so far FS | Input field separator; Default: BEGIN{FS\="[ \t]+"}; Note: it will only apply to future input lines/records. RS | Input Record Separator; Default: BEGIN{RS\="\n"} OFS | Output field separator; Default: BEGIN{OFS\=" "} ORS | Output Record Separator; Default: BEGIN{ORS="\n"} ### Flags ---- flag | Description --------|------------- -f <filename> | Uses the awk file instead; Can also use #!/usr/bin/awk -f in first line. -F "x"| Uses the symbol(s) in "x" for the field separator ### Operators ---- op | Description ----|---- = | assignment operator; sets a variable equal to a value or string == | equality operator; TRUE if both sides are equal != | inverse equality operator \~/\!~| extended regular expression comparison && | logical AND \|\| | logical OR ! | logical NOT <, >, <=, >= | relational operators +, -, \/, *, %, ^ | Math operators space | String concatenation(implicit or explicit) ### Pattern 1. use extended regular expression 2. syntax example ```shell= /regex/{action} #if match pattern, do... ``` 3. special pattern + BEGIN + def: matches before the first input line is read + END + def: matches after the last input line has been read + example ```shell= BEGIN{FS=",";print "NAME RATE HOURS"; print "" }{ print } END{ print "total number of employees is", NR } ``` :::info NOTE: Although NR retains its value after the last input line has been read, $0 does not (on some systems): solution: ```shell= { last = $0 } END{ print NR ":", last } ``` ::: --- ### Control flow #### if-else example ```shell= $2 > 6 { n = n + 1; pay = pay + $2 * $3 } END { if (n > 0) print n, "employees, total pay is", pay, "average pay is", pay/n else print "no employees are paid more than $6/hour" } ``` --- #### while loop example ```shell= { i = 1 while (i <= $3) { printf(“\t%.2f\n”, $1 * (1 + $2) ^ i) i = i + 1 } } ``` --- #### for loop example ```shell= { for (i = 1; i <= $3; i = i + 1) printf(“\t%.2f\n”, $1 * (1 + $2) ^ i) } ``` --- ### built-in function #### string 1. length() + Def: tell you the number of elements in an array OR characters in a string. + example: equivalent of wc ```shell= #!/usr/bin/awk -f { nc = nc + length($0) + 1 nw = nw + NF } END { print NR, "lines,", nw, "words,", nc, "characters" } ``` 2. index() + Def: + equivalent of the strstr() function of C + Returns a positive value when the substring is found. + The number returned is the location of the substring. :::warning If the substring consists of 2 or more characters, all of them must be found, sequentially, in the same order, for a match. ::: + example: Find a comma ```shell= { sentence="This is a short, useless sentence."; if (index(sentence, ",") > 0) {printf("Found comma at %d\n", index(sentence,","));} } ``` 3. sub() + Def: perform string substitution + syntax ```shell= sub(/old/, "new", stringvariable) ``` :::info + stringvariable: Optional, default is $0 + 1 is returned if a substitution occurs, otherwise 0 ::: + example ```shell= echo 1 + 2 + 3 | awk '{sub("[1 ]+","@")}1' ``` 4. gsub() + Def: global substitution + Like sed global substitution ```shell= sed 's/regex/"new"/g' ``` 5. substr() + Def: extracts part of a string. :::info + It takes 2 or 3 arguments. + The 1st is the string. + The 2nd is the position of the start of the substring to extract. + The 3rd is the length of the string to extract. + If this 3rd argument is missing, the rest of the string is used. ::: + example: process some mail addresses ```shell= #!/bin/awk -f { if ((x=index($1,"@")) > 0) { username = substr($1,1,x-1); hostname = substr($1,x+1,length($1)-x); printf("username = %s, hostname = %s\n", username, ostname); } } ``` 6. split() + Def: divides up a string :::info + Takes 3 arguments: the string, an array to fill, and a separator. + The 3rd argument is a regular expression. + split($0,A,FS) will create an array such that: ```shell= $1==A[1], $2==A[2], … $NF==A[NF]. ``` ::: + example: break up a sentence into words ```shell= #!/usr/bin/awk -f BEGIN { string="This is a string, is it not?"; search=" "; n=split(string,array,search); for (i=1; i<=n; i++) { printf("Word[%d]=%s\n",i,array[i]); } exit; } ``` #### Output 1. printf() + syntax ```shell= printf( format, val1, val2, val3, … ) ``` :::warning AWK makes no distinction between a number and a string! ::: 2. print + Def :::info + Put spaces where ever there are commas + Insert a new-line at the end ::: ### Associative Arrays + Def + like python dictionary + use hash table to make some task more efficient. + example + awk ```shell= #!/usr/bin/awk BEGIN{ dict["danke"] = "thanks" dict["schoen"] = "beautiful" } {for(i=1;i<=NF;i++)if($i in dict) $i=dict[$i]} 1 ``` :::info Reserved word:in >$i is in dict ? true : false ::: + python ```python= dict = {"danke" : "thanks", "schoen" : "beautiful"} ``` :::warning + NOTE + key is case-sensitive and must be full matched. + can't match substring. + The code below will not match the key which is combined with two fields. ```shell= BEGIN{ dict["danke"] = "thanks" dict["schoen"] = "beautiful" dict["danke schoen"] = "thank you very much." } {for(i=1;i<=NF;i++)if($i in dict) $i=dict[$i]} 1 ``` + solution ```shell= #!/usr/bin/awk BEGIN{ dict["danke"] = "thanks" dict["schoen"] = "beautiful" dict["danke schoen"] = "thank you very much." } { for(i = 1; i <= NF; i++) { if($i" "$(i+1) in dict) { $i = dict[$i" "$(i+1)] $(i+1) = "" } if($i in dict) $i = dict[$i] } } 1 ``` ::: + multi-dimensional array + example ```shell= { L[NF" "NR]=$0; } { L[NF "," var]=$0; } { L[$1" "$3]=$0; } ``` + Split() and Associative Array + How to create an array indexed by numbers? + month[1] is 'Jan', month[2] is 'Feb', month[3] is 'Mar', … ```shell= split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", month, " ") ``` + How to create an array named "mdigit", indexed by these “month” strings? + mdigit["Jan"] is 1, mdigit["Feb"] is 2 ```shell= for (i=1; i<=12; i++) mdigit[month[i]] = i ``` ### AWK one-liners #### file spacing + Double space a file ```shell= awk '1;{print ""}' #or awk 'BEGIN{ORS="\n\n"};1' ``` + Double space a file which already has blank lines in it? + Output file should contain no more than one blank line between lines of text. ```shell= awk 'NF{print $0 "\n"}' ``` :::warning NOTE: On Unix systems, DOS lines which have only CRLF (\r\n) are often treated as non-blank, and thus 'NF' alone will return TRUE. ::: + Triple space a file? ```shell= awk '1;{print "\n"}' ```