AWK and SED tips

# AWK and SED tips ###### tags: 'linux' `awk` `sed` ``` # sum up all line value in a file awk '{ sum += $1 } END { print sum }' <file name> # get all lines after specific line number (for example, 1000, include line 1000) cat <file> | awk '{if (NR>=1000) print}' # Get odd or even line from text file - get odd lines cat <file name> | sed -n 'p;n;d' cat <file name> | sed -n 'p;n' gawk '{if(NR%2!=0)print}' <file name> cat <file name> | awk 'NR%2' - get even lines cat <file name> | sed -n 'n;p;d' # Use multiple pattern in SED ! use -e to combin multiple pattens ! or use ; winth ' ' sed -e 's/<Find Pattern/<Replace Pattern>/g' -e 's/<Find Pattern/<Replace Pattern>/g' sed 's/<Find Pattern/<Replace Pattern>/g;s/<Find Pattern/<Replace Pattern>/g' - Example: ubuntu@ip-172-31-29-119:~$ _date=$(date +'%D_%T') ubuntu@ip-172-31-29-119:~$ echo $_date 05/28/16_07:58:28 ubuntu@ip-172-31-29-119:~$ echo $_date | sed -e 's/\//_/g' -e 's/:/_/g' 05_28_16_07_58_28 ubuntu@ip-172-31-29-119:~$ echo $_date | sed 's/\//_/g; s/:/_/g' 05_28_16_07_58_28 # Basic AWK usage ubuntu@ip-172-31-29-119:/tmp$ cat file2 string1 string2 string3 string4 string5 string6 string7 string8 string9 string10 string11 string12 string13 string14 string15 string16 string17 string18 string19 string20 string21 string22 string23 string24 ubuntu@ip-172-31-29-119:/tmp$ cat file stringA:stringB:stringC:stringD stringE:stringF:stringG:stringH stringI:stringJ:stringK:stringL [ Print Columns by Number using AWK ] - Print all columns. ubuntu@ip-172-31-29-119:/tmp$ awk '{print $0}' file2 string1 string2 string3 string4 string5 string6 string7 string8 string9 string10 string11 string12 string13 string14 string15 string16 string17 string18 string19 string20 string21 string22 string23 string24 - Print the 3rd column. ubuntu@ip-172-31-29-119:/tmp$ awk '{print $3}' file2 string3 string11 string19 - Print the 1st and the 3rd columns. ubuntu@ip-172-31-29-119:/tmp$ awk '{print $1 $3}' file2 string1string3 string9string11 string17string19 ubuntu@ip-172-31-29-119:/tmp$ awk '{print $1,$3}' file2 string1 string3 string9 string11 string17 string19 [ Change Field Separator in AWK ] - Use ":" (colon) as a separator and print the 3rd column. ubuntu@ip-172-31-29-119:/tmp$ awk -F ":" '{print $3,$4}' file stringC stringD stringG stringH stringK stringL [ Exclude Specific Columns using AWK ] - Print all other columns but not the 3rd one. ubuntu@ip-172-31-29-119:/tmp$ awk '{$3=""; print $0}' file2 string1 string2 string4 string5 string6 string7 string8 string9 string10 string12 string13 string14 string15 string16 string17 string18 string20 string21 string22 string23 string24 - Print all other columns but not the 1st and the 2nd. ubuntu@ip-172-31-29-119:/tmp$ awk '{$1=$2=""; print $0}' file2 string3 string4 string5 string6 string7 string8 string11 string12 string13 string14 string15 string16 string19 string20 string21 string22 string23 string24 [ Print or Exclude a Range of Columns using AWK ] - Print a range of columns from the 2nd till the 4th. ubuntu@ip-172-31-29-119:/tmp$ awk -v f=2 -v t=4 '{for(i=f;i<=t;i++) printf("%s%s",$i,(i==t)?"\n":OFS)}' file2 string2 string3 string4 string10 string11 string12 string18 string19 string20 - Exclude a column range from the 2nd till the 4th and print the rest of the columns. ubuntu@ip-172-31-29-119:/tmp$ awk -v f=2 -v t=4 '{for(i=1;i<=NF;i++)if(i>=f&&i<=t)continue;else printf("%s%s",$i,(i!=NF)?OFS:ORS)}' file2 string1 string5 string6 string7 string8 string9 string13 string14 string15 string16 string17 string21 string22 string23 string24 # Extract string between two specific words ubuntu@ip-172-31-29-119:~$ email="<some text> from=someuser@somedomain.com, <some text>" ubuntu@ip-172-31-29-119:~$ echo $email | awk -v FS="(=|,)" '{print $2}' someuser@somedomain.com ubuntu@ip-172-31-29-119:~$ string="other stings1...start string_i_want end...other strings2..." ubuntu@ip-172-31-29-119:~$ echo $string | awk -v FS="(start|end)" '{print $2}' string_i_want [ Use grep with perl-regexp ] ubuntu@ip-172-31-29-119:~$ echo $string | grep -oP 'start\K.*(?=end)' string_i_want ubuntu@ip-172-31-29-119:~$ echo $string | grep -oP 'start\K(?:(?!end).)*' string_i_want ubuntu@ip-172-31-29-119:~$ echo $string | grep -oP 'start\s*\K.*(?=\s+end)' string_i_want ubuntu@ip-172-31-29-119:~$ echo $string | grep -oP 'start\s*\K(?:(?!\s+end).)*' string_i_want ``` Source: https://www.shortcutfoo.com/app/dojos/awk/cheatsheet Basics I ``` $1 Reference first column awk '/pattern/ {action}' file↵Execute action for matched pattern 'pattern' on file 'file' ; Char to separate two actions print Print current record line $0 Reference current record line ``` Variables I ``` $2 Reference second column FS Field separator of input file (default whitespace) NF Number of fields in current record NR Line number of the current record ``` Basics II ``` ^ Match beginning of field ~ Match opterator !~ Do not match operator -F Command line option to specify input field delimiter BEGIN Denotes block executed once at start END Denotes block executed once at end str1 str2 Concat str1 and str2 ``` One-Line Exercises I ``` awk '{print $1}' file↵ Print first field for each record in file awk '/regex/' file↵ Print only lines that match regex in file awk '!/regex/' file↵ Print only lines that do not match regex in file awk '$2 == "foo"' file↵ Print any line where field 2 is equal to "foo" in file awk '$2 != "foo"' file↵ Print lines where field 2 is NOT equal to "foo" in file awk '$1 ~ /regex/' file↵ Print line if field 1 matches regex in file awk '$1 !~ /regex/' file↵ Print line if field 1 does NOT match regex in file ``` Variables II ``` FILENAME Reference current input file FNR Reference number of the current record relative to current input file OFS Field separator of the outputted data (default whitespace) ORS Record separator of the outputted data (default newline) RS Record separator of input file (default newline) ``` Variables III ``` CONVFMT Conversion format used when converting numbers (default %.6g) SUBSEP Separates multiple subscripts (default 034) OFMT Output format for numbers (default %.6g) ARGC Argument count, assignable ARGV Argument array, assignable ENVIRON Array of environment variables ``` Functions I ``` index(s,t) Position in string s where string t occurs, 0 if not found length(s) Length of string s (or $0 if no arg) rand Random number between 0 and 1 substr(s,index,len) Return len-char substring of s that begins at index (counted from 1) srand Set seed for rand and return previous seed int(x) Truncate x to integer value ``` Functions II ``` split(s,a,fs) Split string s into array a split by fs, returning length of a match(s,r) Position in string s where regex r occurs, or 0 if not found sub(r,t,s) Substitute t for first occurrence of regex r in string s (or $0 if s not given) gsub(r,t,s) Substitute t for all occurrences of regex r in string s ``` Functions III ``` system(cmd) Execute cmd and return exit status tolower(s) String s to lowercase toupper(s) String s to uppercase getline Set $0 to next input record from current input file. ``` One-Line Exercises II ``` awk 'NR!=1{print $1}' file↵ Print first field for each record in file excluding the first record awk 'END{print NR}' file↵ Count lines in file awk '/foo/{n++}; END {print n+0}' file↵ Print total number of lines that contain foo awk '{total=total+NF};END{print total}' file↵ Print total number of fields in all lines awk '/regex/{getline;print}' file↵ Print line immediately after regex, but not line containing regex in file awk 'length > 32' file↵ Print lines with more than 32 characters in file awk 'NR==12' file↵ Print line number 12 of file ```