UNIX III - HackMD

[Toc] # UNIX III ## the sed command "Stream EDitor", It is an "non-interactive" text editor. The reason to use sed to eliminate the tedium of routine editing tasks: find, replace, delete, append, insert, etc. ![截圖 2024-05-02 下午4.14.10](https://hackmd.io/_uploads/B1gPsaeMC.png) you specify: ```sed [-option][command][/pattern][/replacement][/flag] file``` 1. a __type of action__ to perform when matched (such as 's' means substitute) 2. a __pattern__ that you are looking for (the patterns are regex) 3. the exact __details of the action__ 4. fome __flags__ (such as `/g` means global, if without `/g`, it would only have replaced the ++first++ match one ++each++ line) whatever symbol goes after the paraamters becomes the delimiter, we could use other symbols as well, such as `sed 's-/-\\-g`, which use `-` as a delimiter. ```tcsh % pwd /home/yen/unix % pwd | sed 's/.*\///' unix % pwd | sed 's-.*/--' unix ``` --- useful ```sed``` command-line [-option] and flags symbols * ```-n``` no auto-printing of the pattern space * ```-e``` execute the command sequence specified in the argument following this flag (mostly by default) * ```-f``` obtain a command sequence from ++a file++ with the name spacified in the argument filliwing this flag * ```&```: you reinsert all of the matched patter with ```&``` --- three common substitute flags: ```SUBSUTUTTE s/pattern/replacement/[flags]``` * ```/g``` Replace all instances of /pattern/, not just the first on the line. ![image](https://hackmd.io/_uploads/Sk0nqqee0.png) * ```/#n``` Replace only the n~th~ instance of /pattern/ on the line. ![image](https://hackmd.io/_uploads/HJWwo5xg0.png) * ```/p``` Print the line if a successful substitution is done. If a `/g` flag allows several substitutions to be done, only the result after the final substitution is printed. ![截圖 2024-04-08 清晨6.18.06](https://hackmd.io/_uploads/Sytso9xgR.png) _*note: usually use the ```/p``` flag with the output suppression flag; ```-n``` just want to print the lines that match_ --- _*Noticing that sed always find the longest matching pattern in the input._ ``` % cat -n file 1 Joe paid $20 to 100 people 2 I paid ten dollars to five people % cat file | sed -n 's/ten/ten/p' I paid ten dollars to five people % cat file | sed -n 's/ten/&/p' I paid ten dollars to five people % cat file | sed 's/[0-9]\{1,\}/(number=&)/' Joe paid $(number=20) to 100 people I paid ten dollars to five people % echo '5Hello! %$%' | sed \ ? 's/[A-Za-z][A-Za-z]*/"I found a word"/' 5"I found a word"! %$% % % cat s the quick brown fox jumped over the lazy dog % cat s | sed 's,$the \([a-z]* $*the\),[\1],' [the quick brown fox jumped over the] lazy dog % cat s | sed 's,$the \([a-z]* $*the\),[\2],' [over ] lazy dog % cat s | sed 's,$the \(\([a-z]* $*\)the\),[\2],' [quick brown fox jumped over ] lazy dog ``` runnung multiple sed commands using the __semicolon__ or `-e` flag, where there are different ways to run `sed`: 1. tring it as an argument to sed 2. given a filename argument to sed such as `sed -f way < file` 3. running directly from a sed script ``` #!/usr/bin/sed -f s/A/a/;s/B/b/ ``` ## how sed works? pesudo code of sed working ``` while(!EOF){ 1. load the pattern space with the next line from STDIN. pattern space = a data buffer - the "current text" as it's being editing “As it’s being edited” means that your substitutions change the pattern space. 2. foreach subcommand within this sed command - use the pattern space as input to the subcommand - do the subcommand, possibly printing to STDOUT “printing to STDOUT” means that it’s not going to the pattern space - put the answer into the pattern space 3. write the pattern space to STDOUT (if the -n flag is not used) } ``` alogrithm: ![image](https://hackmd.io/_uploads/BkHKPfWzC.png) considering that: ``` yenubuntu:~/unix> echo "A B C" | sed 's/B/b/' A b C yenubuntu:~/unix> echo "A B C" | sed 's/B/b/p' A b C A b C yenubuntu:~/unix> echo "A B C" | sed -n 's/B/b/p' A b C yenubuntu:~/unix> ``` the first one auto-printed the pattern space, since there was no `-n` flag to stop it from printing(actually `sed` is an UNIX command of Stream EDitor not the STDOUT, it prints because of its work mode that write to the STDOUT by dafault); the second one printed twice, once because of the /p, once beacuse of no -n flag; the last one with `-n` and `/p`, so it print once with `/p` flag. ``` yenubuntu:~/unix> echo "A B C" | sed 's/B/b/p';'s/C/c/' A b C A b c yenubuntu:~/unix> echo "A B C" | sed -n 's/B/b/p';'s/C/c/' A b C yenubuntu:~/unix> ``` there is no output from the 2^nd^ subcommand since we use `-n` to tell it not to print the pattern space at the end. ## the sed commands 1. without action (the comment): # 2. perform an action: a, c, d, D, g, G, h, H, i, l, n, N, p, r, s, w, x, y, z, = 3. related to control flow: b, q, t, T, !, :, ;, \n, {, }, \, /, a number, $, "," above all flags / options will be cetegorzing to 7. > 1. [command separators](#1-command-separators): `;`, `\n`, `{`, `}` > 2. [direct to STDOUT](#2-direct-to-STDOUT): `a`, `c`, `i`, `p`, `P`, `=` > 3. [update the pattern space](#3-update-the-pattern-space): `d`, `D`, `n`, `N`, `s`, `y`, `z` > 4. [use the hold space](#4-use-the-hold-space): `g`, `G`, `h`, `H`, `x` > 5. [general control flow](#5-general-control-flow): `b`, `q`, `t`, `T`, `:`, (`c`, `d`, `D`) > 6. [predicated execution](#6-predicated-execution): `!`, `/`, `\`, `a number`, `$`, `,` > 7. [unusual output](#7-unusual-output): `l`, `r`, `w` ### 1. command separators `sed` command could be separated either ++semicolon++ or a ++newline++ character, and the command sequence can be further added to with `-e` of `-f` or be grouped with `{` and `}`. But, you should put a `;` before `}`, that is, `;}`. Most people’s sed versions will not require the `;`, but that is non-standard. If commands also follow the `}`, then use `;};` * `;`: semicolon * `\n`: __A__ newline character * `{` * `}` ### 2. direct to STDOUT * `p`: print the pattern space to STDOUT it will print at the 2~nd~ point before the 3~rd~ point of the pesudo code of how sed works ![截圖 2024-05-02 晚上11.10.12](https://hackmd.io/_uploads/BJmkT7-z0.png) ![image](https://hackmd.io/_uploads/B1JGhHrGC.png) * `P`: print the pattern space to STDOUT, but only up to the first newline character * `=`: print the line number to STDOUT it will print at the 2~nd~ point before the 3~rd~ point of the pesudo code of how sed works * `i`: following the i, the ++rest of the line++ is a string to __insert__ to the STDOUT ![image](https://hackmd.io/_uploads/Sy1b0HSGC.png) * `a`: following the a, the ++rest of the line++ is a string to __append__ to the STDOUT __after__ the pattern space gets printed(which happens later) ![截圖 2024-05-06 凌晨2.22.26](https://hackmd.io/_uploads/BksOCSBf0.png) ==EXAMPLE== programming assignment II, spring 24' ```tcsh= #!/usr/bin/tcsh echo echo Echo echoed, '"'You said, "'$*'".'"' echo echo | sed -n "iSed's "'"i"'" said, "'"You said, '"'$*'"'."' echo echo $*:q | sed "s/.*/Sed's "'"s"'" said, "'"You said, '"'&'"'."/' echo echo $*:q | sed "iSed's "'"i" and "a" said, "You said, '"'""\ a'."'"'\ |tr -d "\n"; echo echo echo $*:q | sed "x;s/.*/Sed's hold space was used to say, "'"You said, '"'/;G;s/\n//;x;s/.*/'."'"'"/;x;G;s/\n//" echo ``` ![截圖 2024-05-06 凌晨2.25.54](https://hackmd.io/_uploads/HyWrkUSzA.png) * `c`: following the c, the ++rest of the line++ is a string to print to the STDOUT. afterwards, immediately, start a new cycle for the next line of input ![截圖 2024-05-06 凌晨2.41.06](https://hackmd.io/_uploads/HyXRMLBzR.png) ### 3. update the pattern space * `s`: substitute pattern with string, have mentioned above * `z`: zap the pattern space (equivalent to `s/.*//g`) * `y`: do a `tr` like list-based substitution | | `tr` | `sed y` | | ------- | -------------------------- | --------- | | match | last | __first__ | | padding | with the final replacement | N/A | | range | allow, such as `a-z` | NOT allow | \ ![截圖 2024-05-07 晚上9.41.04](https://hackmd.io/_uploads/Hynuk3vzA.png) ![截圖 2024-05-07 晚上9.41.31](https://hackmd.io/_uploads/Hkd9y2PMC.png) _*note: whatever comes after y is the __delimiter__._ ==EXAMPLE== ![截圖 2024-05-07 晚上10.05.15](https://hackmd.io/_uploads/SkaQrnvMA.png) comparison: 1. `tr` - allows you to indicate range `tr -d 0-9` == `tr -d 0123456789` - allows you to use padding in the replacement string `tr 0-9 01` == `tr 0-9 0111111111` - ignore left over characters in the replacement string `tr 0-9 a-z` == `tr 0-9 a-j` - uses the ++last++ match in the replacement string `tr banana 123456` == `tr bna 156` 2. `sed y` - disallow range - requires the replacement string to have __same__ size - uses the ++first++ match in the replacement string `sed y/banana/123456/` == `sed y/ban/123/` * `n`: **replace** pattern space with the next input line, ++after printing the old space++ (unless `-n`) ``` yenubuntu:~/unix> seq 5 | sed -n n yenubuntu:~/unix> seq 5 | sed n | tr '\n' , ; echo 1,2,3,4,5, yenubuntu:~/unix> seq 5 | sed -n 'n;p' | tr '\n' , ; echo 2,4, yenubuntu:~/unix> seq 5 | sed 'n;p' | tr '\n' , ; echo 1,2,2,3,4,4,5, yenubuntu:~/unix> ``` * `N`: **append** the next input line into the pattern space (with a newline inserted before it) ``` yenubuntu:~/unix> cat sample Hello world This is a test yenubuntu:~/unix> sed N sample Hello world This is a test yenubuntu:~/unix> echo -n || every two lines now are treated as a single line yenubuntu:~/unix> sed 'N;s/\n/ /g' sample Hello world This is a test yenubuntu:~/unix> seq 7 | sed 'N;N;s/\n/,/g' 1,2,3 4,5,6 7 yenubuntu:~/unix> ``` > differences between `n` and `N`: > 1. Pattern space management: `n` effectively resets the pattern space with the next line, whereas `N` appends the next line to the existing contents of the pattern space. > 2. Use cases: Use `n` for selective processing or skipping lines; use `N` for handling patterns that span multiple lines. > > Both commands are powerful tools in sed for controlling how line-oriented data is processed. Whether you use `n` or `N` depends largely on the specific requirements of the text processing task at hand. * `d`: delete the pattern space. immediately start a new cycle for the next line of input ``` yenubuntu:~/unix> echo "A B C" | tr " " "\n" | sed '=;d;ino' 1 2 3 yenubuntu:~/unix> echo "A B C" | tr " " "\n" | sed '=;i yes\ ? ;d' 1 yes 2 yes 3 yes ``` _*note: that `d` has the side effect that it stops processing the input line. So the `i` needs to go first._ so far it looks like it just stops output, the same as `-n` ``` % echo "A B C" | tr " " "\n" | sed 's/B/b/p; d' b % echo "A B C" | tr " " "\n" | sed -n 's/B/b/p' b % ``` But its usefulness is with control flow, as we will see later. * `D`: If no newline in pattern space, perform a “d”. Otherwise, delete the pattern space up to first newline, and restart with the resultant pattern space, without reading new input. ``` yenubuntu:~/unix> seq 7 | sed 'N;D' 7 yenubuntu:~/unix> seq 3 | sed 'N;N;D' 2 3 yenubuntu:~/unix> seq 5 | sed 'N;N;D' 3 4 5 yenubuntu:~/unix> yenubuntu:~/unix> echo "lineA\nlineB\nlineC" | sed 'N;s/\n/+/' lineA+lineB lineC yenubuntu:~/unix> echo "lineA\nlineB\nlineC" | sed 'N;D' lineC ``` ### 4. use the hold space in addition to pattern space, sed provides a second space, the **hold space** there is not many commands modify the hold space. for instance, you cannot access it, unless you first bring it into the pattern space. * `h`: copy the pattern space into the hold space * `H`: append the pattern space into the hold space * `g`: get the hold space (i.e. load into the pattern space) * `G`: append the hold space into the pattern space * `x`: ++eXchange++ the pattern space and hold space ![截圖 2024-05-08 下午1.43.04](https://hackmd.io/_uploads/r14gWq_zC.png) ``` yenubuntu:~/unix> echo world | sed "x;ihello\ ? ;g" hello world yenubuntu:~/unix> echo world | sed "x;s/.*/hello/;x" world yenubuntu:~/unix> echo world | sed "x;s/.*/hello/;x;x" hello yenubuntu:~/unix> echo world | sed "x;s/.*/hello/;x;x;G;s/\n/ /" hello world yenubuntu:~/unix> ``` :::info ![截圖 2024-05-08 下午1.46.13](https://hackmd.io/_uploads/H1VhbqOMA.png) ::: ### 5. general control flow It's the same way computer hardware achieves control flows. As well, we consider that the computer can run our compiled program, so this must be enough syntax to achieve any desired control flow: if/else, while or for. sed control flow is simply, but ugly, assume 'x' as example: * `:x` defines a label 'x' you can branch to * `bx`: branches to a label called 'x' (if no label is given, then branch to the end) _*note: other languages put ':' after the label_ ```c $ cat test.c #include <stdio.h> int main(){ printf("hello"); goto L1; printf("pass"); L1: printf(" world!\n"); } $ gcc -o test test.c $ ./test hello world! ``` * `q`: this quits sed, but print the pattern space. - It will __not fetch any more input lines__ - It will, however, print the pattern space (unless the “-n” flag was used). - This is control flow in the same sense as b, t, d, & D, because it affects the program counter of this sed program. * `tx`: the ++t++est to conditionally branch to label 'x;, if any previous `s` command had matched * when an `s` is successful, it sets a certain flag, which remains set until the next `t` executes * A `t` test the flag to decide whether to branch * but it will alse reset the flag, so if you cause the reset, use `tx;:x` :::info ![截圖 2024-05-08 下午3.35.32](https://hackmd.io/_uploads/ByDIsj_f0.png) ::: * `Tx`: opposite of 't' - branch if no flag (:warning: not standard) --- These do have side effect of goint to the top of the control flow: `c`, `C`, `d` * `d` & `D` will “Update the pattern space.” But they also affect control flow, by causing a restart (perhaps after loading a new input line). * `d`: delete the pattern space. immediately ++start a new program++ for the next input line * `D`: If no '\n' in pattern space, perform a “d”. Otherwise, delete the pattern space up to first '\n', and ++restart the program++ on the resultant pattern space * `c` will “Write to stdout.” But it also affects control flow, by causing a restart (after loading a new input line). * `q`: ++quits sed, but print the pattern space. (unless `-n`)++ ### 6. predicated execution Predication is non-general control flow. It won’t change the program counter, but it may prevent the predicated command (__singular__, which means predication applies to only the immediately following command) from executing. but we can group command++s++ by using "{}". * `a number`: execute the command(s) that follows only if it matches the ++line number++ given * `$`: indicates the final line number * `,`: execute over a range commands can be predicated for specific line numbers ``` 1 first command line 2 second command line ... $ the last command line i, j from i-th to j-th line, inclusive. j can be $ ``` ==EXAMPLE== | command | meaning | | ---------------- | --------------------------------- | | `sed '3p'` | prints everything _except_ line 3 | | `sed -n '2p'` | prints only line 2 | | `sed -n '1,23p'` | prints only the first 23 lines | | `sed '23q'` | prints only the first 23 lines | | `sed '23, $d'` | prints only the first 23 lines | * `/`: execute the command(s) that follows only if it __matches the ++pattern++ given__ * `\`: e,g,. `\XregexX` has the same effect as /regex/ but allows any character X > 1. to remove all lines with the name: > `% sed '/Steve/ d' < file` > `% sed '/Steve/ s/.*/CONFIDENTIAL/'` > `% sed '/Steve/cCONFIDENTIAL'` > or don't use predication: > `% sed 's/.*Steve.*//'` > > 2. to print all of the lines within C comments: > `%sed -n '/\/\*/ , /\*\// p'` > `/\/\*/` and `/\*\//` are regex > ```c > yenubuntu:~/unix> sed -n '/\/\*/, /\*\// p' < test.c > /* this ia a test program. > the C / *..* / operator > does not allow nesting. */ > ``` --- problems about nested: in C, "{...}"s nest but "/*...*/"s don't ```c yenubuntu:~/unix> cat broken.c #include <stdio.h> int main(){ /* this ia a test program. the C /*..*/ operator does not allow nesting. */ } yenubuntu:~/unix> gcc broken.c test.c: In function ‘main’: test.c:4:18: error: unknown type name ‘operator’ 4 | the C /*..*/ operator | ^~~~~~~~ test.c:5:10: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘not’ 5 | does not allow nesting. */ | ^~~ test.c:6:1: error: expected declaration or statement at end of input 6 | } | ^ yenubuntu:~/unix> cat fixed.c #include <stdio.h> int main(){ /* this ia a test program. the C / *..* / operator does not allow nesting. */ } yenubuntu:~/unix> gcc fiexed.c yenubuntu:~/unix> ``` real point is: __no `sed /.../` nesting.__ C lets you put /*..*/ on one line, but the sed's /../ does not ```c yenubuntu:~/unix> cat fixed.c | sed '/\/\*/, /\*\// s/^/>>> /' #include <stdio.h> int main(){ >>> /* this ia a test program. >>> the C / *..* / operator >>> does not allow nesting. */ } yenubuntu:~/unix> cat broken.c | sed '/\/\*/, /\*\// s/^/>>> /' #include <stdio.h> int main(){ >>> /* this ia a test program. >>> the C /*..*/ operator does not allow nesting. */ } yenubuntu:~/unix> cat test.c | sed '/\/\*/, /\*\// s/^/>>> /' #include <stdio.h> int main(){ >>> /* this ia a test program. >>> the C / *..* / operator >>> does not allow nesting. */ int n; scanf("%d", &n); if (n > 0){ printf("yes\n"); >>> while (n < 0){ /* comment still work here */ >>> printf("%d\n", n--); >>> } >>> } >>> } yenubuntu:~/unix> gcc test.c yenubuntu:~/unix> ``` The last 4 lines aren't actually inside a C comment. Bute the ">>> " indicates that sed thinks they're inside a `/*..*/` _*note: predication works with any sed coomand._ ![截圖 2024-05-10 凌晨2.51.05](https://hackmd.io/_uploads/SJNQj55f0.png) --- 1. combining line numbers and regex: `% sed '1,/stop/ s/#.*//'` remove comments from the beginning of the file until it finds the keyword "stop" 2. put several subcommands inside the condition block with { and }, just like C: `% sed -n '/unix/ {s/x/y/; s/unix/UNIX/p;}'` the last `;` is necessary but not required by every sed implementation. ==EXAMPLE== Noticing: execute the command(s) that follows only if it __matches the ++pattern++ given__ ![image](https://hackmd.io/_uploads/SJtGAkTGC.png) --- * `!`: __negate__ the condition under which to execute the following command _*Note: `!` in the tcsh interpreter return earlier commands, even inside the ' quote. Hence, we use __backslash__ instead; however, not if we run in the sed program from a file._ ``` yenubuntu:~/unix> echo "A B" | tr " " "\n" | sed -n p A B yenubuntu:~/unix> echo "A B" | tr " " "\n" | sed -n !\p yenubuntu:~/unix> echo "A B" | tr " " "\n" | sed '\!d' A B yenubuntu:~/unix> ``` ==EXAMPLE== ![image](https://hackmd.io/_uploads/ryH5Pepz0.png) ![截圖 2024-05-11 晚上9.51.24](https://hackmd.io/_uploads/BJoJ_lpf0.png) > professor' s humor :smiling_face_with_smiling_eyes_and_hand_covering_mouth: > ![截圖 2024-05-11 晚上9.56.16](https://hackmd.io/_uploads/B1nZYe6zA.png) > ![截圖 2024-05-11 晚上9.55.46](https://hackmd.io/_uploads/BkJeFlaM0.png) ### 7. unusual output * `r`: read a file and print it to STDOUT -- but only after the current program finishes. (It is a variant of `a`: whereas a’s argument is a string to print, for r it is the filename to print from.) ``` yenubuntu:~/unix> echo hello > f2 ; echo bye >> f2 ; yenubuntu:~/unix> seq 2 | sed 'rf2\ ? ;s/.*/(&)/;' (1) hello bye (2) hello bye yenubuntu:~/unix> seq 2 | sed 'ahello\nbye\ ? ;s/.*/(&)/;' (1) hello bye (2) hello bye yenubuntu:~/unix> ``` * `w`: write the pattern space to the file ++named by w’s argument++. If the file exist, ++overwrite++ it. (It is a variant of `p`, just with output redirection.) ``` yenubuntu:~/unix> seq 3 | sed -n 's/.*/<&>/;wf2' yenubuntu:~/unix> cat f2 <1> <2> <3> yenubuntu:~/unix> echo hi | sed -n 'wf3;p' yenubuntu:~/unix> cat f3 cat: f3: No such file or directory yenubuntu:~/unix> ls f3* 'f3;p' yenubuntu:~/unix> cat 'f3;p' hi yenubuntu:~/unix> ``` as with `r` and `w`, the rest of line is the filename as well as, it writes in place, based on what the pattern space currently is: ``` yenubuntu:~/unix> echo hi | sed -n 'wf2\ ? ;z;wf2\ ? ;s/^/hello/;wdifffile\ ? ;s/$/ there/;wf2' yenubuntu:~/unix> cat difffile hello yenubuntu:~/unix> cat f2 hi yenubuntu:~/unix> cat f2 hi hello there yenubuntu:~/unix> ``` * `I`: Print the pattern space in a longer form that is “visually unambiguous”. (It is a variant of `p`, just with special characters printing in a plain-text format.) * except for adding a “$”, sed’s l changes nothing here: ``` yenubuntu:~/unix> echo hello world | sed -n l hello world$ ``` * It is useful for unicode or nonprintable characters: ``` yenubuntu:~/unix> echo '你好\r' | sed l \344\275\240\345\245\275\r$ 你好 ``` * It prints directly and immediately to stdout. Similar to p: ``` yenubuntu:~/unix> seq 2|sed 's/$/:你好?/p;l;s/...$/再見\!/' 1:你好? 1:\344\275\240\345\245\275?$ 1:再見! 2:你好? 2:\344\275\240\345\245\275?$ 2:再見! ``` ## remembering these sed command! ![image](https://hackmd.io/_uploads/SkTvrZaMA.png) ## optimizing for speed * Substitution executes quicker if a "find" expression is put before the `s/.../.../ `: ``` % sed 's/foo/bar/g' file # standard % sed '/foo/ s/foo/bar/g' file # faster % sed '/foo/ s//bar/g' file # sed shorthand ``` If the regular expression you want to type is the same as the last expression you used, then you can leave it blank. * Also if you only need to output lines from the first part of the file, use a `q` command: ``` % sed -n '45,50p' file # prints line 45-50 % sed -n '51q;45,50p' file # same, but faster ``` ## useful one-line script for sed [reference](https://www.pement.org/sed/sed1line.txt), lateset updated: Dec. 29, 2005 context: > [file spacing](#file-spacing) > [numbering](#numbering) > [text substitution](#text-substitution) > [selecting printing of lines](#selecting-printing-of-lines) > [selecting deleting of lines](#selecting-deleting-of-lines) ### file spacing * double space a file: `sed G` `G`: append the hold space, where the hold space now is blank * double space a file which already has some blank lines in it `sed '/^$/d;G'`, `/^$/` is a regex, which means from begining(`^`) to the end(`$`) are nothing. That is, a blank line, then delete it(`d`). * triple space a file: `sed 'G;G'` * undo double-spacing (assume all even-numbered lines are always blank) `sed 'n;G'` ==EXAMPLE== ```= yenubuntu:~/unix> cat f2 hi hello there yenubuntu:~/unix> cat f2 | sed '/^$/d;G' hi hello there ``` * insert a blank line __above__ every line which matches `regex` `sed '/regex/{x;p;x;}'` or `sed 's/.*regex/\n&/'`(one character shorter) * insert a blank line __below__ every line which matches `regex` `sed '/regex/G'` * insert a blank line __above__ and __below__ every line which matches `regex` `'sed /regex/{x;p;x;G;}'` _*note: `;}` is the standard one._ ==EXAMPLE== ```= yenubuntu:~/unix> cat f2 hi hello there yenubuntu:~/unix> cat f2 | sed '/hi/{x;p;x;G}' hi hello there ``` ### numbering * number each line of a file * like `grep -n ^` `sed = $filename | sed 'N;s/\n//:/'` (`N`: **append** the next input line into the pattern space (with a newline inserted before it. every two lines now are treated as a single line and have `\n` before the next pattern space) * like `cat -n`: ``` % sed = filename | sed \ ? 'N;s/^/ /;s/ *$.\{6\}$\n/\1\t/' ? 'N;s/^/ /;s/ *$......$\n/\1\t/' (same above) ? 'N;:L;s/^/ /;/......\n/\!bL;s/\n/\t/' (adds 1 by 1) ? 'N;:L;s/^.\{,5\}\n/ &/;tL;s/\n/\t/' (shortest) ``` ==EXAMPLE== ```= yenubuntu:~/unix> sed = f2 | sed 'N;s/\n/:/' 1:hi 2:hello there yenubuntu:~/unix> sed = f2 | sed \ ? 'N;s/^/ /;s/ *$.\{6\}$\n/\1\t/' 1 hi 2 hello there ``` * number of each line of a file, but only print numbers if line is not blank `sed '/./=' $filename | sed '/./N; s/\n/:/'` or `sed = $filename | sed 'N;s/\n/:/;/:$/z'` ==EXAMPLE== ```= yenubuntu:~/unix> cat f2 hi hello there yenubuntu:~/unix> sed /./= f2 | sed '/./N;s/\n/:/' 1:hi 3:hello there ``` * count lines (like `wc -l`) `sed -n '$='` or `sed '$=;d'` ### test substitution > additional: > in UNIX environment, convert DOS newlines (CR/LF, `\r\n`) to Unix format(LF, `\n`). in regex: `\r?\n` `sed 's/.$//'` * deleting leading whitespace (spaces, tabs) from front of each line `sed 's/^[\t]*//'` or `sed 's/[\t]*//'`(without space at front it's fine) * deleting trailing whitespace (spaces, tabs) from end of each line `sed 's/[\t]*$//'` * deleting bott leading and trailing whitespace (spaces, tabs), just combine two commands `sed 's/[\t]*//;s/[\t]*$//'` --- formatting: * align all text flush ++right++ on a ++79-columns width++ (set at 78 + 1 space) `sed ':a;s/^.\{1,78\}$/ &/;ta'` _*note: `:a` defines a label you can branch to, and `ta` is the link._ * center all text in the ++middle++ of ++79-columns, with spaces on the right to fill the columns and leading spaces being significant `sed ':a;s/^.\{1,78\}$/ &/;ta'` * center in the middle of 79-columns, with no trailing spaces and ignoring leading spaces `sed ':a;s/^.\{1,77\}$/ &/;ta;s/$ *$\1/\1/'` _*note: `:a;s/^.\{1,77\}$/ &/;ta` since this part makes it right-justified (the last example below), so we use `s/$ *$\1/\1/` removes half of it. ![截圖 2024-05-25 晚上8.49.45](https://hackmd.io/_uploads/S1LuC814C.png) --- * substitute (find and replace) "foo" with "bar" on each line (replace only the 1^st^ instance) `sed 's/foo/bar/'` * replace only the 4^th^ instance `sed 's/foo/bar/4'` * replace ALL instance `sed 's/foo/bar/g'` * replace only the last case `sed 's/$.*$foo/\1bar/'` * replace the next-to-last case `sed 's/$.*$foo$.*foo$/1bar\2/'` _*note: `\1, \2...` to let you identify a rematch to the earlier pattern._ * substitute "foo" with "bar" __ONLY__ for lines which contains "baz" `sed '/baz/s/foo/bar/g'` * substitute "foo" with "bar" __EXCEPT__ for lines which contains "baz" `sed '/baz/\!s/foo/bar/g'` --- ordering: * reverse order of lines (like `tac`, `seq 4 | tac | tr \\n \ ;echo`) `sed '1\!G;h;$\!d'` or `sed -n '1\!G;h;$p'` or `sed -n '2,$G;gl$p'` \ ==EXAMPLE== ``` yenubuntu:~/unix> echo "hello\nworld" | sed '1\!G;h;$\!d' world hello yenubuntu:~/unix> echo "hello\nworld" | sed '1\!G;h' hello world hello ``` * reverse the character on the line (like `rev`) `sed '/\n/\!G; s/$.$$.*\n$/&\2\1/; //D; s/.//'` or `sed 'G;:L;s/$.$$.*\n$/\2\1/;tL;s/.//'` _*note: in [use the hold space](#4-use-the-hold-space) note_ ``` % seq 4| tr \\n \ | rev;echo 4 3 2 1 % seq 4| sed G\;h|tr \\n \ ;echo 1 2 1 3 2 1 4 3 2 1 ``` ==EXAMPLE== ``` yenubuntu:~/unix> echo 1234|\ ? sed '/\n/\!G; s/$.$$.*\n$/&\2\1/; //D; s/.//' 4321 ``` in this example, the `G` appends the hold space onto the end of the pattern space (with a `\n` separating the two parts.) However, there is nothing `h`, `H` or `x` instructions. Therefore nothing _ever_ goes into the hold space. Then, why the `G`? to add the `\n` to the pattern space. Well, why a `\n`? Actually the code doesn't care that it is a `\n`. It just wants ++a symbol++ which won't appear on an input line. This `\n` is being used __as a marker__ to separate things that haven't been reversed(ie, those to the left of the marker) from things that have been(ie, those to the right of it). > * `/\n/`: if there is already a `\n` in the pattern space... > * `\!`: ...then don't... > * `G`: add a `\n` to the pattern space > > above sentence in sed logic is: add a `\n` if it hasn't already been done. > * `$.$$.*\n$`: this pattern separates the first characters from everything else up to the marker > * `\2\1`: ...then moves that character to right after ther marker > _color hint: \$.\\$\$.*\n\\$ = \2\1_ > * `&`: oddly, it also copies back the original space. so there are now 2 markers temporarily. > > * `//D`: when no pattern is given, the previous pattern is used. Hence, `//D` = `/ $.$$.*\n$ /D` = `/..*\n/D`. the expression `..*\n` means there must be something before the `\n`(ie, something _before the marker_ or _not done_ reversing) > __not done__ reversing: delete up to the first marker, `&`, then restart, since `D` has the side-effect of restarting. > \ > _*note: `D` -- If no newline in pattern space, perform a “d”. Otherwise, delete the pattern space up to first newline, __and restart with the resultant pattern space, without reading new input__._ > * `s/.//`: delete the marker, `\n` [example I](https://imgur.com/mXE7PrG) [example II](https://i.imgur.com/dKA8ZRP.gifv) * put pairs of lines side-by-side (like `paste - -`) `sed '$\!N;s/\n/\t/` ![image](https://hackmd.io/_uploads/rksykYyV0.png) --- :::success * if a line ends with a backslash, append the next line to it `sed ':a;/\\$/N;s/\\\n//;ta'` * if a line begins with "=" then append it to the previous line & replace the "=" with a space: `sed ':a;$\!N;s/\n=/ /;ta;P;D'` * add commas to numeric strings, changing "1234567" to "1,234,567" `sed ':a;s/$.*[0-9]$$[0-9]\{3\}$/\1,\2/;ta'` or `sed ':a;s/\B[0-9]\{3\}\>/,&/;ta'` # GNU sed * add commas to numbers with decimal points and minus signs: `sed 's/\.[0-9]/&\n./g;:a;s/\n\.$[0-9]$/\n\1\n./;ta;s/$.*[0-9]$$[0-9]\{3\}$/\1,\2/;ta;s/\n\.//g;s/\n//g'` or `sed -r ':a;s/(^|[^0-9.])([0-9]+)([0-9]{3});/\1\2,\3/g;ta'` # GNU sed only * add a blank line after every 5 lines (after lines 5, 10, 15, 20, etc.): `sed 'n;n;n;n;G'` or `sed '0~5G'` # GNU sed only ::: ### selecting printing of lines * print first 10 lines of a file (like `head`) `sed 10q` * print the last 10 lines of a file (like `tail`) `sed ':a;$q;N;11,$D;ba'` * print first line of a file (like `head -1`) `sed q` _*note: `q`: this quits sed, but __print__ the pattern space._ * print the last line of a file (like `tail -1`) `sed '$\!d'` or `sed -n '$p'` _*note: `$` indicates the final line number_ * print the last 2 lines of a file (like `tail -2`) `sed '$\!N;$\!D'` ![截圖 2024-05-26 晚上8.16.20](https://hackmd.io/_uploads/B11Q_igVR.png) * print the next-to-the-last line of a file (if only 1 line in the file, print blank line) `sed '$\!{h;d;};x'` or `sed '$ba;h;d;:a;x'` or `sed '${g;p;};h;d'` * print next-to-the-last line (if 1 line, print it) `sed '1{$q;};$\!{h;d;};x'` or `sed '1{$q;};${g;p;};h;d'` * print the next-to-the-last line (if 1 line, print nothing): `sed '1{$d;};$\!{h;d;};x'` or `sed '1{$d;};${g;p;};h;d'` --- * print only lines that contains a specific regex (like `grep`) `sed -n '/regexp/p'` or `sed '/regexp/\!d'` * print lines without regexp (like `grep -v`) `sed -n '/regexp/\!p'` or `sed '/regexp/d'` _*note: `grep -v` invert the matches (i.e., print if not match)_ * print the line immediately _before_ a regexp but not the line containing the regexp `sed -n '/regexp/{g;1\!p};h'` or `sed -n '/regexp/{g;1ba;p;:a}h'` * print the line immediately _after_ a regexp but not the line containing the regexp `sed -n '/regexp/{n;p;}'` :::success * print 1 line of context before and after regexp, with line number indicating where the regexp occurred (like `grep -A1 -B1`) `sed -n '/regexp/{-;x;1\!p;g;$!N;p;D;}h'` ::: --- * grep for AAA or BBB or CCC `sed '/AAA/b;/BBB/b;/CCC/b;d'` or `sed -n '/$[ABC]$\1\1p'` or `sed '/AAA\|BBB\|ccc/!d'` # GNU sed only * grep for AAA and BBB and CCC (any order) `sed '/AAA/\!d;/BBB/\!d;/CCC/\!d'` * grep for AAA and BBB and CCC (that order) `sed '/AAA.*BBB.*CCC/\!d'` * print only lines of 65 characters or longer `sed '/.\{65\}/\!d'` * print only lines of less than 65 characters `sed -n '/.\{65\}/\!p'` or `sed '/.\{65\}/d'` :::danger what is `b` doing here? ::: :::success * print paragraph if it contains AAA (blank lines separate paragraphs) `sed '/./{H;$\!d;};x;/AAA/\!d'` * print paragraph if contains AAA, BBB and CCC `sed '/./{H;$\!d;};x;/AAA/\!d;/BBB/\!d;/CCC/\!d'` * print paragraph if it has AAA or BBB or CCC `sed '/./{H;$\!d;};x;/AAA/b;/BBB/b;/CCC/b;d'` or `sed '/./{H;$ba;d;:a;};x;/AAA/b;/BBB/b;/CCC/b;d'` or `sed '/./{H;$\!d;};x;/\[ABC]\)\1\1/\!d'` or `sed '/./{H;$!d;};x;/AAA\|BBB\|CCC/b;d'` #GNU sed only ::: * print section of file _from regexp to end of file_ `sed -n '/regexp/,$p'` * print section of file based on line numbers (lines 8-12, inclusive) `sed -n 8,12p` or `sed 8,12\!d` * print line number 52 `sed -n 52p` or `sed 52q\;d` (efficient on big files) * beginning at line 3, print every 7th line `sed -n '3,${p;n;n;n;n;n;n;}'` or `sed -n '3~7p'` # GNU sed only * print section of file between two regular expressions (inclusive) `sed -n '/regexp1/,/regexp2/p'` ### selecting deleting_ of lines * print all lines _EXCEPT_ between 2 regexps `sed '/regexp1/,/regexp2/d'` > _COMPARISON_ > print section of file between two regular expressions (inclusive) > `sed -n '/regexp1/,/regexp2/p'` * delete duplicate, _consecutive_ lines from a file (like `uniq`). That is, each first such line is kept, the duplicates are deleted `sed '$\!N; /^$.*$\n\1$/\!P; D'` * delete duplicate, _nonconsecutive_ lines from a file. Beware __not to overflow the buffer size__ of the hold space, or else use GNU sed `sed -n 'G; s/\n/&&/; /^$[ -~]*\n$.*\n\1/d; s/\n//;h;P'` _*note: This `[ -~]` is just an ASCII hack. The space ` ` is ASCII code 32 and the tilde `~` is ASCII code 126. the range from 32 to 126 catches all __printable__ ASCII characters._ --- * delete all lines except duplicates (like `uniq -d`) `sed '$\!N; s/^$.*$\n\1$/\1/;t;D'` * delete the first 10 lines of a file `sed '1,10d'` * delete the last line of a file `sed '$d'` * delete the last 2 lines of a file `sed 'N;$\!P;$\!D;$d'` * delete the last 10 lines of a file `sed ':a;$d;N;2,10ba;P;D'` or `sed -n ':a;1,10\!{P;N;D;};N;ba'` * delete every 8th line `sed 'n;n;n;n;n;n;n;d;'` or `sed '0~8d'` # GNU sed only * delete lines matching pattern `sed '/pattern/d'` * delete ALL blank lines (like `grep .`) `sed '/^$/d'` or `sed '/./\!d'` --- :::success * delete all CONSECUTIVE blank lines from file except the first; also deletes all blank lines from top and end of file (like "cat -s"): `sed '/./,/^$/\!d'`, allows 1 blank at end `sed '/^$/N;/\n$/D'`, allows 1 blank at top * limit the number of CONSECUTIVE blank lines to two `sed '/^$/N;/\n$/N;//D'` * delete all leading blank lines at _top_ of file `sed '/./,$\!d'` * delete all trailing blank lines at _end_ of file `sed ':a;/^\n*$/{$d;N;ba;}'` or `sed ':a;/^\n*$/N;/\n$/ba'` * delete the last line of each paragraph `sed -n '/^$/{p;h;};/./{x;/./p;}'` ::: ## man sed ``` SED(1) User Commands SED(1) NAME sed - stream editor for filtering and transforming text SYNOPSIS sed [OPTION]... {script-only-if-no-other-script} [input-file]... DESCRIPTION Sed is a stream editor. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline). While in some ways similar to an editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is conse‐ quently more efficient. But it is sed's ability to filter text in a pipeline which particularly distinguishes it from other types of editors. -n, --quiet, --silent suppress automatic printing of pattern space --debug annotate program execution -e script, --expression=script add the script to the commands to be executed -f script-file, --file=script-file add the contents of script-file to the commands to be executed --follow-symlinks follow symlinks when processing in place -i[SUFFIX], --in-place[=SUFFIX] edit files in place (makes backup if SUFFIX supplied) -l N, --line-length=N specify the desired line-wrap length for the `l' command --posix disable all GNU extensions. -E, -r, --regexp-extended use extended regular expressions in the script (for portability use POSIX -E). -s, --separate consider files as separate rather than as a single, continuous long stream. --sandbox operate in sandbox mode (disable e/r/w commands). -u, --unbuffered load minimal amounts of data from the input files and flush the output buffers more often -z, --null-data separate lines by NUL characters --help display this help and exit --version output version information and exit If no -e, --expression, -f, or --file option is given, then the first non-option argument is taken as the sed script to interpret. All remaining arguments are names of input files; if no input files are specified, then the standard input is read. GNU sed home page: <https://www.gnu.org/software/sed/>. General help using GNU software: <https://www.gnu.org/gethelp/>. E-mail bug reports to: <bug-sed@gnu.org>. COMMAND SYNOPSIS This is just a brief synopsis of sed commands to serve as a reminder to those who already know sed; other documentation (such as the texinfo document) must be consulted for fuller descriptions. Zero-address ``commands'' : label Label for b and t commands. #comment The comment extends until the next newline (or the end of a -e script fragment). } The closing bracket of a { } block. Zero- or One- address commands = Print the current line number. a \ text Append text, which has each embedded newline preceded by a backslash. i \ text Insert text, which has each embedded newline preceded by a backslash. q [exit-code] Immediately quit the sed script without processing any more input, except that if auto-print is not disabled the current pattern space will be printed. The exit code argument is a GNU extension. Q [exit-code] Immediately quit the sed script without processing any more input. This is a GNU extension. r filename Append text read from filename. R filename Append a line read from filename. Each invocation of the command reads a line from the file. This is a GNU extension. Commands which accept address ranges { Begin a block of commands (end with a }). b label Branch to label; if label is omitted, branch to end of script. c \ text Replace the selected lines with text, which has each embedded newline preceded by a back‐ slash. d Delete pattern space. Start next cycle. D If pattern space contains no newline, start a normal new cycle as if the d command was is‐ sued. Otherwise, delete text in the pattern space up to the first newline, and restart cy‐ cle with the resultant pattern space, without reading a new line of input. h H Copy/append pattern space to hold space. g G Copy/append hold space to pattern space. l List out the current line in a ``visually unambiguous'' form. l width List out the current line in a ``visually unambiguous'' form, breaking it at width charac‐ ters. This is a GNU extension. n N Read/append the next line of input into the pattern space. p Print the current pattern space. P Print up to the first embedded newline of the current pattern space. s/regexp/replacement/ Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp. t label If a s/// has done a successful substitution since the last input line was read and since the last t or T command, then branch to label; if label is omitted, branch to end of script. T label If no s/// has done a successful substitution since the last input line was read and since the last t or T command, then branch to label; if label is omitted, branch to end of script. This is a GNU extension. w filename Write the current pattern space to filename. W filename Write the first line of the current pattern space to filename. This is a GNU extension. x Exchange the contents of the hold and pattern spaces. y/source/dest/ Transliterate the characters in the pattern space which appear in source to the correspond‐ ing character in dest. Addresses Sed commands can be given with no addresses, in which case the command will be executed for all in‐ put lines; with one address, in which case the command will only be executed for input lines which match that address; or with two addresses, in which case the command will be executed for all input lines which match the inclusive range of lines starting from the first address and continuing to the second address. Three things to note about address ranges: the syntax is addr1,addr2 (i.e., the addresses are separated by a comma); the line which addr1 matched will always be accepted, even if addr2 selects an earlier line; and if addr2 is a regexp, it will not be tested against the line that addr1 matched. After the address (or address-range), and before the command, a ! may be inserted, which specifies that the command shall only be executed if the address (or address-range) does not match. The following address types are supported: number Match only the specified line number (which increments cumulatively across files, unless the -s option is specified on the command line). first~step Match every step'th line starting with line first. For example, ``sed -n 1~2p'' will print all the odd-numbered lines in the input stream, and the address 2~5 will match every fifth line, starting with the second. first can be zero; in this case, sed operates as if it were equal to step. (This is an extension.) $ Match the last line. /regexp/ Match lines matching the regular expression regexp. Matching is performed on the current pattern space, which can be modified with commands such as ``s///''. \cregexpc Match lines matching the regular expression regexp. The c may be any character. GNU sed also supports some special 2-address forms: 0,addr2 Start out in "matched first address" state, until addr2 is found. This is similar to 1,addr2, except that if addr2 matches the very first line of input the 0,addr2 form will be at the end of its range, whereas the 1,addr2 form will still be at the beginning of its range. This works only when addr2 is a regular expression. addr1,+N Will match addr1 and the N lines following addr1. addr1,~N Will match addr1 and the lines following addr1 until the next line whose input line number is a multiple of N. REGULAR EXPRESSIONS POSIX.2 BREs should be supported, but they aren't completely because of performance problems. The \n sequence in a regular expression matches the newline character, and similarly for \a, \t, and other sequences. The -E option switches to using extended regular expressions instead; it has been supported for years by GNU sed, and is now included in POSIX. BUGS E-mail bug reports to bug-sed@gnu.org. Also, please include the output of ``sed --version'' in the body of your report if at all possible. AUTHOR Written by Jay Fenlason, Tom Lord, Ken Pizzini, Paolo Bonzini, Jim Meyering, and Assaf Gordon. This sed program was built with SELinux support. SELinux is enabled on this system. GNU sed home page: <https://www.gnu.org/software/sed/>. General help using GNU software: <https://www.gnu.org/gethelp/>. E-mail bug reports to: <bug-sed@gnu.org>. COPYRIGHT Copyright © 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. SEE ALSO awk(1), ed(1), grep(1), tr(1), perlre(1), sed.info, any of various books on sed, the sed FAQ (http://sed.sf.net/grabbag/tutorials/sedfaq.txt), http://sed.sf.net/grabbag/. The full documentation for sed is maintained as a Texinfo manual. If the info and sed programs are properly installed at your site, the command info sed should give you access to the complete manual. sed 4.8 January 2020 SED(1) ```