# sed programming (reversing, 14 teams solved) ###### tags: `SECCON CTF 2021` ## Overview The main theme of this challenge is "Substitution is Turing-complete". ``` $ echo "SECCON{dummy}" | sed -f checker.sed WRONG ``` checker.sed: ``` #!/bin/sed -f # Check flag format # Some characters are used internally /^SECCON{[02-9A-HJ-Z_a-km-z]*}$/!{ cINVALID FORMAT b } :t s/1Illl11IlIl1/1IlIl11Illl1/;tt s/1Illl11III1/1III11Illl1/;tt s/1Ill11IlIl1/1IlIl11Ill1/;tt s/1Illl11l1/1l11Illl1/;tt : s/o/1IlIl11IIll11IIll11IlIl11IIll11IIll11IIll11IIll1/;tt s/O/1IlIl11IIll11IlIl11IlIl11IIll11IIll11IIll11IIll1/;tt s/j/1IlIl11IIll11IIll11IlIl11IIll11IlIl11IIll11IlIl1/;tt s/^/1IIll11IIll11IlIl11IIll11IIlI11l1/;tt ``` ## sed We often use `sed` command like this: ``` $ echo "SECCON CTF 2020" | sed -e s/2020/2021/ SECCON CTF 2021 ``` Actually, `sed` has many more features. See [info sed](https://www.gnu.org/software/sed/manual/sed.html). However, checker.sed mainly uses only two features. - `s` ubstitute. You may already know. - `t` est. Jump if a substitution is done. test.sed: ``` s/seccon/SECCON/ t label_123 s/substitution/substitution is NOT done/ t # without label jump to the end :label_123 s/substitution/substitution is done/ ``` ``` $ echo "seccon substitution" | sed -f test.sed SECCON substitution is done $ echo "secc0n substitution" | sed -f test.sed secc0n substitution is NOT done ``` There is only one label `t`. What checker.sed do is very simple. Find a substitution rule that matches the current string, replace it and move to the begin or end. This is a [Markov algorithm](https://en.wikipedia.org/wiki/Markov_algorithm). ## Solution The command `p` prints the current string. It is useful for reversing. Inserting a line of `p` after a line of `t:`, we can see the string for each substitution. First, let us tackle `1II11IIll11I...` (ones, large `I`s and small `l`s). ![](https://i.imgur.com/P5SU2uf.png) Since all of them are in the form of `1[Il]+1`, replace them to single characters. Note that if we use a character `x`, we must remove a rule `s/x/1IlIl11I...` to avoid conflicts. solve1&#46;py: ```python= R = [ ("1IlI1" , "#"), ("1I1" , "-"), ("1IIlI1", "<"), ("1l1" , ">"), ("1IllI1", "@"), ("1III1" , "a"), ("1IIII1", "b"), ("1IIIl1", "f"), ("1II1" , "t"), ("1Ill1" , "x"), ("1Illl1", "y"), ("1IlIl1", "0"), ("1IIll1", "1"), ] for line in open("checker.sed"): line = line[:-1] remove = False for _k, v in R: if line.startswith(f"s/{v}/"): remove = True if remove: line = "#"+line for k, v in R: line = line.replace(k, v) print(line) if line==":t": print("p") ``` ``` $ python3 solve1.py > checker2.sed ``` checker2.sed: ``` #!/bin/sed -f # Check flag format # Some characters are used internally /^SECCON{[02-9A-HJ-Z_a-km-z]*}$/!{ cINVALID FORMAT b } :t p s/y0/0y/;tt s/ya/ay/;tt s/x0/0x/;tt s/y>/>y/;tt s/xb/bx/;tt s/xa/ax/;tt s/x1/1x/;tt s/y1/1y/;tt s/yb/by/;tt s/x>/>x/;tt s/Gt/GRt/;tt s/t1111000t1/t1111001t/;tt s/t10010t0/f/;tt ``` The replacements are selected in my knowledge as the author. However, you can guess `0` and `1` from the following rules. They are binary representations of the ASCII codes. ``` : s/v/01110110/;tt s/d/01100100/;tt s/R/01010010/;tt s/3/00110011/;tt : ``` See the following rules. ``` s/Gt/GRt/;tt s/Wf/WRf/;tt s/Ot/ONt/;tt s/Of/ONf/;tt s/Nf/NG/;t s/Rf/ROf/;tt s/Tt/TS/;t s/Rt/RAt/;tt s/Ct/COt/;tt s/Nt/NGt/;tt s/At/ATt/;tt s/f/Wf/;tt s/t/Ct/;tt ``` They generate `WRONG` from `f` and `CONGRATS` from `t` and exit. We can see `f` means that the input is wrong and `t` means the input is correct. Most of the rules are the forms of `s/t bin(n) t[01]/t bin(n+1) t/` or the forms of `s/t bin(n) t[01]/f/` where `bin(n)` is the binary representation of `n`. ``` s/t0t1/t1t/;tt s/t0t0/f/;tt s/t1t1/t10t/;tt s/t1t0/f/;tt s/t10t0/t11t/;tt s/t10t1/f/;tt s/t11t1/t100t/;tt s/t11t0/f/;tt : ``` They check that the encrypted input is `1101...`. Let us extract the encrypted input. solve2&#46;py: ```python= import re r = re.compile(r"^s/t([01]+)t([01])/t[01]+t/;tt$") target = {} for line in open("checker2.sed"): line = line[:-1] m = r.match(line) if m: target[int(m.group(1), 2)] = m.group(2) print("target:", "".join(target[k] for k in range(len(target)))) ``` ``` $ python3 solve2.py target: 110101101111111011101100110001110111101011110100101101011011110011000101001001011111000101001011100000100101001111001111111011100100001001011100110010000011111101110110110100000111000010101101 ``` These rules simply convert the input to binary representation. ``` : s/v/01110110/;tt s/d/01100100/;tt s/R/01010010/;tt s/3/00110011/;tt : ``` The rest is the main encryption process. ``` s/y0/0y/;tt s/ya/ay/;tt s/x0/0x/;tt s/y>/>y/;tt s/xb/bx/;tt s/xa/ax/;tt s/x1/1x/;tt s/y1/1y/;tt s/yb/by/;tt s/x>/>x/;tt s/01-/00/;tt s/#a/0#/;tt s/f>/f/;tt s/t11000000t>/t/;tt s/y/b/;tt s/>0/0>/;tt s/f1/f/;tt s/#b/1#/;tt s/>1/1>/;tt s/x/a/;tt s/@11>/>x/;tt s/@01>/>y/;tt s/@011/@x11/;tt s/@00>/>x/;tt s/@010/@y10/;tt s/@100/@y00/;tt s/@111/@y11/;tt s/@10>/>y/;tt s/<>/<#/;tt s/t11000000t/f/;tt s/#/>/;tt s/f0/f/;tt s/@001/@y01/;tt s/0-/-1/;tt s/@110/@x10/;tt s/@101/@x01/;tt s/@000/@x00/;tt s/11-/10/;tt s/1-//;tt s/0</0-<@0/;tt s/1</1-<@0/;tt s/</t0t/;tt s/^/1101<>/;tt ``` In Markov algorithms *cursors* play an import role. Cursors move from left to right or from right to left and do something. For example, in the following rules, `!` is the cursor and change `a` to `x` and `b` to `y`. ``` s/!a/x!/;tt s/!b/y!/;tt ``` ``` !abaa x!baa xy!aa xyx!a xyxx! ``` In the following, I will show how the input is substituted along with the applied rules. ### Step 1 Replace characters of the input with their binary representations. ``` s/u/01110101/;tt s/6/00110110/;tt s/_/01011111/;tt s/p/01110000/;tt : ``` ``` FLAG 01000110LAG 0100011001001100AG 0100011001001100A01000111 01000110010011000100000101000111 ``` ### Step 2 Add `1101<>`. ``` s/^/1101<>/;tt ``` ``` 01000110010011000100000101000111 1101<>01000110010011000100000101000111 ``` ### Step 3 Move `>` to the end. ``` s/>0/0>/;tt s/>1/1>/;tt ``` ``` 1101<>01000110010011000100000101000111 1101<0>1000110010011000100000101000111 1101<01>000110010011000100000101000111 : 1101<0100011001001100010000010100011>1 1101<01000110010011000100000101000111> ``` ### Step 4 Pop up `-` and `@0` if there is `0` or `1` to the left of `<`, otherwise go to step 12. ``` s/0</0-<@0/;tt s/1</1-<@0/;tt ``` ``` 1101<01000110010011000100000101000111> 1101-<@001000110010011000100000101000111> ``` ### Step 5 Decrement the number to the left of `<`. ``` s/01-/00/;tt s/11-/10/;tt s/0-/-1/;tt s/1-//;tt ``` ``` 1101-<@001000110010011000100000101000111> 1100<@001000110010011000100000101000111> ``` In the next loop, this step is as follows. ``` 1100-<@... 110-1<@... 11-11<@... 1011<@... ``` Note that rules of this step are interlaced with rules of the following steps and substitutions are done simultaneously. ### Step 6 Generate `x` if the number of `1`s in the three characters to the right of `@` is even, otherwise generate `y`. ``` s/@000/@x00/;tt s/@001/@y01/;tt s/@010/@y10/;tt s/@011/@x11/;tt s/@100/@y00/;tt s/@101/@x01/;tt s/@110/@x10/;tt s/@111/@y11/;tt s/@00>/>x/;tt s/@01>/>y/;tt s/@10>/>y/;tt s/@11>/>x/;tt ``` ``` 1100<@001000110010011000100000101000111> 1100<@y01000110010011000100000101000111> ``` ### Step 7 Move `x` and `y` to the end. ``` s/x0/0x/;tt s/y0/0y/;tt s/x1/1x/;tt s/y1/1y/;tt s/x>/>x/;tt s/y>/>y/;tt s/xa/ax/;tt s/ya/ay/;tt s/xb/bx/;tt s/yb/by/;tt ``` ``` 1100<@y01000110010011000100000101000111> 1100<@0y1000110010011000100000101000111> 1100<@01y000110010011000100000101000111> 1100<@010y00110010011000100000101000111> : 1100<@01000110010011000100000101000111y> 1100<@01000110010011000100000101000111>y ``` ### Step 8 Change `x` to `a` and `y` to `b`. Go to Step 6 if `0` or `1` remain. ``` s/x/a/;tt s/y/b/;tt ``` ``` 1100<@01000110010011000100000101000111>y 1100<@01000110010011000100000101000111>b ``` ### Step 9 Change `<>` to `<#`. ``` s/<>/<#/;tt ``` ``` 1100<>bbbabaabbbbbaababbbaaabbabbababa 1100<#bbbabaabbbbbaababbbaaabbabbababa ``` ### Step 10 Move `#` to the end with changing `a` to `0` and `b` to `1`. ``` s/#a/0#/;tt s/#b/1#/;tt ``` ``` 1100<#bbbabaabbbbbaababbbaaabbabbababa 1100<1#bbabaabbbbbaababbbaaabbabbababa 1100<11#babaabbbbbaababbbaaabbabbababa 1100<111#abaabbbbbaababbbaaabbabbababa : 1100<1110100111110010111000110110101#a 1100<11101001111100101110001101101010# ``` ### Step 11 Change `#` to `>`. Go to Step 4. ``` s/#/>/;tt ``` ``` 1100<11101001111100101110001101101010# 1100<11101001111100101110001101101010> ``` In Step 4 to Step 11, this algorithm computes the next generation of a [1-dimensional cellular automaton](https://en.wikipedia.org/wiki/Elementary_cellular_automaton) of rule 150. Implementing 1-dimensional cellular automatons on some system prove that the system is Turing-complete. Although I want to implement the [Rule 110](https://en.wikipedia.org/wiki/Rule_110), it is too hard to solve. ### Step 12 Change `<` to `t0t`, check whether the current string is the target string or not and show `WRONG` or `CONGRATS` as explained above. ``` : s/</t0t/;tt ``` ``` <11101001001000000100011000111010> t0t11101001001000000100011000111010> t1t1101001001000000100011000111010> t10t101001001000000100011000111010> f01001001000000100011000111010> f1001001000000100011000111010> : f0> f> f Wf WRf WROf WRONf WRONG ``` ---- What we have to do is to get the initial state of the cellular automaton from the state of 13th (0b1101) generation. Let $x_1$, $x_2$, $x_3$, ... be the current state and $y_1$, $y_2$, $y_3$, .. be the next state. The value $x_{i+1}$ is uniquely determined from $x_{i-1}$, $x_i$ and $y_i$. We can compute the first values of the state since the flag format is `SECCON{...}`. By computing the first part of 13th generation and bringing back, we can get the flag. solve3&#46;py: ```python= N = 13 target = "110101101111111011101100110001110111101011110100101101011011110011000101001001011111000101001011100000100101001111001111111011100100001001011100110010000011111101110110110100000111000010101101" n = len(target) flag = "SECCON{"+"X"*(n//8-8)+"}" flag = "".join(bin(ord(f)+0x100).replace("0b1", "") for f in flag) T = [list(flag)] for i in range(N): t = ["0"]+T[i]+["0"] T += [[]] for j in range(n): T[i+1] += [str(t[j:j+3].count("1")%2)] T[N] = list(target) for i in range(N)[::-1]: for j in range(1, n-1): T[i][j+1] = str((T[i][j-1:j+1]+[T[i+1][j]]).count("1")%2) flag = "".join(T[0]) flag = "".join(chr(int("".join(flag[i:i+8]), 2)) for i in range(0, n, 8)) print("flag:", flag) ``` ``` $ python3 solve3.py flag: SECCON{mARkOV_4Lg0Ri7hM} $ echo "SECCON{mARkOV_4Lg0Ri7hM}" | sed -f checker.sed CONGRATS ``` `SECCON{mARkOV_4Lg0Ri7hM}`