# sed programming (reversing, 14 teams solved)
###### tags: `SECCON CTF 2021`
## Overview
The main theme of this challenge is "Substitution is Turing-complete".
```
$ echo "SECCON{dummy}" | sed -f checker.sed
WRONG
```
checker.sed:
```
#!/bin/sed -f
# Check flag format
# Some characters are used internally
/^SECCON{[02-9A-HJ-Z_a-km-z]*}$/!{
cINVALID FORMAT
b
}
:t
s/1Illl11IlIl1/1IlIl11Illl1/;tt
s/1Illl11III1/1III11Illl1/;tt
s/1Ill11IlIl1/1IlIl11Ill1/;tt
s/1Illl11l1/1l11Illl1/;tt
:
s/o/1IlIl11IIll11IIll11IlIl11IIll11IIll11IIll11IIll1/;tt
s/O/1IlIl11IIll11IlIl11IlIl11IIll11IIll11IIll11IIll1/;tt
s/j/1IlIl11IIll11IIll11IlIl11IIll11IlIl11IIll11IlIl1/;tt
s/^/1IIll11IIll11IlIl11IIll11IIlI11l1/;tt
```
## sed
We often use `sed` command like this:
```
$ echo "SECCON CTF 2020" | sed -e s/2020/2021/
SECCON CTF 2021
```
Actually, `sed` has many more features.
See [info sed](https://www.gnu.org/software/sed/manual/sed.html).
However, checker.sed mainly uses only two features.
- `s` ubstitute. You may already know.
- `t` est. Jump if a substitution is done.
test.sed:
```
s/seccon/SECCON/
t label_123
s/substitution/substitution is NOT done/
t # without label jump to the end
:label_123
s/substitution/substitution is done/
```
```
$ echo "seccon substitution" | sed -f test.sed
SECCON substitution is done
$ echo "secc0n substitution" | sed -f test.sed
secc0n substitution is NOT done
```
There is only one label `t`.
What checker.sed do is very simple.
Find a substitution rule that matches the current string, replace it and move to the begin or end.
This is a [Markov algorithm](https://en.wikipedia.org/wiki/Markov_algorithm).
## Solution
The command `p` prints the current string.
It is useful for reversing.
Inserting a line of `p` after a line of `t:`, we can see the string for each substitution.
First, let us tackle `1II11IIll11I...` (ones, large `I`s and small `l`s).
![](https://i.imgur.com/P5SU2uf.png)
Since all of them are in the form of `1[Il]+1`, replace them to single characters.
Note that if we use a character `x`, we must remove a rule `s/x/1IlIl11I...` to avoid conflicts.
solve1.py:
```python=
R = [
("1IlI1" , "#"),
("1I1" , "-"),
("1IIlI1", "<"),
("1l1" , ">"),
("1IllI1", "@"),
("1III1" , "a"),
("1IIII1", "b"),
("1IIIl1", "f"),
("1II1" , "t"),
("1Ill1" , "x"),
("1Illl1", "y"),
("1IlIl1", "0"),
("1IIll1", "1"),
]
for line in open("checker.sed"):
line = line[:-1]
remove = False
for _k, v in R:
if line.startswith(f"s/{v}/"):
remove = True
if remove:
line = "#"+line
for k, v in R:
line = line.replace(k, v)
print(line)
if line==":t":
print("p")
```
```
$ python3 solve1.py > checker2.sed
```
checker2.sed:
```
#!/bin/sed -f
# Check flag format
# Some characters are used internally
/^SECCON{[02-9A-HJ-Z_a-km-z]*}$/!{
cINVALID FORMAT
b
}
:t
p
s/y0/0y/;tt
s/ya/ay/;tt
s/x0/0x/;tt
s/y>/>y/;tt
s/xb/bx/;tt
s/xa/ax/;tt
s/x1/1x/;tt
s/y1/1y/;tt
s/yb/by/;tt
s/x>/>x/;tt
s/Gt/GRt/;tt
s/t1111000t1/t1111001t/;tt
s/t10010t0/f/;tt
```
The replacements are selected in my knowledge as the author.
However, you can guess `0` and `1` from the following rules.
They are binary representations of the ASCII codes.
```
:
s/v/01110110/;tt
s/d/01100100/;tt
s/R/01010010/;tt
s/3/00110011/;tt
:
```
See the following rules.
```
s/Gt/GRt/;tt
s/Wf/WRf/;tt
s/Ot/ONt/;tt
s/Of/ONf/;tt
s/Nf/NG/;t
s/Rf/ROf/;tt
s/Tt/TS/;t
s/Rt/RAt/;tt
s/Ct/COt/;tt
s/Nt/NGt/;tt
s/At/ATt/;tt
s/f/Wf/;tt
s/t/Ct/;tt
```
They generate `WRONG` from `f` and `CONGRATS` from `t` and exit.
We can see `f` means that the input is wrong and `t` means the input is correct.
Most of the rules are the forms of `s/t bin(n) t[01]/t bin(n+1) t/` or the forms of `s/t bin(n) t[01]/f/` where `bin(n)` is the binary representation of `n`.
```
s/t0t1/t1t/;tt
s/t0t0/f/;tt
s/t1t1/t10t/;tt
s/t1t0/f/;tt
s/t10t0/t11t/;tt
s/t10t1/f/;tt
s/t11t1/t100t/;tt
s/t11t0/f/;tt
:
```
They check that the encrypted input is `1101...`.
Let us extract the encrypted input.
solve2.py:
```python=
import re
r = re.compile(r"^s/t([01]+)t([01])/t[01]+t/;tt$")
target = {}
for line in open("checker2.sed"):
line = line[:-1]
m = r.match(line)
if m:
target[int(m.group(1), 2)] = m.group(2)
print("target:", "".join(target[k] for k in range(len(target))))
```
```
$ python3 solve2.py
target: 110101101111111011101100110001110111101011110100101101011011110011000101001001011111000101001011100000100101001111001111111011100100001001011100110010000011111101110110110100000111000010101101
```
These rules simply convert the input to binary representation.
```
:
s/v/01110110/;tt
s/d/01100100/;tt
s/R/01010010/;tt
s/3/00110011/;tt
:
```
The rest is the main encryption process.
```
s/y0/0y/;tt
s/ya/ay/;tt
s/x0/0x/;tt
s/y>/>y/;tt
s/xb/bx/;tt
s/xa/ax/;tt
s/x1/1x/;tt
s/y1/1y/;tt
s/yb/by/;tt
s/x>/>x/;tt
s/01-/00/;tt
s/#a/0#/;tt
s/f>/f/;tt
s/t11000000t>/t/;tt
s/y/b/;tt
s/>0/0>/;tt
s/f1/f/;tt
s/#b/1#/;tt
s/>1/1>/;tt
s/x/a/;tt
s/@11>/>x/;tt
s/@01>/>y/;tt
s/@011/@x11/;tt
s/@00>/>x/;tt
s/@010/@y10/;tt
s/@100/@y00/;tt
s/@111/@y11/;tt
s/@10>/>y/;tt
s/<>/<#/;tt
s/t11000000t/f/;tt
s/#/>/;tt
s/f0/f/;tt
s/@001/@y01/;tt
s/0-/-1/;tt
s/@110/@x10/;tt
s/@101/@x01/;tt
s/@000/@x00/;tt
s/11-/10/;tt
s/1-//;tt
s/0</0-<@0/;tt
s/1</1-<@0/;tt
s/</t0t/;tt
s/^/1101<>/;tt
```
In Markov algorithms *cursors* play an import role.
Cursors move from left to right or from right to left and do something.
For example, in the following rules, `!` is the cursor and change `a` to `x` and `b` to `y`.
```
s/!a/x!/;tt
s/!b/y!/;tt
```
```
!abaa
x!baa
xy!aa
xyx!a
xyxx!
```
In the following, I will show how the input is substituted along with the applied rules.
### Step 1
Replace characters of the input with their binary representations.
```
s/u/01110101/;tt
s/6/00110110/;tt
s/_/01011111/;tt
s/p/01110000/;tt
:
```
```
FLAG
01000110LAG
0100011001001100AG
0100011001001100A01000111
01000110010011000100000101000111
```
### Step 2
Add `1101<>`.
```
s/^/1101<>/;tt
```
```
01000110010011000100000101000111
1101<>01000110010011000100000101000111
```
### Step 3
Move `>` to the end.
```
s/>0/0>/;tt
s/>1/1>/;tt
```
```
1101<>01000110010011000100000101000111
1101<0>1000110010011000100000101000111
1101<01>000110010011000100000101000111
:
1101<0100011001001100010000010100011>1
1101<01000110010011000100000101000111>
```
### Step 4
Pop up `-` and `@0` if there is `0` or `1` to the left of `<`, otherwise go to step 12.
```
s/0</0-<@0/;tt
s/1</1-<@0/;tt
```
```
1101<01000110010011000100000101000111>
1101-<@001000110010011000100000101000111>
```
### Step 5
Decrement the number to the left of `<`.
```
s/01-/00/;tt
s/11-/10/;tt
s/0-/-1/;tt
s/1-//;tt
```
```
1101-<@001000110010011000100000101000111>
1100<@001000110010011000100000101000111>
```
In the next loop, this step is as follows.
```
1100-<@...
110-1<@...
11-11<@...
1011<@...
```
Note that rules of this step are interlaced with rules of the following steps and substitutions are done simultaneously.
### Step 6
Generate `x` if the number of `1`s in the three characters to the right of `@` is even, otherwise generate `y`.
```
s/@000/@x00/;tt
s/@001/@y01/;tt
s/@010/@y10/;tt
s/@011/@x11/;tt
s/@100/@y00/;tt
s/@101/@x01/;tt
s/@110/@x10/;tt
s/@111/@y11/;tt
s/@00>/>x/;tt
s/@01>/>y/;tt
s/@10>/>y/;tt
s/@11>/>x/;tt
```
```
1100<@001000110010011000100000101000111>
1100<@y01000110010011000100000101000111>
```
### Step 7
Move `x` and `y` to the end.
```
s/x0/0x/;tt
s/y0/0y/;tt
s/x1/1x/;tt
s/y1/1y/;tt
s/x>/>x/;tt
s/y>/>y/;tt
s/xa/ax/;tt
s/ya/ay/;tt
s/xb/bx/;tt
s/yb/by/;tt
```
```
1100<@y01000110010011000100000101000111>
1100<@0y1000110010011000100000101000111>
1100<@01y000110010011000100000101000111>
1100<@010y00110010011000100000101000111>
:
1100<@01000110010011000100000101000111y>
1100<@01000110010011000100000101000111>y
```
### Step 8
Change `x` to `a` and `y` to `b`. Go to Step 6 if `0` or `1` remain.
```
s/x/a/;tt
s/y/b/;tt
```
```
1100<@01000110010011000100000101000111>y
1100<@01000110010011000100000101000111>b
```
### Step 9
Change `<>` to `<#`.
```
s/<>/<#/;tt
```
```
1100<>bbbabaabbbbbaababbbaaabbabbababa
1100<#bbbabaabbbbbaababbbaaabbabbababa
```
### Step 10
Move `#` to the end with changing `a` to `0` and `b` to `1`.
```
s/#a/0#/;tt
s/#b/1#/;tt
```
```
1100<#bbbabaabbbbbaababbbaaabbabbababa
1100<1#bbabaabbbbbaababbbaaabbabbababa
1100<11#babaabbbbbaababbbaaabbabbababa
1100<111#abaabbbbbaababbbaaabbabbababa
:
1100<1110100111110010111000110110101#a
1100<11101001111100101110001101101010#
```
### Step 11
Change `#` to `>`. Go to Step 4.
```
s/#/>/;tt
```
```
1100<11101001111100101110001101101010#
1100<11101001111100101110001101101010>
```
In Step 4 to Step 11, this algorithm computes the next generation of a [1-dimensional cellular automaton](https://en.wikipedia.org/wiki/Elementary_cellular_automaton) of rule 150.
Implementing 1-dimensional cellular automatons on some system prove that the system is Turing-complete.
Although I want to implement the [Rule 110](https://en.wikipedia.org/wiki/Rule_110), it is too hard to solve.
### Step 12
Change `<` to `t0t`, check whether the current string is the target string or not and show `WRONG` or `CONGRATS` as explained above.
```
:
s/</t0t/;tt
```
```
<11101001001000000100011000111010>
t0t11101001001000000100011000111010>
t1t1101001001000000100011000111010>
t10t101001001000000100011000111010>
f01001001000000100011000111010>
f1001001000000100011000111010>
:
f0>
f>
f
Wf
WRf
WROf
WRONf
WRONG
```
----
What we have to do is to get the initial state of the cellular automaton from the state of 13th (0b1101) generation.
Let $x_1$, $x_2$, $x_3$, ... be the current state and $y_1$, $y_2$, $y_3$, .. be the next state.
The value $x_{i+1}$ is uniquely determined from $x_{i-1}$, $x_i$ and $y_i$.
We can compute the first values of the state since the flag format is `SECCON{...}`.
By computing the first part of 13th generation and bringing back, we can get the flag.
solve3.py:
```python=
N = 13
target = "110101101111111011101100110001110111101011110100101101011011110011000101001001011111000101001011100000100101001111001111111011100100001001011100110010000011111101110110110100000111000010101101"
n = len(target)
flag = "SECCON{"+"X"*(n//8-8)+"}"
flag = "".join(bin(ord(f)+0x100).replace("0b1", "") for f in flag)
T = [list(flag)]
for i in range(N):
t = ["0"]+T[i]+["0"]
T += [[]]
for j in range(n):
T[i+1] += [str(t[j:j+3].count("1")%2)]
T[N] = list(target)
for i in range(N)[::-1]:
for j in range(1, n-1):
T[i][j+1] = str((T[i][j-1:j+1]+[T[i+1][j]]).count("1")%2)
flag = "".join(T[0])
flag = "".join(chr(int("".join(flag[i:i+8]), 2)) for i in range(0, n, 8))
print("flag:", flag)
```
```
$ python3 solve3.py
flag: SECCON{mARkOV_4Lg0Ri7hM}
$ echo "SECCON{mARkOV_4Lg0Ri7hM}" | sed -f checker.sed
CONGRATS
```
`SECCON{mARkOV_4Lg0Ri7hM}`