# s/<script>//gi (misc, 115 teams solved)
###### tags: `SECCON CTF 2021`
## Overview
> Can you figure out why `s/<script>//gi` is insufficient for sanitizing?
> This can be bypassed with `<scr<script>ipt>`.
>
> Remove `<script>` (case insensitive) from the input until the input contains no `<script>`.
In this problem, we are required to speed up collect sanitizing since the size of flag.txt is 64 MiB.
## Bad approach
We can do that with `sed` or favorite programming languages.
```
$ echo 'S3CC0N{dum<scr<script>ipt>my}' | sed -e ':loop; s/<script>//gi; tloop'
S3CC0N{dummy}
$ cat small.txt | sed -e ':loop; s/<script>//gi; tloop'
S3CC0N{dummy_flag>_<_pt>>PT><sCRIp<scr<scr<scr!pt>ipt>ipt>}
```
However, for flag.txt, this approach takes very long time and will not finish before the end of the CTF.
```
$ cat flag.txt | sed -e ':loop; s/<script>//gi; tloop'
```
Let $n$ be the size of input.
Removing `<script>` once takes $O(n)$.
We have to remove `<script>` at most $O(n)$ times.
So, overall execution time is $O(n^2)$, which is too long for 64 MiB.
## Good approach
To reduce the time complexity, we can use a [stack](https://en.wikipedia.org/wiki/Stack_\(abstract_data_type\)).
Push each character of the input and if the top of the stack is `<script>`, pop it.
What is left in the stack will be the answer.
Since push and pop take $O(1)$ time, overall execution time is $O(n)$.
Note that `stack` must be a `list`, not a `string`.
If `stack` is `string`, a single instruction `stack = stack[-8:]` takes $O(n)$ time.
```python=
import sys
with open(sys.argv[1]) as f:
data = f.read()[:-1]
stack = []
for i, c in enumerate(data):
if i%0x10000==0:
print(f"{i/len(data)*100:6.2f} %")
stack += [c]
if "".join(stack[-8:]).lower()=="<script>":
for i in range(8):
stack.pop()
print("".join(stack))
```
```
$ time python3 solve.py flag.txt
0.00 %
0.10 %
0.20 %
0.29 %
:
99.71 %
99.80 %
99.90 %
100.00 %
SECCON{sanitizing_is_not_so_good><_escaping_is_better_iPt><SCript<ScrIpT<scRIp<scRI<Sc<scr!pt>}
real 0m27.691s
user 0m27.424s
sys 0m0.140s
```