# UIUCTF 2020 - Bot Protection IV
>###### tags: `machine learning`
>[name=whysw@PLUS]
## Attachments
- problem
- [index.html](https://gist.github.com/YangSeungWon/cfd13aca223f5ac5cddd44d30998486a#file-index-html)
- [captchas.zip](https://drive.google.com/file/d/1jLqs7HnPI6YmYKjGHYqGKFJXkfuKaC8q/view?usp=sharing)
- [mc.ttf](https://gist.github.com/YangSeungWon/cfd13aca223f5ac5cddd44d30998486a#file-mc-ttf)
- writeup
- [Bot Protection IV.ipynb](https://gist.github.com/YangSeungWon/cfd13aca223f5ac5cddd44d30998486a#file-bot-protection-iv-ipynb)
Attachments are uploaded on [gist](https://gist.github.com/YangSeungWon/cfd13aca223f5ac5cddd44d30998486a) and [google drive](https://drive.google.com/file/d/1jLqs7HnPI6YmYKjGHYqGKFJXkfuKaC8q/view?usp=sharing).
## Challenge
```
When on website: +1 spam resistance +10 user annoyance
Gotta be fast! 500 in 10 minutes!
https://captcha.chal.uiuc.tf
Author: tow_nater
```
![](https://i.imgur.com/4O9Kr62.jpg)
As you can see in the comments in `index.html`, there is `captcha.zip` file in https://captcha.chal.uiuc.tf/captchas.zip.
```html:index.html=2
<!--TODO: we don't need /captchas.zip anymore now that we dynamically create captchas. We should delete this file.-->
```
![](https://i.imgur.com/opr9P1O.png)
There were 69696 PNG files, with True answer of captcha.
---
Additionally, these strange characters are `Minecraft Enchantment Table Language`. ttf file was in https://captcha.chal.uiuc.tf/static/mc.ttf.
![](https://i.imgur.com/aii3KO7.png)
It is just one-to-one correspondence with the alphabet, so after doing captcha for about an hour, I became possible to distinguish and type these characters in ~5 seconds. (which is not enough to get FLAG!)
## Solution
### Machine Learning?
I and my teammates tried hard to find other WEB vulnerabilities, but failed.
So we thought that this challenge might be about machine learning...?(even though this chall is in web category) Then, `captchas.zip` must be dataset for machine learning.
There are 5 characters at once, so I searched Github for Tensorflow code for OCR on more than 2 characters.
https://github.com/JackonYang/captcha-tensorflow
And here it is!
---
### Adapt github code to this challenge
#### Change Variable
That original code in github is about solving captcha for **4 digits**.
We are dealing with 5 (alphabet) characters, so changed like below.
>Previous:
>```=
>H, W, C = 100, 120, 3
>N_LABELS = 10
>D = 4
>```
Changed to:
```=
>H, W, C = 75, 250, 3
N_LABELS = 26
D = 5
```
#### Increase Accuracy
At first, we used exactly same layer setting with that code, but that fails at least once in 10 trials.
```python=
input_layer = tf.keras.Input(shape=(H, W, C))
x = layers.Conv2D(32, 3, activation='relu')(input_layer)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Flatten()(x)
x = layers.Dense(1024, activation='relu')(x)
# x = layers.Dropout(0.5)(x)
x = layers.Dense(D * N_LABELS, activation='softmax')(x)
x = layers.Reshape((D, N_LABELS))(x)
```
Improving it, we removed one layer,
```python
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
```
but it made the situation worse...
So we added one more layer from the first one!
```python
input_layer = tf.keras.Input(shape=(H, W, C))
x = layers.Conv2D(32, 3, activation='relu')(input_layer)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
```
The output was awesome. We rarely failed! But this is not the end.
---
### Retrying + Human Learning
We don't have penalty even when we fails. This means we can try again just after we fails. We were able to sort answers by possibility. (because we used softmax)
```python=
im = Image.open(BytesIO(base64.b64decode(data)))
data = np.array([np.array(np.array(np.array((np.array(im) / 255.0))))])
y_pred = model.predict_on_batch(data)
res = tf.math.top_k(y_pred, k=3)
prob = np.array(res[0][0])
indices = np.array(res.indices[0])
l = []
beta = prob[0].size
beka = prob.size // beta
for i in range(beka):
k = []
for j in range(beta):
k.append([indices[i][j], prob[i][j]])
l.append(k)
wasm = list(product(*l))
def f(x):
s = 0
for i in x:
s += i[1]
return s
res = sorted(wasm, key=f, reverse=True)
```
---
This challenge uses session cookie for counting 15 minutes. It means we can open multiple windows with same cookie.
So we opened another window and used it in emergency situation.
```python=
def send(res):
for arr in res[:30]:
trial = ""
for pair in arr:
trial += toCh(pair[0])
r = s.post("https://captcha.chal.uiuc.tf/", data = {"captcha":trial})
ret = r.text.split('<h2>')[1].split('</h2>')[0]
print(ret)
if ret != "Invalid captcha":
return True
return False
while True:
im, res = solve_captcha(get_img())
if not send(res):
input("ALEEEEEEEEEEEEEEEEEEEEEERT!!!!!!!!!!!!!")
```
When it eventually fails after 30 tries, we manually type the answer, and press enter in python in order to continue.
AND WE GOT...
![](https://i.imgur.com/FoHtYs4.png)
![](https://i.imgur.com/JGbOiCf.jpg)
output : `uiuctf{i_knew_a_guy_in_highschool_that_could_read_this}`
> p.s. Now I can read this too! haha - whysw