Try   HackMD

UIUCTF 2020 - Bot Protection IV

tags: machine learning

whysw@PLUS

Attachments

Attachments are uploaded on gist and google drive.

Challenge

When on website: +1 spam resistance +10 user annoyance

Gotta be fast! 500 in 10 minutes!

https://captcha.chal.uiuc.tf

Author: tow_nater

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

As you can see in the comments in index.html, there is captcha.zip file in https://captcha.chal.uiuc.tf/captchas.zip.

<!--TODO: we don't need /captchas.zip anymore now that we dynamically create captchas. We should delete this file.-->

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

There were 69696 PNG files, with True answer of captcha.


Additionally, these strange characters are Minecraft Enchantment Table Language. ttf file was in https://captcha.chal.uiuc.tf/static/mc.ttf.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

It is just one-to-one correspondence with the alphabet, so after doing captcha for about an hour, I became possible to distinguish and type these characters in ~5 seconds. (which is not enough to get FLAG!)

Solution

Machine Learning?

I and my teammates tried hard to find other WEB vulnerabilities, but failed.
So we thought that this challenge might be about machine learning?(even though this chall is in web category) Then, captchas.zip must be dataset for machine learning.

There are 5 characters at once, so I searched Github for Tensorflow code for OCR on more than 2 characters.

https://github.com/JackonYang/captcha-tensorflow

And here it is!


Adapt github code to this challenge

Change Variable

That original code in github is about solving captcha for 4 digits.
We are dealing with 5 (alphabet) characters, so changed like below.

Previous:

H, W, C = 100, 120, 3 N_LABELS = 10 D = 4

Changed to:

>H, W, C = 75, 250, 3 N_LABELS = 26 D = 5

Increase Accuracy

At first, we used exactly same layer setting with that code, but that fails at least once in 10 trials.

input_layer = tf.keras.Input(shape=(H, W, C)) x = layers.Conv2D(32, 3, activation='relu')(input_layer) x = layers.MaxPooling2D((2, 2))(x) x = layers.Conv2D(64, 3, activation='relu')(x) x = layers.MaxPooling2D((2, 2))(x) x = layers.Conv2D(64, 3, activation='relu')(x) x = layers.MaxPooling2D((2, 2))(x) x = layers.Flatten()(x) x = layers.Dense(1024, activation='relu')(x) # x = layers.Dropout(0.5)(x) x = layers.Dense(D * N_LABELS, activation='softmax')(x) x = layers.Reshape((D, N_LABELS))(x)

Improving it, we removed one layer,

x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)

but it made the situation worse

So we added one more layer from the first one!

input_layer = tf.keras.Input(shape=(H, W, C))
x = layers.Conv2D(32, 3, activation='relu')(input_layer)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)

The output was awesome. We rarely failed! But this is not the end.


Retrying + Human Learning

We don't have penalty even when we fails. This means we can try again just after we fails. We were able to sort answers by possibility. (because we used softmax)

im = Image.open(BytesIO(base64.b64decode(data))) data = np.array([np.array(np.array(np.array((np.array(im) / 255.0))))]) y_pred = model.predict_on_batch(data) res = tf.math.top_k(y_pred, k=3) prob = np.array(res[0][0]) indices = np.array(res.indices[0]) l = [] beta = prob[0].size beka = prob.size // beta for i in range(beka): k = [] for j in range(beta): k.append([indices[i][j], prob[i][j]]) l.append(k) wasm = list(product(*l)) def f(x): s = 0 for i in x: s += i[1] return s res = sorted(wasm, key=f, reverse=True)

This challenge uses session cookie for counting 15 minutes. It means we can open multiple windows with same cookie.
So we opened another window and used it in emergency situation.

def send(res): for arr in res[:30]: trial = "" for pair in arr: trial += toCh(pair[0]) r = s.post("https://captcha.chal.uiuc.tf/", data = {"captcha":trial}) ret = r.text.split('<h2>')[1].split('</h2>')[0] print(ret) if ret != "Invalid captcha": return True return False while True: im, res = solve_captcha(get_img()) if not send(res): input("ALEEEEEEEEEEEEEEEEEEEEEERT!!!!!!!!!!!!!")

When it eventually fails after 30 tries, we manually type the answer, and press enter in python in order to continue.

AND WE GOT

output : uiuctf{i_knew_a_guy_in_highschool_that_could_read_this}

p.s. Now I can read this too! haha - whysw