or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
# UIUCTF - Bot Protection IV Writeup
UIUCTF - Bot Protection IV Writeup
by nthistle with DiceGang
Challenge:
We start off by heading to https://captcha.chal.uiuc.tf, where we're presented with a Minecraft enchanting table and some weird text. At this point we can already make the guess that this is going to have to do with CAPTCHAs written in Minecraft Enchanting Language.
Sure enough, we check the source, and see this lovely comment at the top:
Right, then we'll just download /captchas.zip, and– oh. It's 1GB. And they claim they're creating the captchas dynamically. Oh no. At this point, I'm already starting to suspect that they want us to use Machine Learning to crack their CAPTCHAs automatically. The 1GB zip turns out to be full of labelled CAPTCHAs, which looks suspiciously like a training set.
Of course, you should never trust what they tell you in a CTF. It's worth checking to make sure that they are in fact generating CAPTCHAs dynamically, and not just randomly pulling from
captchas.zip
. However, by this point, I was already fairly certain they wanted us to use ML, so a teammate did this check for me instead while I started on my approach (Spoiler: they were, in fact, generating them dynamically).I decided to use PyTorch, simply because it's what I have installed at the moment (although I've also used Keras to similar effect before). The initial work is fairly straightforward, it's just importing the data and transforming it into a format that we can work with:
Once we have this working, we have an important decision to make: the network architecture. Of course, since we're working with image data, we'll naturally use some kind of Convolutional Neural Network, but we need to decide things like the input shape and the specific sizes of layers. Like any experienced machine learning practitioner, I naturally picked something completely random for my architecture.
However, I also made a critical mistake here. That is, I decided to make a single-character classifier, and use a sliding window approach. This is commonly how OCR approaches work, since not all words have a fixed size, you just train it on individual characters, and then "slide" it over your image in order to get an entire sentence.
…but you only need this if you're recognizing variable numbers of characters, or have a large amount of sequential text to recognize. These CAPTCHAs are always 5 characters long, so we could just use a single pass (through a larger neural network) hardcoded to always detect 5 character segments.
Oh well. Hindsight is 20/20.
Anyways, here's the architecture I chose to use:
It basically consists of two 3x3 convolutional layers (of 16 and 32 filters, respectively), then a 2x2 max pooling layer, followed by another 3x3 convolution (with 64 filters), a flattening, and then two fully connected layers, with 64 and 26 neurons each (the latter is our output layer). Like I said, arbitrary. An aside: I personally love LeakyReLU as an activation function (I've been burned by ReLU before), so I use it everywhere. You'll find that many ML practitioners have their own "tricks" of questionable efficacy, and this is mine. Of course, the final layer's activation function is a Softmax (we use
LogSoftmax
because that's what you're supposed to do in PyTorch, I guess).Great, now we have the next hurdle: how do we get training data? "But we have training data," you thought just now. Wrong. We have 5-character CAPTCHA training data, not single-character training data. Sure, so we'll just split each CAPTCHA into– oh, wait. That's right, we don't know where each character begins and the next one starts. We could just split each CAPTCHA into 5 segments of equal size horizontally, but that has… questionable results.
This is where I employ an advanced technique known as "knowing when to not care". Here's the idea: sure, we'll get some bad splits. But we'll also get a good number of good splits. If we train our network on both bad and good splits, it'll struggle to learn the bad ones, but will presumably do well on the good ones. This will deflate training accuracy a little, but won't matter much in the end, because we'll only do inference (test-time prediction) on good images.
This was my second mistake. In principle, my logic was sound, but I forgot that I would be doing the sliding-window approach for inference. This is a problem, because I need the network to have low confidence in between characters for the sliding window to be able to tell where new characters begin. If I'm training on bad data that consists of the "in between characters" region… well, let's just say it made my life harder than it needed to be.
Anyways, armed with our faulty logic, we're ready to add this step to the input processing:
Finally, we're ready for the main training process:
Note that I'm using the term "epoch" very loosely here. Traditionally, in machine learning, one "epoch" is an entire pass over the training dataset. However, I typically work in problems where the dataset space is very large (or can be sampled randomly), where we just use an "epoch" as a count of minibatches to quantize the training how we see fit (also, I have a ridiculously small number of minibatches per epoch here, I just wanted to see nice training statistics print more often).
Other than that, this training process is fairly standard – you can probably find something similar if you google "PyTorch Image Classification Tutorial". Another aside: initially I didn't bother splitting my data into training/testing, because I figured the model was simple enough and I had enough data that overfitting wouldn't really be a problem, but I mentioned what I was doing in our team chat, and someone else in the team who works with machine learning nagged me to make a train/test split. The code for that is fairly straightforward:
Now we're ready to actually train our model, and…
By 30 epochs (which is only
30 * 20 * 64 = 38k
individual images), we're already seeing accuracy around 90%, which is good enoughfor government work. Now, we need to actually implement sliding-window inference.Again, rather than consult any established literature on the topic, I decided to home roll my own sliding window approach, because what could go wrong? Well, in short, a lot. I ended up with a lot of very messy code, but for reference the basic slide-and-get-confidence routine looked like this:
From here I did a variety of things, including de-duplicating
out
(although this doesn't work if the captcha really was, say,DVVXR
) and thresholding the confidence scores. Unfortunately, courtesy of that Mistake #2 I mentioned earlier, the confidence scores were very messy. I was able to get decent results with smoothing out the confidence scores with a 1D Gaussian approximation, and then taking local maxima of that, but it was way messier than it needed to be.The next step was to test against live: for this, I chose to use Selenium. My reasons for choosing Selenium were twofold: (1) I anticipated having to do several manual corrections of the CAPTCHAs when my neural network was wrong, and a raw
requests
solution would be less user-friendly and probably take longer to show the image for manual solution, and (2) I only recently started to use Selenium, and I wanted some practice.This loop continuously scrapes the CAPTCHA
<img>
tag, saving the image locally ascur_captcha.png
, which it can then open and format properly for the neural network to use to make a prediction. Theans == last
tidbit is just to make it so it doesn't continually try to spam the same answer when it's wrong (yes, it will fail if the next CAPTCHA happens to be the same, but I'm willing to accept a (1/(26^5)) risk).At this point, I thought my hard work was about to pay off with a juicy first blood (at this point, Bot Protection had no solves). This was also before any clarification posts were released about Bot Protection, so my team speculated that there were only 30 levels (I think the 10 minute time limit had been guessed by someone else on the team that was experimenting with it). After all, if you translate the text on the CAPTCHA page, you get "Level 0 is not high enough" (incidentally, this is just regular text, using the Minecraft Enchanting Language font, so you can read it by just copying into Notepad), and the maximum level to enchant with in Minecraft is level 30.
I fire up the script, and, woohoo, I hit level 30! …but it keeps going. I stopped around 41 because I was somewhat confused (and tired of doing CAPTCHAs manually). I realized that level 30 would probably actually be too low, since with a good mastery of Enchanting Language, you could probably get to level 30 by hand in 10 minutes (one team member was already hard at work learning it with this Quizlet). So, I try again. And then I hit 100, and… it keeps going. I got to around ~110 or so before the 10 minute timer hit.
Note that it was considerable effort to do this, since my script only had about a 85~90% success rate on the entire CAPTCHA, so I had to do a little over every 10th CAPTCHA by hand. At this point I decided it wasn't worth the effort of doing it by hand with no end in sight, so we used the Modmail to ask how many CAPTCHAs we had to do. Turns out they had received this question already, so they just decided to publicly announce: "Huge note on Bot Protection IV: You need to solve 500 captchas in 10 minutes."
Five.
Hundred.
CAPTCHAs.
To cut to the chase, I realized I needed to significantly up my automatic success rate. Naturally wanting to take the path of least resistance, I decided to "fuzz" my existing results. To do this, I looked at typical inputs that were giving my algorithm trouble. I won't go over every single thing that my "fuzzer" does, but a few examples:
With all this, I was able to amp up my average accuracy to ~95%, and I also played with the timings on Selenium in order to make it as fast as possible. I also rigged it up so that it would print every minute, so I would have a good idea if I was on "WR pace" or not. My approach from here was to just restart the run if I wasn't on CAPTCHA #100, ideally better, by the 2 minute mark (I need to average 50/minute to make 500/10 minutes).
I had one great run where I was at #130 at the 2 minute mark, but then Selenium suddenly died and threw an error. I had set the timing too aggressively, and it complained about not being able to find the input element, since it was looking before the pageload finished (there's probably a way to wait for pageload to finish, but I didn't bother looking for it). I scaled back the timing, sprinkled in some try-catch blocks, and set off again.
In the end, I think I got my "god run" within the first 5 attempts (that weren't restarted very early, anyways). My 1m and 2m splits were slightly above-average, then falling back down to average by 5m, but I got really lucky on the last half, and ended up finishing over a minute ahead of pace. It's also worth noting that I started unintentionally memorizing Enchanting Language as I was doing this, to the point that I could recognize 50% of characters instantly, and another 30% without much trouble, only having to actually consult my pinned table ~20% of the time.
After much effort, we're rewarded with the flag:
uiuctf{i_knew_a_guy_in_highschool_that_could_read_this}
…I became that guy. Overall, this was a fun (if painful) challenge, much thanks to
tow_nater
for writing it!Here's my full source code (warning: it is messy), for some reason I decided to do everything in the same file, hence the flag for whether to train or load the model from the saved file.