owned this note
owned this note
Published
Linked with GitHub
# UIUCTF - Bot Protection IV Writeup
by nthistle with DiceGang
**Challenge:**
```
When on website: +1 spam resistance +10 user annoyance
https://captcha.chal.uiuc.tf
Author: tow_nater
```
![](https://i.imgur.com/fgRckXY.png)
We start off by heading to https://captcha.chal.uiuc.tf, where we're presented with a Minecraft enchanting table and some weird text. At this point we can already make the guess that this is going to have to do with CAPTCHAs written in [Minecraft Enchanting Language](https://minecraft.gamepedia.com/Enchanting_Table#Standard_Galactic_Alphabet).
Sure enough, we check the source, and see this lovely comment at the top:
```
<!doctype html>
<!--TODO: we don't need /captchas.zip anymore now that we dynamically create captchas. We should delete this file.-->
<html>
<title>UIUCTF</title>
<link rel="stylesheet" href="/static/style.css">
<body>
<div class="bg"/>
...
```
Right, then we'll just download [/captchas.zip](https://captcha.chal.uiuc.tf/captchas.zip), and– oh. It's 1GB. And they claim they're creating the captchas dynamically. Oh no. At this point, I'm already starting to suspect that they want us to use [Machine Learning](https://en.wikipedia.org/wiki/Machine_learning) to crack their CAPTCHAs automatically. The 1GB zip turns out to be full of labelled CAPTCHAs, which looks _suspiciously_ like a [training set](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets).
Of course, you should never trust what they tell you in a CTF. It's worth checking to make sure that they are in fact generating CAPTCHAs dynamically, and not just randomly pulling from `captchas.zip`. However, by this point, I was already fairly certain they wanted us to use ML, so a teammate did this check for me instead while I started on my approach (Spoiler: they were, in fact, generating them dynamically).
I decided to use PyTorch, simply because it's what I have installed at the moment (although I've also used Keras to similar effect before). The initial work is fairly straightforward, it's just importing the data and transforming it into a format that we can work with:
```python
import os
source_images = os.listdir("captchas")
random.shuffle(source_images)
from PIL import Image
import numpy as np
LABEL_LOOKUP = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
def show_image(im):
Image.fromarray((255 * im).astype(np.uint8)).show()
def normalize_image(im):
im = im.astype(np.float64)
im = im - im.min()
im = im / im.max()
return im
def get_images(n_images):
images = random.sample(source_images, n_images)
ims = [np.array(Image.open(os.path.join("captchas/", image))) \
for image in images]
labels = [image[:image.find("_")] for image in images]
return ims, labels
def encode_labels(labels):
idxs = [LABEL_LOOKUP.find(label) for label in labels]
return np.array(idxs)
```
Once we have this working, we have an important decision to make: the network architecture. Of course, since we're working with image data, we'll naturally use some kind of [Convolutional Neural Network](https://en.wikipedia.org/wiki/Convolutional_neural_network), but we need to decide things like the input shape and the specific sizes of layers. Like any experienced machine learning practitioner, I naturally picked something completely random for my architecture.
However, I also made a critical mistake here. That is, I decided to make a single-character classifier, and use a sliding window approach. This is commonly how [OCR](https://en.wikipedia.org/wiki/Optical_character_recognition) approaches work, since not all words have a fixed size, you just train it on individual characters, and then "slide" it over your image in order to get an entire sentence.
<div style="text-align:center"><img src="https://i.imgur.com/BjJTHLz.png"/> </div>
...but you only need this if you're recognizing variable numbers of characters, or have a large amount of sequential text to recognize. These CAPTCHAs are always 5 characters long, so we could just use a single pass (through a larger neural network) hardcoded to always detect 5 character segments.
Oh well. Hindsight is 20/20.
Anyways, here's the architecture I chose to use:
```python
# Base images are 250x75, so sixths will be approximately
# 50x74, rounding the 75 to 74 to make the pooling nice.
class EnchantmentLanguageCNN(nn.Module):
def __init__(self):
super(EnchantmentLanguageCNN, self).__init__()
self.conv1 = nn.Conv2d(3, 16, 3)
self.relu1 = nn.LeakyReLU(0.1)
self.conv2 = nn.Conv2d(16, 32, 3)
self.relu2 = nn.LeakyReLU(0.1)
self.maxpool = nn.MaxPool2d(2)
self.conv3 = nn.Conv2d(32, 64, 3)
self.relu3 = nn.LeakyReLU(0.1)
self.fc1 = nn.Linear(21 * 33 * 64, 64)
self.relu4 = nn.LeakyReLU(0.1)
self.fc2 = nn.Linear(64, 26)
self.softmax = nn.LogSoftmax(dim=1)
def forward(self, x):
x = self.conv1(x)
x = self.relu1(x)
x = self.conv2(x)
x = self.relu2(x)
x = self.maxpool(x)
x = self.conv3(x)
x = self.relu3(x)
x = x.view(-1, 21 * 33 * 64)
x = self.fc1(x)
x = self.relu4(x)
x = self.fc2(x)
x = self.softmax(x)
return x
```
It basically consists of two 3x3 convolutional layers (of 16 and 32 filters, respectively), then a 2x2 max pooling layer, followed by another 3x3 convolution (with 64 filters), a flattening, and then two fully connected layers, with 64 and 26 neurons each (the latter is our output layer). Like I said, arbitrary. An aside: I personally _love_ LeakyReLU as an activation function (I've been burned by ReLU before), so I use it everywhere. You'll find that many ML practitioners have their own "tricks" of questionable efficacy, and this is mine. Of course, the final layer's activation function is a Softmax (we use `LogSoftmax` because [that's what you're supposed to do in PyTorch](https://discuss.pytorch.org/t/what-is-the-difference-between-log-softmax-and-softmax/11801/2), I guess).
Great, now we have the next hurdle: how do we get training data? "But we have training data," you thought just now. Wrong. We have 5-character CAPTCHA training data, not single-character training data. Sure, so we'll just split each CAPTCHA into– oh, wait. That's right, we don't know where each character begins and the next one starts. We could just split each CAPTCHA into 5 segments of equal size horizontally, but that has... questionable results.
<div style="text-align:center"><img src="https://i.imgur.com/B8J1QBA.png"/> </div>
This is where I employ an advanced technique known as _"knowing when to not care"_. Here's the idea: sure, we'll get some bad splits. But we'll also get a good number of good splits. If we train our network on both bad and good splits, it'll struggle to learn the bad ones, but will presumably do well on the good ones. This will deflate training accuracy a little, but won't matter much in the end, because we'll only do inference (test-time prediction) on good images.
This was my second mistake. In principle, my logic was sound, but I forgot that I would be doing the sliding-window approach for inference. This is a problem, because I need the network to have low confidence in between characters for the sliding window to be able to tell where new characters begin. If I'm training on bad data that consists of the "in between characters" region... well, let's just say it made my life harder than it needed to be.
Anyways, armed with our faulty logic, we're ready to add this step to the input processing:
```python
def get_fifth(im, i):
start = (250 * i) // 5
return im[0:74, start:start+50, :]
def get_batch(batch_size=64, test=False):
ims, labels = get_images(batch_size, test)
ims = [normalize_image(im) for im in ims]
selections = [random.randint(0,4) for im in ims]
fifths = [get_fifth(im, selection)[None] for selection, im in zip(selections, ims)]
labels = [label[selection] for selection, label in zip(selections, labels)]
return np.moveaxis(np.concatenate(fifths),3,1), encode_labels(labels)
```
Finally, we're ready for the main training process:
```python
elcnn = EnchantmentLanguageCNN()
opt = optim.Adam(elcnn.parameters(), lr=0.001)
loss = nn.NLLLoss()
NUM_EPOCHS = 50
NUM_MBS = 20
TEST_MBS = 5
for epoch in range(NUM_EPOCHS):
average_loss = 0
for _ in range(NUM_MBS):
opt.zero_grad()
x, y = get_batch(64)
x = torch.from_numpy(x).float()
y = torch.from_numpy(y).long()
objective = loss(elcnn(x), y)
average_loss += objective.item() / NUM_MBS
objective.backward()
opt.step()
print(f"Epoch {epoch+1}")
print(" Training Loss = %0.04f" % average_loss)
test_loss = 0
test_acc = 0
for _ in range(TEST_MBS):
x, y = get_batch(64, test=True)
x = torch.from_numpy(x).float()
y = torch.from_numpy(y).long()
h = elcnn(x)
pred = np.argmax(torch.exp(h).detach().numpy(), 1)
test_loss += loss(h, y).item() / TEST_MBS
test_acc += np.mean(pred == y.numpy()) / TEST_MBS
print(" Test Loss = %0.04f" % test_loss)
print(" Test Accuracy = %0.04f" % test_acc)
print()
```
Note that I'm using the term "epoch" _very_ loosely here. Traditionally, in machine learning, one "epoch" is an entire pass over the training dataset. However, I typically work in problems where the dataset space is very large (or can be sampled randomly), where we just use an "epoch" as a count of minibatches to quantize the training how we see fit (also, I have a ridiculously small number of minibatches per epoch here, I just wanted to see nice training statistics print more often).
Other than that, this training process is fairly standard -- you can probably find something similar if you google "PyTorch Image Classification Tutorial". Another aside: initially I didn't bother splitting my data into training/testing, because I figured the model was simple enough and I had enough data that overfitting wouldn't really be a problem, but I mentioned what I was doing in our team chat, and [someone else in the team](https://ctf.harrisongreen.me/about/) who works with machine learning nagged me to make a train/test split. The code for that is fairly straightforward:
```python
TRAIN_SPLIT = 0.6
n = len(source_images)
train_images = source_images[:int(TRAIN_SPLIT * n)]
test_images = source_images[int(TRAIN_SPLIT * n):]
def get_images(n_images, test):
source = test_images if test else train_images # note change
images = random.sample(source, n_images)
ims = [np.array(Image.open(os.path.join("captchas/", image))) \
for image in images]
labels = [image[:image.find("_")] for image in images]
return ims, labels
```
Now we're ready to actually train our model, and...
```
Epoch 1
Training Loss = 3.3112
Test Loss = 3.1667
Test Accuracy = 0.0469
Epoch 2
Training Loss = 3.0750
Test Loss = 2.8879
Test Accuracy = 0.1469
Epoch 3
Training Loss = 2.5853
Test Loss = 2.1856
Test Accuracy = 0.3594
Epoch 4
Training Loss = 1.9622
Test Loss = 1.6823
Test Accuracy = 0.5219
Epoch 5
Training Loss = 1.5606
Test Loss = 1.5452
Test Accuracy = 0.5656
...
Epoch 28
Training Loss = 0.4170
Test Loss = 0.5681
Test Accuracy = 0.8750
Epoch 29
Training Loss = 0.4582
Test Loss = 0.4630
Test Accuracy = 0.9062
Epoch 30
Training Loss = 0.3660
Test Loss = 0.5343
Test Accuracy = 0.9031
```
By 30 epochs (which is only `30 * 20 * 64 = 38k` individual images), we're already seeing accuracy around 90%, which is good enough ~~for government work~~. Now, we need to actually implement sliding-window inference.
Again, rather than consult any established literature on the topic, I decided to home roll my own sliding window approach, because what could go wrong? Well, in short, a lot. I ended up with a lot of _very_ messy code, but for reference the basic slide-and-get-confidence routine looked like this:
```python
def predict_sliding(im, elcnn, resolution=5):
im = normalize_image(im)
out = ""
confidence = []
for start in range(0, 250 - 50, resolution):
window = np.moveaxis(im[0:74, start:start+50, :][None],3,1)
window = torch.from_numpy(window).float()
pred = torch.exp(elcnn(window)).detach().numpy()
best = np.argmax(pred, 1)[0]
out += LABEL_LOOKUP[best]
confidence.append(pred[0,best])
return out, confidence
```
From here I did a variety of things, including de-duplicating `out` (although this doesn't work if the captcha really was, say, `DVVXR`) and thresholding the confidence scores. Unfortunately, courtesy of that Mistake #2 I mentioned earlier, the confidence scores were very messy. I was able to get decent results with smoothing out the confidence scores with a 1D Gaussian approximation, and then taking local maxima of that, but it was way messier than it needed to be.
```python
def denoise(slide, confidence):
basic_denoised = "".join(c for p,c,n in zip(slide,slide[1:],slide[2:]) if p == c == n)
if len(set(basic_denoised)) == 5:
out = ""
for c in basic_denoised:
if c not in out:
out += c
return out
else:
out = ""
smoothed_confidence = []
confidence = [confidence[0], confidence[0]] + confidence + [confidence[-1], confidence[-1]]
for i in range(2,len(confidence)-2):
smoothed_confidence.append(
1 * confidence[i-2] + \
4 * confidence[i-1] + \
4 * confidence[i] + \
4 * confidence[i+1] + \
1 * confidence[i+2]) # it was better with 1,4,4,4,1 than 1,4,6,4,1, don't ask why
smoothed_confidence = [0] + smoothed_confidence + [0]
for i in range(len(slide)):
if smoothed_confidence[i+1] > smoothed_confidence[i] and \
smoothed_confidence[i+1] > smoothed_confidence[i+2]:
out += slide[i]
return out
```
The next step was to test against live: for this, I chose to use [Selenium](https://www.selenium.dev/). My reasons for choosing Selenium were twofold: (1) I anticipated having to do several manual corrections of the CAPTCHAs when my neural network was wrong, and a raw `requests` solution would be less user-friendly and probably take longer to show the image for manual solution, and (2) I only recently started to use Selenium, and I wanted some practice.
```python
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import base64, time
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://captcha.chal.uiuc.tf/")
last = None
while True:
img = driver.find_element_by_class_name("captcha")
input_element = driver.find_element_by_id("captcha")
src = img.get_attribute("src")
_, data = src.split(";")
_, data = data.split(",")
with open("cur_captcha.png","wb") as f:
_ = f.write(base64.b64decode(data))
im = np.array(Image.open("cur_captcha.png"))
im = normalize_image(im)
slide, conf = predict_sliding(im, elcnn)
ans = denoise(slide, conf)
if ans == last:
continue
input_element.send_keys(ans)
input_element.submit()
last = ans
time.sleep(0.1)
```
This loop continuously scrapes the CAPTCHA `<img>` tag, saving the image locally as `cur_captcha.png`, which it can then open and format properly for the neural network to use to make a prediction. The `ans == last` tidbit is just to make it so it doesn't continually try to spam the same answer when it's wrong (yes, it will fail if the next CAPTCHA happens to be the same, but I'm willing to accept a (1/(26^5)) risk).
At this point, I thought my hard work was about to pay off with a juicy first blood (at this point, Bot Protection had no solves). This was also before any clarification posts were released about Bot Protection, so my team speculated that there were only 30 levels (I think the 10 minute time limit had been guessed by someone else on the team that was experimenting with it). After all, if you translate the text on the CAPTCHA page, you get "Level 0 is not high enough" (incidentally, this is just regular text, using the Minecraft Enchanting Language font, so you can read it by just copying into Notepad), and the maximum level to enchant with in Minecraft is level 30.
I fire up the script, and, woohoo, I hit level 30! ...but it keeps going. I stopped around 41 because I was somewhat confused (and tired of doing CAPTCHAs manually). I realized that level 30 would probably actually be too low, since with a good mastery of Enchanting Language, you could probably get to level 30 by hand in 10 minutes (one team member was already hard at work learning it with [this Quizlet](https://quizlet.com/421263738/minecraft-enchantment-table-language-flash-cards/)). So, I try again. And then I hit 100, and... it keeps going. I got to around ~110 or so before the 10 minute timer hit.
<div style="text-align:center"><img src="https://i.imgur.com/GedfMZf.png"/> <i>No, Level 108 is not high enough</i></div> <br />
Note that it was considerable effort to do this, since my script only had about a 85~90% success rate on the entire CAPTCHA, so I had to do a little over every 10th CAPTCHA by hand. At this point I decided it wasn't worth the effort of doing it by hand with no end in sight, so we used the Modmail to ask how many CAPTCHAs we had to do. Turns out they had received this question already, so they just decided to publicly announce: "Huge note on Bot Protection IV: You need to solve 500 captchas in 10 minutes."
Five.
Hundred.
CAPTCHAs.
To cut to the chase, I realized I needed to significantly up my automatic success rate. Naturally wanting to take the path of least resistance, I decided to "fuzz" my existing results. To do this, I looked at typical inputs that were giving my algorithm trouble. I won't go over every single thing that my "fuzzer" does, but a few examples:
- Suppose we get the prediction "ZCCUWQ". Originally, in cases where it predicted something of length 6, but it had duplicates, I would just cut out the duplicate to get "ZCUWQ". As it turns out, in a lot of these cases, the duplicate was actually real. Simply randomly deleting a character in these cases would often stumble upon the real answer (eventually)
- The letter "J" in Minecraft Enchanting Language is very thin. It's just three dots stacked on top of each other. As a result, the network misses "J"s entirely fairly often. Simply injecting "J"s at random into the resulting prediction helped a surprising amount.
- Rather than do this fancy sliding window confidence threshold detection, why not just predict the way we trained, by splitting into 5 boxes and using that? Okay, well, this one was actually pretty bad, but in Machine Learning, [two bad models make one not-so-bad model](https://en.wikipedia.org/wiki/Boosting_(machine_learning)).
With all this, I was able to amp up my average accuracy to ~95%, and I also played with the timings on Selenium in order to make it as fast as possible. I also rigged it up so that it would print every minute, so I would have a good idea if I was on "WR pace" or not. My approach from here was to just restart the run if I wasn't on CAPTCHA #100, ideally better, by the 2 minute mark (I need to average 50/minute to make 500/10 minutes).
I had one great run where I was at #130 at the 2 minute mark, but then Selenium suddenly died and threw an error. I had set the timing too aggressively, and it complained about not being able to find the input element, since it was looking before the pageload finished (there's probably a way to wait for pageload to finish, but I didn't bother looking for it). I scaled back the timing, sprinkled in some try-catch blocks, and set off again.
In the end, I think I got my "god run" within the first 5 attempts (that weren't restarted very early, anyways). My 1m and 2m splits were slightly above-average, then falling back down to average by 5m, but I got really lucky on the last half, and ended up finishing over a minute ahead of pace. It's also worth noting that I started unintentionally memorizing Enchanting Language as I was doing this, to the point that I could recognize 50% of characters instantly, and another 30% without much trouble, only having to actually consult my pinned table ~20% of the time.
<div style="text-align:center"><img src="https://i.imgur.com/5iMVXTK.png"/> <i>My custom lookup table, rearranged by character morphology. T, E, and M were the hardest to remember the differences between, whereas A, B, O, and R were instantly recognizable fairly quickly.</i></div> <br />
After much effort, we're rewarded with the flag:
`uiuctf{i_knew_a_guy_in_highschool_that_could_read_this}`
...I became that guy. Overall, this was a fun (if painful) challenge, much thanks to `tow_nater` for writing it!
Here's my full source code (warning: it is messy), for some reason I decided to do everything in the same file, hence the flag for whether to train or load the model from the saved file.
```python
import os
import random
source_images = os.listdir("captchas")
random.shuffle(source_images)
from PIL import Image
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
TRAIN_SPLIT = 0.6
n = len(source_images)
train_images = source_images[:int(TRAIN_SPLIT * n)]
test_images = source_images[int(TRAIN_SPLIT * n):]
LABEL_LOOKUP = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
def predict_sliding(im, elcnn, resolution=5):
im = normalize_image(im)
out = ""
confidence = []
for start in range(0, 250 - 50, resolution):
window = np.moveaxis(im[0:74, start:start+50, :][None],3,1)
window = torch.from_numpy(window).float()
pred = torch.exp(elcnn(window)).detach().numpy()
best = np.argmax(pred, 1)[0]
out += LABEL_LOOKUP[best]
confidence.append(pred[0,best])
return out, confidence
def denoise(slide, confidence):
basic_denoised = "".join(c for p,c,n in zip(slide,slide[1:],slide[2:]) if p == c == n)
if len(set(basic_denoised)) == 5:
out = ""
for c in basic_denoised:
if c not in out:
out += c
return out
else:
out = ""
smoothed_confidence = []
confidence = [confidence[0], confidence[0]] + confidence + [confidence[-1], confidence[-1]]
for i in range(2,len(confidence)-2):
smoothed_confidence.append(
1 * confidence[i-2] + \
4 * confidence[i-1] + \
4 * confidence[i] + \
4 * confidence[i+1] + \
1 * confidence[i+2])
smoothed_confidence = [0] + smoothed_confidence + [0]
for i in range(len(slide)):
if smoothed_confidence[i+1] > smoothed_confidence[i] and \
smoothed_confidence[i+1] > smoothed_confidence[i+2]:
out += slide[i]
return out
def show_image(im):
Image.fromarray((255 * im).astype(np.uint8)).show()
def normalize_image(im):
im = im.astype(np.float64)
im = im - im.min()
im = im / im.max()
return im
def get_fifth(im, i):
start = (250 * i) // 5
return im[0:74, start:start+50, :]
def get_images(n_images, test):
source = test_images if test else train_images
images = random.sample(source, n_images)
ims = [np.array(Image.open(os.path.join("captchas/", image))) \
for image in images]
labels = [image[:image.find("_")] for image in images]
return ims, labels
def encode_labels(labels):
idxs = [LABEL_LOOKUP.find(label) for label in labels]
return np.array(idxs)
def get_batch(batch_size=64, test=False):
ims, labels = get_images(batch_size, test)
ims = [normalize_image(im) for im in ims]
selections = [random.randint(0,4) for im in ims]
fifths = [get_fifth(im, selection)[None] for selection, im in zip(selections, ims)]
labels = [label[selection] for selection, label in zip(selections, labels)]
return np.moveaxis(np.concatenate(fifths),3,1), encode_labels(labels)
# Base images are 250x75, so sixths will be approximately
# 50x74, rounding the 75 to 74 to make the pooling nice.
class EnchantmentLanguageCNN(nn.Module):
def __init__(self):
super(EnchantmentLanguageCNN, self).__init__()
self.conv1 = nn.Conv2d(3, 16, 3)
self.relu1 = nn.LeakyReLU(0.1)
self.conv2 = nn.Conv2d(16, 32, 3)
self.relu2 = nn.LeakyReLU(0.1)
self.maxpool = nn.MaxPool2d(2)
self.conv3 = nn.Conv2d(32, 64, 3)
self.relu3 = nn.LeakyReLU(0.1)
self.fc1 = nn.Linear(21 * 33 * 64, 64)
self.relu4 = nn.LeakyReLU(0.1)
self.fc2 = nn.Linear(64, 26)
self.softmax = nn.LogSoftmax(dim=1)
def forward(self, x):
x = self.conv1(x)
x = self.relu1(x)
x = self.conv2(x)
x = self.relu2(x)
x = self.maxpool(x)
x = self.conv3(x)
x = self.relu3(x)
x = x.view(-1, 21 * 33 * 64)
x = self.fc1(x)
x = self.relu4(x)
x = self.fc2(x)
x = self.softmax(x)
return x
elcnn = EnchantmentLanguageCNN()
opt = optim.Adam(elcnn.parameters(), lr=0.001)
loss = nn.NLLLoss()
NUM_EPOCHS = 50
NUM_MBS = 20
TEST_MBS = 5
do_train = False
if do_train:
for epoch in range(NUM_EPOCHS):
average_loss = 0
for _ in range(NUM_MBS):
opt.zero_grad()
x, y = get_batch(64)
x = torch.from_numpy(x).float()
y = torch.from_numpy(y).long()
objective = loss(elcnn(x), y)
average_loss += objective.item() / NUM_MBS
objective.backward()
opt.step()
print(f"Epoch {epoch+1}")
print(" Training Loss = %0.04f" % average_loss)
test_loss = 0
test_acc = 0
for _ in range(TEST_MBS):
x, y = get_batch(64, test=True)
x = torch.from_numpy(x).float()
y = torch.from_numpy(y).long()
h = elcnn(x)
pred = np.argmax(torch.exp(h).detach().numpy(), 1)
test_loss += loss(h, y).item() / TEST_MBS
test_acc += np.mean(pred == y.numpy()) / TEST_MBS
print(" Test Loss = %0.04f" % test_loss)
print(" Test Accuracy = %0.04f" % test_acc)
print()
torch.save(elcnn.state_dict(), "model.pt")
else:
elcnn.load_state_dict(torch.load("model.pt"))
def randomize_guess(ans):
if random.random() < 0.5:
idx = random.randint(0,len(ans)-1)
ans = ans[:idx] + "J" + ans[idx:]
while len(ans) < 5:
idx = random.randint(0,len(ans)-1)
ans = ans[:idx] + ans[idx] + ans[idx] + ans[idx+1:]
while len(ans) > 5:
candidates = [i for i in range(len(ans)-1) if ans[i] == ans[i+1]]
if len(candidates) == 0:
choice = random.randint(0,len(ans)-1)
else:
choice = random.choice(candidates)
ans = ans[:choice] + ans[choice+1:]
return ans
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import base64, time
print("Ready to test against website!")
time.sleep(5)
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://captcha.chal.uiuc.tf/")
last = ""
failures = 0
minutes = 0
last_time = time.time()
FAILURE_THRESHOLD = 12
while True:
if time.time() - last_time > 60:
minutes += 1
print(f"{minutes} minutes elapsed!")
last_time = time.time()
try:
img = driver.find_element_by_class_name("captcha")
input_element = driver.find_element_by_id("captcha")
src = img.get_attribute("src")
_, data = src.split(";")
_, data = data.split(",")
with open("cur_captcha.png","wb") as f:
_ = f.write(base64.b64decode(data))
im = np.array(Image.open("cur_captcha.png"))
im = normalize_image(im)
slide, conf = predict_sliding(im, elcnn)
ans = denoise(slide, conf)
if ans == last:
failures += 1
else:
failures = 0
print(f"Prediction: {ans}")
last = ans
if failures < FAILURE_THRESHOLD:
if failures == 0 and len(ans) == 5:
input_element.send_keys(ans)
input_element.submit()
elif 1 <= failures <= 3:
guess = slide[failures-1::8]
input_element.send_keys(guess)
input_element.submit()
else:
guess = randomize_guess(ans)
input_element.send_keys(guess)
input_element.submit()
elif failures == FAILURE_THRESHOLD:
input_element.click()
except:
time.sleep(0.01)
continue
time.sleep(0.07)
```