changed 5 years ago
Linked with GitHub

UIUCTF - Bot Protection IV Writeup

by nthistle with DiceGang

Challenge:

When on website: +1 spam resistance +10 user annoyance

https://captcha.chal.uiuc.tf

Author: tow_nater

We start off by heading to https://captcha.chal.uiuc.tf, where we're presented with a Minecraft enchanting table and some weird text. At this point we can already make the guess that this is going to have to do with CAPTCHAs written in Minecraft Enchanting Language.

Sure enough, we check the source, and see this lovely comment at the top:

<!doctype html>

<!--TODO: we don't need /captchas.zip anymore now that we dynamically create captchas. We should delete this file.-->

<html>
<title>UIUCTF</title>
<link rel="stylesheet" href="/static/style.css">

<body>
<div class="bg"/>
...

Right, then we'll just download /captchas.zip, and– oh. It's 1GB. And they claim they're creating the captchas dynamically. Oh no. At this point, I'm already starting to suspect that they want us to use Machine Learning to crack their CAPTCHAs automatically. The 1GB zip turns out to be full of labelled CAPTCHAs, which looks suspiciously like a training set.

Of course, you should never trust what they tell you in a CTF. It's worth checking to make sure that they are in fact generating CAPTCHAs dynamically, and not just randomly pulling from captchas.zip. However, by this point, I was already fairly certain they wanted us to use ML, so a teammate did this check for me instead while I started on my approach (Spoiler: they were, in fact, generating them dynamically).

I decided to use PyTorch, simply because it's what I have installed at the moment (although I've also used Keras to similar effect before). The initial work is fairly straightforward, it's just importing the data and transforming it into a format that we can work with:

import os

source_images = os.listdir("captchas")
random.shuffle(source_images)

from PIL import Image
import numpy as np

LABEL_LOOKUP = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

def show_image(im):
    Image.fromarray((255 * im).astype(np.uint8)).show()

def normalize_image(im):
    im = im.astype(np.float64)
    im = im - im.min()
    im = im / im.max()
    return im

def get_images(n_images):
    images = random.sample(source_images, n_images)
    ims = [np.array(Image.open(os.path.join("captchas/", image))) \
           for image in images]
    labels = [image[:image.find("_")] for image in images]
    return ims, labels

def encode_labels(labels):
    idxs = [LABEL_LOOKUP.find(label) for label in labels]
    return np.array(idxs)

Once we have this working, we have an important decision to make: the network architecture. Of course, since we're working with image data, we'll naturally use some kind of Convolutional Neural Network, but we need to decide things like the input shape and the specific sizes of layers. Like any experienced machine learning practitioner, I naturally picked something completely random for my architecture.

However, I also made a critical mistake here. That is, I decided to make a single-character classifier, and use a sliding window approach. This is commonly how OCR approaches work, since not all words have a fixed size, you just train it on individual characters, and then "slide" it over your image in order to get an entire sentence.

but you only need this if you're recognizing variable numbers of characters, or have a large amount of sequential text to recognize. These CAPTCHAs are always 5 characters long, so we could just use a single pass (through a larger neural network) hardcoded to always detect 5 character segments.

Oh well. Hindsight is 20/20.

Anyways, here's the architecture I chose to use:

# Base images are 250x75, so sixths will be approximately
# 50x74, rounding the 75 to 74 to make the pooling nice.
class EnchantmentLanguageCNN(nn.Module):
    def __init__(self):
        super(EnchantmentLanguageCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 3)
        self.relu1 = nn.LeakyReLU(0.1)
        self.conv2 = nn.Conv2d(16, 32, 3)
        self.relu2 = nn.LeakyReLU(0.1)
        self.maxpool = nn.MaxPool2d(2)
        self.conv3 = nn.Conv2d(32, 64, 3)
        self.relu3 = nn.LeakyReLU(0.1)
        self.fc1 = nn.Linear(21 * 33 * 64, 64)
        self.relu4 = nn.LeakyReLU(0.1)
        self.fc2 = nn.Linear(64, 26)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.conv2(x)
        x = self.relu2(x)
        x = self.maxpool(x)
        x = self.conv3(x)
        x = self.relu3(x)
        x = x.view(-1, 21 * 33 * 64)
        x = self.fc1(x)
        x = self.relu4(x)
        x = self.fc2(x)
        x = self.softmax(x)
        return x

It basically consists of two 3x3 convolutional layers (of 16 and 32 filters, respectively), then a 2x2 max pooling layer, followed by another 3x3 convolution (with 64 filters), a flattening, and then two fully connected layers, with 64 and 26 neurons each (the latter is our output layer). Like I said, arbitrary. An aside: I personally love LeakyReLU as an activation function (I've been burned by ReLU before), so I use it everywhere. You'll find that many ML practitioners have their own "tricks" of questionable efficacy, and this is mine. Of course, the final layer's activation function is a Softmax (we use LogSoftmax because that's what you're supposed to do in PyTorch, I guess).

Great, now we have the next hurdle: how do we get training data? "But we have training data," you thought just now. Wrong. We have 5-character CAPTCHA training data, not single-character training data. Sure, so we'll just split each CAPTCHA into– oh, wait. That's right, we don't know where each character begins and the next one starts. We could just split each CAPTCHA into 5 segments of equal size horizontally, but that has questionable results.

This is where I employ an advanced technique known as "knowing when to not care". Here's the idea: sure, we'll get some bad splits. But we'll also get a good number of good splits. If we train our network on both bad and good splits, it'll struggle to learn the bad ones, but will presumably do well on the good ones. This will deflate training accuracy a little, but won't matter much in the end, because we'll only do inference (test-time prediction) on good images.

This was my second mistake. In principle, my logic was sound, but I forgot that I would be doing the sliding-window approach for inference. This is a problem, because I need the network to have low confidence in between characters for the sliding window to be able to tell where new characters begin. If I'm training on bad data that consists of the "in between characters" region well, let's just say it made my life harder than it needed to be.

Anyways, armed with our faulty logic, we're ready to add this step to the input processing:

def get_fifth(im, i):
    start = (250 * i) // 5
    return im[0:74, start:start+50, :]

def get_batch(batch_size=64, test=False):
    ims, labels = get_images(batch_size, test)
    ims = [normalize_image(im) for im in ims]
    selections = [random.randint(0,4) for im in ims]
    fifths = [get_fifth(im, selection)[None] for selection, im in zip(selections, ims)]
    labels = [label[selection] for selection, label in zip(selections, labels)]
    return np.moveaxis(np.concatenate(fifths),3,1), encode_labels(labels)

Finally, we're ready for the main training process:

elcnn = EnchantmentLanguageCNN()
opt = optim.Adam(elcnn.parameters(), lr=0.001)
loss = nn.NLLLoss()

NUM_EPOCHS = 50
NUM_MBS = 20
TEST_MBS = 5

for epoch in range(NUM_EPOCHS):
    average_loss = 0
    for _ in range(NUM_MBS):
        opt.zero_grad()
        x, y = get_batch(64)
        x = torch.from_numpy(x).float()
        y = torch.from_numpy(y).long()
        objective = loss(elcnn(x), y)
        average_loss += objective.item() / NUM_MBS
        objective.backward()
        opt.step()

    print(f"Epoch {epoch+1}")
    print(" Training Loss = %0.04f" % average_loss)

    test_loss = 0
    test_acc = 0
    for _ in range(TEST_MBS):
        x, y = get_batch(64, test=True)
        x = torch.from_numpy(x).float()
        y = torch.from_numpy(y).long()
        h = elcnn(x)
        pred = np.argmax(torch.exp(h).detach().numpy(), 1)
        test_loss += loss(h, y).item() / TEST_MBS
        test_acc += np.mean(pred == y.numpy()) / TEST_MBS

    print(" Test Loss = %0.04f" % test_loss)
    print(" Test Accuracy = %0.04f" % test_acc)
    print()

Note that I'm using the term "epoch" very loosely here. Traditionally, in machine learning, one "epoch" is an entire pass over the training dataset. However, I typically work in problems where the dataset space is very large (or can be sampled randomly), where we just use an "epoch" as a count of minibatches to quantize the training how we see fit (also, I have a ridiculously small number of minibatches per epoch here, I just wanted to see nice training statistics print more often).

Other than that, this training process is fairly standard you can probably find something similar if you google "PyTorch Image Classification Tutorial". Another aside: initially I didn't bother splitting my data into training/testing, because I figured the model was simple enough and I had enough data that overfitting wouldn't really be a problem, but I mentioned what I was doing in our team chat, and someone else in the team who works with machine learning nagged me to make a train/test split. The code for that is fairly straightforward:

TRAIN_SPLIT = 0.6
n = len(source_images)
train_images = source_images[:int(TRAIN_SPLIT * n)]
test_images = source_images[int(TRAIN_SPLIT * n):]

def get_images(n_images, test):
    source = test_images if test else train_images # note change
    images = random.sample(source, n_images)
    ims = [np.array(Image.open(os.path.join("captchas/", image))) \
           for image in images]
    labels = [image[:image.find("_")] for image in images]
    return ims, labels

Now we're ready to actually train our model, and

Epoch 1
 Training Loss = 3.3112
 Test Loss = 3.1667
 Test Accuracy = 0.0469

Epoch 2
 Training Loss = 3.0750
 Test Loss = 2.8879
 Test Accuracy = 0.1469

Epoch 3
 Training Loss = 2.5853
 Test Loss = 2.1856
 Test Accuracy = 0.3594

Epoch 4
 Training Loss = 1.9622
 Test Loss = 1.6823
 Test Accuracy = 0.5219

Epoch 5
 Training Loss = 1.5606
 Test Loss = 1.5452
 Test Accuracy = 0.5656
 
...

Epoch 28
 Training Loss = 0.4170
 Test Loss = 0.5681
 Test Accuracy = 0.8750

Epoch 29
 Training Loss = 0.4582
 Test Loss = 0.4630
 Test Accuracy = 0.9062

Epoch 30
 Training Loss = 0.3660
 Test Loss = 0.5343
 Test Accuracy = 0.9031

By 30 epochs (which is only 30 * 20 * 64 = 38k individual images), we're already seeing accuracy around 90%, which is good enough for government work. Now, we need to actually implement sliding-window inference.

Again, rather than consult any established literature on the topic, I decided to home roll my own sliding window approach, because what could go wrong? Well, in short, a lot. I ended up with a lot of very messy code, but for reference the basic slide-and-get-confidence routine looked like this:

def predict_sliding(im, elcnn, resolution=5):
    im = normalize_image(im)
    out = ""
    confidence = []
    for start in range(0, 250 - 50, resolution):
        window = np.moveaxis(im[0:74, start:start+50, :][None],3,1)
        window = torch.from_numpy(window).float()
        pred = torch.exp(elcnn(window)).detach().numpy()
        best = np.argmax(pred, 1)[0]
        out += LABEL_LOOKUP[best]
        confidence.append(pred[0,best])
    return out, confidence

From here I did a variety of things, including de-duplicating out (although this doesn't work if the captcha really was, say, DVVXR) and thresholding the confidence scores. Unfortunately, courtesy of that Mistake #2 I mentioned earlier, the confidence scores were very messy. I was able to get decent results with smoothing out the confidence scores with a 1D Gaussian approximation, and then taking local maxima of that, but it was way messier than it needed to be.

def denoise(slide, confidence):
    basic_denoised = "".join(c for p,c,n in zip(slide,slide[1:],slide[2:]) if p == c == n)
    if len(set(basic_denoised)) == 5:
        out = ""
        for c in basic_denoised:
            if c not in out:
                out += c
        return out
    else:
        out = ""
        smoothed_confidence = []
        confidence = [confidence[0], confidence[0]] + confidence + [confidence[-1], confidence[-1]]
        for i in range(2,len(confidence)-2):
            smoothed_confidence.append(
                1 * confidence[i-2] + \
                4 * confidence[i-1] + \
                4 * confidence[i]   + \
                4 * confidence[i+1] + \
                1 * confidence[i+2]) # it was better with 1,4,4,4,1 than 1,4,6,4,1, don't ask why
        smoothed_confidence = [0] + smoothed_confidence + [0]
        for i in range(len(slide)):
            if smoothed_confidence[i+1] > smoothed_confidence[i] and \
               smoothed_confidence[i+1] > smoothed_confidence[i+2]:
                out += slide[i]
        return out

The next step was to test against live: for this, I chose to use Selenium. My reasons for choosing Selenium were twofold: (1) I anticipated having to do several manual corrections of the CAPTCHAs when my neural network was wrong, and a raw requests solution would be less user-friendly and probably take longer to show the image for manual solution, and (2) I only recently started to use Selenium, and I wanted some practice.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import base64, time

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://captcha.chal.uiuc.tf/")

last = None

while True:
    img = driver.find_element_by_class_name("captcha")
    input_element = driver.find_element_by_id("captcha")
    src = img.get_attribute("src")
    _, data = src.split(";")
    _, data = data.split(",")
    with open("cur_captcha.png","wb") as f:
        _ = f.write(base64.b64decode(data))
    im = np.array(Image.open("cur_captcha.png"))
    im = normalize_image(im)
    slide, conf = predict_sliding(im, elcnn)
    ans = denoise(slide, conf)
    if ans == last:
         continue
    input_element.send_keys(ans)
    input_element.submit()
    last = ans
    time.sleep(0.1)

This loop continuously scrapes the CAPTCHA <img> tag, saving the image locally as cur_captcha.png, which it can then open and format properly for the neural network to use to make a prediction. The ans == last tidbit is just to make it so it doesn't continually try to spam the same answer when it's wrong (yes, it will fail if the next CAPTCHA happens to be the same, but I'm willing to accept a (1/(26^5)) risk).

At this point, I thought my hard work was about to pay off with a juicy first blood (at this point, Bot Protection had no solves). This was also before any clarification posts were released about Bot Protection, so my team speculated that there were only 30 levels (I think the 10 minute time limit had been guessed by someone else on the team that was experimenting with it). After all, if you translate the text on the CAPTCHA page, you get "Level 0 is not high enough" (incidentally, this is just regular text, using the Minecraft Enchanting Language font, so you can read it by just copying into Notepad), and the maximum level to enchant with in Minecraft is level 30.

I fire up the script, and, woohoo, I hit level 30! but it keeps going. I stopped around 41 because I was somewhat confused (and tired of doing CAPTCHAs manually). I realized that level 30 would probably actually be too low, since with a good mastery of Enchanting Language, you could probably get to level 30 by hand in 10 minutes (one team member was already hard at work learning it with this Quizlet). So, I try again. And then I hit 100, and it keeps going. I got to around ~110 or so before the 10 minute timer hit.

No, Level 108 is not high enough

Note that it was considerable effort to do this, since my script only had about a 85~90% success rate on the entire CAPTCHA, so I had to do a little over every 10th CAPTCHA by hand. At this point I decided it wasn't worth the effort of doing it by hand with no end in sight, so we used the Modmail to ask how many CAPTCHAs we had to do. Turns out they had received this question already, so they just decided to publicly announce: "Huge note on Bot Protection IV: You need to solve 500 captchas in 10 minutes."

Five.

Hundred.

CAPTCHAs.

To cut to the chase, I realized I needed to significantly up my automatic success rate. Naturally wanting to take the path of least resistance, I decided to "fuzz" my existing results. To do this, I looked at typical inputs that were giving my algorithm trouble. I won't go over every single thing that my "fuzzer" does, but a few examples:

  • Suppose we get the prediction "ZCCUWQ". Originally, in cases where it predicted something of length 6, but it had duplicates, I would just cut out the duplicate to get "ZCUWQ". As it turns out, in a lot of these cases, the duplicate was actually real. Simply randomly deleting a character in these cases would often stumble upon the real answer (eventually)
  • The letter "J" in Minecraft Enchanting Language is very thin. It's just three dots stacked on top of each other. As a result, the network misses "J"s entirely fairly often. Simply injecting "J"s at random into the resulting prediction helped a surprising amount.
  • Rather than do this fancy sliding window confidence threshold detection, why not just predict the way we trained, by splitting into 5 boxes and using that? Okay, well, this one was actually pretty bad, but in Machine Learning, two bad models make one not-so-bad model.

With all this, I was able to amp up my average accuracy to ~95%, and I also played with the timings on Selenium in order to make it as fast as possible. I also rigged it up so that it would print every minute, so I would have a good idea if I was on "WR pace" or not. My approach from here was to just restart the run if I wasn't on CAPTCHA #100, ideally better, by the 2 minute mark (I need to average 50/minute to make 500/10 minutes).

I had one great run where I was at #130 at the 2 minute mark, but then Selenium suddenly died and threw an error. I had set the timing too aggressively, and it complained about not being able to find the input element, since it was looking before the pageload finished (there's probably a way to wait for pageload to finish, but I didn't bother looking for it). I scaled back the timing, sprinkled in some try-catch blocks, and set off again.

In the end, I think I got my "god run" within the first 5 attempts (that weren't restarted very early, anyways). My 1m and 2m splits were slightly above-average, then falling back down to average by 5m, but I got really lucky on the last half, and ended up finishing over a minute ahead of pace. It's also worth noting that I started unintentionally memorizing Enchanting Language as I was doing this, to the point that I could recognize 50% of characters instantly, and another 30% without much trouble, only having to actually consult my pinned table ~20% of the time.

My custom lookup table, rearranged by character morphology. T, E, and M were the hardest to remember the differences between, whereas A, B, O, and R were instantly recognizable fairly quickly.

After much effort, we're rewarded with the flag:

uiuctf{i_knew_a_guy_in_highschool_that_could_read_this}

I became that guy. Overall, this was a fun (if painful) challenge, much thanks to tow_nater for writing it!

Here's my full source code (warning: it is messy), for some reason I decided to do everything in the same file, hence the flag for whether to train or load the model from the saved file.

import os
import random

source_images = os.listdir("captchas")
random.shuffle(source_images)

from PIL import Image
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim

TRAIN_SPLIT = 0.6
n = len(source_images)
train_images = source_images[:int(TRAIN_SPLIT * n)]
test_images = source_images[int(TRAIN_SPLIT * n):]

LABEL_LOOKUP = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

def predict_sliding(im, elcnn, resolution=5):
    im = normalize_image(im)
    out = ""
    confidence = []
    for start in range(0, 250 - 50, resolution):
        window = np.moveaxis(im[0:74, start:start+50, :][None],3,1)
        window = torch.from_numpy(window).float()
        pred = torch.exp(elcnn(window)).detach().numpy()
        best = np.argmax(pred, 1)[0]
        out += LABEL_LOOKUP[best]
        confidence.append(pred[0,best])
    return out, confidence

def denoise(slide, confidence):
    basic_denoised = "".join(c for p,c,n in zip(slide,slide[1:],slide[2:]) if p == c == n)
    if len(set(basic_denoised)) == 5:
        out = ""
        for c in basic_denoised:
            if c not in out:
                out += c
        return out
    else:
        out = ""
        smoothed_confidence = []
        confidence = [confidence[0], confidence[0]] + confidence + [confidence[-1], confidence[-1]]
        for i in range(2,len(confidence)-2):
            smoothed_confidence.append(
                1 * confidence[i-2] + \
                4 * confidence[i-1] + \
                4 * confidence[i]   + \
                4 * confidence[i+1] + \
                1 * confidence[i+2])
        smoothed_confidence = [0] + smoothed_confidence + [0]
        for i in range(len(slide)):
            if smoothed_confidence[i+1] > smoothed_confidence[i] and \
               smoothed_confidence[i+1] > smoothed_confidence[i+2]:
                out += slide[i]
        return out
        

def show_image(im):
    Image.fromarray((255 * im).astype(np.uint8)).show()

def normalize_image(im):
    im = im.astype(np.float64)
    im = im - im.min()
    im = im / im.max()
    return im

def get_fifth(im, i):
    start = (250 * i) // 5
    return im[0:74, start:start+50, :]

def get_images(n_images, test):
    source = test_images if test else train_images
    images = random.sample(source, n_images)
    ims = [np.array(Image.open(os.path.join("captchas/", image))) \
           for image in images]
    labels = [image[:image.find("_")] for image in images]
    return ims, labels

def encode_labels(labels):
    idxs = [LABEL_LOOKUP.find(label) for label in labels]
    return np.array(idxs)

def get_batch(batch_size=64, test=False):
    ims, labels = get_images(batch_size, test)
    ims = [normalize_image(im) for im in ims]
    selections = [random.randint(0,4) for im in ims]
    fifths = [get_fifth(im, selection)[None] for selection, im in zip(selections, ims)]
    labels = [label[selection] for selection, label in zip(selections, labels)]
    return np.moveaxis(np.concatenate(fifths),3,1), encode_labels(labels)

# Base images are 250x75, so sixths will be approximately
# 50x74, rounding the 75 to 74 to make the pooling nice.
class EnchantmentLanguageCNN(nn.Module):
    def __init__(self):
        super(EnchantmentLanguageCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 3)
        self.relu1 = nn.LeakyReLU(0.1)
        self.conv2 = nn.Conv2d(16, 32, 3)
        self.relu2 = nn.LeakyReLU(0.1)
        self.maxpool = nn.MaxPool2d(2)
        self.conv3 = nn.Conv2d(32, 64, 3)
        self.relu3 = nn.LeakyReLU(0.1)
        self.fc1 = nn.Linear(21 * 33 * 64, 64)
        self.relu4 = nn.LeakyReLU(0.1)
        self.fc2 = nn.Linear(64, 26)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.conv2(x)
        x = self.relu2(x)
        x = self.maxpool(x)
        x = self.conv3(x)
        x = self.relu3(x)
        x = x.view(-1, 21 * 33 * 64)
        x = self.fc1(x)
        x = self.relu4(x)
        x = self.fc2(x)
        x = self.softmax(x)
        return x

elcnn = EnchantmentLanguageCNN()
opt = optim.Adam(elcnn.parameters(), lr=0.001)
loss = nn.NLLLoss()
NUM_EPOCHS = 50
NUM_MBS = 20
TEST_MBS = 5

do_train = False

if do_train:
    for epoch in range(NUM_EPOCHS):
        average_loss = 0
        for _ in range(NUM_MBS):
            opt.zero_grad()
            x, y = get_batch(64)
            x = torch.from_numpy(x).float()
            y = torch.from_numpy(y).long()
            objective = loss(elcnn(x), y)
            average_loss += objective.item() / NUM_MBS
            objective.backward()
            opt.step()

        print(f"Epoch {epoch+1}")
        print(" Training Loss = %0.04f" % average_loss)

        test_loss = 0
        test_acc = 0
        for _ in range(TEST_MBS):
            x, y = get_batch(64, test=True)
            x = torch.from_numpy(x).float()
            y = torch.from_numpy(y).long()
            h = elcnn(x)
            pred = np.argmax(torch.exp(h).detach().numpy(), 1)
            test_loss += loss(h, y).item() / TEST_MBS
            test_acc += np.mean(pred == y.numpy()) / TEST_MBS

        print(" Test Loss = %0.04f" % test_loss)
        print(" Test Accuracy = %0.04f" % test_acc)
        print()
    torch.save(elcnn.state_dict(), "model.pt")
else:
    elcnn.load_state_dict(torch.load("model.pt"))

def randomize_guess(ans):
    if random.random() < 0.5:
        idx = random.randint(0,len(ans)-1)
        ans = ans[:idx] + "J" + ans[idx:]
    while len(ans) < 5:
        idx = random.randint(0,len(ans)-1)
        ans = ans[:idx] + ans[idx] + ans[idx] + ans[idx+1:]
    while len(ans) > 5:
        candidates = [i for i in range(len(ans)-1) if ans[i] == ans[i+1]]
        if len(candidates) == 0:
            choice = random.randint(0,len(ans)-1)
        else:
            choice = random.choice(candidates)
        ans = ans[:choice] + ans[choice+1:]
    return ans

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import base64, time

print("Ready to test against website!")
time.sleep(5)

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://captcha.chal.uiuc.tf/")

last = ""
failures = 0
minutes = 0
last_time = time.time()
FAILURE_THRESHOLD = 12
while True:
    if time.time() - last_time > 60:
        minutes += 1
        print(f"{minutes} minutes elapsed!")
        last_time = time.time()
    try:
        img = driver.find_element_by_class_name("captcha")
        input_element = driver.find_element_by_id("captcha")
        src = img.get_attribute("src")
        _, data = src.split(";")
        _, data = data.split(",")
        with open("cur_captcha.png","wb") as f:
            _ = f.write(base64.b64decode(data))
        im = np.array(Image.open("cur_captcha.png"))
        im = normalize_image(im)
        slide, conf = predict_sliding(im, elcnn)
        ans = denoise(slide, conf)
        if ans == last:
            failures += 1
        else:
            failures = 0
            print(f"Prediction: {ans}")
            last = ans
        if failures < FAILURE_THRESHOLD:
            if failures == 0 and len(ans) == 5:
                input_element.send_keys(ans)
                input_element.submit()
            elif 1 <= failures <= 3:
                guess = slide[failures-1::8]
                input_element.send_keys(guess)
                input_element.submit()
            else:
                guess = randomize_guess(ans)
                input_element.send_keys(guess)
                input_element.submit()
        elif failures == FAILURE_THRESHOLD:
            input_element.click()
    except:
        time.sleep(0.01)
        continue
    time.sleep(0.07)
Select a repo