M30W@DiceGang
MuJS Official Repo
Challenge Resource & Solve Script
I played UIUCTF with DiceGang last weekend, and we secured first place with only one challenge left unsolved. Sadly, I was busy on the second day of CTF and only managed to do Accounting Accidents and MuJS with help from teammates.
MuJS was a pretty cool challenge though, given that I always wanted to try js engine pwn but never really got into it. The fact that I failed to do one-line JS (another MuJS from 0CTF 2020) was even more reason for me to do this one.
Here I'd like to share how my team managed to tackle the challenge.
In this challenge, the author provided a README that illustrates the target of this challenge as well as some additional requirements.
While getting shell is not required, we are required to implement three common primitives used in js engine exploit.
read32(address)
- Read an arbitrary 32-bit value fromaddress
.write32(address, value)
- Write an arbitrary 32-bit value ataddress
.exec(address)
- Execute a function ataddress
. We don't need control of any arguments.
These few lines caught our attention
To ensure your exploit isn't trash, our server will run your exploit against 5 different builds of the same MuJS code, that we aren't going to give you.
All 5 builds are for Linux AARCH64 or X86-64 systems, with reasonable compiler options.
Hmm, so we aren't allowed to use any shared library based features or do stuff that only works in specific architectures. This might be troublesome in later stages, but let's worry about this later.
This section covers the prerequisite knowledge about MuJS internals. For readers that are already familiar with MuJS, or don't want to spend too much time on it, feel free to skip to 0x04 where we discuss the vulnerable patch applied.
Object is one of the key components in all js engines afaik, and MuJS is no exception. Apart from Object, related structures in MuJS such as Values and Properties are also shown here.
Object is defined by struct js_Object
An interesting charecteristic of MuJS Object is that it does not support inplace edits. Any modification on an object will result in creating a new instance of it.
Value is defined by struct js_Value
Property is defined by struct js_Property
Digging deeper into the source code, we can understand how those three structures are linked together.
For every independent Object, there exist one corresponding Property (I'll refer to this as Identity Property aka Id_Prop later).
Id_Props are stored in a balanced tree and serves lookup purposes. Whenever an Object is refered in js code, our jsengine will traverse the tree to try to find the corresponding Id_Prop. Finally Object can be dereferenced through Value structure embedded within Id_Prop.
** Note that although independent Object must have a matching Property, it is not necessarily true the other way round. Some stuff such as number or strings may be stored directly in Value, which eliminates the need of a dedicated Object.
A rough graphic outline is shown below (Id_Prop of Object B Prototype is omitted here for simplicity) :
Since all structures are allocated dynamically upon use, and MuJS doesn't implement any object type isolation mechanism like gigacage in JSC, we can expect to see all kinds of structures scattered across the heap during runtime.
More details including how the lookup tree is balanced, and how a sentinel node is used as guarding leaf node are also interesting, but will not be covered here. Readers are encouraged to read the source code and learn those stuff.
So we now know how Objects are linked together, the next step would be to understand how methods / attributes of Object, or more precisely, Internal Properties of Objects are accessed. From now on, I will refer to object attributes as Obj_Attr.
The way MuJS handles javascript is by constructing AST and parsed bytecodes from raw javascript code. It then yields control over to a stack-based VM, which runs with those opcode as input.
Diagram of the workflow is shown below :
As mentioned above, the only part that we care about is how Obj_Attr are retrieved in the VM. So let's focus on that part. (The actual reason we are discussing this is that it will be utilized for arbitrary function call in our exploit, but we'll just stick with code review here and leave exploit for later).
The opcode for accessing Obj_Attr with attribute name string is OP_GETPROP_S
which then calls jsR_getproperty()
jsR_getproperty()
is just a wrapper around jsR_hasproperty()
:
You can probably notice that I've ommited part of the code below, the ommited part is basically some "special properties" or "fast path" for specific objects types. Since it is unlikely for those hardcoded properties to be of much use in exploiting, we'll just ignore it and focus on the generic path.
The generic path searches for a Property with correct name in jsV_getproperty()
. If the Property is found, two possible branches follow.
ref->getter
function object if there is oneref->value
The second route is the default behaviour, while the first route provides something like a hook. For those familiar with pwn, it should be obvious that by controlling the hook, there might be a chance to gain arbitrary code execution.
Let's dissect jsV_getproperty()
first to see how an Obj_Attr is fetched.
Recall that in the previous section, we saw how Properties are chained in a tree like structure, and tree is traversed upon referencing object.lookup()
is exactly where the tree traverse happens.
What jsV_getproperty()
does is search for Property in obj->properties
tree first. If nothing is found in obj->properties
, it then fallback to brushing through the protoype chain list for inherited Properties.
Here is a program flow graph on the procedures covered above
That's about enough background knowledge, now it's time to see what the challenge is about.
Similar to most js engine pwnables, the author provided a patched version of MuJS. The patch basically consists of the following parts.
README actually mentioned that the bug is introduced in the patch of Ap_join(), a function used to join array contents into a single string.
The patch appears to be quite simple, some integer numbers are replaced by uint16_t ones. Further observation shows that those numbers are used as malloc size and string length tracker.
We immediately realized that this patch could lead to
Details about how to exploit this bug will be discussed in later sections.
The authors also replaced the default allocator with a custom one they built, which seems like a simplified version of magazine allocator used in macOS.
Custom allocator logic could be summarized below.
malloc()
, the allocator seeks for the best fit zone (the zone with smallest chunk size that satisfies request), and returns a chunk from it. If target zone ran out of free chunks, an error is raised and process aborts.free()
, the chunk is released and linked into free chunk list.realloc()
acts as a wrapper of malloc()
/free()
. A noticeable thing is that it does not shrink chunks. In all cases, it would malloc()
a new chunk of desired size, and memcpy the contents there.Zones are initialized as below :
Notice that the chunks are placed into free list from low address to high address, this observation will be useful later.
And realloc code here:
Finally, a new object type DataView is introduced.
Quoting struct js_Object
from previous section, the inner structure of DataView is as below.
DataView basically acts as a Array Buffer (a missing javascript component in MuJS), and supports operations on array type Uint8 / Uint16 / Uint32.
The Uint8 version of get/set functions is shown here
The fact that DataView supports raw memory operations make it a juicy candidate for attacking, since once we succeed in hijacking its buffer pointer, it is quite easy to do arbitrary read/write.
Noticeably, all DataView related functions have this line in it, which checks whether the Object type is correct.
Now we've got to know MuJS internal mechanisms, and reviewed the patch, time to work on pwning it.
Debugging tools makes live easier.
Good examples are describe()
from JSC or %DebugPrint
from v8.
However there doesn't seem to be any similar tools in MuJS, therefore, I decided to build one myself. Below is the snippet I used to patch MuJS.
debug()
prints the u
field of js_Value
, which is a pointer to js_Object
in most cases. This helps reduce time spent on finding objects in heap.readline()
is a native functions of MuJS. This function is useful in setting breakpoints to attach gdb. The author commented it out in his patch, so I just uncommented and reclaimed it.Our initial attack plan is quite simple, create a DataView object, and then utilize heap overflow to overwrite length
field. This plan seemed feasbile at first glance, but soon got called off due to some difficulties.
Recall that the DataView object would have a structure as below.
To overwrite object.u.length
field, we would need to first overwrite all the fields above it. This includes stuff like object.properties
and object.prototype
. While it is possible to leak heap address through some crazy heap manipulation, we can't have any NULL bytes in the payload used for overwriting stuff. This means the two overwritten pointers are doomed to point to some unaccessible memory. Once again, recall that Obj_Attr are accessed through the lookup()
function, which relies on those two pointers. Thus, if we clobber it up, it is no longer possible to access inherited methods that allow DataView to perform read / write.
At this point, we concluded that overwriting length directly is impossible. A new plan is needed.
Now we know editing a DataView object to fit our need is hard. How about trying to do some type confusion on other objects? Searching high and low, we concluded that two object types, userdata & RegExp, are good candidates for type confusion.
Let's look at the union
field of DataView, userdata and RegExp to see why they are chosen as candidates.
It is easy to notice both RegExp and userdata have pointers for first two fields. This means that if we manage to get a type confusion, length
field of the DataView object would be relatively large, and most likely allow OOB access.
The following question is, out of the two candidates, which should we use?
Also, most of us should have heard of RegExp before, but what exactly is userdata?
Reading the MuJS documents, I noticed this
Objects with the userdata class are provided to allow arbitrary C data to be attached to Javascript objects. A userdata object has a pointer to a block of raw memory, which is managed by the host. Userdata values cannot be created or modified in Javascript, only through the C API. This guarantees the integrity of data owned by the host program.
So userdata is actually a C API for developers, and we can't access it in js exploit scripts. This leaves RegExp as our only choice.
Next on is the type confusion part, readers might have noticed that there is a type
field at the very start of Object structure, this is exactly how MuJS differentiates between object types.
type
is actually just a number from enum js_Class
, which we could easily change given that it is the first field of Object.
Ok, suppose we have a RegExp object with type overwritten to DataView, but how can we access the DataView methods?
For normal DataView objects, the methods are inherited directly from DataView prototype, however, the type confused object we have at hand inherited it's share from RegExp prototype. We must find a way to sneak the DataView methods into our originally-RexExp-now-DataView object.
It turns out javascript has this cool feature where we can refer to the prototype function with something like this DataView.prototype.setUint32
So with the following js code below, we can actually add DataView methods to any arbitrary object under its Property tree.
Recall that I mentioned earlier, when accessing Object methods, lookup()
will try searching in object->property
. This means that the code above have actually succeeded in transferring the DataView method to our originally-RegExp object.
We've now got a solid plan, let's carry on to actually writing some exploit.
Up To this point, we have been treating heap overflow as a given. Now we need to get our hands dirty and craft the actual payload.
As mentioned earlier, the bug came from patching some int to uint16, now let's zoom in on how the Ap_join()
function actually works.
n
(a uint16 num) to 1, pre-allocating space for the terminating NULL byte for joined string.strlen()
and add it to n
, then realloc()
chunk based n
and concat the strings. This is where stuff starts getting interesting.Below is the code where the vulnerability resides. I've removed/replace some irrelevant stuff with pseudo code to make it easier to read.
It is obvious that n += strlen(r)
is the line where overflow happens. And when this happens, js_realloc()
will return a new chunk with size satisfying n
, but probably not large enough to accomodate r
. Thus resulting in heap BOF.
The first step we made here is trying to craft an array with many short strings that sum up to large length and trigger bug.
This immediately led to heap exhaust error. Fair enough.
So how can we do this in a more elegant way?repeat()
function is not present in MuJS, but we can easily create our own simplified version of repeat with the following. This code essentially produces an array filled with STRING with length of power of 2.
And now we can craft an array to trigger overflow.
However simply overflowing is not enough, since our plan is to overwrite obj.type
of a RegExp object, fine grained control over the joined string is necessary.
So let's examine how realloc()
works when overflow happens. The relevant code is shown below, once again, unrelated code is removed, and some pseudo code is used.
On overflow, allocation_size
is most likely larger than size
, so we will be taking the min = size
path here. And since the original string is longer then min
, memcpy
content will not contain any NULL bytes.
Summing that up, we can reach the conclusion that my_realloc()
basically creates a chunk, fills it up to size
(n
in Ap_join()
), and hands it back for strcat()
.
** The catch here is that since my_realloc()
does not cleanup chunk, it is possible that memcopied content is immediately followed by non NULL bytes. We will deal with this problem in the heap grooming section. But let's assume that memcopied data is followed by a NULL byte now.
Back to crafting overflow. We already know that the size of js_Object = 0x68, so we would need a chunk from the 0x80 zone to be able to perform intended overwrite. A favorable setup would be having a large string as first argument of array, and a smaller one to perform strcat overflow. After some calculation, I reached the array
This script overflows 0x81 bytes of data, and the last byte will be 0x11 (type of DataView). The reason that we need 0x81 bytes overflow instead of 1 byte overflow is that RegExp mallocs a chunk to store its source string, and we would like the string to be in same zone as js_Object (reason discussed later in heap grooming section).
So we get the overflower to realloc a chunk right above victim.u.source
string buffer, the 0x11 will be written into victim.type
, achieving our goal of changing victim
to DataView type.
We have achieved heap overflow and type hijack, but only under a lot of assumption of where chunks are allocated. In this section, we are going to discuss the reasons behind certain heap layout as well as how to achieve it.
From my experience in 0CTF, MuJS heap layout is "super dynamic", meaning it changes a lot easily upon the slightest change of js script. This is partially due to 0CTF using glibc default allocator, and a lot of heap consolidating/splitting is happening throughout the process. Another reason is MuJS tends to allocate lots of object during setup and javascript parsing phase. To defeat this, we would first need to rethink our target and have a clear mental image of the heap layout needed.
Recall our ultimate goal is to gain AAR/AAW, and we would like to do so with Object type confusion. Now let's take a moment to think about it.
The type confusion mentioned earlier does grant OOB access, but is that truly arbitrary address access? Sadly… no.
The type confusion only provides access to memory region between *prog ~ *prog+(unsigned int)(*source)
. So how do we transform this large OOB into a true arbitrary address access?
The answer is quite simple, some slave/master construction. If we have another slave DataView object within access of the type confused master, it is possible to utilize master to modify slave data pointer, and then perform AAR/AAW with slave.
Let's try illustrating an ideal setup.
In this setup, slave Object will definitely be within master data access range. This is the very reason I would like to have master data (Previously RegExp source) allocated in the same zone as js_Object.
Finally, we need to figure out how to get this setup. The main obstacle of getting contiguous chunk allocated is that there are those freed chunks scattered all around heap, and we might accidently acquire one while doing malloc(). Thus ,it is necessary to eliminate those non-contiguous freed chunks.
The following script sprays a lot of js_Object sized chunks onto heap to take the non-contiguous chunks out of heap, and provides a non-fragmented (and pretty likely, completely fresh, unused before) heap for further operation.
Another benefit of this spraying snippet is that it handles the realloc memcpy non NULL terminated problem mentioned earlier, since Ap_join string will most likely land on a some new, unused area.
Getting our hands on a non-fragmented heap is neat, but this isn't quite enough to guarantee everything will be in the right place. The order of creating objects also matter.
I mentioned earlier that the custom heap free linked list operates in a LIFO manner, and chunks within arena are placed into freed list from lower address to higher address. This means objects at higher address will be created before the ones in lower address. Summing all this up, we can get the javascript code below
At this point, creating the AAR/AAW primitive is a piece of cake, just don't forget to assign DataView.prototype methods to master before starting.
set master methods
javascript realization of primitives
We've reached to the last part of exploit, the arbitrary execute primitive. Now with AAR/AAW at hand, how could we possibly tackle this? The answer is with getter functions.
Before going into the details, keep in mind that this is far from the simplest way to achieve arbitrary code execution. I only chose this exploit path to exercise my skills over manipulating jsengine objects, and to keep the exploit strictly within master/slave objects.
Now let's se how it is actually done. Recall that if a getter is present in some object attribute (a js_Property struct), it will be called upon retrieving the object. However, after reviewing the js_Property struct one more time, we can see getter
is a js_Object pointer rather than a function pointer.
This got me wondering about how js_call()
actually works.
So i looked up the source code and figured out that js_call()
handles 4 kinds of objects, and out of those 4, JS_CCFUNCTION is the one that handles raw c function pointers.
That should be enough information, let's draw a graph of the objects used on getter call.
The graph should have made it pretty clear that to call a fake getter, we would need to craft two structures, one fake Property and one fake JS_CCFUNCTION Object. These two objects can be crafted in master.data region to satisfy the "strictly contained" rule I mentioned earlier.
Here is the code for execute primitive (u64/write64 are helper functions crafted from pure js code or previous primitives)
We've finally got a locally working exploit after some 15 hours, and decided to test it on remote server, just to realize stdout isn't echoing back properly (on the other hand, stderr had no similar issues). After contacting the admins and waiting for them to figrure out what was wrong, we took our time trying all kinds of stuff to get the flag through stderr. Several attemps worked, such as usingthrow(flag)
/eval(flag)
instead of print(flag)
.
A bit later, challenge author @samsonites told me a possible cause might be stdout buffering. His guess was since our exploit script crashes at some point, stdout never gets flushed properly. After learning about this, I decided to review my exploit once again and try to avoid the crash.
It turns out that the root cause of crash is quite simple. We changed slave.data while performing AAR/AAW, but never bothered to change it back. So in the cleanup procedure, process tries to free stuff pointed to by slave.data and fails miserably. The solution is thus to store original slave.data pointer value, and restore it before exiting.
Source Scripts :
Complete Exploit Script
The setting of this challenge is pretty interesting, as it closely resembles a real life situation where we might have access to source code of some vulnerable project, but not the exact build running on attack target.
This lead me to think about how it could be possible to attack such a grey-box model. Using MuJS as experiment material, I realized it is actually not that hard. This section will be about how to go from grey-box to white-box given the primitives, and demo how I got shell on one of the target binaries.
First examine the main problem here is we don't have access to binary running on remote server. But also keep in mind that we've already got AAR primitive. So if we managed to get our hands on any pointers into the binary, it is actually possible to exfiltrate the entire stuff by doing AAR with offsets to the pointer. For most jsengines, pointers into binaries are inevitable since there will be function pointers within objects.
Let's get working on a function for this.
After retrieving the binary, we can then retrieve shared library with pointers into it (these should be extractable from binary if shared libraries exist).
The next step after Data Exfiltration would be spawing a shell. The challenge here is all remote processes are ran in chroot, thus I couldn't access /bin/sh or any other executables on the server.
To demonstrate remote get shell under such limitation, I implemented a minimal shell with 3 commands (ls, cat, exit) in assembly. Below is the code and screenshot of getting shell on first remote binary. (Utility codes used to parse dumped binary/run minimal shell can be found in my github repo).
Source Scripts :
>> Complete Get Shell Script
>> Minimal Reverse Shell
>> Memory Dump Parser
The writeup ended up far longer than I expected, but this much is required to cover all the stuff I learned here. And again, since I am new to jsengine pwning, it is possible for me to make mistakes. If you happen to spot anything wrong in this writeup, please let me know.
Overall, MuJS was a cool challenge. Thanks to Sigpwny for holding such an amazing CTF, samsonites for making this challenge, and Rob & pepsipu for helping me through this challenge.