---
title: "SMP WG, April 11, 2023"
tags: unikraft, smp, synchronization
datetime: 2023-04-11T10:00:00+02:00
location: Online, Discord (https://bit.ly/UnikraftDiscord), the `#monkey-business` voice channel
teams:
- smp
participants:
- RăzvanD
- Marc
- Michalis
- Sairaj
- Simon
- RăzvanV
- Andra
---
## :dart: Agenda
- Status update
- Next steps
## :closed_book: Discussions
RV: I had some medical issues.
I started to look on Friday.
RV: The task was to include buddy allocator with the VM API.
MR: The most simple step that you can do is to use the nopaging support.
Then synchronize the buddy allocator.
MR: Then decide where this allocator is getting memory from.
MR: Starting with nopaging is easier.
Starting with paging you could get on top of the memory allocator and this is more complex.
RV: There is already a PR on synchronizing ukallocbuddy, so this is done.
MR: Maybe this will synchronize paging operations.
First it would work if paging operations use directly `ukvmem`.
MR: Even if `ukvmem` is there, you can make paging API calls directly.
Of course, then you need to synchronize every level.
If we use `ukvmem`, you don't need to sublayers, such as the frame allocator.
Since the top-layer is synchronized, everything below that will be synchronized.
SK: You said you have a PR for the buddy allocator.
Was there something open?
RV: Andra, Simon and Sairaj are assigned as reviewers.
I need to look into the buddy allocator synchronization PR.
RV: I used the SMP API to test it.
There is a test inside the PR description.
MR: In `ukvmem` you still need to synchornized for internal data structures.
MR: Don't just think about correctness, but also about performance.
Maybe use reader-writer lock.
Play with different approaches to see if there are performance issues.
SK: I would look in the DPDK implementation for synchronization primitives around SMP.
MR: This is one of the additional topics: lock-free data structures.
MR: For now synchronize with standard synchronization primitives.
But for the future, we would look into lock-free structures.
MR: We have new synchronization primitives.
It's more about lock-free data structures.
RD/MR: We would have lock-free data structures as a next step.
Let's first focus on the synchronizing libraries (such as memory allocation).
MR: If, in case of mimalloc, there is a `mmap()` call, leave it like that.
You want to keep the code as unmodified as possible, so it is easier to update.
RV: We are now patching mimalloc to call the region allocator.
MR: As long as you don't have paging enabled, this is OK.
MR: How would the ukbuddyalloc will work with paging enabled?
Two approaches:
1) Create virtual memory area large enough to include all physical memory. Allocate physical memory by just accessing it and letting demand-paging do its magic. You can release physical memory via `madvise()`. Limitation: Out-of-memory will be a page fault that cannot be resolved.
2) Explicitly manage different areas in the buddy allocator and allocate/release these areas with `ukvmem`. Much like `mimalloc` works. Very explicit out-of-memory behavior during `malloc` if buddy allocator cannot get additional memory from `ukvmem`. Disadvantage: buddy allocator needs to manage multiple regions (there are linear searches in there).
MR: Not sure what the best approach is.
If multiple regions, there is overhead in managing multiple regions.
SK: You're talking about the buddy allocator being paging-aware?
MR: Yes, you could use `madvise()` to release memory for the buddy allocator.
SK: Something to keep track of is what regions are mapped and what are not mapped.
MR: In buddy allocator, you don't need to.
It's totally transparent with the amount of memory you have in the system.
If you do it explictly, you can ask to prepopulate entries to have the physical memory.
And handle in a more elegant manner if you run out of heap.
MR: Maybe it could be that it's more costly, depending on how the buddy allocator works behind the scenes.
SK: Personally, I would prefer a more explicitly allocate / deallocate.
I wouldn't use the `posix-mmap` API, but I would use the underlying `ukvmem` calls.
MR:
Currently we do it this way:
+ When we start with `ukvmem`, we create a large enough virtual memory area so that we can use all physical memory
+ The buddy allocator splits the heap into chunks of $2^n$ size. Whenever we need to allocate a certain size and there is no chunk of that size, we take a larger block and split it recusively until we have a new chunk of the desired size
+ However, physical memory is not explicitly assigned. Only on the first access to a page via demand-paging. That means, allocation of physical memory is totally transparent to the buddy allocator. Problem 1: out-of-memory behavior = crash on page fault. Problem 2: physical memory is never released
Better way to do it:
+ At start, still allocate a VMA that is large enough to include all physical memory (already done by `ukboot`)
+ Within the buddy allocator, we add to the block metadata a flag which indicates if the block has physical memory assigned. The flag must be **inherited** if the block is split.
+ For a configurable order $x$ (i.e., chunk sizes $>= 2^x$) we say that **no** physical memory should be assigned, except if a block of that size or larger has been directly requested. This is to release physical memory when possible while limiting overhead for assigning and freeing physical memory compared to doing it also for sizes $<2^x$.
+ At the beginning no physical memory is assigned. When we split a block of size $2^x$, we explicitly call `uk_vma_advise()` with `UK_VMA_ADV_WILLNEED` to one of the buddies (it will have size $2^{x-1}$) to assign physical memory to the region. The call will fail if not enough physical memory can be allocated. Since already some physical memory might have been allocated with the attempted advise, a call to `uk_vma_advise()` with `UK_VMA_ADV_DONTNEED` should follow an error to release any partially allocated physical memory again. On success, the flag for the block can be set, indicating that this block has physical memory assigned. The existing flow of allocation within the buddy allocator continues.
+ If the call to `uk_vma_advise()` fails since there is not enough physical memory, we can still continue to split the block to the desired block size. However, the flag will indicate that **no** physical memory is allocated yet.
+ Whenever the buddy allocator gives out a block of memory it checks the new flag and performs a `uk_vma_advise()` with `UK_VMA_ADV_WILLNEED` to make sure physical memory is allocated if the flag is not set. This will implicitly handle proper physical memory allocation for direct heap allocations with sizes $>=2^x$. And it will also enable a second attempt of physical memory allocation with a smaller size when we could not allocate a block of size $2^{x-1}$. In case this `uk_vma_advise()` fails, the heap allocation fails (again do a `UK_VMA_ADV_DONTNEED` on the range to make sure no physical memory is partially allocated to it).
+ When a heap allocation is freed and two buddies can be merged, we check if the order of the new block is $>=x$. In that case we release the physical memory with `UK_VMA_ADV_DONTNEED`. If only one of the buddies has physical memory assigned, release it no matter the order of the target block. The new block would have the flag **not** set.
+ HINT: Whenever physical memory is released using `UK_VMA_ADV_DONTNEED`, make sure you do not include the metadata in the freed range
SjK: The synchronization PR (476) done.
SjK: I did a report on the university.
MR: There are multiple layers that need synchronization.
One are is the frame allocator, because it will be used by multiple cores.
MR: There is also the idea of partitioning memory and leave it out of this hot path.
And you can ask at some point for extra memory: steal memory.
And synchronize that.
MR: The ideas is that, irrespective if you do partitioning or not, you require the `ukvmem` synchronization.
## :wrench: TODOs and Decisions
Synchronization PR: https://github.com/unikraft/unikraft/pull/476
MR: Review #476 for release 0.13
RV: Keep an eye for the ukalloc PR: https://github.com/unikraft/unikraft/pull/801
RV: Synchronize `ukvmem`.
SjK: Research on frame allocation synchronization on different OSes and research papers.