[Debug] IEEE 802.1X EAPoL & EAPoR authentication results in kernel oops
===
###### tags: `debug` `IEEE 802.1X` `EAPoL` `EAPoR` `kernel oops` `NULL pointer dereference` `memory overflow`
[ToC]
## Environment
First, we have a dot 1x authentication server, a switch as authenticator and a client as supplicant. The switch connected to the server while the client connecting to the switch as shown below.
<div style="text-align: center"><img src="https://i.imgur.com/4BQw5jm.png"/></div>
Here, the authentication server is an embedded device "raspberry pi" with free-radius. And the authenticator is a L2 management switch supporting IEEE 802.1X. The supplicant is a PC or a NB.
## Science Popularization
Before getting start, let's see what are EAPoL and EAPoR.
- EAPoL is EAP over LAN.
The communication between Supplicant and Authenticator is EAPoL.
- EAPoR is EAP over Radius.
The communication between Authenticator and Server is EAPoL.
- EAP is Extensible Authentication Protocol.
In brief, the sequence diagram is shown roughly.
```sequence
#Note left of SNMP Manager: Set
Supplicant->Authenticator: EAPoL
Authenticator-->Supplicant:EAPoL
#Note right of SNMP Agent: UDP:161
Authenticator->Server: EAPoR
Server-->Authenticator:EAPoR
Supplicant->Authenticator: EAPoL
Authenticator-->Supplicant:EAPoL
```
Below diagram shows the handshake of EAPoL and EAPoR.
```sequence
Note over Supplicant,Authenticator: EAPoL
Supplicant->Authenticator: (1)EAPoL-Start
Authenticator-->Supplicant:(2)EAP-Request/Identity
Supplicant->Authenticator: (3)EAP-Response/Identity
Note over Authenticator,Server: EAPoR
Authenticator->Server:(4)RADIUS Access-Request
Server-->Authenticator:(5)RADIUS Access-Challenge
Authenticator-->Supplicant:(6)EAP-Chanllenge-Request
Supplicant->Authenticator: (7)EAP-Challenge-Response
Authenticator->Server:(8)RADIUS Access-Request
Server-->Authenticator:(9)RADIUS Access-Accept
Authenticator-->Supplicant:(10)EAP-Success
Note over Supplicant,Authenticator: (11) Dynamic encryption keys ceated\n4-Way Handshake
Note over Supplicant,Authenticator: (12) Controled port opens
```
## Symptoms
However, when the user in the client logined through EAPoL (Extensible Authentication Protocol), the switch occurred kernel oops with below logs.
This failure is not happened every time. It seems to occur randomly at first. After trying again and again, I found a regular pattern of username and password will get this case.
:::spoiler
```
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = cb9cc000
[00000000] *pgd=6c91e831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] SMP ARM
Modules linked in: usb_storage linux_bcm_net(PO) linux_uk_proxy(PO) linux_bcm_core(PO) linux_kernel_bde(PO)
CPU: 0 Tainted: P O (3.6.5-Broadcom Linux #1)
PC is at put_page+0xc/0x50
LR is at skb_release_data+0x94/0xf0
pc : [<c007a3c0>] lr : [<c0268208>] psr: 20000013
sp : cb919d20 ip : cb919d30 fp : cb919d2c
r10: 0000007b r9 : ffffffea r8 : 0000007b
r7 : cb919ee4 r6 : cb85bc00 r5 : cb8d3780 r4 : 00000001
r3 : cb8d8ac0 r2 : 00000004 r1 : 00000000 r0 : 00000000
Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 10c53c7d Table: 6c9cc04a DAC: 00000015
Process 1x_srv (pid: 691, stack limit = 0xcb918270)
Stack: (0xcb919d20 to 0xcb91a000)
9d20: cb919d44 cb919d30 c0268208 c007a3c0 cb8d3780 cb8d3780 cb919d5c cb919d48
9d40: c0267e20 c0268180 00000001 cb85bc00 cb919d6c cb919d60 c0267f60 c0267e10
9d60: cb919d84 cb919d70 c026ba88 c0267f18 00000000 00000060 cb919ddc cb919d88
9d80: c0322688 c026ba7c cb919dbc cb919d98 c0009a5c 00000640 0000007b cb919ee4
9da0: 00000000 00008e88 00000000 cb85bc00 00000000 ce47ed80 00000060 00000640
9dc0: cb919ee4 ce8f18c0 cb918000 020045f0 cb919ecc cb919de0 c025e9a4 c03221fc
9de0: 00000060 cb919df0 cb919e14 cb919df8 00000060 00000640 ce47ed80 00000001
9e00: 00000000 cb919ee4 c07bb300 c07bb348 c07bb300 ce8f18c0 cb919e4c cb919e28
9e20: c004b45c c004ac00 00000000 00000000 00000000 00000001 ffffffff 00000000
9e40: 00000000 00000000 00000000 00000000 ce8f18c0 00000000 00000000 00000000
9e60: 00000000 00000000 cb919de8 00000000 00000000 00000000 00000000 00000000
9e80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9ea0: 00000000 00000000 cb919ecc 00000640 ce47ed80 00000060 cb919f00 be988c5c
9ec0: cb919fa4 cb919ed0 c0260e30 c025e91c 0000c350 fffffff7 00000000 0200466b
9ee0: 000005c5 cb919f00 00000012 cb919edc 00000001 00000000 00000000 cb919f08
9f00: 8e880011 0000001e 06020001 684ce000 cb912800 cb919f88 05f6a450 00000000
9f20: cb919f21 00000000 00000000 00000000 2fd5672a 00000011 2fd4a3da 00000011
9f40: c0040b44 c07ba6d0 00000000 00000000 00000000 be988b78 cb919fa4 00000000
9f60: be988c80 001699dc 000000a2 c0009304 cb918000 00000000 cb919fa4 cb919f88
9f80: be988c5c be988c58 00000020 00000124 c0009304 00000000 00000000 cb919fa8
9fa0: c00091a0 c0260dac be988c5c be988c58 00000007 020045f0 00000640 00000020
9fc0: be988c5c be988c58 00000020 00000124 020045f0 00000007 00000007 01fda2f8
9fe0: 00000000 be988c30 00010488 b6df5464 60000010 00000007 00000000 00000000
Backtrace:
[<c007a3b4>] (put_page+0x0/0x50) from [<c0268208>] (skb_release_data+0x94/0xf0)
[<c0268174>] (skb_release_data+0x0/0xf0) from [<c0267e20>] (__kfree_skb+0x1c/0xd4)
r5:cb8d3780 r4:cb8d3780
[<c0267e04>] (__kfree_skb+0x0/0xd4) from [<c0267f60>] (consume_skb+0x54/0x58)
r4:cb85bc00 r3:00000001
[<c0267f0c>] (consume_skb+0x0/0x58) from [<c026ba88>] (skb_free_datagram+0x18/0x40)
[<c026ba70>] (skb_free_datagram+0x0/0x40) from [<c0322688>] (packet_recvmsg+0x498/0x4b0)
r4:00000060 r3:00000000
[<c03221f0>] (packet_recvmsg+0x0/0x4b0) from [<c025e9a4>] (sock_recvmsg+0x94/0xa8)
[<c025e910>] (sock_recvmsg+0x0/0xa8) from [<c0260e30>] (sys_recvfrom+0x90/0xe0)
r8:be988c5c r7:cb919f00 r6:00000060 r5:ce47ed80 r4:00000640
[<c0260da0>] (sys_recvfrom+0x0/0xe0) from [<c00091a0>] (ret_fast_syscall+0x0/0x30)
Code: c04ddcfc e1a0c00d e92dd800 e24cb004 (e5902000)
---[ end trace 5d55fb4d3fa0bbdb ]---
Kernel panic - not syncing: Fatal exception
Watchdog Timeout
```
:::
## Root cause analysis
### Prerequisite
In this case, we should clarify where this bug is.
:::info
1. In EAPoL or EAPoR?
2. In the kernel space or user space?
:::
### Hypothesis
According to the backtrace, we think this failure caused by **accessing a null pointer**. From top and down, we hypothesize boldly that the bug hides in the code of EAPoR (Extensible Authentication Protocol Over Radius) or in kernel.
**[Hypothesis 1]** This pattern of username is limited or considered by our code.
After testing again and again, we found that it depended on **the length of username**. If the length of username is 9 bytes, it always failed.
**[Hypothesis 2]** A memory buffer to store the username is not big enough.
After tracing the code of EAPoL and EAPoR, the buffer size is large enough. And it will pass even if the length of username is longer than 9 bytes.
**[Hypothesis 3]** The received or sent packets during ~~EAPoL~~ EAPoR handshaking result in this catastrophe.
We made an experiment to disconnect the link between server and switch before submitting the password and username, and then submit it. The issue cannot reproduce in the end.
:::success
This proved that this issue occured during the handshake between switch and server, i.e., **EAPoR**.
:::
**[Hypothesis 4]** Something went wrong in the skb buffer when switch sent EAPoR packets to server.
After tracing the code of EAPoR and comparing with debugging messages from COM port, we make sure that kernel oops occurred after packets were sent from switch to a server.
==So, the bug is in the ++RX side of switch.++==
**[Hypothesis 5]** Something went wrong in the skb buffer when switch received EAPoR packets from a server.
After setuping a sniffer to capture the packets between switch and server, we got the **"key"** packet resulting in kernel oops. It implicts that kernel oops after switch received this packet.
We rebooted the switch and regenerated this ***bomb packet*** by colasoft. We proved that our hypothesis is correct. Everytime, the switch received this bomb packet, it crashed!
:::success
It turned out the bug was **in kernel space**. Therefore, we have to check our ++device driver++ or linux kernel.
:::
**[Hypothesis 6]** The device driver did not do the right thing when handling received packets in the bottom Half's ISR.
:::danger
The length of *bomb packet* is ***118 bytes*** with ***ethertype = 0x888E***.
:::
We tried to generate the same length of packets with different ethertype and everything went well.
:::success
After tracing the code in driver, we found that this bug was in ISR when we allocated the skb for IEEE 802.1X frames.
We did not allocate enough memory to receive these packets. In other word, it is a **++memory overflow++** issue.
:::
## Brainstorm
The above description is a skill of top-down debugging. Here, based on some aforementioned clues I try to show the other one, button-up debugging.
I programed a testing tool to send different length of IEEE 802.1X packets to the switch. And it will fail not only in the length of 118 bytes but others.
I thought out of my curiosity. I started to dig the process out of skb buffer allocation and de-allocation in kernel.
## SKB Free
### Sequence
__kfree_skb -> skb_release_data -> skb_frag_unref -> __skb_frag_unref -> put_page
### Explaination
In this function **skb_release_data**, we failed in the for loop as shown below.
``` C
static void skb_release_data(struct sk_buff *skb)
{
...
if (skb_shinfo(skb)->nr_frags) {
int i;
for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
skb_frag_unref(skb, i);
}
...
}
```
This piece of code is used to free memory of paged data, but we do not use any paged data in our case.
>Things start to get a little bit more complicated once paged data begins to be used. For the most part the ability to use [page, offset, len] tuples for SKB data came about so that file system file contents could be directly sent over a socket. But, as it turns out, it is sometimes beneficial to use this for nomal buffering of process sendmsg() data.
>
>It must be understood that once paged data starts to be used on an SKB, this puts a specific restriction on all future SKB data area operations. In particular, it is no longer possible to do skb_put() operations.
>
>We will now mention that there are actually two length variables assosciated with an SKB, len and data_len. The latter only comes into play when there is paged data in the SKB. skb->data_len tells how many bytes of paged data there are in the SKB. From this we can derive a few more things:
>
>The existence of paged data in an SKB is indicated by skb->data_len being non-zero. This is codified in the helper routine skb_is_nonlinear() so that it the function you should use to test this.
The amount of non-paged data at skb->data can be calculated as skb->len - skb->data_len. Again, there is a helper routine already defined for this called skb_headlen() so please use that.
The main abstraction is that, when there is paged data, the packet begins at skb->data for skb_headlen(skb) bytes, then continues on into the paged data area for skb->data_len bytes. That is why it is illogical to try and do an skb_put(skb) when there is paged data. You have to add data onto the end of the paged data area instead.
Each chunk of paged data in an SKB is described by the following structure:
>
>struct skb_frag_struct {
struct page *page;
__u16 page_offset;
__u16 size;
};
There is a pointer to the page (which you must hold a proper reference to), the offset within the page where this chunk of paged data starts, and how many bytes are there.
The paged frags are organized into an array in the shared SKB area, defined by this structure:
>
>#define MAX_SKB_FRAGS (65536/PAGE_SIZE + 2)
>
> struct skb_shared_info {
atomic_t dataref;
unsigned int nr_frags;
unsigned short tso_size;
unsigned short tso_segs;
struct sk_buff *frag_list;
skb_frag_t frags[MAX_SKB_FRAGS];
};
The nr_frags member states how many frags there are active in the frags[] array. The tso_size and tso_segs is used to convey information to the device driver for TCP segmentation offload. The frag_list is used to maintain a chain of SKBs organized for fragmentation purposes, it is _not_ used for maintaining paged data. And finally the frags[] holds the frag descriptors themselves.
After printing this flag in kernel code, we found that kernel oops occured when the flag was set to a non-zero value which was set by ourselves due to memory overflow. As a consequence, a <font color="red">**virtual address 00000000 is accessed**</font>.
:::info
And now, we are going to further dig out why it only failed in some cases .i.e,
When will this flag be miswrite by us?
:::
https://elixir.bootlin.com/linux/v3.6.5/source/net/core/skbuff.c#L581
``` C=
//File: net/core/skbuff.c
/**
* __kfree_skb - private function
* @skb: buffer
*
* Free an sk_buff. Release anything attached to the buffer.
* Clean the state. This is an internal helper function. Users should
* always call kfree_skb
*/
void __kfree_skb(struct sk_buff *skb)
{
skb_release_all(skb);
kfree_skbmem(skb);
}
/* Free everything but the sk_buff shell. */
static void skb_release_all(struct sk_buff *skb)
{
skb_release_head_state(skb);
skb_release_data(skb);
}
static void skb_release_data(struct sk_buff *skb)
{
if (!skb->cloned ||
!atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,
&skb_shinfo(skb)->dataref)) {
if (skb_shinfo(skb)->nr_frags) {
int i;
for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
skb_frag_unref(skb, i);
}
/*
* If skb buf is from userspace, we need to notify the caller
* the lower device DMA has done;
*/
if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
struct ubuf_info *uarg;
uarg = skb_shinfo(skb)->destructor_arg;
if (uarg->callback)
uarg->callback(uarg);
}
if (skb_has_frag_list(skb))
skb_drop_fraglist(skb);
skb_free_head(skb);
}
}
/**
* skb_frag_unref - release a reference on a paged fragment of an skb.
* @skb: the buffer
* @f: the fragment offset
*
* Releases a reference on the @f'th paged fragment of @skb.
*/
static inline void skb_frag_unref(struct sk_buff *skb, int f)
{
__skb_frag_unref(&skb_shinfo(skb)->frags[f]);
}
/**
* __skb_frag_unref - release a reference on a paged fragment.
* @frag: the paged fragment
*
* Releases a reference on the paged fragment @frag.
*/
static inline void __skb_frag_unref(skb_frag_t *frag)
{
put_page(skb_frag_page(frag));
}
```
## SKB Allocation
### Sequence
dev_alloc_skb -> netdev_alloc_skb -> __netdev_alloc_skb
### Explaination
In the function **__netdev_alloc_skb**, we can allocate skb with size **fragsz** including aligned data buffer and aligned shared info.
In this case, we allocate a skb size with X (legal from skb[0] to skb[X-1]), but we write data to skb[X]. It is probably that we miswirte to the first byte of skb_shared_info struct, i.e.,nr_frags. And this mistake results in this catastrophe.
``` C
struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
unsigned int length, gfp_t gfp_mask)
{
...
unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD) +
SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
...
}
```
https://elixir.bootlin.com/linux/v3.6.5/source/include/linux/skbuff.h#L388

``` C=
//File: include/linux/skbuff.h
//SMP_CACHE_BYTES = 32 or 64
#define SKB_DATA_ALIGN(X) (((X) + (SMP_CACHE_BYTES - 1)) & \
~(SMP_CACHE_BYTES - 1))
#define NET_SKB_PAD max(32, L1_CACHE_BYTES)
struct skb_shared_info {
unsigned char nr_frags;
__u8 tx_flags;
unsigned short gso_size;
/* Warning: this field is not always filled in (UFO)! */
unsigned short gso_segs;
unsigned short gso_type;
struct sk_buff *frag_list;
struct skb_shared_hwtstamps hwtstamps;
__be32 ip6_frag_id;
/*
* Warning : all fields before dataref are cleared in __alloc_skb()
*/
atomic_t dataref;
/* Intermediate layers must ensure that destructor_arg
* remains valid until skb destructor */
void * destructor_arg;
/* must be last field, see pskb_expand_head() */
skb_frag_t frags[MAX_SKB_FRAGS];
};
/* legacy helper around netdev_alloc_skb() */
static inline struct sk_buff *dev_alloc_skb(unsigned int length)
{
return netdev_alloc_skb(NULL, length);
}
/**
* netdev_alloc_skb - allocate an skbuff for rx on a specific device
* @dev: network device to receive on
* @length: length to allocate
*
* Allocate a new &sk_buff and assign it a usage count of one. The
* buffer has unspecified headroom built in. Users should allocate
* the headroom they think they need without accounting for the
* built in space. The built in space is used for optimisations.
*
* %NULL is returned if there is no free memory. Although this function
* allocates memory it can be called from an interrupt.
*/
static inline struct sk_buff *netdev_alloc_skb(struct net_device *dev,
unsigned int length)
{
return __netdev_alloc_skb(dev, length, GFP_ATOMIC);
}
```
``` C=
static void myRxISR(void *pkt, int pkt_size, ...)
{
struct sk_buff *skb;
skb = dev_alloc_skb(pkt_size);
}
```
https://elixir.bootlin.com/linux/v3.6.5/source/net/core/skbuff.c#L581
``` C=
/**
* __netdev_alloc_skb - allocate an skbuff for rx on a specific device
* @dev: network device to receive on
* @length: length to allocate
* @gfp_mask: get_free_pages mask, passed to alloc_skb
*
* Allocate a new &sk_buff and assign it a usage count of one. The
* buffer has unspecified headroom built in. Users should allocate
* the headroom they think they need without accounting for the
* built in space. The built in space is used for optimisations.
*
* %NULL is returned if there is no free memory.
*/
struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
unsigned int length, gfp_t gfp_mask)
{
struct sk_buff *skb = NULL;
unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD) +
SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
if (fragsz <= PAGE_SIZE && !(gfp_mask & (__GFP_WAIT | GFP_DMA))) {
void *data;
if (sk_memalloc_socks())
gfp_mask |= __GFP_MEMALLOC;
data = __netdev_alloc_frag(fragsz, gfp_mask);
if (likely(data)) {
skb = build_skb(data, fragsz);
if (unlikely(!skb))
put_page(virt_to_head_page(data));
}
} else {
skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask,
SKB_ALLOC_RX, NUMA_NO_NODE);
}
if (likely(skb)) {
skb_reserve(skb, NET_SKB_PAD);
skb->dev = dev;
}
return skb;
}
EXPORT_SYMBOL(__netdev_alloc_skb);
/**
* __alloc_skb - allocate a network buffer
* @size: size to allocate
* @gfp_mask: allocation mask
* @flags: If SKB_ALLOC_FCLONE is set, allocate from fclone cache
* instead of head cache and allocate a cloned (child) skb.
* If SKB_ALLOC_RX is set, __GFP_MEMALLOC will be used for
* allocations in case the data is required for writeback
* @node: numa node to allocate memory on
*
* Allocate a new &sk_buff. The returned buffer has no headroom and a
* tail room of at least size bytes. The object has a reference count
* of one. The return is the buffer. On a failure the return is %NULL.
*
* Buffers may only be allocated from interrupts using a @gfp_mask of
* %GFP_ATOMIC.
*/
struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
int flags, int node)
{
struct kmem_cache *cache;
struct skb_shared_info *shinfo;
struct sk_buff *skb;
u8 *data;
bool pfmemalloc;
cache = (flags & SKB_ALLOC_FCLONE)
? skbuff_fclone_cache : skbuff_head_cache;
if (sk_memalloc_socks() && (flags & SKB_ALLOC_RX))
gfp_mask |= __GFP_MEMALLOC;
/* Get the HEAD */
skb = kmem_cache_alloc_node(cache, gfp_mask & ~__GFP_DMA, node);
if (!skb)
goto out;
prefetchw(skb);
/* We do our best to align skb_shared_info on a separate cache
* line. It usually works because kmalloc(X > SMP_CACHE_BYTES) gives
* aligned memory blocks, unless SLUB/SLAB debug is enabled.
* Both skb->head and skb_shared_info are cache line aligned.
*/
size = SKB_DATA_ALIGN(size);
size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
if (!data)
goto nodata;
/* kmalloc(size) might give us more room than requested.
* Put skb_shared_info exactly at the end of allocated zone,
* to allow max possible filling before reallocation.
*/
size = SKB_WITH_OVERHEAD(ksize(data));
prefetchw(data + size);
/*
* Only clear those fields we need to clear, not those that we will
* actually initialise below. Hence, don't put any more fields after
* the tail pointer in struct sk_buff!
*/
memset(skb, 0, offsetof(struct sk_buff, tail));
/* Account for allocated memory : skb + skb->head */
skb->truesize = SKB_TRUESIZE(size);
skb->pfmemalloc = pfmemalloc;
atomic_set(&skb->users, 1);
skb->head = data;
skb->data = data;
skb_reset_tail_pointer(skb);
skb->end = skb->tail + size;
#ifdef NET_SKBUFF_DATA_USES_OFFSET
skb->mac_header = ~0U;
#endif
/* make sure we initialize shinfo sequentially */
shinfo = skb_shinfo(skb);
memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
atomic_set(&shinfo->dataref, 1);
kmemcheck_annotate_variable(shinfo->destructor_arg);
if (flags & SKB_ALLOC_FCLONE) {
struct sk_buff *child = skb + 1;
atomic_t *fclone_ref = (atomic_t *) (child + 1);
kmemcheck_annotate_bitfield(child, flags1);
kmemcheck_annotate_bitfield(child, flags2);
skb->fclone = SKB_FCLONE_ORIG;
atomic_set(fclone_ref, 1);
child->fclone = SKB_FCLONE_UNAVAILABLE;
child->pfmemalloc = pfmemalloc;
}
out:
return skb;
nodata:
kmem_cache_free(cache, skb);
skb = NULL;
goto out;
}
EXPORT_SYMBOL(__alloc_skb);
```
# Reference
https://elixir.bootlin.com/linux/latest/source
http://vger.kernel.org/~davem/skb_data.html
https://support.huawei.com/enterprise/zh/doc/EDOC1100058974/67dadfe0
http://www.cc.ntu.edu.tw/chinese/epaper/0006/20080920_6003.htm
https://www.hitchhikersguidetolearning.com/2017/09/17/wireless-capture-example-eap-handshake-part-2/