[Debug] IEEE 802.1X EAPoL & EAPoR authentication results in kernel oops === ###### tags: `debug` `IEEE 802.1X` `EAPoL` `EAPoR` `kernel oops` `NULL pointer dereference` `memory overflow` [ToC] ## Environment First, we have a dot 1x authentication server, a switch as authenticator and a client as supplicant. The switch connected to the server while the client connecting to the switch as shown below. <div style="text-align: center"><img src="https://i.imgur.com/4BQw5jm.png"/></div> Here, the authentication server is an embedded device "raspberry pi" with free-radius. And the authenticator is a L2 management switch supporting IEEE 802.1X. The supplicant is a PC or a NB. ## Science Popularization Before getting start, let's see what are EAPoL and EAPoR. - EAPoL is EAP over LAN. The communication between Supplicant and Authenticator is EAPoL. - EAPoR is EAP over Radius. The communication between Authenticator and Server is EAPoL. - EAP is Extensible Authentication Protocol. In brief, the sequence diagram is shown roughly. ```sequence #Note left of SNMP Manager: Set Supplicant->Authenticator: EAPoL Authenticator-->Supplicant:EAPoL #Note right of SNMP Agent: UDP:161 Authenticator->Server: EAPoR Server-->Authenticator:EAPoR Supplicant->Authenticator: EAPoL Authenticator-->Supplicant:EAPoL ``` Below diagram shows the handshake of EAPoL and EAPoR. ```sequence Note over Supplicant,Authenticator: EAPoL Supplicant->Authenticator: (1)EAPoL-Start Authenticator-->Supplicant:(2)EAP-Request/Identity Supplicant->Authenticator: (3)EAP-Response/Identity Note over Authenticator,Server: EAPoR Authenticator->Server:(4)RADIUS Access-Request Server-->Authenticator:(5)RADIUS Access-Challenge Authenticator-->Supplicant:(6)EAP-Chanllenge-Request Supplicant->Authenticator: (7)EAP-Challenge-Response Authenticator->Server:(8)RADIUS Access-Request Server-->Authenticator:(9)RADIUS Access-Accept Authenticator-->Supplicant:(10)EAP-Success Note over Supplicant,Authenticator: (11) Dynamic encryption keys ceated\n4-Way Handshake Note over Supplicant,Authenticator: (12) Controled port opens ``` ## Symptoms However, when the user in the client logined through EAPoL (Extensible Authentication Protocol), the switch occurred kernel oops with below logs. This failure is not happened every time. It seems to occur randomly at first. After trying again and again, I found a regular pattern of username and password will get this case. :::spoiler ``` Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = cb9cc000 [00000000] *pgd=6c91e831, *pte=00000000, *ppte=00000000 Internal error: Oops: 17 [#1] SMP ARM Modules linked in: usb_storage linux_bcm_net(PO) linux_uk_proxy(PO) linux_bcm_core(PO) linux_kernel_bde(PO) CPU: 0 Tainted: P O (3.6.5-Broadcom Linux #1) PC is at put_page+0xc/0x50 LR is at skb_release_data+0x94/0xf0 pc : [<c007a3c0>] lr : [<c0268208>] psr: 20000013 sp : cb919d20 ip : cb919d30 fp : cb919d2c r10: 0000007b r9 : ffffffea r8 : 0000007b r7 : cb919ee4 r6 : cb85bc00 r5 : cb8d3780 r4 : 00000001 r3 : cb8d8ac0 r2 : 00000004 r1 : 00000000 r0 : 00000000 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 10c53c7d Table: 6c9cc04a DAC: 00000015 Process 1x_srv (pid: 691, stack limit = 0xcb918270) Stack: (0xcb919d20 to 0xcb91a000) 9d20: cb919d44 cb919d30 c0268208 c007a3c0 cb8d3780 cb8d3780 cb919d5c cb919d48 9d40: c0267e20 c0268180 00000001 cb85bc00 cb919d6c cb919d60 c0267f60 c0267e10 9d60: cb919d84 cb919d70 c026ba88 c0267f18 00000000 00000060 cb919ddc cb919d88 9d80: c0322688 c026ba7c cb919dbc cb919d98 c0009a5c 00000640 0000007b cb919ee4 9da0: 00000000 00008e88 00000000 cb85bc00 00000000 ce47ed80 00000060 00000640 9dc0: cb919ee4 ce8f18c0 cb918000 020045f0 cb919ecc cb919de0 c025e9a4 c03221fc 9de0: 00000060 cb919df0 cb919e14 cb919df8 00000060 00000640 ce47ed80 00000001 9e00: 00000000 cb919ee4 c07bb300 c07bb348 c07bb300 ce8f18c0 cb919e4c cb919e28 9e20: c004b45c c004ac00 00000000 00000000 00000000 00000001 ffffffff 00000000 9e40: 00000000 00000000 00000000 00000000 ce8f18c0 00000000 00000000 00000000 9e60: 00000000 00000000 cb919de8 00000000 00000000 00000000 00000000 00000000 9e80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 9ea0: 00000000 00000000 cb919ecc 00000640 ce47ed80 00000060 cb919f00 be988c5c 9ec0: cb919fa4 cb919ed0 c0260e30 c025e91c 0000c350 fffffff7 00000000 0200466b 9ee0: 000005c5 cb919f00 00000012 cb919edc 00000001 00000000 00000000 cb919f08 9f00: 8e880011 0000001e 06020001 684ce000 cb912800 cb919f88 05f6a450 00000000 9f20: cb919f21 00000000 00000000 00000000 2fd5672a 00000011 2fd4a3da 00000011 9f40: c0040b44 c07ba6d0 00000000 00000000 00000000 be988b78 cb919fa4 00000000 9f60: be988c80 001699dc 000000a2 c0009304 cb918000 00000000 cb919fa4 cb919f88 9f80: be988c5c be988c58 00000020 00000124 c0009304 00000000 00000000 cb919fa8 9fa0: c00091a0 c0260dac be988c5c be988c58 00000007 020045f0 00000640 00000020 9fc0: be988c5c be988c58 00000020 00000124 020045f0 00000007 00000007 01fda2f8 9fe0: 00000000 be988c30 00010488 b6df5464 60000010 00000007 00000000 00000000 Backtrace: [<c007a3b4>] (put_page+0x0/0x50) from [<c0268208>] (skb_release_data+0x94/0xf0) [<c0268174>] (skb_release_data+0x0/0xf0) from [<c0267e20>] (__kfree_skb+0x1c/0xd4) r5:cb8d3780 r4:cb8d3780 [<c0267e04>] (__kfree_skb+0x0/0xd4) from [<c0267f60>] (consume_skb+0x54/0x58) r4:cb85bc00 r3:00000001 [<c0267f0c>] (consume_skb+0x0/0x58) from [<c026ba88>] (skb_free_datagram+0x18/0x40) [<c026ba70>] (skb_free_datagram+0x0/0x40) from [<c0322688>] (packet_recvmsg+0x498/0x4b0) r4:00000060 r3:00000000 [<c03221f0>] (packet_recvmsg+0x0/0x4b0) from [<c025e9a4>] (sock_recvmsg+0x94/0xa8) [<c025e910>] (sock_recvmsg+0x0/0xa8) from [<c0260e30>] (sys_recvfrom+0x90/0xe0) r8:be988c5c r7:cb919f00 r6:00000060 r5:ce47ed80 r4:00000640 [<c0260da0>] (sys_recvfrom+0x0/0xe0) from [<c00091a0>] (ret_fast_syscall+0x0/0x30) Code: c04ddcfc e1a0c00d e92dd800 e24cb004 (e5902000) ---[ end trace 5d55fb4d3fa0bbdb ]--- Kernel panic - not syncing: Fatal exception Watchdog Timeout ``` ::: ## Root cause analysis ### Prerequisite In this case, we should clarify where this bug is. :::info 1. In EAPoL or EAPoR? 2. In the kernel space or user space? ::: ### Hypothesis According to the backtrace, we think this failure caused by **accessing a null pointer**. From top and down, we hypothesize boldly that the bug hides in the code of EAPoR (Extensible Authentication Protocol Over Radius) or in kernel. **[Hypothesis 1]** This pattern of username is limited or considered by our code. After testing again and again, we found that it depended on **the length of username**. If the length of username is 9 bytes, it always failed. **[Hypothesis 2]** A memory buffer to store the username is not big enough. After tracing the code of EAPoL and EAPoR, the buffer size is large enough. And it will pass even if the length of username is longer than 9 bytes. **[Hypothesis 3]** The received or sent packets during ~~EAPoL~~ EAPoR handshaking result in this catastrophe. We made an experiment to disconnect the link between server and switch before submitting the password and username, and then submit it. The issue cannot reproduce in the end. :::success This proved that this issue occured during the handshake between switch and server, i.e., **EAPoR**. ::: **[Hypothesis 4]** Something went wrong in the skb buffer when switch sent EAPoR packets to server. After tracing the code of EAPoR and comparing with debugging messages from COM port, we make sure that kernel oops occurred after packets were sent from switch to a server. ==So, the bug is in the ++RX side of switch.++== **[Hypothesis 5]** Something went wrong in the skb buffer when switch received EAPoR packets from a server. After setuping a sniffer to capture the packets between switch and server, we got the **"key"** packet resulting in kernel oops. It implicts that kernel oops after switch received this packet. We rebooted the switch and regenerated this ***bomb packet*** by colasoft. We proved that our hypothesis is correct. Everytime, the switch received this bomb packet, it crashed! :::success It turned out the bug was **in kernel space**. Therefore, we have to check our ++device driver++ or linux kernel. ::: **[Hypothesis 6]** The device driver did not do the right thing when handling received packets in the bottom Half's ISR. :::danger The length of *bomb packet* is ***118 bytes*** with ***ethertype = 0x888E***. ::: We tried to generate the same length of packets with different ethertype and everything went well. :::success After tracing the code in driver, we found that this bug was in ISR when we allocated the skb for IEEE 802.1X frames. We did not allocate enough memory to receive these packets. In other word, it is a **++memory overflow++** issue. ::: ## Brainstorm The above description is a skill of top-down debugging. Here, based on some aforementioned clues I try to show the other one, button-up debugging. I programed a testing tool to send different length of IEEE 802.1X packets to the switch. And it will fail not only in the length of 118 bytes but others. I thought out of my curiosity. I started to dig the process out of skb buffer allocation and de-allocation in kernel. ## SKB Free ### Sequence __kfree_skb -> skb_release_data -> skb_frag_unref -> __skb_frag_unref -> put_page ### Explaination In this function **skb_release_data**, we failed in the for loop as shown below. ``` C static void skb_release_data(struct sk_buff *skb) { ... if (skb_shinfo(skb)->nr_frags) { int i; for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) skb_frag_unref(skb, i); } ... } ``` This piece of code is used to free memory of paged data, but we do not use any paged data in our case. >Things start to get a little bit more complicated once paged data begins to be used. For the most part the ability to use [page, offset, len] tuples for SKB data came about so that file system file contents could be directly sent over a socket. But, as it turns out, it is sometimes beneficial to use this for nomal buffering of process sendmsg() data. > >It must be understood that once paged data starts to be used on an SKB, this puts a specific restriction on all future SKB data area operations. In particular, it is no longer possible to do skb_put() operations. > >We will now mention that there are actually two length variables assosciated with an SKB, len and data_len. The latter only comes into play when there is paged data in the SKB. skb->data_len tells how many bytes of paged data there are in the SKB. From this we can derive a few more things: > >The existence of paged data in an SKB is indicated by skb->data_len being non-zero. This is codified in the helper routine skb_is_nonlinear() so that it the function you should use to test this. The amount of non-paged data at skb->data can be calculated as skb->len - skb->data_len. Again, there is a helper routine already defined for this called skb_headlen() so please use that. The main abstraction is that, when there is paged data, the packet begins at skb->data for skb_headlen(skb) bytes, then continues on into the paged data area for skb->data_len bytes. That is why it is illogical to try and do an skb_put(skb) when there is paged data. You have to add data onto the end of the paged data area instead. Each chunk of paged data in an SKB is described by the following structure: > >struct skb_frag_struct { struct page *page; __u16 page_offset; __u16 size; }; There is a pointer to the page (which you must hold a proper reference to), the offset within the page where this chunk of paged data starts, and how many bytes are there. The paged frags are organized into an array in the shared SKB area, defined by this structure: > >#define MAX_SKB_FRAGS (65536/PAGE_SIZE + 2) > > struct skb_shared_info { atomic_t dataref; unsigned int nr_frags; unsigned short tso_size; unsigned short tso_segs; struct sk_buff *frag_list; skb_frag_t frags[MAX_SKB_FRAGS]; }; The nr_frags member states how many frags there are active in the frags[] array. The tso_size and tso_segs is used to convey information to the device driver for TCP segmentation offload. The frag_list is used to maintain a chain of SKBs organized for fragmentation purposes, it is _not_ used for maintaining paged data. And finally the frags[] holds the frag descriptors themselves. After printing this flag in kernel code, we found that kernel oops occured when the flag was set to a non-zero value which was set by ourselves due to memory overflow. As a consequence, a <font color="red">**virtual address 00000000 is accessed**</font>. :::info And now, we are going to further dig out why it only failed in some cases .i.e, When will this flag be miswrite by us? ::: https://elixir.bootlin.com/linux/v3.6.5/source/net/core/skbuff.c#L581 ``` C= //File: net/core/skbuff.c /** * __kfree_skb - private function * @skb: buffer * * Free an sk_buff. Release anything attached to the buffer. * Clean the state. This is an internal helper function. Users should * always call kfree_skb */ void __kfree_skb(struct sk_buff *skb) { skb_release_all(skb); kfree_skbmem(skb); } /* Free everything but the sk_buff shell. */ static void skb_release_all(struct sk_buff *skb) { skb_release_head_state(skb); skb_release_data(skb); } static void skb_release_data(struct sk_buff *skb) { if (!skb->cloned || !atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, &skb_shinfo(skb)->dataref)) { if (skb_shinfo(skb)->nr_frags) { int i; for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) skb_frag_unref(skb, i); } /* * If skb buf is from userspace, we need to notify the caller * the lower device DMA has done; */ if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) { struct ubuf_info *uarg; uarg = skb_shinfo(skb)->destructor_arg; if (uarg->callback) uarg->callback(uarg); } if (skb_has_frag_list(skb)) skb_drop_fraglist(skb); skb_free_head(skb); } } /** * skb_frag_unref - release a reference on a paged fragment of an skb. * @skb: the buffer * @f: the fragment offset * * Releases a reference on the @f'th paged fragment of @skb. */ static inline void skb_frag_unref(struct sk_buff *skb, int f) { __skb_frag_unref(&skb_shinfo(skb)->frags[f]); } /** * __skb_frag_unref - release a reference on a paged fragment. * @frag: the paged fragment * * Releases a reference on the paged fragment @frag. */ static inline void __skb_frag_unref(skb_frag_t *frag) { put_page(skb_frag_page(frag)); } ``` ## SKB Allocation ### Sequence dev_alloc_skb -> netdev_alloc_skb -> __netdev_alloc_skb ### Explaination In the function **__netdev_alloc_skb**, we can allocate skb with size **fragsz** including aligned data buffer and aligned shared info. In this case, we allocate a skb size with X (legal from skb[0] to skb[X-1]), but we write data to skb[X]. It is probably that we miswirte to the first byte of skb_shared_info struct, i.e.,nr_frags. And this mistake results in this catastrophe. ``` C struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int length, gfp_t gfp_mask) { ... unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD) + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); ... } ``` https://elixir.bootlin.com/linux/v3.6.5/source/include/linux/skbuff.h#L388 ![](https://i.imgur.com/98c7WXj.png) ``` C= //File: include/linux/skbuff.h //SMP_CACHE_BYTES = 32 or 64 #define SKB_DATA_ALIGN(X) (((X) + (SMP_CACHE_BYTES - 1)) & \ ~(SMP_CACHE_BYTES - 1)) #define NET_SKB_PAD max(32, L1_CACHE_BYTES) struct skb_shared_info { unsigned char nr_frags; __u8 tx_flags; unsigned short gso_size; /* Warning: this field is not always filled in (UFO)! */ unsigned short gso_segs; unsigned short gso_type; struct sk_buff *frag_list; struct skb_shared_hwtstamps hwtstamps; __be32 ip6_frag_id; /* * Warning : all fields before dataref are cleared in __alloc_skb() */ atomic_t dataref; /* Intermediate layers must ensure that destructor_arg * remains valid until skb destructor */ void * destructor_arg; /* must be last field, see pskb_expand_head() */ skb_frag_t frags[MAX_SKB_FRAGS]; }; /* legacy helper around netdev_alloc_skb() */ static inline struct sk_buff *dev_alloc_skb(unsigned int length) { return netdev_alloc_skb(NULL, length); } /** * netdev_alloc_skb - allocate an skbuff for rx on a specific device * @dev: network device to receive on * @length: length to allocate * * Allocate a new &sk_buff and assign it a usage count of one. The * buffer has unspecified headroom built in. Users should allocate * the headroom they think they need without accounting for the * built in space. The built in space is used for optimisations. * * %NULL is returned if there is no free memory. Although this function * allocates memory it can be called from an interrupt. */ static inline struct sk_buff *netdev_alloc_skb(struct net_device *dev, unsigned int length) { return __netdev_alloc_skb(dev, length, GFP_ATOMIC); } ``` ``` C= static void myRxISR(void *pkt, int pkt_size, ...) { struct sk_buff *skb; skb = dev_alloc_skb(pkt_size); } ``` https://elixir.bootlin.com/linux/v3.6.5/source/net/core/skbuff.c#L581 ``` C= /** * __netdev_alloc_skb - allocate an skbuff for rx on a specific device * @dev: network device to receive on * @length: length to allocate * @gfp_mask: get_free_pages mask, passed to alloc_skb * * Allocate a new &sk_buff and assign it a usage count of one. The * buffer has unspecified headroom built in. Users should allocate * the headroom they think they need without accounting for the * built in space. The built in space is used for optimisations. * * %NULL is returned if there is no free memory. */ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int length, gfp_t gfp_mask) { struct sk_buff *skb = NULL; unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD) + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); if (fragsz <= PAGE_SIZE && !(gfp_mask & (__GFP_WAIT | GFP_DMA))) { void *data; if (sk_memalloc_socks()) gfp_mask |= __GFP_MEMALLOC; data = __netdev_alloc_frag(fragsz, gfp_mask); if (likely(data)) { skb = build_skb(data, fragsz); if (unlikely(!skb)) put_page(virt_to_head_page(data)); } } else { skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE); } if (likely(skb)) { skb_reserve(skb, NET_SKB_PAD); skb->dev = dev; } return skb; } EXPORT_SYMBOL(__netdev_alloc_skb); /** * __alloc_skb - allocate a network buffer * @size: size to allocate * @gfp_mask: allocation mask * @flags: If SKB_ALLOC_FCLONE is set, allocate from fclone cache * instead of head cache and allocate a cloned (child) skb. * If SKB_ALLOC_RX is set, __GFP_MEMALLOC will be used for * allocations in case the data is required for writeback * @node: numa node to allocate memory on * * Allocate a new &sk_buff. The returned buffer has no headroom and a * tail room of at least size bytes. The object has a reference count * of one. The return is the buffer. On a failure the return is %NULL. * * Buffers may only be allocated from interrupts using a @gfp_mask of * %GFP_ATOMIC. */ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, int flags, int node) { struct kmem_cache *cache; struct skb_shared_info *shinfo; struct sk_buff *skb; u8 *data; bool pfmemalloc; cache = (flags & SKB_ALLOC_FCLONE) ? skbuff_fclone_cache : skbuff_head_cache; if (sk_memalloc_socks() && (flags & SKB_ALLOC_RX)) gfp_mask |= __GFP_MEMALLOC; /* Get the HEAD */ skb = kmem_cache_alloc_node(cache, gfp_mask & ~__GFP_DMA, node); if (!skb) goto out; prefetchw(skb); /* We do our best to align skb_shared_info on a separate cache * line. It usually works because kmalloc(X > SMP_CACHE_BYTES) gives * aligned memory blocks, unless SLUB/SLAB debug is enabled. * Both skb->head and skb_shared_info are cache line aligned. */ size = SKB_DATA_ALIGN(size); size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); if (!data) goto nodata; /* kmalloc(size) might give us more room than requested. * Put skb_shared_info exactly at the end of allocated zone, * to allow max possible filling before reallocation. */ size = SKB_WITH_OVERHEAD(ksize(data)); prefetchw(data + size); /* * Only clear those fields we need to clear, not those that we will * actually initialise below. Hence, don't put any more fields after * the tail pointer in struct sk_buff! */ memset(skb, 0, offsetof(struct sk_buff, tail)); /* Account for allocated memory : skb + skb->head */ skb->truesize = SKB_TRUESIZE(size); skb->pfmemalloc = pfmemalloc; atomic_set(&skb->users, 1); skb->head = data; skb->data = data; skb_reset_tail_pointer(skb); skb->end = skb->tail + size; #ifdef NET_SKBUFF_DATA_USES_OFFSET skb->mac_header = ~0U; #endif /* make sure we initialize shinfo sequentially */ shinfo = skb_shinfo(skb); memset(shinfo, 0, offsetof(struct skb_shared_info, dataref)); atomic_set(&shinfo->dataref, 1); kmemcheck_annotate_variable(shinfo->destructor_arg); if (flags & SKB_ALLOC_FCLONE) { struct sk_buff *child = skb + 1; atomic_t *fclone_ref = (atomic_t *) (child + 1); kmemcheck_annotate_bitfield(child, flags1); kmemcheck_annotate_bitfield(child, flags2); skb->fclone = SKB_FCLONE_ORIG; atomic_set(fclone_ref, 1); child->fclone = SKB_FCLONE_UNAVAILABLE; child->pfmemalloc = pfmemalloc; } out: return skb; nodata: kmem_cache_free(cache, skb); skb = NULL; goto out; } EXPORT_SYMBOL(__alloc_skb); ``` # Reference https://elixir.bootlin.com/linux/latest/source http://vger.kernel.org/~davem/skb_data.html https://support.huawei.com/enterprise/zh/doc/EDOC1100058974/67dadfe0 http://www.cc.ntu.edu.tw/chinese/epaper/0006/20080920_6003.htm https://www.hitchhikersguidetolearning.com/2017/09/17/wireless-capture-example-eap-handshake-part-2/