# Hierachical spinlock embedding problem in Linux kernel Contributed by <`Julian Fang`> ## Environment * Target version: linux-5.9.1 * Building Environment ```bash $ uname -a Linux 4.15.0-121-generic $ gcc -v gcc version 7.5.0 ``` ## Target spinlock: * Reference: [multicore-locks/litl](https://github.com/multicore-locks/litl) * In the [include/htlockepfl.h](https://github.com/multicore-locks/litl/blob/916469ca797ee299a4ae674b41c4fac9ac4ae21b/include/htlockepfl.h#L47) ```c typedef struct { union { volatile uint64_t u; struct { volatile uint32_t grant; volatile uint32_t request; } s; } u; char __pad[pad_to_cache_line(sizeof(uint64_t))]; } ticket_lock_local_t __attribute__((aligned(L_CACHE_LINE_SIZE))); ``` The size of a local lock is at least(Not including padding) 64-bit. ## Target scenario * Reference: [torvalds/linux](https://github.com/torvalds/linux) * In the [include/linux/spinlock_types.h](https://github.com/torvalds/linux/blob/c4d6fe7311762f2e03b3c27ad38df7c40c80cc93/include/linux/spinlock_types.h#L71): ```c typedef struct spinlock { union { struct raw_spinlock rlock; #ifdef CONFIG_DEBUG_LOCK_ALLOC # define LOCK_PADSIZE (offsetof(struct raw_spinlock, dep_map)) struct { u8 __padding[LOCK_PADSIZE]; struct lockdep_map dep_map; }; #endif }; } spinlock_t; ``` The spinlock implementations can be seen in line 71. Without the debug flags ` CONFIG_DEBUG_LOCK_ALLOC `, it the struct of spinlock can be seen as: ```c typedef struct spinlock { union { struct raw_spinlock rlock; }; } spinlock_t; ``` Now we should find the ` struct raw_spinlock ` for test the embedding of htlock. The ` raw_spinlock ` can be seen in the same header file line 20: ```c typedef struct raw_spinlock { arch_spinlock_t raw_lock; #ifdef CONFIG_DEBUG_SPINLOCK unsigned int magic, owner_cpu; void *owner; #endif #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; #endif } raw_spinlock_t; ``` As mentioned above, we can omit the debugging parts. ```c typedef struct raw_spinlock { arch_spinlock_t raw_lock; } raw_spinlock_t; ``` The spinlock here is actually ` arch_spinlock_t `. For my computer the ` arch_spinlock_t ` is the x86_64 implementation. Hence it may be considered implemented in [include/asm-generic/qspinlock_types.h](https://github.com/torvalds/linux/blob/master/include/asm-generic/qspinlock_types.h). We can found ` struct qspinlock ` in line 14: ```c typedef struct qspinlock { union { atomic_t val; /* * By using the whole 2nd least significant byte for the * pending bit, we can allow better optimization of the lock * acquisition for the pending bit holder. */ #ifdef __LITTLE_ENDIAN struct { u8 locked; u8 pending; }; struct { u16 locked_pending; u16 tail; }; #else struct { u16 tail; u16 locked_pending; }; struct { u8 reserved[2]; u8 pending; u8 locked; }; #endif }; } arch_spinlock_t; ``` We have found the structure of the spinlock inside linux kernel. ## The different size spinlock Then we try to modifiy the size of the spinlock. ```c typedef struct qspinlock { union { atomic_t val; /* * By using the whole 2nd least significant byte for the * pending bit, we can allow better optimization of the lock * acquisition for the pending bit holder. */ #ifdef __LITTLE_ENDIAN struct { u8 locked; u8 pending; }; struct { u16 locked_pending; u16 tail; u32 test_pending; }; #else struct { u32 test_pending; u16 tail; u16 locked_pending; }; struct { u8 reserved[2]; u8 pending; u8 locked; }; #endif }; } arch_spinlock_t; ``` With the modification above, I tried to pend a 32-bit size field into ` arch_spinlock_t `. It'll get a error message while compiling. ```bash arch/x86/kernel/hpet.c: In function ‘read_hpet’: ././include/linux/compiler_types.h:319:38: error: call to ‘__compiletime_assert_149’ declared with attribute error: BUILD_BUG_ON failed: sizeof(union hpet_lock) != 8 _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ``` It'll leads to an assertion error in `HPET counter`. ## HPET counter: HPET counter is refered to ` High Precision Event Timer ` In line 651 of [/arch/x86/kernel/hpet.c](https://github.com/torvalds/linux/blob/38525c6919e2f6b27c1855905f342a0def3cbdcf/arch/x86/kernel/hpet.c#L651): > Reading the HPET counter is a very slow operation. If a large number of CPUs are trying to access the HPET counter simultaneously, it can cause massive delays and slow down system performance dramatically. > . > . > . > If multiple CPUs are trying to access the HPET counter at the same time, we don't actually need to read the counter multiple times. Instead, the other CPUs can use the counter value read by the first CPU in the group. > . > . > . > The lock and the HPET value are stored together and can be read in a single atomic 64-bit read. It is explicitly assumed that arch_spinlock_t is 32 bits in size. The highlight here is: * Reading the HPET counter is very slow operation * To prevent the frequently accessment, the first CPU will read the value and other CPUS can use the counter value read by the first CPU in the group. * For lower the cost of reading HPET counter, it'l be read in a single atomic 64-bit read. * 32 bits value + 32 bits spinlock_t * Hence the ` arch_spinlock_t ` is assume 32 bits in size. The 64 bits spinlock will encounter the implementation coutradictions without sorrounded modifications.