# Bitfields in IDO ## Bitfields Bitfields are a stupid[^stupid] C optimisation to pack unaligned variables made of a small or odd number of bits. Their memory layout is unspecified, the notation looks like [^stupid]: See, e.g. https://lwn.net/Articles/478657/ **** ```c struct example1 { u32 m1 : 1; u32 m2 : 4; u32 m3 : 7; }; ``` Any integer type may be used, which makes things even more confusing; it is generally most sensible to use an unsigned int unless you are expecting it to be a small total number of bits. Their memory layout is unspecified apart from lying within the area of the struct reserved by the type. ## IDO memory layout IDO is a big-endian MIPS compiler. Bitfields are stored strictly left-to-right, most significant to least significant, e.g. ``` aaabbbbb ccccccddd dddd0000 00000000 ``` corresponds to ```c struct example2 { u32 a : 3; u32 b : 5; u32 c : 6; u32 d : 7; } v; ``` Bitfields reserve memory and align the struct as if it contains an element of the widest type used in the bitfield: in `v`'s case, this is `u32`, so the struct aligns to 4, and `v` takes up a minimum of 32 bytes (see the 0s at the end). If an element of a bitfield straddles an alignment boundary of the type, it instead aligns to the boundary, and the bits between are unused. For example, ```c struct example3 { u32 a : 21; u32 b : 13; }; ``` is laid out in memory as ``` aaaaaaaa aaaaaaaa aaaaa000 00000000 bbbbbbbb bbbbb000 00000000 00000000 ``` ## Notes on optimisation considerations `v` is a useful example because it does not quite fit in a u16, which causes weird stuff to happen when trying to access `d`. Before we move on, notice that if we have a variable `v` of this type, `a`, `b`, `c`, `d` may be accessed as ```c v.a == ((u8*)&v)[0] >> 5; v.b == ((u8*)&v)[0] & 0x1F; v.c == ((u8*)&v)[0] >> 2; v.d == (((u32*)&v)[0] >> 25) & 0x7F; ``` IDO will do the middle two of these but not first or last: for `a`, it instead does ```c v.a == ((u32*)&v)[0] >> 29; ``` while for `d`, it does ```c v.c == (((u32*)&v)[0] << 14) >> 25; ``` A natural question is "why?". - Firstly, if a bitfield aligns to a byte boundary we may save an instruction by loading the byte and doing the appropriate shift or `and`. Since this is not the case for the start of the bitfield, it follows the more ordinary pattern of just loading the whole word and shifting. - Secondly, generally IDO will use a shift left/shift right to get elements of bitfields instead of an `and` 1. it is easier to adjust for signed bitfields: one simply replaces the `srl` by an `sra`, whereas an `and` would need to be replaced by an `sll`/`sra` pair. 2. It adapts better to wider fields: a mask for a field wider than 16 bits requires an extra `lui` to load its top half. Below we go into detail about the various possible special cases. ## IDO reading Generally IDO will load the whole word and shift away the irrelevant parts. Exceptions are made when one of the field's boundaries lines up with a byte boundary: in such cases, a narrower load and a single `and` or `sra`/`srl` will be used where possible. The principle will become clear from looking at the examples given below. Using smaller types makes no difference apart from limiting the maximum width. Signed types use `sra` instead of `srl` but are otherwise the same. Using 64-bit types generates more instructions. ### Examples with small bit width (Throughout, special cases are indicated in these tables by **bold**.) For ```c struct bitfield { u32 field_0_80 : 1; u32 field_0_40 : 1; u32 field_0_20 : 1; u32 field_0_10 : 1; u32 field_0_08 : 1; u32 field_0_04 : 1; u32 field_0_02 : 1; u32 field_0_01 : 1; u32 field_1_80 : 1; u32 field_1_40 : 1; u32 field_1_20 : 1; u32 field_1_10 : 1; u32 field_1_08 : 1; u32 field_1_04 : 1; u32 field_1_02 : 1; u32 field_1_01 : 1; u32 field_2_80 : 1; u32 field_2_40 : 1; u32 field_2_20 : 1; u32 field_2_10 : 1; u32 field_2_08 : 1; u32 field_2_04 : 1; u32 field_2_02 : 1; u32 field_2_01 : 1; u32 field_3_80 : 1; u32 field_3_40 : 1; u32 field_3_20 : 1; u32 field_3_10 : 1; u32 field_3_08 : 1; u32 field_3_04 : 1; u32 field_3_02 : 1; u32 field_3_01 : 1; } b; ``` Width 1: | bits | load | operation | | -------- | ------- | ----------- | | **0_80** | (u32*) | >> 31 | | 0_40 | (u32*) | << 1 >> 31 | | 0_20 | (u32*) | << 2 >> 31 | | 0_10 | (u32*) | << 3 >> 31 | | 0_08 | (u32*) | << 4 >> 31 | | 0_04 | (u32*) | << 5 >> 31 | | 0_02 | (u32*) | << 6 >> 31 | | **0_01** | (u8*) | & 1 | | **1_80** | (u8*) | >> 7 | | 1_40 | (u32*) | << 9 >> 31 | | 1_20 | (u32*) | << 10 >> 31 | | 1_10 | (u32*) | << 11 >> 31 | | 1_08 | (u32*) | << 12 >> 31 | | 1_04 | (u32*) | << 13 >> 31 | | 1_02 | (u32*) | << 14 >> 31 | | **1_01** | (u16*) | & 1 | | **2_80** | (u8*)+2 | >> 15 | | 2_40 | (u32*) | << 17 >> 31 | | 2_20 | (u32*) | << 18 >> 31 | | 2_10 | (u32*) | << 19 >> 31 | | 2_08 | (u32*) | << 20 >> 31 | | 2_04 | (u32*) | << 21 >> 31 | | 2_02 | (u32*) | << 22 >> 31 | | **2_01** | (u8*)+2 | & 1 | | **3_80** | (u8*)+3 | >> 7 | | 3_40 | (u32*) | << 25 >> 31 | | 3_20 | (u32*) | << 26 >> 31 | | 3_10 | (u32*) | << 27 >> 31 | | 3_08 | (u32*) | << 28 >> 31 | | 3_04 | (u32*) | << 29 >> 31 | | 3_02 | (u32*) | << 30 >> 31 | | **3_01** | (u32*) | & 1 | Width 2, no offset | bits | operation | load | | -------- | ----------- | -------- | | **0_C0** | >> 30 | (u32*) | | 0_30 | << 2 >> 30 | (u32*) | | 0_0C | << 4 >> 30 | (u32*) | | **0_03** | & 3 | (u32*) | | **1_C0** | >> 6 | (u8*)+1 | | 1_30 | << 10 >> 30 | (u32*) | | 1_0C | << 12 >> 30 | (u32*) | | **1_03** | & 3 | (u16*) | | **2_C0** | >> 14 | (u16*)+2 | | 2_30 | << 18 >> 30 | (u32*) | | 2_0C | << 20 >> 30 | (u32*) | | **2_03** | & 3 | (u8*)+2 | | **3_C0** | >> 6 | (u*)+3 | | 3_30 | << 26 >> 30 | (u32*) | | 3_0C | << 28 >> 30 | (u32*) | | **3_03** | & 3 | (u32*) | Width 2, offset by 1 | bits | load | operation | | ------ | ------- | ----------- | | 0_60 | (u32*) | << 1 >> 30 | | 0_18 | (u32*) | << 3 >> 30 | | 0_06 | (u32*) | << 5 >> 30 | | 0_0180 | (u32*) | << 7 >> 30 | | 1_60 | (u32*) | << 9 >> 30 | | 1_18 | (u32*) | << 11 >> 30 | | 1_06 | (u32*) | << 13 >> 30 | | 1_0180 | (u32*) | << 15 >> 30 | | 2_60 | (u32*) | << 17 >> 30 | | 2_18 | (u32*) | << 19 >> 30 | | 2_06 | (u32*) | << 21 >> 30 | | 2_0180 | (u32*) | << 23 >> 30 | | 3_60 | (u32*) | << 25 >> 30 | | 3_18 | (u32*) | << 27 >> 30 | | 3_06 | (u32*) | << 29 >> 30 | (A putative 3_0180 doesn't exist, it aligns to the next word boundary instead.) Width 3, no offset | bits | load | operation | | -------- | ------- | ----------- | | 0_E0 | (u32*) | >> 29 | | 0_1C | (u32*) | << 3 >> 29 | | 0_0380 | (u32*) | << 6 >> 29 | | 1_70 | (u32*) | << 9 >> 29 | | 1_0E | (u32*) | << 12 >> 29 | | 1_01C0 | (u32*) | << 15 >> 29 | | 2_38 | (u32*) | << 18 >> 29 | | **2_07** | (u8*)+2 | & 7 | | **3_E0** | (u8*)+3 | >> 5 | | 3_1C | (u32*) | << 27 >> 29 | ## Storing At -O1, almost all are of the form ```c new_val = (((in_val & (mask >> shift)) << shift) & mask) | (*(type*)cur_val & ~mask) ``` The `type` is generally u8 but can be larger if the mask spans multiple bytes. -O2 can remove unnecessary masking. There is one exception: fields that cross the mid-word boundary (that is, include both of the middle two bits) will *always* use an xor construction: for example, ```c void set_bitfield_1_01C0(u32 x) { b ^= (u32) (((arg0 & 7) ^ ((u32) b >> 0xE)) << 0x1D) >> 0xF; } ``` If `in_val` is signed, `srl/sra` is used to clear higher bits of the input instead of `and`, other operations are the same. ### The `xor` algorithm The `xor` cases work as follows: 1. Load the whole word. 2. Shift it right to put the bits to be replaced in a position to line up with those replacing them. 3. `and` the replacement with the shifted mask, or shift/unshift it, to clear higher bits. 3. `xor` with replacement. 4. Shift right to clear out the now spurious top bits. 5. Shift left to get the replaced bits in the right place. 6. `xor` with the original value. Example: suppose we are looking at the following bitfield: ```c struct xor_example { u32 x : 7; u32 y : 13; } bitfield; ``` `7+12 = 19 > 16`, so this crosses the mid-word boundary and we should use the xor method. Suppose the current value of the bitfield is represented by ``` A = aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa ``` and the value we wish to put into `y` is `Y = yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy`. 1. Load whole word's current value ``` A = aaaaaaaa aaaaaaaa aaaaaaaa aaaaaaaa ``` 2. Shift current value to clear bits below where `Y` is to go. ``` A >> (32 - (13 + 7)) = A >> 11 = 00000000 000aaaaa aaaaaaaa aaaaaaaa ``` 3. And `y` with mask ``` Y & 0x1FFF = yyyyy yyyyyyyy ``` 2. xor with masked `y`: ``` (Y & 0x1FFF) ^ (A >> 12) = zzzzzzzz zzzzzzzz zzz(a^y)(a^y)(a^y)(a^y)(a^y) (a^y)(a^y)(a^y)(a^y)(a^y)(a^y)(a^y)(a^y) ``` where the `z` are numbers we do not care about due to the next step 3. shift to clear bits above where `Y` goes ``` ((Y & 0x1FFF) ^ (A >> 12)) << (32 - 13) = (a^y)(a^y)(a^y)(a^y)(a^y)(a^y)(a^y)(a^y) (a^y)(a^y)(a^y)(a^y)(a^y)000 00000000 00000000 ``` 4. shift to get `Y` bits to the right position ``` (((Y & 0x1FFF) ^ (A >> 12)) << 19) >> 7 = 0000000(a^y) (a^y)(a^y)(a^y)(a^y)(a^y)(a^y)(a^y)(a^y) (a^y)(a^y)(a^y)(a^y)0000 00000000 ``` 5. xor with `A` again: `a^0 = a` and `a^(a^y) = (a^a)^y = y` since xor is associative and `a^a = 0`. ``` A ^ ((((Y & 0x1FFF) ^ (A >> 12)) << 19) >> 7) = aaaaaaay yyyyyyyy yyyyaaaa aaaaaaaa ``` as required. ### Examples for small widths Size 1: | bits/mask | load_type | mask | left shift | | --------- | --------- | ----- | ---------- | | 0_80 | (u8*) | 0x80 | << 7 | | 0_40 | (u8*) | 0x40 | << 6 | | 0_20 | (u8*) | 0x20 | << 5 | | 0_10 | (u8*) | 0x10 | << 4 | | 0_08 | (u8*) | 0x8 | << 3 | | 0_04 | (u8*) | 0x4 | << 2 | | 0_02 | (u8*) | 0x2 | << 1 | | 0_01 | (u8*) | 0x1 | | (pattern then repeats for other bytes) Size 2, displaced by 0: | bits/mask | load_type | mask | left shift | | --------- | --------- | ----- | ---------- | | 0_C0 | (u8*) | 0xC0 | << 6 | | 0_30 | (u8*) | 0x30 | << 4 | | 0_0C | (u8*) | 0xC | << 2 | | 0_03 | (u8*) | 0x3 | | (pattern then repeats for other bytes) Size 2, displaced by 1: | bits/mask | load_type | mask | left shift | | ---------- | --------- | ----- | ---------- | | 0_60 | (u8*) | 0x60 | << 5 | | 0_18 | (u8*) | 0x18 | << 3 | | 0_06 | (u8*) | 0x6 | << 1 | | 0_0180 | (u16*) | 0x180 | << 7 | | 1_60 | (u8*)+1 | 0x60 | << 5 | | 1_18 | (u8*)+1 | 0x18 | << 3 | | 1_06 | (u8*)+1 | 0x6 | << 1 | | **1_0180** | xor | ++++ | ++++ | | 2_60 | (u8*)+2 | 0x60 | << 5 | | 2_18 | (u8*)+2 | 0x18 | << 3 | | 2_06 | (u8*)+2 | 0x6 | << 1 | | 3_0180 | (u16*)+2 | 0x180 | << 7 | | 3_60 | (u8*)+3 | 0x60 | << 5 | | 3_18 | (u8*)+3 | 0x18 | << 3 | | 3_06 | (u8*)+3 | 0x6 | << 1 | | bits/mask | load_type | mask | left shift | | ---------- | --------- | ----- | ---------- | | 0_E0 | (u8*) | 0xE0 | << 5 | | 0_1C | (u8*) | 0x1C | << 2 | | 0_0380 | (u16*) | 0x380 | << 7 | | 1_70 | (u8*) | 0x70 | << 7 | | 1_0E | (u8*) | 0x0E | << 1 | | **1_01C0** | xor | ++++ | ++++ | | 2_38 | (u8*) | 0x38 | << 3 | | 2_07 | (u8*) | 0x7 | | | 3_E0 | (u8*) | 0xE0 | << 5 | | 3_1C | (u8*) | 0x1C | << 2 |