KMC/GCC decompilation patterns

# KMC/GCC decompilation patterns (This document is publicly editable, please contribute.) As a reference, here's the IDO version of this document: https://hackmd.io/vPmcgdaFSlq4R2mfkq4bJg PMRet GCC tips and tricks document: https://github.com/pmret/papermario/wiki/GCC-2.8.1-Tips-and-Tricks ```c // temp_v1 = arg5 - 1; // temp_t1 = temp_v1 & (~temp_v1 >> 0x1F); temp_t1 = MAX(arg5 - 1, 0); ``` * Two adjacent 16-bit loads can be coalesced into a single `lw` when being compared at the same time instead of generating two `lh` instructions. This can only happen if the two are next to each other in memory and the first one is guaranteed to be 4 byte aligned. For example, ```c struct Test { s16 a; s16 b; s32 for_alignment; }; int foo(struct Test* t) { if (t->a || t->b) { bar(); } } ``` ``` foo: addiu $sp, $sp, -0x18 sw $ra, 0x10($sp) lw $v0, 0x0($a0) # This is the coalesced loads for `a` and `b` beqz $v0, 1f nop jal bar nop 1: lw $ra, 0x10($sp) addiu $sp, $sp, 0x18 jr $ra nop ``` The compiler can make similar optimizations for combining multiple 8-bit comparisons as well. In the case that the members don't total 4-bytes, it can still coalesce the loads but will mask their combined value when doing the comparison. Non-adjacent members loads can be coalesced as well as long as they're within a 4-byte boundary, and the members between that were skipped will be masked out of the comparison via a combination of `lui`, `ori`, and `and`. ```c struct Test { s8 a; s8 b; s8 c; s8 d; s32 for_alignment; }; int foo(struct Test* t) { if (t->a || t->b || t->d) { bar(); } } ``` ``` foo: addiu $sp, $sp, -0x18 sw $ra, 0x10($sp) lw $v0, 0x0($a0) lui $v1, 0xffff ori $v1, $v1, 0xff and $v0, $v0, $v1 beqz $v0, 1f nop jal bar nop 1: lw $ra, 0x10($sp) jr $ra addiu $sp, $sp, 0x18 ``` In -O0, `for(;;)` generates different code than `while(1)`. ## Signed division by 2 There are three different patterns involving a signed variable divided by 2. The variant depends from the size of the type. ```c s32 temp_v0; // temp_v0 = ((s32)(temp_v0 + ((u32)temp_v0 >> 0x1F)) >> 1) temp_v0 /= 2; ``` ```c s16 temp_v0; // temp_v0 = ((s32)(temp_v0 + (((u32)(temp_v0 << 0x10)) >> 0x1F))) >> 1 temp_v0 /= 2; ``` ```c s8 temp_v0; // temp_v0 = ((s32)(temp_v0 + (((u32)(temp_v0 << 0x18)) >> 0x1F))) >> 1 temp_v0 /= 2; ```