IL Hooks For Beginners

# IL Hooks For Beginners ## Preface You can skip to [here](#But-what-even-is-IL) if you know what IL hooks are. You probably encountered this issue while modding Celeste before: A vanilla method that you cannot edit to fit your mod without copy-pasting the source code and changing the part you need (by the way, this is absolutely terrible and breaks compatibility with other mods, *don't ever do this*). `On` hooks are powerful enough for most situations but sometimes you need to go directly into a vanilla function and change something yourself. This is where IL hooks come into play. IL hooks are much more powerful than simple method hooks, since they allow you the freedom of changing anything about the code that you want. *However you should avoid using them when possible*. Not only are they way more clunky to read than normal hooks, but multiple IL hooks on one place can conflict with each other and cause one of them to do nothing, crash the game with an `InvalidProgramException` or in the worst case make the game go haywire. In order to view a program's IL, you will need a .NET disassembler. Visual Studio does not come with such a program, so you will need a tool such as [dnSpy](https://github.com/dnSpy/dnSpy) or [ILSpy](https://github.com/icsharpcode/ILSpy). Alternatively JetBrains Rider and ReSharper both let you view IL in their respective editors, however these tools aren't free. ### But what even is IL? C# does not compile directly to machine code that can be understood by your CPU. Instead, the C# compiler translates the code into an intermediate language called, well... "Common Intermediate Language", shortened to CIL or just IL *(also MSIL, but only by psychopaths)*. This language is then translated to CPU code by the "Common Language Runtime" (the program that runs Celeste's code) when the game is started. With MonoMod you can modify the code inside of methods, but nothing else. Assemblies and types cannot be modified once they're loaded (this would break things *very* badly). Here's an example of a method that prints "Hello!" to the console, written in IL: ```= ! IL_0000: nop IL_0001: ldstr "Hello!" IL_0006: call void [System.Console]System.Console::WriteLine(string) IL_000b: nop IL_000c: ret ``` When inspecting C# methods in IL view, you will often find IL code wrapped by method and type signatures. *You only need to focus on the IL inside methods, the rest is irrelevant since you cannot change it.* ## Concepts in IL-Land ### IL, as steps for a recipe Have you ever heard the analogy that *"C# is just a recipe for your CPU to follow"*? IL fits into this analogy even more, in fact it's just a list of instructions boiled down to the simplest form possible. Every instruction only has two components: - The `Opcode` - an optional `Operand` Every instruction can only do one thing. This is determined by the `Opcode`, which dictates what it does such as *"load this variable"* or *"add these things together"*. The `Operand` provides context to the instruction, such as what variable to load, what method to call and so on. IL might look scary at first, but in reality it's just a bunch of steps that each do only one thing at a time. Luckily, Wikipedia provides [a list of all opcodes](https://en.wikipedia.org/wiki/List_of_CIL_instructions), so you can easily look up what an opcode stands for. But wait, if an instruction can only have one operand, how do we, for example, add two numbers together? ### Memory, as a stack of cards Another behaviour that IL instructions can have is that they can either *put something on the stack* or *take something from the stack*. Before explaining what a stack is, lets look at an example: ```= ! ldc.i4.1 // load the number 1 as an int onto the stack ldc.i4.2 // load the number 2 as an int onto the stack add // take two values off the stack, add them together and put the result on the stack ret // return the only value currently on the stack ``` The comments here explain what the code, but what actually is this mysterious "stack"? Well, stacks in general are a special type of collection that *only supports putting a new element at the front or taking it the first element out.* You can imagine it like a stack of cards, where players either draw or put cards on the top of a card stack. Here, the instructions do the same but with references, booleans, integers and more. You can see this with the first two instructions which push both 1 and 2 onto the stack. The `add` instruction then takes both those numbers off the stack and adds them together, putting the result back onto the stack. Lastly, the `ret` instruction takes the result of `add` and returns it to the caller. As you may guess, the equivalent code in C# looks like this: ```csharp= ! // the function signature exists just for completion's sake, it's not the focus here static int Example() { return 1 + 2; } ``` The stack is the first level of memory in which a C# program stores values, and is needed for instructions to be able to work with anything stored in memory. As explained, some opcodes load values from fields, locals or methods onto the stack while some operations take values from the stack to perform an operation and put the result onto the stack again. Again, a crucial piece of IL functionality. ### Labels, to know where to jump to So, now you know how to make a list of IL instructions and how these instructions interoperate with eachother to make a program. You could theoretically write something in pure IL now, but it would be very boring. Of course, we need to cover program flow! The most relevant instructions for control flow inside of a method (not covering method calls) are the *branch instructions*, like `brtrue` (jump if not zero), `br` (unconditional jump), but those use a special operand: *Labels*. Labels are also the simplest concept that is covered in this guide, because they are just indices that point to instructions. You are just telling the CLR what instruction to jump to. What a shocker. ### Signatures, to know what to access Last but not least, we need to learn how to work with signatures like field, method or type names. This is important to be able to work with `ldfld`/`stfld` (load a field/store into a field), `call`/`callvirt` (direct call/late-bound call) or `newobj` (create an object), and they are also really simple. Emit methods identify these via Reflection objects, so just passing those when emitting IL will be enough. One special mention is the difference between `call` and `callvirt`. You will see both on your escapades through decompilations, and it's important to know when to use either of them. `call` is the basic one of the two, as it will directly call the method in its operand. `callvirt` on the other hand will take an instance, look up its most derived version of the method and call that. This is called "late binding". You want to use `call` when you know ahead of time what method you want to call, i.e. you are calling a static method or you are in an overriding method and you want to call the base version, and you use `callvirt` if you want to call an instance's method but you cannot know the most derived version ahead of time. ## Now you're thinking with IL: Code examples Now that we are done with theory and explanations, we can actually start working with IL. The first step is registering a method as an IL hook. The method gets executed immediately upon registering and we receive an `ILContext` instance. From this context we can create an `ILCursor`. The cursor tracks what instruction in the list its currently pointing to (it will always start at the first instruction), and it defines methods we can call to move to a certain set of instructions (`Goto()`) or insert an instruction *before* the current index (`EmitInstr()`). WIP add examples ## Notes for writing IL hooks WIP add points like "avoid IL hooks until needed" ## Resources WIP link resources like dnspy or the opcode list