# CodeInstance, MethodInstances, CodeInfo, world ages, and more Let's take a look at each of these internal representations of code, and document which of the fields are relevant to each consumer. This is not necessarily the current state, but rather the intended future state. ## Definitions First we have `Method`: this seems fairly self-explanatory. This represents a user-defined dispatch entry target. Next we have `MethodInstance`: this is the pair of details: `(Method, specialization types)` which uniquely defines each object by egal and type-equality. Next we have `CodeInfo`: this describes a precise input to codegen. It might be unspecialized (just the direct output of code lowering) or optimized. It contains the statements of the function, and some metadata for each of the statements. It may contain some extra data unused by codegen, but generally that data may not be available. Next we have `IRCode`: this represents a superset of the information in `CodeInfo`, but also in particular represents it in a exploded way that is easier to mutate. The `CodeInfo` representation is optimized for size when compressed instead. It might not be easy to mutate this further (though it can be appoximately converted back to an `IRCode` with some effort) And finally we have `CodeInstance`: this represents an executable chunk OR also an inference edge. It will be important to know which fields are valid for each use case, so this will be more refined further now. A `CodeInstance` has both "extrinsic" and "intrinsic" properties (using the terms as defined here https://hackmd.io/@Og2_pcUySm6R_RPqbZ06JA/S1bqP1D_6), which are as follows: The "extrinsic" properties are: - `MethodInstance`: The cache that contained this. This might be the same as the `def` field (as is guaranteed now), or perhaps might be a subtype of `def` in the future. - `owner` (e.g. the cache_owner from AbstractInterpreter): the particular inference configuration that generated this, or the token `nothing` for the builtin rules - `min_world`/`max_world`: this inclusive set defines when the intrinsic properties are valid - `edges` (not yet present): This collects the inference predicates that fed into defining this. It is an exact dual of the `worlds`, in that the `worlds` can be computed from the `edges` and the `edges` could be computed from the `worlds`: - Computing `worlds` from `edges` is the intersection of the `worlds` in each of the `edges`. - Computing `edges` from `worlds` is inference, recording each of the intrinsic properties that influenced this result. The "intrinsic" properties are most of the remaining fields (except for `next` which is a structural property), such as: `inferred`, `invoke`, `ipo_effects`, `rettype`, and so on. The list is not fixed, but rather is any property that we decide to add to inference to track. Of particular note: these fields do _not_ define a specialization axis, but rather are defined _by_ it. One particularly crucial aspect of this is that it means they can be narrowed at any time by an intersection over their lattice, provided that they remain compatible with the `worlds` fields--or that the `worlds` can be widened provided that they remain compatible with thes properties. Of additional note: the `edges` list is also not part of the intrinsic properties, since it is equivalent to `worlds`. However, for implementation simplicity, we don't typically narrow these intrinsic properties, as those updates can race with another thread trying to widen the `worlds`. However, it is legal to narrow these properties over their lattice, as long as that is consistent with the current (or wider) `worlds`, and there are cases where we will do so. Generally `edges` doesn't get changed, as there is no reason to add new edges and intersection on its lattice is rather complicated. Lastly there is a "debuginfo" field. This seems to fall somewhat between being an intrinsic property and an edge, as it contains the metadata of what source code was inlined to create the "inferred" field contents (and thus also the "invoke" field contents). This makes it an extrinsic property like an edge, but as an edge to the source code definition of that exact Method, rather than to just the inferred intrinsic properties. Additionally, a new boolean field can be added to the `MethodInstance` that states whether it is currently the invoke target for its own signature in `world_counter` world. ## Usage Steps Now lets turn to looking at how these definitions connect in a simple program, them we will add details about concurrency and (de)serialization. First the user defines some Methods which get added to the global MethodTable (represented piecemeal in lookups from various `TypeName` fields). Secondly the user calls one of these methods, resulting in selection of a Method by comparison of `morespecific`. These ends up being heavily cached in various places, such that this lookup in this particular world can be repeated efficiently and quickly. Next, the caching logic selects a particular `MethodInstance` to represent the desired monomorphization (this is usually, but not always, the same as what will be codegened later). Then it tries to invoke this MethodInstance, leading to the next phase: CodeInstance creation. The cache then tries to locate a CodeInstance that is valid for executing in the current world. Upon failing that, it asks type-inference (`jl_type_infer`) to give it one. If inference fails to do that, it will create one without optimizations. Each of these steps is mediated by the JIT engine (henceforth just called "engine", as it does not actually JIT or optimize code itself, but simply maintains the ownership of it), as follows. In the following description, the word "then" should be taken to imply a SCC graph ordering, not a linear ordering. - The process first submits a notification to the engine for a right to create this CodeInstance. The engine will atomically determine if another thread has already started work on an equivalent request and give back a handle to that work. This will be repeated for each vertex in the graph, the the process will collect quite a number of these rights. These are not locks, as the presence of cycles in inference may lead to different theads needing to acquire these rights in a different order. Thus, in particular, failure to acquire one of these rights will need extra handling. - _Initially, the process will simply acquire the `codegen_lock` before asking for any rights and while holding any rights. This simplifies the initial implementation, although it prevents any parallelism from being realized yet._ - The process then runs inference, resulting in a cache of inference details for a graph of functions, in an internal representation (e.g. `InferenceResult`). - The process then runs optimizations, resulting a cache of optimizations, still in an internal representation. - The process then releases those rights with the engine by trading it for an object known to the engine, such as a `CodeInfo`. In exchange, the engine gives it back a `CodeInstance` object containing the intrinsic data and worlds that identify it as an edge. - The engine can be asked any time later for a native pointer and will provide one in it. However, if it has lost the ability to generate one exactly for the specific intrinsic properties, the native pointers (both `invoke` and `specptr`) might be a trampoline that calls back to invoke on the MethodInstance and restarts this process later. It might even be an interpreter call. This is because it is always legal to redispatch from CodeInstance to MethodInstance, since the intrinsic properties are inherent to the any execution in the given world, and not inherent to that particular copy of it (e.g. because invalidation of worlds is not possible). - Alternatively, the process can instead release the rights without fufulling it at all, or partially fulfill it, or fulfill it with only the exact content needed for the engine to create a `CodeInstance` (including llvm bitcode or native pointers). The process might not fulfill it at all if LimitedAccuracy got returned, for example. - The process might partially fulfill it if it cannot get a later right that it needed to finish creating it. When a process only partially fulfills a request, it also may chose to lose any or all of the other rights it was holding and enter a waiting state. The next time a process tries to take that right, it can choose to steal as much of that partial work as it wants to. Eventually when that right becomes available or has been fulfilled, the process will resume and can finish fulfilling any others that didn't get fulfilled. ## Edges Everything up to this point is a description of the containers, so now lets turn our attention to the contents and the properties of those contents. Otherwise known as "edges". So lets talk about what those do. Edges represent the complete set of facts use by inference interproceedurally. Edges also represent the static call graph. This is almost the same, but not in all cases. Previously, the former set of facts was only represented as backedges from a MethodInstance to other MethodInstances, as a summary of the CodeInstances. Additionly, the backedges only represented the set of facts that might be falsified later. Therefore, they exclude info that already has an upper bound on the usable world or which did not influence the final resulting code generated. Previously also, the latter was represented by Expr(:invoke) calls to a MethodInstance. Except during incremental precompile, when both used a third representation. Now, it would benefit us to try to unify those. What I propose is this: The argument to `Expr(:invoke)` is changed from a MethodInstance to an integer index into the edges table, and commits codegen to emitting a particular call signature based on the intrinsic properties at that edge table index. The edges table is a list of CodeInstance objects (each of them typically in the cache, though not required). These serve several different purposes, and the valid fields to examine change based on the consumer (take care here!): - When `codegen` examines the CodeInstance, from an `:invoke` target, it may use the `spectypes` and `rettypes` field as well as any other intrinsic properties it considers useful. Exluded are `inferred` and `invoke`. In particular, codegen does not in any way recurse. - When `inference` examines a CodeInstance, from a cache lookup, it may use the same fields as `codegen`, plus `inferred`. In particular, it may recurse to recreate `inferred` also, if it chooses to. - If the `optimizer` was to examine a CodeInstance (such as from a cache lookup or `:invoke`), it may use the same fields as `codegen`, which does not give it much useful or stable info. Instead, this information should typically be given to it by `inference`. - When `serialization` examines a CodeInstance, it keeps the edges and intrinsic properties, but discards the `worlds`. - When `method-add` examines a CodeInstance following backedges, it truncates the `max_world` field of any it finds contains an edge that intersects with the new Method - When `deserialization` or re-validation examines a CodeInstance: - from serialization: it recomputes the valid worlds, for the given set of intrinsic properties, from the edges, if the edges are recursively still valid - from recursion: there must be found a provably equivalent set of edges, which can be proven in several ways - the set of method lookup results is identical - or inference over the lookup has the same intrinsic properties, but has a new set of edges ### Edges Encoding There are several other encodings possible in the table, as well as just a CodeInstance, which have slightly different meanings for "identical method lookup". In these cases, where it states a lookup of a CodeInstance, it means specifically the extrinsic properties (e.g. just the `MethodInstance`). For example that computing `only(methods_by_ftype(ci.def.specTypes))` returns `ci.def.def::Method` in the lookup world. - A CodeInstance alone means it is both the call target and specialization target. This is most common, as it works for all `call` where there is only one result and a rettype was successfully inferred. - A type followed by a CodeInstance means it is an invoke target. The same as a `call`, but only considers the exact subtype lookup of that `type` lookup results in that CodeInstance, and does not consider all intersections. - An `Int` value, `N`, followed by a type followed by `N` CodeInstances means that a lookup of methods via `type` should result in all of those following CodeInstances (and no others). - Corollary: A `0` followed by a type means it is a "missing" edge, that targets the lookup of that `type` being empty. - A MethodInstance can appear anywhere a CodeInstance is valid in the previous list, and means that that target was used for properties beyond what could be captured in a CodeInstance. For example, because it was inlined or because some form of constant propagation was run that refined the result. It may be necessary to also simulataneously inline the edges from that CodeInstance, if we are not keeping that CodeInstance around, to preserve that graph connection for invalidation. If those edge properties all hold, recursively over each CodeInstance in the list, then the current CodeInstance is valid. Otherwise, inference can attempt to locate replacement CodeInstances for each of those, such that the intrinsic properties of it still hold, resulting a copy of the current CodeInstance with the same `inferred` field and intrinsic properties, but with a different list of edges in it. ## Implementation APIs This is a list of the APIs expected to be needed to implement the runtime, as well as the documentation to include with them. Note that `jl_engine_reserve` is where most of the state management complexity occurs. The APIs here correspond most closely to the current `SOURCE_MODE_ABI`, while `SOURCE_MODE_FORCE_SOURCE*` will not be relevant (or possible) anymore. Any caller that used `SOURCE_MODE_FORCE_SOURCE*` in the past will use the `(Core.Compiler).code_typed` reflection utility instead from Julia, as everything becomes "push" based instead of calling back into inference from C for these utilities. ```cpp= // Request inference to populate an optimized jl_code_instance_t for execution //jl_code_instance_t *jl_type_infer(jl_method_instance_t *m, size_t world, int notrequired) __nullable; void jl_type_infer(jl_code_instance_t *ci, size_t world, int notrequired) __nullable; // Request codegen to emit native code for a given src object // Note that there is no recursion/workqueue. If the code is not available already for a given edge, no attempt is made to infer it and it simply immediately falls back to making a slower trampoline. jl_llvm_functions_t jl_emit_code(Module &M, jl_code_info_t *src, jl_codegen_params_t &params) __nullable; // Request engine to grant the right to infer m for owner (that includes world) // This may block if another thread already has reserved this right. // Return either a new, empty, unattached jl_code_instance_t that just records the reservation (with the `partial` boolean set) // or an previously inferred object. The `inferred` field is volatile, so if the caller needs to ensure that sticks around, it should copy it to a local field. // If it is empty (the `partial` boolean is true), the caller may use the `inferred` field to stash a handle to the current progress. // If the `partial` boolean is false, then inference is done on this, and it is now part of the global cache. Do not mutate this further. There may or may not be source code available for it in the `inferred` field (assume not). // The `leases` array is a list of all current rights that it is holding. If the call blocks, other tasks may be permitted to borrow those rights. All rights will be returned to the caller (possibly still incompleted or possibly completed) when this returns. If they are already completed, the caller should not try to fulfill them again. // The `leases` array can be NULL, but the engine will not block (will return NULL) if it cannot be immediately fulfilled. This is useful if the caller knows that it can keep working on other things while it waits. // Additionally, this will return NULL if the request would result in a deadlock (because it already has that reservation). The caller should treat that result similar to toplevel LimitedAccuracy (and avoid caching this current result to force use of the fallback interpreter). // The returned object may have a finalizer attached, such that if this gets forgotten, we can print a warning and reject the promise implicitly, as well as release any internal resources before the object becomes invalid. jl_code_instance_t *jl_engine_reserve(jl_method_instance_t *m, size_t world, jl_value_t *owner, int notrequired, jl_array_t *leases __nullable) __nullable; // Fulfill a promise made for the right to infer a signature, which will finish populating `ci` with the data from `src`, call `jl_emit_code` with this to associate it with LLVM code in the JIT, and release all tasks waiting on this lease afterwards. This also handles installing their backedges correctly. void jl_engine_fulfill(jl_code_instance_t *ci, jl_code_info_t *src); // Advanced API for jl_engine_fulfill which takes arguments separately instead of nicely pre-packaged in a src object. The user is required to pass in either an LLVM Module or populate native function pointers as well. void jl_engine_fulfill_advanced(jl_code_instance_t *ci, /* intrinsic properties, etc */); // Sometimes fulfilling a promise requires simultaneously fulfilling other promises, since they have circularities in their edges. Since all edges must be valid before fulfilling a reservation, sometimes it will be necessary to return an entire list of strongly connected components simultaneously. void jl_engine_fulfill_scc(jl_code_instance_t *ci[], jl_code_info_t *src[]); void jl_engine_fulfill_scc_advanced(jl_code_instance_t *ci[], /* arrays of all the intrinsic properties, etc */); // Reject a promise made for the right to infer a signature (garbage collect it or allow the next task waiting on the lease to take it back). void jl_engine_reject(jl_code_instance_t *ci); // Query if there is a jl_code_instance_t from an older world, which is already inferred but might not be relevant, so that inference can decide if it is relevant jl_code_instance_t *jl_engine_closest(jl_method_instance_t *m, jl_value_t *owner, size_t world) __nullable; // Internally, the engine can add a Module to the OrcJIT and retrieve a list of function pointers to content in it, given a map of edges by index to their corresponding function (jl_callptr_t) or specialization (specsig) pointers that was derived from the edges list of the CodeInstance along with the calling convention selector. void jl_add_to_ee(std::unique_ptr<ThreadSafeModule> M, std::vector<jl_fptr_t> LinkMap); ``` For example, here is roughly the body of the fallback code: ```cpp= jl_code_instance_t *fallback_inference(jl_method_instance_t *mi, jl_value_t *owner, size_t world) { jl_code_instance_t *ci = jl_engine_reserve(mi, owner, world, jl_an_empty_array); if (ci->partial) { JL_GC_PUSH1(&ci); jl_code_info_t *src = jl_code_for_interpreter(mi, world); if (src) jl_engine_fulfill(ci, src); else { jl_engine_reject(ci); ci = NULL; } JL_GC_POP(); } return ci; } ``` The non-fallback code should look very similar to this, except of course have inference in there too! This also as a special case can be run without a reservation too, to get out of situations where it would cause deadlocks. ## Implementation progress - [x] add edges and rettype fields to CodeInfo - [x] refactor inference to allocate CodeInstance up-front, then fill it in and cache it at the end (instead of doing both at once) - [ ] keep codegen data as an array (svec) until after optimizations, so we can drop any unused values that do not survive it instead of adding them to global roots immediately - [ ] add an atomic byte to CodeInfo to control whether the contents are currently shared or volatile (instead of is_src_volatile); remove explicit compression from optimization (in lieu of just doing codegen/optimization more eagerly--on a thread--if code is not inline-able) - [x] teach inference to correctly generate edges - [x] teach optimizer to more correctly generate edges - [ ] ~~add isonlymatch boolean to MethodInstance, or world age of last success~~ - [x] convert staticdata to using edges, rather than recomputing from backedges - [x] alter codegen to take a CodeInfo directly, never computing it - [x] update code_llvm implementation functions to take as input a CodeInfo (from code_typed), never computing it (similar for code_llvm dump_module=true & take a CodeInstance directly for dump_module=false) - [ ] update codegen to expect :invoke to have a CodeInstance, to get specsig explicitly, never computing it, and deprecate having a MethodInstance there - [ ] update downstream consumers as needed (e.g. GPUCompiler, JuliaInterpreter, Cthulhu) - [x] add multithreading "engine" - [ ] add workstealing when compilation begins in JIT - [ ] teach runtime how/when legal to "reuse" a CodeInstance - [ ] teach JIT how to register with the engine before compiling, instead of using big codegen lock (BCL)