Let's take a look at each of these internal representations of code, and document which of the fields are relevant to each consumer. This is not necessarily the current state, but rather the intended future state.
First we have Method
: this seems fairly self-explanatory. This represents a user-defined dispatch entry target.
Next we have MethodInstance
: this is the pair of details: (Method, specialization types)
which uniquely defines each object by egal and type-equality.
Next we have CodeInfo
: this describes a precise input to codegen. It might be unspecialized (just the direct output of code lowering) or optimized. It contains the statements of the function, and some metadata for each of the statements. It may contain some extra data unused by codegen, but generally that data may not be available.
Next we have IRCode
: this represents a superset of the information in CodeInfo
, but also in particular represents it in a exploded way that is easier to mutate. The CodeInfo
representation is optimized for size when compressed instead. It might not be easy to mutate this further (though it can be appoximately converted back to an IRCode
with some effort)
And finally we have CodeInstance
: this represents an executable chunk OR also an inference edge. It will be important to know which fields are valid for each use case, so this will be more refined further now. A CodeInstance
has both "extrinsic" and "intrinsic" properties (using the terms as defined here https://hackmd.io/@Og2_pcUySm6R_RPqbZ06JA/S1bqP1D_6), which are as follows:
The "extrinsic" properties are:
MethodInstance
: The cache that contained this. This might be the same as the def
field (as is guaranteed now), or perhaps might be a subtype of def
in the future.owner
(e.g. the cache_owner from AbstractInterpreter): the particular inference configuration that generated this, or the token nothing
for the builtin rulesmin_world
/max_world
: this inclusive set defines when the intrinsic properties are validedges
(not yet present): This collects the inference predicates that fed into defining this. It is an exact dual of the worlds
, in that the worlds
can be computed from the edges
and the edges
could be computed from the worlds
:
worlds
from edges
is the intersection of the worlds
in each of the edges
.edges
from worlds
is inference, recording each of the intrinsic properties that influenced this result.The "intrinsic" properties are most of the remaining fields (except for next
which is a structural property), such as: inferred
, invoke
, ipo_effects
, rettype
, and so on. The list is not fixed, but rather is any property that we decide to add to inference to track. Of particular note: these fields do not define a specialization axis, but rather are defined by it. One particularly crucial aspect of this is that it means semantically they can be narrowed at any time by an intersection over their lattice, provided that they remain compatible with the worlds
fields–or that the worlds
can be widened provided that they remain compatible with thes properties. Of additional note: the edges
list is also not part of the intrinsic properties, since it is equivalent to worlds
. However, for implementation simplicity, we don't typically narrow these intrinsic properties, as those updates can race with another thread trying to widen the worlds
. However, it is legal to observe that these properties can be narrower over their lattice if consistent with the current (or wider) worlds
, and there may be cases where we will do so. Generally edges
doesn't get changed, as there is no reason to add new edges and intersection on its lattice is rather complicated.
Lastly there is a "debuginfo" field. This seems to fall somewhat between being an intrinsic property and an edge, as it contains the metadata of what source code was inlined to create the "inferred" field contents (and thus also the "invoke" field contents). This makes it an extrinsic property like an edge, but as an edge to the source code definition of that exact Method, rather than to just the inferred intrinsic properties.
Additionally, a new boolean field can be added to the MethodInstance
that states whether it is currently the invoke target for its own signature in world_counter
world.
Now lets turn to looking at how these definitions connect in a simple program, them we will add details about concurrency and (de)serialization.
First the user defines some Methods which get added to the global MethodTable (represented piecemeal in lookups from various TypeName
fields).
Secondly the user calls one of these methods, resulting in selection of a Method by comparison of morespecific
. These ends up being heavily cached in various places, such that this lookup in this particular world can be repeated efficiently and quickly.
Next, the caching logic selects a particular MethodInstance
to represent the desired monomorphization (this is usually, but not always, the same as what will be codegened later).
Then it tries to invoke this MethodInstance, leading to the next phase: CodeInstance creation. The cache then tries to locate a CodeInstance that is valid for executing in the current world. Upon failing that, it asks type-inference (jl_type_infer
) to give it one. If inference fails to do that, it will create one without optimizations. Each of these steps is mediated by the JIT engine (henceforth just called "engine", as it does not actually JIT or optimize code itself, but simply maintains the ownership of it), as follows. In the following description, the word "then" should be taken to imply a SCC graph ordering, not a linear ordering.
codegen_lock
before asking for any rights and while holding any rights. This simplifies the initial implementation, although it prevents any parallelism from being realized yet.InferenceResult
).CodeInfo
. In exchange, the engine gives it back a CodeInstance
object containing the intrinsic data and worlds that identify it as an edge.
invoke
and specptr
) might be a trampoline that calls back to invoke on the MethodInstance and restarts this process later. It might even be an interpreter call. This is because it is always legal to redispatch from CodeInstance to MethodInstance, since the intrinsic properties are inherent to the any execution in the given world, and not inherent to that particular copy of it (e.g. because invalidation of worlds is not possible).CodeInstance
(including llvm bitcode or native pointers). The process might not fulfill it at all if LimitedAccuracy got returned, for example.Everything up to this point is a description of the containers, so now lets turn our attention to the contents and the properties of those contents. Otherwise known as "edges". So lets talk about what those do.
Edges represent the complete set of facts use by inference interproceedurally.
Edges also represent the static call graph. This is almost the same, but not in all cases.
Previously, the former set of facts was only represented as backedges from a MethodInstance to other MethodInstances, as a summary of the CodeInstances. Additionly, the backedges only represented the set of facts that might be falsified later. Therefore, they exclude info that already has an upper bound on the usable world or which did not influence the final resulting code generated.
Previously also, the latter was represented by Expr(:invoke) calls to a MethodInstance.
Except during incremental precompile, when both used a third representation.
Now, it would benefit us to try to unify those.
What I propose is this:
The argument to Expr(:invoke)
is changed from a MethodInstance to a CodeInstance, and commits codegen to emitting a particular call signature based on the intrinsic properties at that edge table index. If a MethodInstance is put there instead, it will generate an call to jl_invoke that instead. When compressed, this turns into an integer index into the edges table, instead of requiring adding to the root table.
The edges table is a list of CodeInstance objects (each of them typically in the cache, though not required). These serve several different purposes, and the valid fields to examine change based on the consumer (take care here!):
codegen
examines the CodeInstance, from an :invoke
target, it may use the spectypes
and rettypes
field as well as any other intrinsic properties it considers useful. Exluded are inferred
and invoke
. In particular, codegen does not in any way recurse.inference
examines a CodeInstance, from a cache lookup, it may use the same fields as codegen
, plus inferred
. In particular, it may recurse to recreate inferred
also, if it chooses to.optimizer
was to examine a CodeInstance (such as from a cache lookup or :invoke
), it may use the same fields as codegen
, which does not give it much useful or stable info. Instead, this information should typically be given to it by inference
.serialization
examines a CodeInstance, it keeps the edges and intrinsic properties, but discards the worlds
.method-add
examines a CodeInstance following backedges, it truncates the max_world
field of any it finds contains an edge that intersects with the new Methoddeserialization
or re-validation examines a CodeInstance:
There are several other encodings possible in the table, as well as just a CodeInstance, which have slightly different meanings for "identical method lookup". In these cases, where it states a lookup of a CodeInstance, it means specifically the extrinsic properties (e.g. just the MethodInstance
). For example that computing only(methods_by_ftype(ci.def.specTypes))
returns ci.def.def::Method
in the lookup world.
call
where there is only one result and a rettype was successfully inferred.call
, but only considers the exact subtype lookup of that type
lookup results in that CodeInstance, and does not consider all intersections.Int
value, N
, followed by a type followed by N
CodeInstances means that a lookup of methods via type
should result in all of those following CodeInstances (and no others).0
followed by a type means it is a "missing" edge, that targets the lookup of that type
being empty.If those edge properties all hold, recursively over each CodeInstance in the list, then the current CodeInstance is valid. Otherwise, inference can attempt to locate replacement CodeInstances for each of those, such that the intrinsic properties of it still hold, resulting a copy of the current CodeInstance with the same inferred
field and intrinsic properties, but with a different list of edges in it.
We also currently have a "legacy" edge kind also:
MethodTable
means that the lookup was incomplete for that signature, so we need an backedge from that table tooThis is a list of the APIs expected to be needed to implement the runtime, as well as the documentation to include with them. Note that jl_engine_reserve
is where most of the state management complexity occurs. The APIs here correspond most closely to the current SOURCE_MODE_ABI
, while SOURCE_MODE_FORCE_SOURCE*
will not be relevant (or possible) anymore. Any caller that used SOURCE_MODE_FORCE_SOURCE*
in the past will use the (Core.Compiler).code_typed
reflection utility instead from Julia, as everything becomes "push" based instead of calling back into inference from C for these utilities.
// Request inference to populate an optimized jl_code_instance_t for execution
//jl_code_instance_t *jl_type_infer(jl_method_instance_t *m, size_t world, int notrequired) __nullable;
void jl_type_infer(jl_code_instance_t *ci, size_t world, int notrequired) __nullable;
// Request codegen to emit native code for a given src object
// Note that there is no recursion/workqueue. If the code is not available already for a given edge, no attempt is made to infer it and it simply immediately falls back to making a slower trampoline.
jl_llvm_functions_t jl_emit_code(Module &M, jl_code_info_t *src, jl_codegen_params_t ¶ms) __nullable;
// Request engine to grant the right to infer m for owner (that includes world)
// This may block if another thread already has reserved this right.
// Return either a new, empty, unattached jl_code_instance_t that just records the reservation (with the `partial` boolean set)
// or an previously inferred object. The `inferred` field is volatile, so if the caller needs to ensure that sticks around, it should copy it to a local field.
// If it is empty (the `partial` boolean is true), the caller may use the `inferred` field to stash a handle to the current progress.
// If the `partial` boolean is false, then inference is done on this, and it is now part of the global cache. Do not mutate this further. There may or may not be source code available for it in the `inferred` field (assume not).
// The `leases` array is a list of all current rights that it is holding. If the call blocks, other tasks may be permitted to borrow those rights. All rights will be returned to the caller (possibly still incompleted or possibly completed) when this returns. If they are already completed, the caller should not try to fulfill them again.
// The `leases` array can be NULL, but the engine will not block (will return NULL) if it cannot be immediately fulfilled. This is useful if the caller knows that it can keep working on other things while it waits.
// Additionally, this will return NULL if the request would result in a deadlock (because it already has that reservation). The caller should treat that result similar to toplevel LimitedAccuracy (and avoid caching this current result to force use of the fallback interpreter).
// The returned object may have a finalizer attached, such that if this gets forgotten, we can print a warning and reject the promise implicitly, as well as release any internal resources before the object becomes invalid.
jl_code_instance_t *jl_engine_reserve(jl_method_instance_t *m, size_t world, jl_value_t *owner, int notrequired, jl_array_t *leases __nullable) __nullable;
// Fulfill a promise made for the right to infer a signature, which will finish populating `ci` with the data from `src`, call `jl_emit_code` with this to associate it with LLVM code in the JIT, and release all tasks waiting on this lease afterwards. This also handles installing their backedges correctly.
void jl_engine_fulfill(jl_code_instance_t *ci, jl_code_info_t *src);
// Advanced API for jl_engine_fulfill which takes arguments separately instead of nicely pre-packaged in a src object. The user is required to pass in either an LLVM Module or populate native function pointers as well.
void jl_engine_fulfill_advanced(jl_code_instance_t *ci, /* intrinsic properties, etc */);
// Sometimes fulfilling a promise requires simultaneously fulfilling other promises, since they have circularities in their edges. Since all edges must be valid before fulfilling a reservation, sometimes it will be necessary to return an entire list of strongly connected components simultaneously.
void jl_engine_fulfill_scc(jl_code_instance_t *ci[], jl_code_info_t *src[]);
void jl_engine_fulfill_scc_advanced(jl_code_instance_t *ci[], /* arrays of all the intrinsic properties, etc */);
// Reject a promise made for the right to infer a signature (garbage collect it or allow the next task waiting on the lease to take it back).
void jl_engine_reject(jl_code_instance_t *ci);
// Query if there is a jl_code_instance_t from an older world, which is already inferred but might not be relevant, so that inference can decide if it is relevant
jl_code_instance_t *jl_engine_closest(jl_method_instance_t *m, jl_value_t *owner, size_t world) __nullable;
// Internally, the engine can add a Module to the OrcJIT and retrieve a list of function pointers to content in it, given a map of edges by index to their corresponding function (jl_callptr_t) or specialization (specsig) pointers that was derived from the edges list of the CodeInstance along with the calling convention selector.
void jl_add_to_ee(std::unique_ptr<ThreadSafeModule> M, std::vector<jl_fptr_t> LinkMap);
For example, here is roughly the body of the fallback code:
jl_code_instance_t *fallback_inference(jl_method_instance_t *mi, jl_value_t *owner, size_t world)
{
jl_code_instance_t *ci = jl_engine_reserve(mi, owner, world, jl_an_empty_array);
if (ci->partial) {
JL_GC_PUSH1(&ci);
jl_code_info_t *src = jl_code_for_interpreter(mi, world);
if (src)
jl_engine_fulfill(ci, src);
else
{ jl_engine_reject(ci); ci = NULL; }
JL_GC_POP();
}
return ci;
}
The non-fallback code should look very similar to this, except of course have inference in there too! This also as a special case can be run without a reservation too, to get out of situations where it would cause deadlocks.