TypedCallable - HackMD

# TypedCallable ## Introduction This document proposes a new built-in closure object for efficiently calling a runtime-known function with compile-time known argument types, similar to a C function pointer (but for Julia code). Users can construct a `TypedCallable{AT,RT}` from any callable object `f` and a set of argument + return types: ```julia global logger::TypedCallable{Tuple{String},Nothing} # declare type for type-stability logger = TypedCallable{Tuple{String},Nothing}(FileLogger()) # set logger dynamically logger("Test logging message") # calls `(::FileLogger)(::String)` ``` Calling this object (e.g. `typed_callable(args...)`) is then a fast equivalent of: ```julia invokelatest(f, (args::AT)...)::RT ``` i.e. calling `f` in the latest world with an implied set of argument and return type assertions. Unlike a typical "dynamic dispatch", calling a `TypedCallable` can be nearly as fast as a fully static function invocation. The fast execution path is an integer comparison + indirect jump via a function pointer. It also infers well by default, without additional type assertions. ### Advantages The primary goals of `TypedCallable` are to provide (1) **performance** and (2) **`--trim`-compatibility** for what would normally be a "dynamic dispatch" (a comparatively slow and `--trim` incompatible function call). The returned object: - uses fast, internal ABIs for calling the wrapped function - infers well with existing user code - supports `--trim` compilation As a secondary benefit, the built-in argument / return type-assertions help document / enforce an **interface** for highly-dynamic code with third-party callbacks, etc. ## Use Cases / Motivation It is common to encounter user code that wishes to invoke a function of unknown identity on known arguments. Because the compiler does not know the identity of the object `f`, these are currently forced to fallback to a fully-dynamic dispatch, incurring CPU / memory overhead. Even worse, these calls often infer very poorly by default, often tainting downstream type inference with `::Any`. They also frequently suffer from invalidations that require awkward "dispatch barriers" to resolve. These issues can be resolved, but they are costly to discover, requiring `SnoopCompile`, `code_typed`, etc. to find. Even after the inference / invalidation issues are fixed, the resulting call is still slow compared to a standard `invoke`: ```julia # naive version: stringified = prepare_message(message) # infers poorly + invalidates! # "dispatch barrier" + type-assert: stringified = invokelatest(prepare_message, message)::String # works, but slow to call! ``` None of these solutions provide `--trim` compatibility. #### Examples - `LazyLibrary`: calls `onload` callback for all of its dynamically-registered dependencies - `Logging`: calls `logger(::String)` for a dynamically-registered `logger` object - `atexit()`, calls `f()` for each dynamically-registered `f` object ## Implementation Any "function pointer" in Julia has to perform dispatch in three steps: ``` (1) `invokelatest` dispatch (cached) ⤷ (2) ABI bridge (`specfun` adapter) ⤷ (3) callee function / CodeInstance ``` This proposal recommends: - **(1)** is an (inline) responsibility of the **caller** - **(2)** is a dedicated ABI bridge (shared with `@cfunction`, etc.) - **(3)** is a standard `CodeInstance` (as `invoke`d by "normal" Julia code) #### Object layout To implement the above scheme, a `TypedCallable` will have two pieces: ```julia # similar to `@cfunction` / `@ccallable` mutable struct InvokeLatestTrampoline # This is not a first-class object - it is specially created and cached # by the Julia runtime. This is just a sketch of its contents. @atomic world::UInt # latest validated world @atomic specptr::Ptr{Cvoid} # ABI bridge (2) ci::CodeInstance # callee CI (3) mi::MethodInstance # call signature - used for updates # + addl. fields as needed for caching end struct TypedCallable{AT,RT} f::Any fptr::Ptr{InvokeLatestTrampoline} end ``` This split has two advantages: 1. It allows the `trampoline` object to be cached together with the ABI bridge itself and re-used for both `ccallable` / `cfunction` and `TypedCallable` 2. It means that constructing a TypeCallable with a known ABI + cached adapter only requires allocating `f` (which is often free, due to being a singleton / pre-allocated) ### Implementation Challenges #### ABI adapters + caching Due to its fully-dynamic design, `TypedCallable` requires additional caching support versus `@cfunction` / `@ccallable`. These can predict the ABI + function object "statically", but `TypedCallable` can emit for many different ABI's / function-objects dynamically. To avoid un-bounded JIT work, the `InvokeLatestTrampoline` and ABI bridge (machine code) both require caching by the runtime. This cache is also required for `--trim` support, so that trampolines / adapters can be generated ahead-of-time and saved to the resulting `pkgimage`. #### Serialization + `--trim` support Serialization will be a four-step process: 1. Query the runtime (GC or cache) for a conservative set of live `TypedCallable` objects. 2. Add each `TypedCallable`'s code to the serialization queue. 3. Perform code-generation. 4. During the serialization walk of system data (when we perform trimming), prune any `TypedCallable` code whose objects did not end up in the final image. This allows the runtime to `--trim` the code associated with a TypedCallable, without having to repeat steps (3) and (4) in a loop, which can be difficult to guarantee will reach a fixed-point and to respect mutation constraints (since the cache system is part of what is being scanned). ## Prior Solutions #### FunctionWrappers.jl FunctionWrappers.jl fills a very similar niche, but: 1. It is restricted to C ABI functions (limited `kwargs` support, no custom `union-split` ABI, etc.) 2. It is not `--trim` compatible by design (the internal `Ptr{Cvoid}` cannot be understood as a JIT-provided callback object by the serializer) #### Base.Experimental.OpaqueClosure There are two problems with OpaqueClosure: 1. invoked in a frozen `world` 2. they cannot survive `precompile` (similar to `Ptr`, they become `NULL`) The first is a problem for users who expect a closure to behave like a normal Julia closure. Freezing the world at closure construction time makes many "reasonable" Method definition orderings fail: ```julia const print_stdout = TypedClosure{Tuple{Any},Nothing}((x) -> print(stdout, x)) # this might be done by the user interactively, or in some package that is using'd struct FancyNumber <: Number val::Int end Base.print(io::IO, x::FancyNumber) = print(io, "FancyNumber(val=$(x.val))") # despite the added methods, this fails with a MethodError print_stdout(FancyNumber(1)) ``` This is especially common when the closure uses abstract types in an argument (as above), but it can also occur when accessing `mutable` or global data that returns types whose methods are only recently available. The second problem is that an `OpaqueClosure` is unable to tell the runtime which Julia code it was derived from, which means that the `pkgimage` serializer has no way to re-generate its machine code for ahead-of-time compilation or restore it properly when `using` a package. This can easily lead to bugs when saving / storing OpaqueClosures: ```julia const oc = @opaque ()->println("Hello") function __init__() oc() # uh-oh, using an OpaqueClosure defined at pre-compile time # the function pointer is NULL now, this crashes! end ``` In the future this may return a `MissingCodeError` instead of seg-faulting, but either way it will not run like a user would expect.