# TypedCallable
## Introduction
This document proposes a new built-in closure object for efficiently calling a runtime-known function with compile-time known argument types, similar to a C function pointer (but for Julia code).
Users can construct a `TypedCallable{AT,RT}` from any callable object `f` and a set of argument + return types:
```julia
global logger::TypedCallable{Tuple{String},Nothing} # declare type for type-stability
logger = TypedCallable{Tuple{String},Nothing}(FileLogger()) # set logger dynamically
logger("Test logging message") # calls `(::FileLogger)(::String)`
```
Calling this object (e.g. `typed_callable(args...)`) is then a fast equivalent of:
```julia
invokelatest(f, (args::AT)...)::RT
```
i.e. calling `f` in the latest world with an implied set of argument and return type assertions.
Unlike a typical "dynamic dispatch", calling a `TypedCallable` can be nearly as fast as a fully static function invocation. The fast execution path is an integer comparison + indirect jump via a function pointer.
It also infers well by default, without additional type assertions.
### Advantages
The primary goals of `TypedCallable` are to provide (1) **performance** and (2) **`--trim`-compatibility** for what would normally be a "dynamic dispatch" (a comparatively slow and `--trim` incompatible function call).
The returned object:
- uses fast, internal ABIs for calling the wrapped function
- infers well with existing user code
- supports `--trim` compilation
As a secondary benefit, the built-in argument / return type-assertions help document / enforce an **interface** for highly-dynamic code with third-party callbacks, etc.
## Use Cases / Motivation
It is common to encounter user code that wishes to invoke a function of unknown identity on known arguments.
Because the compiler does not know the identity of the object `f`, these are currently forced to fallback to a fully-dynamic dispatch, incurring CPU / memory overhead. Even worse, these calls often infer very poorly by default, often tainting downstream type inference with `::Any`. They also frequently suffer from invalidations that require awkward "dispatch barriers" to resolve.
These issues can be resolved, but they are costly to discover, requiring `SnoopCompile`, `code_typed`, etc. to find. Even after the inference / invalidation issues are fixed, the resulting call is still slow compared to a standard `invoke`:
```julia
# naive version:
stringified = prepare_message(message) # infers poorly + invalidates!
# "dispatch barrier" + type-assert:
stringified = invokelatest(prepare_message, message)::String # works, but slow to call!
```
None of these solutions provide `--trim` compatibility.
#### Examples
- `LazyLibrary`: calls `onload` callback for all of its dynamically-registered dependencies
- `Logging`: calls `logger(::String)` for a dynamically-registered `logger` object
- `atexit()`, calls `f()` for each dynamically-registered `f` object
## Implementation
Any "function pointer" in Julia has to perform dispatch in three steps:
```
(1) `invokelatest` dispatch (cached)
⤷ (2) ABI bridge (`specfun` adapter)
⤷ (3) callee function / CodeInstance
```
This proposal recommends:
- **(1)** is an (inline) responsibility of the **caller**
- **(2)** is a dedicated ABI bridge (shared with `@cfunction`, etc.)
- **(3)** is a standard `CodeInstance` (as `invoke`d by "normal" Julia code)
#### Object layout
To implement the above scheme, a `TypedCallable` will have two pieces:
```julia
# similar to `@cfunction` / `@ccallable`
mutable struct InvokeLatestTrampoline
# This is not a first-class object - it is specially created and cached
# by the Julia runtime. This is just a sketch of its contents.
@atomic world::UInt # latest validated world
@atomic specptr::Ptr{Cvoid} # ABI bridge (2)
ci::CodeInstance # callee CI (3)
mi::MethodInstance # call signature - used for updates
# + addl. fields as needed for caching
end
struct TypedCallable{AT,RT}
f::Any
fptr::Ptr{InvokeLatestTrampoline}
end
```
This split has two advantages:
1. It allows the `trampoline` object to be cached together with the ABI bridge itself and re-used for both `ccallable` / `cfunction` and `TypedCallable`
2. It means that constructing a TypeCallable with a known ABI + cached adapter only requires allocating `f` (which is often free, due to being a singleton / pre-allocated)
### Implementation Challenges
#### ABI adapters + caching
Due to its fully-dynamic design, `TypedCallable` requires additional caching support versus `@cfunction` / `@ccallable`. These can predict the ABI + function object "statically", but `TypedCallable` can emit for many different ABI's / function-objects dynamically.
To avoid un-bounded JIT work, the `InvokeLatestTrampoline` and ABI bridge (machine code) both require caching by the runtime. This cache is also required for `--trim` support, so that trampolines / adapters can be generated ahead-of-time and saved to the resulting `pkgimage`.
#### Serialization + `--trim` support
Serialization will be a four-step process:
1. Query the runtime (GC or cache) for a conservative set of live `TypedCallable` objects.
2. Add each `TypedCallable`'s code to the serialization queue.
3. Perform code-generation.
4. During the serialization walk of system data (when we perform trimming), prune any `TypedCallable` code whose objects did not end up in the final image.
This allows the runtime to `--trim` the code associated with a TypedCallable, without having to repeat steps (3) and (4) in a loop, which can be difficult to guarantee will reach a fixed-point and to respect mutation constraints (since the cache system is part of what is being scanned).
## Prior Solutions
#### FunctionWrappers.jl
FunctionWrappers.jl fills a very similar niche, but:
1. It is restricted to C ABI functions (limited `kwargs` support, no custom `union-split` ABI, etc.)
2. It is not `--trim` compatible by design (the internal `Ptr{Cvoid}` cannot be understood as a JIT-provided callback object by the serializer)
#### Base.Experimental.OpaqueClosure
There are two problems with OpaqueClosure:
1. invoked in a frozen `world`
2. they cannot survive `precompile` (similar to `Ptr`, they become `NULL`)
The first is a problem for users who expect a closure to behave like a normal Julia closure. Freezing the world at closure construction time makes many "reasonable" Method definition orderings fail:
```julia
const print_stdout = TypedClosure{Tuple{Any},Nothing}((x) -> print(stdout, x))
# this might be done by the user interactively, or in some package that is using'd
struct FancyNumber <: Number
val::Int
end
Base.print(io::IO, x::FancyNumber) = print(io, "FancyNumber(val=$(x.val))")
# despite the added methods, this fails with a MethodError
print_stdout(FancyNumber(1))
```
This is especially common when the closure uses abstract types in an argument (as above), but it can also occur when accessing `mutable` or global data that returns types whose methods are only recently available.
The second problem is that an `OpaqueClosure` is unable to tell the runtime which Julia code it was derived from, which means that the `pkgimage` serializer has no way to re-generate its machine code for ahead-of-time compilation or restore it properly when `using` a package.
This can easily lead to bugs when saving / storing OpaqueClosures:
```julia
const oc = @opaque ()->println("Hello")
function __init__()
oc() # uh-oh, using an OpaqueClosure defined at pre-compile time
# the function pointer is NULL now, this crashes!
end
```
In the future this may return a `MissingCodeError` instead of seg-faulting, but either way it will not run like a user would expect.