# `#compilernauts` meeting time
- zoom link: <https://mit.zoom.us/j/92880616939>
- meeting time: 9am Wed. (EDT) / 6am Wed. (PDT) / 10pm Wed. (JST) / 3pm Wed. (CET)
[TOC]
## July 06, 2022
Attendees:
- Shuhei K.
- Takafumi A.
- Jameson N.
- Gabriel B.
- Valentin C.
- Julian S.
Discussion:
- Shuhei: is worried about the compilation slowdown:
- seems to happen within Julia-level compilation? (appearing within JET/Cthulhu too)
```julia
~/julia/julia5 master
❯ ./usr/bin/julia -e 'using Cthulhu; @time Cthulhu.@interp sum(rand(1:100))'
5.256613 seconds (1.96 M allocations: 132.617 MiB, 0.56% gc time, 99.93% compilation time)
~/julia/julia5 master
❯ ./usr/bin/julia -e 'using Cthulhu; @time Cthulhu.@interp sum(rand(1:100))'
5.100521 seconds (1.96 M allocations: 132.617 MiB, 0.64% gc time, 99.94% compilation time)
~/julia/julia5 master
❯ ./usr/bin/julia -e 'using Cthulhu; @time Cthulhu.@interp sum(rand(1:100))'
5.169596 seconds (1.96 M allocations: 132.617 MiB, 0.54% gc time, 99.94% compilation time)
```
```julia
~/julia/julia4 remotes/origin/backports-release-1.8
❯ ./usr/bin/julia -e 'using Cthulhu; @time Cthulhu.@interp sum(rand(1:100))'
3.425331 seconds (2.82 M allocations: 139.480 MiB, 0.65% gc time, 99.78% compilation time)
~/julia/julia4 remotes/origin/backports-release-1.8
❯ ./usr/bin/julia -e 'using Cthulhu; @time Cthulhu.@interp sum(rand(1:100))'
3.348820 seconds (2.82 M allocations: 139.480 MiB, 0.64% gc time, 99.90% compilation time)
~/julia/julia4 remotes/origin/backports-release-1.8
❯ ./usr/bin/julia -e 'using Cthulhu; @time Cthulhu.@interp sum(rand(1:100))'
3.382658 seconds (2.82 M allocations: 139.480 MiB, 0.60% gc time, 99.90% compilation time)
```
## Done: June 22, 2022
Attendees:
- Shuhei K.
- Takafumi A.
- Jameson N.
- Gabriel B.
- Valentin C.
- Julian S.
Discussion:
- Shuhei:
- [inference perf improvements!](https://github.com/JuliaCI/NanosoldierReports/blob/master/benchmark/by_hash/3110266_vs_c683e1d/report.md)
- compiler performance enhancements
- [sparse inference state management](https://github.com/JuliaLang/julia/pull/45276)
- [inlining improvement](https://github.com/JuliaLang/julia/pull/44512)
- [a refactoring to reduce unnecessary copies](https://github.com/JuliaLang/julia/pull/45404)
- Gabriel and Valentin:
- changes to `jl_create_native` API for GPUCompiler
- GPUCompiler-using packages (CUDA, Enzyme, etc.) want to be able to cache more
- we would like to port `jl_create_native` and `jl_compile_workqueue` to Julia
- create an API: provide a `MethodInstance` -> returns module (TSM?), jlcall func, specialized func, ABI of specfunc, and roots
- recursion is difficult - return just one level of dependencies for each API call
- or provide a callback, which checks cache and returns cached name
- also returns a dictionary with pairs of MI -> LLVM function names
- what to do about globals
- return a mapping from LLVM global name -> Julia value (essentially a pointer)
- need to consider what information is sufficient to re-constitute the global across sessions
- lift the post-processing of globals so it can be customized
- should enable compilation with LLVM's ORC
## Done: June 8, 2022
Attendees:
- Takafumi A.
- Shuhei K.
- Jameson N.
- Tim B.
- Gabriel B.
Discussion:
- Shuhei:
- type-based alias analysis ([#41199](https://github.com/JuliaLang/julia/pull/41199/))
- no longer need to extract a field into a local variable to gain extra type refinement
```julia
x = Some{Any}(...)
# before
v = x.value
if isa(v, Int)
return sin(v#=::Int=#)
end
# after
if isa(x.value, Int)
return sin(x.value#=::Int=#)
end
# still
y = Ref{Any}(...)
if isa(y[], Int)
return sin(y[]#=::Any=#)
end
```
- will be merged soonish (a few pkgeval errors)
- Taka
- if we have some extra time: `unsafe_bitcast` https://github.com/JuliaLang/julia/pull/43065
## Skipped: May 25, 2022
## Done: May 18, 2022
Attendees:
- Takafumi A.
- Shuhei K.
- Jameson N.
- Prem C.
- Gabriel B.
Discussion:
- Shuhei:
- a bit interesting inference debugging exp. from [#45276](https://github.com/JuliaLang/julia/pull/45276):
- [a segfault](https://github.com/JuliaLang/julia/pull/45276#issuecomment-1128904439) reported in pkgeval: seemingly related to wrong type inference or something
- but no apparent difference in `code_typed(..., ; optimize=false|true)`
- the end result (`code_typed`) can be misleading in the presence of recursion
- => use Cthulhu
```julia
julia> using Cthulhu, MutableArithmetics
julia> interp, = Cthulhu.@interp MutableArithmetics.rewrite(:(-x+x));
julia> linfos = collect(keys(interp.unopt));
julia> targets = linfos[findall(linfos) do key
if key isa Core.Compiler.InferenceResult
key = key.linfo
end
key.def.name === :_rewrite
end]
7-element Vector{Union{Core.Compiler.InferenceResult, Core.MethodInstance}}:
MethodInstance for MutableArithmetics._rewrite(::Bool, ::Bool, ::Expr, ::Symbol, ::Vector{Any}, ::Vector{Any}, ::Symbol)
MethodInstance for MutableArithmetics._rewrite(::Bool, ::Bool, ::Any, ::Nothing, ::Vector{Any}, ::Vector{Any}, ::Symbol)
MethodInstance for MutableArithmetics._rewrite(::Bool, ::Bool, ::Expr, ::Nothing, ::Vector{Any}, ::Vector{Any}, ::Symbol)
MethodInstance for MutableArithmetics._rewrite(::Bool, ::Bool, ::Expr, ::Nothing, ::Vector{Any}, ::Vector{Any})
MethodInstance for MutableArithmetics._rewrite(::Bool, ::Bool, ::Any, ::Symbol, ::Vector{Any}, ::Vector{Any}, ::Symbol)
MethodInstance for MutableArithmetics._rewrite(::Bool, ::Bool, ::Any, ::Symbol, ::Vector{Any}, ::Vector{Any})
MethodInstance for MutableArithmetics._rewrite(::Bool, ::Bool, ::Any, ::Nothing, ::Vector{Any}, ::Vector{Any})
julia> interp.unopt[targets[5]].src # inspect...
```
- the root problem: we shouldn't let `typeinf_local` manage some local abstract interpretation state, but just let `InferenceState` so
- [fix](https://github.com/JuliaLang/julia/pull/45276/commits/d9b8ffb73255f49e1cb786854246ee02e7d83f1b)
- Taka:
- Quick check: Is it OK to run optimizer outside the typeinf lock at the moment? https://github.com/JuliaLang/julia/pull/45306#discussion_r874375278
## Done: May 11, 2022
Attendees:
- Valentin C.
- Shuhei K.
- Takafumi A.
- Prem C.
- Jameson N.
- Gabriel B.
Discussion:
- Shuhei:
- :tada: succeeded in bootstrapping/passing tests on "per-BB state abstract interpretation PR" [`avi/inferencerefactor`](https://github.com/JuliaLang/julia/pull/45276)!
- motivations
- inference flows up on a generated code that has many statements but a simple control-flow
- before _per statement_: `O(<number of statements>*<number of slots>)` state
- after _per basic-block_: `O(<number of basic blocks>*<number of slots>)` state
- problems
- [x] main changes by Keno
- [x] came with some orthogonal changes => separated them into separate pieces
- very good for understanding the code itself
- no involved, giant PR!
- [x] recover statement-level type information from per-BB post-analysis states
- in Julia, a single variable can have different types on different places
```julia
# we have `a::Union{Int,Float64}` in `src.slottypes`
function f(x::Int)
a = x::Int # SlotNumber(3) = SlotNumber(2)::Int => sin(TypedSlot(3, Int))
a = sin(a) # SlotNumber(3) = sin(SlotNumber(3))::Float64 => sin(TypedSlot(3, Float64))
a = cos(a) # SlotNumber(3) = cos(SlotNumber(3))::Float64 => sin(TypedSlot(3, Float64))
return a
end
```
- => linear scan for slot assignment & get statement-wise type information from `src.ssavaluetypes`
- [x] a bug in state propagation (regression in `undef`-marking)
- => track "unanalyzed-yet" blocks
- ideas?
- seems like there is still some regression in the inference routine itself
- theoretically it should improve memory usage in general
- used [CCProfile.jl](https://gist.github.com/aviatesk/54ffa6e99d77e7bd8824e3616961de7f) (same as the `"inference"` benchmark set)
- > on master
```
[ Info: Base.return_types(println, (QuoteNode,); interp = CCProfiler())
1.321158 seconds (3.78 M allocations: 252.495 MiB, 10.60% gc time, 99.99% compilation time)
[ Info: Base.return_types(println, (QuoteNode,); interp = CCProfiler())
0.845018 seconds (3.77 M allocations: 251.843 MiB, 12.76% gc time, 100.00% compilation time)
[ Info: Base.return_types(println, (QuoteNode,); interp = CCProfiler())
0.883337 seconds (3.77 M allocations: 251.843 MiB, 13.76% gc time, 100.00% compilation time)
[ Info: Base.return_types(sin, (Int,); interp = CCProfiler())
0.038477 seconds (287.92 k allocations: 19.233 MiB, 99.90% compilation time)
[ Info: Base.return_types(Base.init_stdio, (Ptr{Cvoid},); interp = CCProfiler())
5.175899 seconds (15.55 M allocations: 1023.103 MiB, 12.68% gc time, 100.00% compilation time)
[ Info: Base.return_types(CC.construct_ssa!, (Core.CodeInfo, CC.IRCode, CC.DomTree, Vector{CC.SlotInfo}, Vector{Any}); interp = CCProfiler())
0.842408 seconds (3.13 M allocations: 213.755 MiB, 18.34% gc time, 99.93% compilation time)
[ Info: Base.return_types(CC.domsort_ssa!, (CC.IRCode, CC.DomTree); interp = CCProfiler())
0.539910 seconds (991.68 k allocations: 67.227 MiB, 15.00% gc time, 99.99% compilation time)
[ Info: Base.return_types(CC.abstract_call_gf_by_type, (CC.NativeInterpreter, Any, CC.ArgInfo, Any, CC.InferenceState, Int); interp = CCProfiler())
3.013060 seconds (11.82 M allocations: 806.225 MiB, 18.51% gc time, 100.00% compilation time)
```
- > on PR
```
[ Info: Base.return_types(println, (QuoteNode,); interp = CCProfiler())
1.264429 seconds (3.90 M allocations: 258.103 MiB, 13.06% gc time, 99.99% compilation time)
[ Info: Base.return_types(println, (QuoteNode,); interp = CCProfiler())
1.305389 seconds (3.90 M allocations: 257.429 MiB, 9.23% gc time, 100.00% compilation time)
[ Info: Base.return_types(println, (QuoteNode,); interp = CCProfiler())
0.992389 seconds (3.90 M allocations: 257.429 MiB, 11.55% gc time, 100.00% compilation time)
[ Info: Base.return_types(sin, (Int,); interp = CCProfiler())
0.062240 seconds (296.97 k allocations: 19.728 MiB, 33.22% gc time, 99.94% compilation time)
[ Info: Base.return_types(Base.init_stdio, (Ptr{Cvoid},); interp = CCProfiler())
4.143143 seconds (15.98 M allocations: 1.019 GiB, 12.32% gc time, 100.00% compilation time)
[ Info: Base.return_types(CC.construct_ssa!, (Core.CodeInfo, CC.IRCode, CC.DomTree, Vector{CC.SlotInfo}, Vector{Any}); interp = CCProfiler())
0.692781 seconds (3.24 M allocations: 217.713 MiB, 19.75% gc time, 99.99% compilation time)
[ Info: Base.return_types(CC.domsort_ssa!, (CC.IRCode, CC.DomTree); interp = CCProfiler())
0.277184 seconds (1.02 M allocations: 68.574 MiB, 10.58% gc time, 99.99% compilation time)
[ Info: Base.return_types(CC.abstract_call_gf_by_type, (CC.NativeInterpreter, Any, CC.ArgInfo, Any, CC.InferenceState, Int); interp = CCProfiler())
4.220345 seconds (12.33 M allocations: 828.224 MiB, 15.23% gc time, 100.00% compilation time)
```
- Valentin:
- propagate the new effect information to LLVM level?
- attach information during codegen
## Skipped: May 4, 2022
## Skipped: April 27, 2022
## Done: April 20, 2022
Attendees:
- Valentin C.
- Shuhei K.
- Takafumi A.
- Ian A.
- Prem C.
- Jameson N.
- Collin W.
- Julian S.
- Jeff B.
Discussion:
- Prem:
- ORC compile-on-demand progress
- InteractiveUtils test failure
- jl_dump_fptr_asm doesn't work (probably because our pointers are getting hot-swapped a lot)
- commenting out and using the fallback llvmf->dump_asm implementation fails for binary printing
- also fails on master with the same edit
- Memory Management
- 64-bit linux might work on LLVM 13 with our current memory manager
- Pooled memory managers
- windows is not supported due to debuginfo requirements
- Might be supported now
- macos failure is not debugged
- Bus error, Valentin suggests permissiosn are wrong?
- mac+aarch64 works because of JITLink
- Compilation timers?
- codegen-on-demand
- probably poor interaction with threadcall (need PTLS for garbage collection?)
- Valentin mentions @cfunction will pose an issue too
- typeinfer lock?
- If atomic load is re-read from, mark as consume (monotonic def)
- Can we require that all codegen globals be declared as part of JuliaOJIT class from now on?
## Done: April 13, 2022
Attendees:
- Valentin C.
- Shuhei K.
- Takafumi A.
- Ian A.
- Jameson N.
- Jeff B.
Discussion:
- Valentin
- https://github.com/JuliaLang/julia/pull/44527
- How to best implement linkage to functions in other images
- Jameson:
- this is broken: <https://github.com/JuliaLang/julia/blob/5ce65ab8e876e7b5fd371cb6b52eda428cba6829/base/compiler/abstractinterpretation.jl#L458>
- Shuhei
- problems with `return_type`/`infer_effects` approximations: revisiting [#35800](https://github.com/JuliaLang/julia/issues/35800) while working on [`Core.Compiler.infer_effects`](https://github.com/JuliaLang/julia/pull/44822)
- recursions may lead to idempotency issue, or even wrong result (esp. for `infer_effects`)
- local cycles: ~~ok~~ (assuming we have interprocedurally-valid approximations)
- interprocedural cycles: bad (we don't know if inference is approximated or not)
- idempotency issue: <https://github.com/JuliaLang/julia/issues/35800#issuecomment-1097605769>
- wrong approximation
- ```julia
find_consistent(::Tuple{}) = nothing
function find_consistent(ts::Tuple)
t = first(ts)
effects = Core.Compiler.infer_effects(maybe_consistent, Tuple{t})
if Core.Compiler.is_consistent(effects)
return t
end
return find_consistent(Base.tail(ts))
end
```
- `find_consistent` and `maybe_consistent` may be mutually recursive
- there doesn't seem to be a good way to judge if the result from virtualized `infer_effects` has been approximated or not
- it may be valid to use that information as far as it is "final" (i.e. `ALWAYS_TRUE`/`ALWAYS_FALSE`)
- but now we need to assume that the order of type-lattice corresponds to that of effect-lattice?
- approach:
- defer folding to optimization
- but now it's hard to fold succeeding operations e.g. `Core.Compiler.is_consistent(effects)::Bool`, and no DCE, etc.
- implement a reliable way to detect that inference of a child CFG has any cycles with the current frame?
- Jameson: "uncomputable presently (possibly always / fundamentally?)"
- Prem
- Sink typeinf lock into \_typeinf or lower?
- LLVM updating build toolchain requirements for C++17 (soft in LLVM15, hard in LLVM16)
- YT
- Speculative compilation + other JIT techniques?
## Skipped: April 6, 2022
- Skipped due to conflict with Atomics meeting
## Done: March 30, 2022
Attendees:
- Shuhei K.
- Ian A.
- Julian S.
- Jameson N.
- Collin W.
- Prem C.
- Takafumi A.
For discussion:
- Ian:
- Processing of `static_parameter`?
- https://github.com/JuliaLang/julia/pull/44660/files#diff-0fcc5b3f1870b66e0adbebb39ee9ad51f60194ee100f5c51e368c572ade9ca34R1798
- Advice for testing effects wrt Cthulhu callsites
- Maybe just use existing callsite tests and https://github.com/JuliaLang/julia/pull/44785
- Taka:
- `Core.Compiler.return_type` equivalent for the effect analysis
## Done: March 23, 2022
Attendees:
- Shuhei K.
- Jameson N.
- Takafumi A.
- Valentin C.
For discussion:
- Shuhei:
- [covariance of `Tuple`](https://github.com/JuliaLang/julia/issues/44705)
- concrete types aren't necessarily "final"?
```julia
julia> isconcretetype(DataType)
true
julia> Type{Int} <: DataType
true
```
- how does this interact with dispatch?
```julia
julia> foo(::DataType) = :DataType
foo (generic function with 1 method)
julia> foo(::Type{Int}) = :Int
foo (generic function with 2 methods)
julia> foo(Int)
:Int
julia> foo(Integer)
:DataType
```
```julia
julia> bar(::Tuple{DataType}) = :DataType
bar (generic function with 1 method)
julia> bar(::Tuple{Type{Int}}) = :Int
bar (generic function with 2 methods)
julia> bar((Int,))
:DataType
julia> bar((Integer,))
:DataType
julia> (Int,) isa Tuple{DataType}
true
julia> (Int,) isa Tuple{Type{Int}}
false
```
- how does it interact with union-split:
- transform `foo(x::Any)` to
```julia
if isa(x, Type{Int})
:Int
elseif isa(x, DataType)
:DataType
else
foo(x) # dispatch error
end
```
- transform `bar(x::Any)` to
```julia
if isa(x, Tuple{Type{Int}}) # doesn't work!
:Int
elseif isa(x, Tuple{DataType})
:DataType
else
foo(x) # dispatch error
end
```
## Done: March 16, 2022
Attendees:
- Shuhei K.
- Ian A.
- Julian S.
- Jameson N.
- Prem C.
- Jeff B.
- Takafumi A.
For discussion:
- Prem:
- hello
- past work
- loop allocation hoisting, bounds check elimination, array optimizations, context-sensitive codegen
- current work
- multi-context codegen, migration to TSModule/TSContext, instrumented codegen/optimization passes
- **finer codegen locking**
- potential future work
- module compile-on-demand JIT layer
- only run optimizations when we actually run the function, not when we codegen
- codegen-on-demand?
- no compile_workqueue, no codegen->typeinfer recursion, just insert function stubs in the IR that run typeinfer/compilation when called
- speculative compilation?
- locking problems
* Current framework:
- Lock priorities are hardcoded
* Problems:
- More parallelism = multiple TSCtx = unknown number of locks
- Codegen is arbitrarily recursive, no clean mechanism to send contexts from codegen -> typeinfer -> codegen
- Solution
- Create infinite pool of contexts and draw new one every time one appears in codegen
- Cannot limit size of pool or risk deadlock
- Pool size = O(concurrent codegen operations)
- Includes reentrant codegen
- Acquire/release context back to pool when we acquire/release codegen lock
- Doesn't guarantee compilation is never blocked (our JIT layers still need to acquire the interior context lock from time to time which competes with codegen), but should guarantee we don't deadlock within codegen
## Done: March 09, 2022
Attendees:
- Shuhei K.
- Ian A.
- Julian S.
- Jameson N.
- Collin W.
- Prem C.
- Valentin C.
- Jeff B.
- Takafumi A.
For discussion:
- Shuhei:
- any ideas on documentations of `Core.Compiler` coding tips?
- main point: how to deal with untyped objects in Julia
+ avoid excessive specializations
- use `@nospecialize` and `isa`-based specializations
- avoid complex `Union`s
+ avoid arbitrary `Tuple`-type constructions
- structures
+ the situation: excessive code specializations via runtime dispatch
- explain how arbitrary objects sneak into compiler code, e.g. `ex::Expr`
+ the basic strategies
- `@nospecialize` mechanism (c.f. [introduce `@noinfer` macro to tell the compiler to avoid excess inference](https://github.com/JuliaLang/julia/pull/41931))
- `isa`-based specializations
- caveats
- anonymous function: `any(x->isa(x, SSAValue), ex.args)`
- complex `Union`s: `Any` is better than `Union{Nothing,Type}` (at least until [optimizer: inline abstract union-split callsite](https://github.com/JuliaLang/julia/pull/44512))
- abstract field types: `sig::Any`/`item::Any` is often better than `sig::Type`/`item::Union{InliningTodo, MethodInstance, ConstantCase}`
+ avoid arbitrary `Tuple`-type constructions
- bad: `return ex.args[1], false` vs. good: `return SomeInfo(ex.args[1], false)`
## Done: March 02, 2022
Attendees:
- Shuhei K.
- Jameson N.
- Prem C.
- Valentin C.
- Jeff B.
- Takafumi A.
For discussion:
- Shuhei:
- some progress report on the compiler-plugin PJ
- contextual dispatch
- code transformation mechanism ("overdubbing mechanism")
- tagging system
- will take some time off from EA
- Taka:
- https://github.com/JuliaLang/julia/pull/44340
## Done: February 23, 2022
Attendees:
- Shuhei K.
- Ian A.
- Julian S.
- Jameson N.
- Collin W.
- Prem C.
- Valentin C.
- Jeff B.
- Takafumi A.
For discussion:
- Ian:
- Precondition/postcondition stuff & abstract interpreter?
- Taka
* Automatic promotion of a Julia `Task` to a Tapir task
* Tapir
```julia
Tapir.@sync begin
Tapir.@spawn f()
g()
end
```
compiles to
```
detach #child reattach to #cont
#child
f()
reattach #cont
#cont
g()
sync
```
* Two strategies:
* Approach 1:
Create Tapir-like control-flow-based IR for Julia `Task` (detach/reattach/sync)
* Approach 2:
Analyze `Task`'s closure and convert it to Tapir task
* Challenges of Approach 1:
* Traversing IR (`typeinf_local`) is tricky since you'd need to pretend that a control flow edge is very different ("like function call but not really").
* Challenges of Approach 2:
* How to avoid Box?
* Variables set multiple times outside of `@sync` => capture-by-value
* Variables that are OK to be task local => capture-by-value closure won't capture it
* Variables that are set in only one child; aka task output => special construct?
* Task output requires special handling of a Ref-like object? (something similar to "capturing a slot")
* Capture-by-value https://juliafolds.github.io/FLoops.jl/dev/explanation/faq/
## Done: February 16, 2022
Attendees:
- Shuhei K.
- Ian A.
- Julian S.
- Jameson N.
- Collin W.
- Prem C.
- Valentin C.
- Jeff B.
For discussion:
- Shuhei:
- EA-based SROA was unsuccessful!
- a serious problem of EA: EA's alias information just tracks values that can be aliased (and doesn't encode flow-sensitivity explicitly)
- good: control-flow can be recovered from locations of aliased values
```julia
let
if cond
x = Ref("x")
else
x = Ref("y")
end
return x[] # alias info: :x => [SSAValue(1), SSAValue(2)]
end
```
- bad: (still PhiNode itself has necessary information of CFG)
```julia
let
x = Ref("x")
y = Ref("y")
if cond
z = x
else
z = y
end
# we want to replace this load
# getfield(φ([x, y]), :x)
# with
# φ(["x", "y"])
return z[] # alias info: :x => [SSAValue(1), SSAValue(2)]
end
```
- worse:
```julia
let
x = Ref("x")
y = Ref("y")
if cond
z = x
else
z = y
end
z' = Ref(z)
return z'[][] # alias info: :x => [SSAValue(1), SSAValue(2)]
end
let
x = Ref("x")
y = Ref("y")
ifelse(cond, x, y)[] # alias info: :x => [SSAValue(1), SSAValue(2)]
end
```
- Valentin:
- release blocker: <https://github.com/JuliaLang/julia/issues/44174>
## Done: February 09, 2022
Attendees:
- Shuhei K.
- Ian A.
- Julian S.
- Jameson N.
- Collin W.
- Prem C.
- Valentin C.
For discussion:
- Ian
- Shuhei: next steps on EA?
- Revise fix
- See below :)
- Shuhei: EA TODOs...
- [x] latency problem
- [ ] interprocedural alias analysis
- ```julia
getlen(obj) = length(obj.s) # => impose AllEscape
code_escapes((String,)) do s
x = Ref(s)
# now: impose AllEscape on `x` (including `s`)
# want: impose AllEscape only on `s`
return getlen(x)
end
```
- prob:
- local (easy): propagate escape to aliased values
- interprocedural (complicated): no explicit aliased values, can be aliased to return values, etc...
- wip at [EscapeAnalysis.jl#93](https://github.com/aviatesk/EscapeAnalysis.jl/pull/93)
- [ ] `_apply_iterate`...
- background
- `Local EA`: works on post-inlining state, many things are inlined and simplified
- `IPO EA` works on pre-inlining state, calls are not resolved yet
- `IPO EA` only handles "simple" calls ATM, doesn't handle complicated calls (like `_apply_iterate`, OC call)
- ```julia
# IPO EA
julia> code_escapes(broadcast, (typeof(identity), Base.RefValue{String},); optimize=false)
broadcast(✓ _2::Core.Const(identity), X _3::Tuple{Base.RefValue{String}}) in Base.Broadcast at broadcast.jl:798
◌ 1 ─ %1 = Core.tuple(_2)::Core.Const((identity,))
X │ %2 = Core._apply_iterate(Base.iterate, Base.Broadcast.broadcasted, %1, _3)::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(identity), Tuple{Base.RefValue{String}}}
↑ │ %3 = Base.Broadcast.materialize(%2)::String
◌ └── return %3
# Local EA
julia> noescape(x) = (broadcast(identity, x); nothing)
noescape (generic function with 1 method)
julia> code_escapes((String,)) do s
x = Ref(s)
noescape(x)
end
#5(* _2::String) in Main at REPL[10]:2
*′ 1 ─ %1 = %new(Base.RefValue{String}, _2)::Base.RefValue{String}
◌ └── goto #3 if not true
◌ 2 ─ nothing::Nothing
* 3 ┄ Base.getfield(%1, :x)::String
◌ └── goto #4
◌ 4 ─ goto #5
◌ 5 ─ goto #6
◌ 6 ─ goto #7
◌ 7 ─ %9 = Main.nothing::Core.Const(nothing)
◌ └── goto #8
◌ 8 ─ return %9
julia> code_escapes((String,)) do s
x = Ref(s)
@noinline noescape(x)
end
#7(X _2::String) in Main at REPL[11]:2
X 1 ─ %1 = %new(Base.RefValue{String}, _2)::Base.RefValue{String}
◌ │ %2 = invoke Main.noescape(%1::Base.RefValue{String})::Core.Const(nothing)
◌ └── return %2
```
- [ ] edge tracking
- if an optimization uses interprocedural escape information, we need to add an backedge to it to fully support invalidations
- we need to generalize the inliner's `EdgeTracker` logic
- [x] account for method match ambiguity
- `IPO EA` needs to account for `ThrownEscape` via `MethodError`
- currently only accounts for `no method matching ...`, not for ambiguity error
- the same problem is happening at [`kf/effectsstaging`](https://github.com/JuliaLang/julia/pull/43852#discussion_r802069515)
## Done: February 02, 2022
Attendees:
- Shuhei K.
- Ian A.
- Jeff B.
- Julian S.
- Jameson N.
- Collin W.
- Prem C.
- Valentin C.
For Discussion:
- Ian
- [perf_sum4 regression due to LLVM13?](https://github.com/JuliaLang/julia/commit/dd0c14ba1d0add2ce89524a26684a1194a83312c)
- Idea: Linear types (use once)
- Seems like we may be able to statically determine this using EA-like analysis?
- Ref counting may be hard
- Could use a restricted version of linear types --- [uniqueness type (only one ref allowed)](https://docs.idris-lang.org/en/latest/reference/uniqueness-types.html)
- Still need some sort of borrowing mechanism
- Opens up possibilities for interesting / powerful optimizations
- Similar to ImmutableArrays optimization, could allow for in-place performance w/ better composability
- In distributed context:
- "unique" objects are threadsafe
- Linear channels play nicely with message passing [1](https://docs.rs/session_types/latest/session_types/) [2](https://alcestes.github.io/lchannels/)
- Shuhei
- EA progress
- finished most of the [remaining TODOs](https://github.com/JuliaLang/julia/pull/43800#issuecomment-1021310830)
- how EA will be used:
- pre-inlining EA (`IPO EA`): always, for generating interprocedural cache
- post-inlining EA (`Local EA`): selectively, used for local optimizations
- [some numbers](https://github.com/JuliaLang/julia/pull/43800#issuecomment-1027684071) on latency impact
- IPO with EA
- (obviously) local optimizations can be powered by interprocedural escape information
- e.g. load forwarding, `mutating_arrayfreeze`, finalizer elision, stack alloc
```julia
@noinline somecall(x) = length(x[])
let
x = Ref("julia")
l = somecall(x)
return x[], l # => "julia", l
end
```
- TODO: interprocedural alias analysis
- now: give up alias analysis whenever it's called by gf
- want: "if", "how"
- EA can consume effect analysis work by Keno?
- TODO: dimension propagation (for array SROA)
- (interesting) type inference: form `PartialStruct` (and even [`MustAlias`](https://github.com/JuliaLang/julia/pull/41199)) for mutables
- ```julia
@noinline somecall(x) = length(x[])
let
a = "julia"
x = Ref(a) # ::PartialStruct(Ref, Any[Const("julia")])
somecall(x)
return x[] # => Const("julia")
end
```
- consideration: how to add backedges properly?
- e.g. the backedge to `somecall` should be added
- currently lattice elements don't have a way to contribute to backedge tracking
- might be to another source of latency problem?
- => not likely, since types are supposed to be mostly concrete when we have successful escape information
- Valentin:
- https://github.com/JuliaLang/julia/pull/43990
## Done: January 26, 2022
Attendees:
- Shuhei K.
- Jameson N.
- Jeff B.
- Julian S.
For Discussion:
- Shuhei: Julia-level compilation benchmark
- script: [CCProfiler.jl](https://gist.github.com/aviatesk/54ffa6e99d77e7bd8824e3616961de7f)
- nanosoldier integration: [`“inference”` benchmark suite](https://github.com/JuliaCI/BaseBenchmarks.jl/blob/master/src/inference/InferenceBenchmarks.jl)
- Tim Holy [may work on the suite](https://github.com/JuliaLang/julia/issues/43157#issuecomment-1022171386) :)
- Shuhei: EA blockers
- all remaining TODOs: https://github.com/JuliaLang/julia/pull/43800#issuecomment-1021310830
- (not so hard) a proper cache infrastructure (integration with `CodeInstance`)
- (maybe hard) allow EA to run before inlining (for generating IPO cache)
- why before inlining?
- e.g. `fastmath` may change a branch, and then change the escape state of certain arguments
- inlining changes control flows (seriously)
- mem2reg would be safe (since it accounts for escapes by itself
- (problem) now EA needs to resolve calls by itself
- background: EA has borrowed the efforts of the inlinear to resolve calls
- (idea) EA resolves calls using the existing logic of the inlinear, then cascades that information to inlinear?
- Anyone: Finalizer elision
- Who wants to work on this?
- Julian will give it a try
- define an API for declaring a finalizer as not elidable
- implement Julia-level elision pass
- check if heap-allocated object escapes; if so, skip this object
- add direct call to `finalize` at end of object scope
- finalizer might appear to escape the object - might need changes to EA to detect this?