Escape Analysis MTG @ 2021/06/14

# Escape Analysis MTG @ 2021/06/14 ## Wrap-ups of 2021/06/07 - we will try to implement escape analysis as a separate pass between inference and optimization * for easier prototyping without taking a risk of breaking/complicating inference * especially at least for this GSoC term - we will focus on using escape information for Julia-level optimizations * especially for: + **scalar replacements** and + **heap to stack allocation conversions** * although there are many other applications of escape information, they'll be out of scope for the term ## Possible Target Examples (Shuhei) ### Scalar Replacements Object fields are scalar-replaced only when they're: - immutable fields with constant fields, or * constant-folded at inference - mutable fields when callers are fully inlined at optimization: * this is obviously done by [`getfield_elim_pass`](https://github.com/JuliaLang/julia/blob/82ae530db62f364853b355bcd64b28c71445a2c4/base/compiler/ssair/passes.jl#L550-L885) * mutable objects can't be constant-folded at inference even if it's fully known to be constant ```julia struct Result inf opt end function Base.show(io::IO, res::Result) printstyled(io, "=== inference ===\n"; color=:cyan) show(io, res.inf) println(io) printstyled(io, "=== optimization ===\n"; color=:light_green) show(io, res.opt) end function inf_vs_opt(f, types=(); kwargs...) Result( first(code_typed(f, types; optimize=false, kwargs...)), first(code_typed(f, types; optimize=true, kwargs...))) end struct Immutable{S,T} fld1::S fld2::T end mutable struct Mutable{S,T} fld1::S fld2::T end g = nothing # very untyped @noinline forcenoinline(_) = return g # force not to be inlined because of `g` falseescape(cond, o) = cond ? o : forcenoinline(o) ``` ``` falseescape (generic function with 1 method) ``` NOTE: `getproperty`s will be inlined and are thus equivalent to `getfield` for cases below. ```julia ############################# ### fully constant-folded ### ############################# # works pretty well for immutable objects julia> inf_vs_opt((Bool,)) do cond o = Immutable(0, 1) # ::Const falseescape(cond, o) return o.fld1, o.fld2 # ::Const((0, 1)) end === inference === CodeInfo( 1 ─ (o = Main.Immutable(0, 1))::Core.Const(Immutable{Int64, Int64}(0, 1)) │ Main.falseescape(cond, o::Core.Const(Immutable{Int64, Int64}(0, 1)))::Any │ %3 = Base.getproperty(o::Core.Const(Immutable{Int64, Int64}(0, 1)), :fld1)::Core.Const(0) │ %4 = Base.getproperty(o::Core.Const(Immutable{Int64, Int64}(0, 1)), :fld2)::Core.Const(1) │ %5 = Core.tuple(%3, %4)::Core.Const((0, 1)) └── return %5 ) => Tuple{Int64, Int64} === optimization === CodeInfo( 1 ─ goto #3 if not cond 2 ─ goto #4 3 ─ invoke Main.forcenoinline($(QuoteNode(Immutable{Int64, Int64}(0, 1)))::Immutable{Int64, Int64})::Any └── goto #4 4 ┄ return (0, 1) ) => Tuple{Int64, Int64} # mutable fields can't be scalar-replaced even if it's fully known to be constant julia> inf_vs_opt((Bool,)) do cond o = Mutable(0, 1) # ::Mutable{Int} (allocation) falseescape(cond, o) return o.fld1, o.fld2 # ::Tuple{Int,Int} end === inference === CodeInfo( 1 ─ (o = Main.Mutable(0, 1))::Mutable{Int64, Int64} │ Main.falseescape(cond, o)::Any │ %3 = Base.getproperty(o, :fld1)::Int64 │ %4 = Base.getproperty(o, :fld2)::Int64 │ %5 = Core.tuple(%3, %4)::Tuple{Int64, Int64} └── return %5 ) => Tuple{Int64, Int64} === optimization === CodeInfo( 1 ─ %1 = %new(Mutable{Int64, Int64}, 0, 1)::Mutable{Int64, Int64} └── goto #3 if not cond 2 ─ goto #4 3 ─ invoke Main.forcenoinline(%1::Mutable{Int64, Int64})::Any └── goto #4 4 ┄ %6 = Base.getfield(%1, :fld1)::Int64 │ %7 = Base.getfield(%1, :fld2)::Int64 │ %8 = Core.tuple(%6, %7)::Tuple{Int64, Int64} └── return %8 ) => Tuple{Int64, Int64} # if we succeed to fully-inline, we can eliminate `getfield`s # NOTE the return type is still not `Const` (because constant-folding is the result of optimization, and didn't happen at inference) julia> inf_vs_opt((Bool,)) do cond o = Mutable(0, 1) # ::Mutable{Int,Int} (allocation) falseescape(true, o) # inlined return o.fld1, o.fld2 # ::Tuple{Int,Int} (NOTE not constant-folded) end === inference === CodeInfo( 1 ─ (o = Main.Mutable(0, 1))::Mutable{Int64, Int64} │ Main.falseescape(true, o)::Any │ %3 = Base.getproperty(o, :fld1)::Int64 │ %4 = Base.getproperty(o, :fld2)::Int64 │ %5 = Core.tuple(%3, %4)::Tuple{Int64, Int64} └── return %5 ) => Tuple{Int64, Int64} === optimization === CodeInfo( 1 ─ %1 = Core.tuple(0, 1)::Tuple{Int64, Int64} └── return %1 ) => Tuple{Int64, Int64} julia> @noinline function not_folded(cond) o = Mutable(0, 1) # ::Mutable{Int,Int} (allocation) falseescape(true, o) # inlined return o.fld1, o.fld2 # ::Tuple{Int,Int} (NOTE not constant-folded) end not_folded (generic function with 1 method) julia> inf_vs_opt((Bool,)) do cond not_folded(cond) # ::Tuple{Int,Int}, not Const(0, 1) end === inference === CodeInfo( 1 ─ %1 = Main.not_folded(cond)::Tuple{Int64, Int64} └── return %1 ) => Tuple{Int64, Int64} === optimization === CodeInfo( 1 ─ %1 = invoke Main.not_folded(_2::Bool)::Tuple{Int64, Int64} └── return %1 ) => Tuple{Int64, Int64} ################################# ### partially constant-folded ### ################################# # works pretty well for immutable objects julia> inf_vs_opt((Bool,Int,)) do cond, a0 o = Immutable(a0, 1) # ::PartialStruct falseescape(cond, o) return o.fld1, o.fld2 # ::PartialStruct(Tuple{Int,Int},Any[Int,Const(1)]) end === inference === CodeInfo( 1 ─ (o = Main.Immutable(a0, 1))::Core.PartialStruct(Immutable{Int64, Int64}, Any[Int64, Core.Const(1)]) │ Main.falseescape(cond, o::Core.PartialStruct(Immutable{Int64, Int64}, Any[Int64, Core.Const(1)]))::Any │ %3 = Base.getproperty(o::Core.PartialStruct(Immutable{Int64, Int64}, Any[Int64, Core.Const(1)]), :fld1)::Int64 │ %4 = Base.getproperty(o::Core.PartialStruct(Immutable{Int64, Int64}, Any[Int64, Core.Const(1)]), :fld2)::Core.Const(1) │ %5 = Core.tuple(%3, %4)::Core.PartialStruct(Tuple{Int64, Int64}, Any[Int64, Core.Const(1)]) └── return %5 ) => Tuple{Int64, Int64} === optimization === CodeInfo( 1 ─ %1 = %new(Immutable{Int64, Int64}, a0, 1)::Immutable{Int64, Int64} └── goto #3 if not cond 2 ─ goto #4 3 ─ invoke Main.forcenoinline(%1::Immutable{Int64, Int64})::Any └── goto #4 4 ┄ %6 = Core.tuple(a0, 1)::Tuple{Int64, Int64} └── return %6 ) => Tuple{Int64, Int64} # basically same as for fully constant-folded cases julia> inf_vs_opt((Bool,Int,)) do cond, a0 o = Mutable(a0, 1) # ::PartialStruct falseescape(cond, o) return o.fld1, o.fld2 # ::Tuple{Int,Int} end === inference === CodeInfo( 1 ─ (o = Main.Mutable(a0, 1))::Mutable{Int64, Int64} │ Main.falseescape(cond, o)::Any │ %3 = Base.getproperty(o, :fld1)::Int64 │ %4 = Base.getproperty(o, :fld2)::Int64 │ %5 = Core.tuple(%3, %4)::Tuple{Int64, Int64} └── return %5 ) => Tuple{Int64, Int64} === optimization === CodeInfo( 1 ─ %1 = %new(Mutable{Int64, Int64}, a0, 1)::Mutable{Int64, Int64} └── goto #3 if not cond 2 ─ goto #4 3 ─ invoke Main.forcenoinline(%1::Mutable{Int64, Int64})::Any └── goto #4 4 ┄ %6 = Base.getfield(%1, :fld1)::Int64 │ %7 = Base.getfield(%1, :fld2)::Int64 │ %8 = Core.tuple(%6, %7)::Tuple{Int64, Int64} └── return %8 ) => Tuple{Int64, Int64} julia> inf_vs_opt((Bool,Int,)) do cond, a0 o = Mutable(a0, 1) # ::PartialStruct falseescape(true, o) # inlined return o.fld1, o.fld2 # ::Tuple{Int,Int} end === inference === CodeInfo( 1 ─ (o = Main.Mutable(a0, 1))::Mutable{Int64, Int64} │ Main.falseescape(true, o)::Any │ %3 = Base.getproperty(o, :fld1)::Int64 │ %4 = Base.getproperty(o, :fld2)::Int64 │ %5 = Core.tuple(%3, %4)::Tuple{Int64, Int64} └── return %5 ) => Tuple{Int64, Int64} === optimization === CodeInfo( 1 ─ %1 = Core.tuple(a0, 1)::Tuple{Int64, Int64} └── return %1 ) => Tuple{Int64, Int64} ``` > :warning: maybe off-topic Probably, we can form `PartialStruct` even if it's not fully concrete, but immutable ? Especially replace [`isconcretetype(t)` here](https://github.com/JuliaLang/julia/blob/3f27f88ec9344e96d4c51b538bf0352e4674b037/base/compiler/abstractinterpretation.jl#L1431) with something easier. ```julia julia> inf_vs_opt((Bool,Any,)) do cond, a0 o = Immutable(a0, 1) # want ::PartialStruct(Immutable{A_,Int} where A_, Any[Any,Const(1)]) falseescape(cond, o) return o.fld1, o.fld2 # ::PartialStruct(Tuple{Int,Int},Any[Int,Const(1)]) end === inference === CodeInfo( 1 ─ (o = Main.Immutable(a0, 1))::Immutable{_A, Int64} where _A │ Main.falseescape(cond, o)::Any │ %3 = Base.getproperty(o, :fld1)::Any │ %4 = Base.getproperty(o, :fld2)::Int64 │ %5 = Core.tuple(%3, %4)::Tuple{Any, Int64} └── return %5 ) => Tuple{Any, Int64} === optimization === CodeInfo( 1 ─ %1 = Main.Immutable(a0, 1)::Immutable{_A, Int64} where _A │ Main.falseescape(cond, %1)::Any │ %3 = Base.getfield(%1, :fld1)::Any │ %4 = Base.getfield(%1, :fld2)::Int64 │ %5 = Core.tuple(%3, %4)::Tuple{Any, Int64} └── return %5 ) => Tuple{Any, Int64} ``` ### Heap to Stack Allocation Conversions Maybe this code allocates `x` to heap ? ```julia # by @vtjnash julia> @noinline f(x) = length(x) # no escape actually f (generic function with 1 method) julia> inf_vs_opt() do x = Any[] y = "new thing" push!(x, y) # everything inlined up to this point return f(x) # not inlined, but `x` can actually be stack allocated ??? end === inference === CodeInfo( 1 ─ (x = Base.getindex(Main.Any))::Vector{Any} │ (y = "new thing")::Core.Const("new thing") │ Main.push!(x, y::Core.Const("new thing"))::Any │ %4 = Main.f(x)::Int64 └── return %4 ) => Int64 === optimization === CodeInfo( 1 ── %1 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Any}, svec(Any, Int64), 0, :(:ccall), Vector{Any}, 0, 0))::Vector{Any} │ %2 = π ("new thing", String) │ %3 = Core.lshr_int(1, 63)::Int64 │ %4 = Core.trunc_int(Core.UInt8, %3)::UInt8 │ %5 = Core.eq_int(%4, 0x01)::Bool └─── goto #3 if not %5 2 ── invoke Core.throw_inexacterror(:check_top_bit::Symbol, UInt64::Type{UInt64}, 1::Int64)::Union{} └─── unreachable 3 ── goto #4 4 ── %10 = Core.bitcast(Core.UInt64, 1)::UInt64 └─── goto #5 5 ── goto #6 6 ── goto #7 7 ── goto #8 8 ── $(Expr(:foreigncall, :(:jl_array_grow_end), Nothing, svec(Any, UInt64), 0, :(:ccall), :(%1), :(%10), :(%10)))::Nothing └─── goto #9 9 ── %17 = Base.arraylen(%1)::Int64 │ Base.arrayset(true, %1, %2, %17)::Vector{Any} └─── goto #10 10 ─ %20 = invoke Main.f(%1::Vector{Any})::Int64 └─── return %20 ) => Int64 ``` #### Questions - what's the "escape" ? - escape to where ? - `setfield!(x)` "escapes" `x` ? - construct, `setfield!`: need prove the arguments/parent are not escape - ~~`unsafe_xxx` ?~~ - ccall with `Any`-typed arguments (and `Ref{Any}`?) - global (non-local?) assignments - not-analyzed function calls - return values - throw - finalizers (special case of escaping) - (again, long-term) other possible benefits of escape information for inference stage * "product abstract domain": type information × escape information * we will use the former to derive the latter, but we can do vice versa + in the abstract interpretation literature, this seems to be called "reducing" * e.g. once escape information is available at inference time, we can: + constant-fold mutable objects + form `MustAlias` for mutable objects (?) * possible integration ? 1. forward analaysis (inference) 2. backward analysis (escape analysis, etc.) 3. forward analysis (inference, only locally and lightweight as possible) 4. optimization - how can I judge which object is heap/stack-allocated ? - <https://discourse.julialang.org/t/a-nice-explanation-of-memory-stack-vs-heap/53915> From @mbauman > It’s automatic. But roughly: > Arrays, Dicts, and mutable structs are on the heap > StaticArrays, Tuples, Named Tuples and non-mutable structs are on the stack (but sometimes never exist at all) - we can use `@allocated` to test the optimization: <https://discourse.julialang.org/t/why-is-heap-memory-bad/33290/3> - how will we want to tell the compiler to allocate objects in stack ? - "forward" vs. "reverse" from slack: > The difference in terminology reflects that forward-inference means information is created at the definition site, and flows forward to the uses. > In reverse-inference, the information is created at the usage sites (“is this an escape?“) and flows backwards to the definitions. ## Related - [Julep/Very WIP - Heap allocated immutable arrays and compiler support](https://github.com/JuliaLang/julia/pull/31630): * to enable allocation-less mutable->immutable array conversion, Keno also tried to get (very limited and seemingly only local) escape information in this PR ## Week 1 progress (Xuanda) - familiarize myself with AbstractInterpreter code (for type inference) and Optimization code, thanks to @aviatesk 's code snippets and XCompiler - put up a WIP PR here: https://github.com/TH3CHARLie/julia/pull/1 - some current design ideas: 1. right after conversion to IRCode 2. SSA-based 3. a backward pass 4. pseudo algorithm shown in comment of `escape_analysis.jl` ## Some questions (Xuanda) - About SSA definitions: I checked https://docs.julialang.org/en/v1/devdocs/ssair/ about Julia's SSA, but I did find nodes other than what is mentioned in the docs (e.g. `ReturnNode` and `Expr` are front-end types). Do we have related docs for the mapping between SSA language types and Julia structs? - Currently, I create two escape sites: returns and calls (calls still needs debugging to pass compilation so it's not on the pull request). Anything else significant that I am missing. For calls, I can think of recursively walking into its CFG and doing an escape analysis on it, returning the result back to the caller to see if the args escape. - A: setfield, construct, assigning to a global, non-analyzed function calls - Backward alias analysis, if I understands correctly, we need alias analysis to identify the real escaped source. I am not very familiar with how to do alias analysis in a backward direction. ## Type-based analysis alternative? Jameson to write.