Shuhei Kadowaki
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
--- breaks: false --- <!--{%hackmd theme-dark %}--> # Julia Escape Analysis Project - Team: _Shuhei Kadowaki_, _Jameson Nash_ - Meeting: - zoom link: <https://mit.zoom.us/j/93134458604> - schedule: - 2022: TBD - 2021: (every Wed.) 22:00-23:00 (JST), 9:00-10:00 (EDT) - Scope: - implement a strong Julia-level escape analysis - implement memory-optimizations based on the analysis - prototype them as a testbed of [compiler-plugin](https://hackmd.io/bVhb97Q4QTWeBQw8Rq4IFw?both) - ... and more ? - feedback escape information to inference ? - summarize performance improvements, and write up a paper ? - [Timeline](#Development-Plan): upstream to Julia base by the end of the year # Targets As midterm objectives of this project, we will try to implement following optimizations: - [ ] Enhanced SROA & DCE - [ ] ~~Heap-to-stack allocation conversion~~ - [ ] (Allocation movement ?) - [ ] `finalize`r elision ## Enhanced SROA & DCE ### The current state Julia's compiler does SROA (Scalar-Replacement-of-Aggregates) optimization on two levels: 1. on Julia-level IR: [`getfield_elim_pass!`](https://github.com/JuliaLang/julia/blob/f8d3bd22f68f10bd2dc3f1be3b74bfa7cb125bc5/base/compiler/ssair/passes.jl#L527) 2. on LLVM-IR: COMBAK (which one does LLVM SROA ?) The purpose of this optimization is to enhance the Julia-level SROA with escape analysis. The current SROA pass would be described as (per [#42357](https://github.com/JuliaLang/julia/pull/42357)): > getfield_elim_pass!(ir::IRCode) -> newir::IRCode > > `getfield` elimination pass, a.k.a. Scalar Replacements of Aggregates optimization. > > This pass is based on a local alias analysis that collects field information by def-use chain walking. > It looks for struct allocation sites ("definitions"), and `getfield` calls as well as > `:foreigncall`s that preserve the structs ("usages"). If "definitions" have enough information, > then this pass will replace corresponding usages with lifted values. > `mutable struct`s require additional cares and need to be handled separately from immutables. > For `mutable struct`s, `setfield!` calls account for "definitions" also, and the pass should > give up the lifting conservatively when there are any "intermediate usages" that may escape > the mutable struct (e.g. non-inlined generic function call that takes the mutable struct as > its argument). > > In a case when all usages are fully eliminated, `struct` allocation may also be erased as > a result of dead code elimination. The important observations are: - successful SROA can sometimes even eliminate allocations at all as a result of dead code elimination - `getfield_elim_pass!` is limited in a sense that: - it can't reason about anything inter-procedural - it requires "safe" allocation sites to kick-off whole optimization Let's see successful examples first, and then understand limitations. Here, the allocation site `r = Ref{Int}(42)` is "safe" and has enough information, and `r[]` (inlined to `getfield`) will be eliminated, and even more, the allocation and `setfield!` are voided out too: ```julia julia> code_typed() do r = Ref{Int}(0) r[] = 42 b = sin(r[]) return b end |> only CodeInfo( 1 ─ %1 = Base.sitofp(Float64, 42)::Float64 │ %2 = invoke Base.Math.sin(%1::Float64)::Float64 └── return %2 ) => Float64 ``` `getfield_elim_pass!` requires "safe" allocation. So if we initialize `r = Ref{}()`, then the whole optimization won't work: ```julia julia> code_typed() do r = Ref{Int}() r[] = 42 b = sin(r[]) return b end |> only CodeInfo( 1 ─ %1 = %new(Base.RefValue{Int64})::Base.RefValue{Int64} │ Base.setfield!(%1, :x, 42)::Int64 │ %3 = Base.getfield(%1, :x)::Int64 │ %4 = Base.sitofp(Float64, %3)::Float64 │ %5 = invoke Base.Math.sin(%4::Float64)::Float64 └── return %5 ) => Float64 ``` Still the later LLVM-level SROA may be able to optimize it further, and it seems to be more powerful than Julia-level one in general, e.g. it can "fully" optimize the above code (the argument of `@j_sin_4761` is replaced with the scalar value `double 4.20000e+01`, and we don't have any allocation) ```julia julia> code_llvm(()) do r = Ref{Int}() r[] = 42 b = sin(r[]) return b end ; @ none:2 within `#49` define double @"julia_#49_4759"() #0 { top: ; @ none:4 within `#49` ; ┌ @ math.jl:1228 within `sin` %0 = call double @j_sin_4761(double 4.200000e+01) #0 ; └ ; @ none:5 within `#49` ret double %0 } ``` Now let's bother LLVM -- we can just add non-inlined function call, then LLVM can't help giving up the optimization: ```julia julia> @noinline setter!(r, x) = r[] = x setter! (generic function with 1 method) julia> code_llvm(()) do r = Ref{Int}() setter!(r, 42) b = sin(r[]) return b end ; @ none:2 within `#45` define double @"julia_#45_4943"() #0 { top: %gcframe7 = alloca [3 x {}*], align 16 %gcframe7.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe7, i64 0, i64 0 %0 = bitcast [3 x {}*]* %gcframe7 to i8* call void @llvm.memset.p0i8.i32(i8* nonnull align 16 dereferenceable(24) %0, i8 0, i32 24, i1 false) %1 = call {}*** inttoptr (i64 140733735208165 to {}*** (i64)*)(i64 261) #6 ; ┌ @ refpointer.jl:135 within `Ref` ; │┌ @ refvalue.jl:7 within `RefValue` %2 = bitcast [3 x {}*]* %gcframe7 to i64* store i64 4, i64* %2, align 16 %3 = load {}**, {}*** %1, align 8 %4 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe7, i64 0, i64 1 %5 = bitcast {}** %4 to {}*** store {}** %3, {}*** %5, align 8 %6 = bitcast {}*** %1 to {}*** store {}** %gcframe7.sub, {}*** %6, align 8 %ptls_field3 = getelementptr inbounds {}**, {}*** %1, i64 2305843009213693954 %7 = bitcast {}*** %ptls_field3 to i8** %ptls_load45 = load i8*, i8** %7, align 8 %8 = call noalias nonnull {}* @jl_gc_pool_alloc(i8* %ptls_load45, i32 1392, i32 16) #2 %9 = bitcast {}* %8 to i64* %10 = getelementptr inbounds i64, i64* %9, i64 -1 store atomic i64 4617487696, i64* %10 unordered, align 8 %11 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe7, i64 0, i64 2 store {}* %8, {}** %11, align 16 ; └└ ; @ none:3 within `#45` %12 = call i64 @"j_setter!_4945"({}* nonnull %8, i64 signext 42) #0 ; @ none:4 within `#45` ; ┌ @ refvalue.jl:56 within `getindex` ; │┌ @ Base.jl:38 within `getproperty` %13 = load i64, i64* %9, align 8 ; └└ ; ┌ @ math.jl:1232 within `sin` ; │┌ @ float.jl:269 within `float` ; ││┌ @ float.jl:243 within `AbstractFloat` ; │││┌ @ float.jl:146 within `Float64` %14 = sitofp i64 %13 to double ; │└└└ ; │ @ math.jl:1234 within `sin` %15 = call double @j_sin_4946(double %14) #0 %16 = load {}*, {}** %4, align 8 %17 = bitcast {}*** %1 to {}** store {}* %16, {}** %17, align 8 ; └ ; @ none:5 within `#45` ret double %15 } ``` LLVM can also be confused with some other Julia level semantics other than inter-procedural calls. For example, `typeassert` can pretty much null out LLVM SROA. In the following code snippet, neither of Julia-level nor LLVM-level SROA fails to eliminate closure constructions: ```julia= julia> doit(f, args...) = f(args...) julia> code_typed() do local arr::Vector{Any} doit() do arr = Any[] end arr end |> only CodeInfo( 1 ─ %1 = Core.Box::Type{Core.Box} │ %2 = %new(%1)::Core.Box │ %3 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Any}, svec(Any, Int64), 0, :(:ccall), Vector{Any}, 0, 0))::Vector{Any} │ Core.setfield!(%2, :contents, %3)::Vector{Any} │ %5 = Core.isdefined(%2, :contents)::Bool └── goto #3 if not %5 2 ─ goto #4 3 ─ $(Expr(:throw_undef_if_not, :arr, false))::Any 4 ┄ %9 = Core.getfield(%2, :contents)::Any │ Core.typeassert(%9, Vector{Any})::Vector{Any} │ %11 = π (%9, Vector{Any}) └── return %11 ) => Vector{Any} julia> code_llvm(()) do local arr::Vector{Any} doit() do arr = Any[] end arr end ; @ none:2 within `#55` define nonnull {}* @"julia_#55_4772"() #0 { pass: ; @ none:3 within `#55` ; ┌ @ none:1 within `doit` ; │┌ @ none:4 within `#56` ; ││┌ @ array.jl:423 within `getindex` ; │││┌ @ boot.jl:471 within `Array` @ boot.jl:452 %0 = call nonnull {}* inttoptr (i64 4483450916 to {}* ({}*, i64)*)({}* inttoptr (i64 4648194800 to {}*), i64 0) ; └└└└ ; @ none:6 within `#55` %1 = bitcast {}* %0 to i64* %2 = getelementptr inbounds i64, i64* %1, i64 -1 %3 = load atomic i64, i64* %2 unordered, align 8 %4 = and i64 %3, -16 %5 = inttoptr i64 %4 to {}* %6 = icmp eq {}* %5, inttoptr (i64 4648194800 to {}*) br i1 %6, label %pass3, label %fail2 fail2: ; preds = %pass call void @jl_type_error(i8* getelementptr inbounds ([11 x i8], [11 x i8]* @_j_str1, i64 0, i64 0), {}* inttoptr (i64 4648194800 to {}*), {}* %0) unreachable pass3: ; preds = %pass ret {}* %0 } # interesting fact: # LLVM can optimize it away if we strip out the type declaration ! julia> code_llvm(()) do local arr doit() do arr = Any[] end arr end ; @ none:2 within `#55` define nonnull {}* @"julia_#55_4959"() #0 { pass: ; @ none:3 within `#55` ; ┌ @ none:1 within `doit` ; │┌ @ none:4 within `#56` ; ││┌ @ array.jl:423 within `getindex` ; │││┌ @ boot.jl:471 within `Array` @ boot.jl:452 %0 = call nonnull {}* inttoptr (i64 4431128612 to {}* ({}*, i64)*)({}* inttoptr (i64 4614886128 to {}*), i64 0) ; └└└└ ; @ none:6 within `#55` ret {}* %0 } ``` ### Enhanced SROA & More aggressive DCE We want "better" Julia-level SROA and DCE. Especially, it should be: - (_**inter-procedural**_) new SROA should analyze aliases and escapes better, and work better with non-inlined calls - (_**more aggressive**_) new DCE should be able to null out `typeassert`/`isdefined`/`isa` checks more aggressively so that we can eliminate more allocations[^1] [^1]: This optimization can be developed standalone. For example, we can eliminate the `else` branch in the following code snippet w/o escape analysis if we have more aggressive DCE: ```julia julia> code_typed(()) do r = Ref{Any}(42) a = r[] if isa(a, Int) return 0 else return nothing end end 1-element Vector{Any}: CodeInfo( 1 ─ %1 = (42 isa Main.Int)::Bool └── goto #3 if not %1 2 ─ return 0 3 ─ return Main.nothing ) => Union{Nothing, Int64} ``` Target examples: - propagate inter-procedural memory-effect: ```julia julia> @noinline setter!(r, x) = r[] = x setter! (generic function with 1 method) julia> code_typed() do r = Ref{Int}() setter!(r, 42) b = sin(r[]) # should be turned into `sin(42)` return b end |> only CodeInfo( 1 ─ %1 = %new(Base.RefValue{Int64})::Base.RefValue{Int64} │ invoke Main.setter!(%1::Base.RefValue{Int64}, 42::Int64)::Any │ %3 = Base.getfield(%1, :x)::Int64 │ %4 = Base.sitofp(Float64, %3)::Float64 │ %5 = invoke Base.Math.sin(%4::Float64)::Float64 └── return %5 ) => Float64 ``` - (even local analysis improvement): ```julia # `RefValue{String}` alloction won't be eliminated (s::String) -> broadcast(identity, Ref(s)) # abstract-intepretation based field information propagation # should allow us to optimize this IR 2 1 ─ %1 = %new(Base.RefValue{String}, _2)::Base.RefValue{String} │╻╷╷ Ref │ %2 = Core.tuple(%1)::Tuple{Base.RefValue{String}} │╻ broadcast │ %3 = Core.getfield(%2, 1)::Base.RefValue{String} ││ └── goto #3 if not true ││╻╷ materialize 2 ─ nothing::Nothing │ 3 ┄ %6 = Base.getfield(%3, :x)::String │││╻╷╷╷╷ copy └── goto #4 ││││┃ getindex 4 ─ goto #5 ││││ 5 ─ goto #6 │││ 6 ─ goto #7 ││ 7 ─ return %6 │ ``` - nullify `typeassert` check: ```julia julia> code_typed() do r = Ref{Any}() r[] = 42 # this assertion is useful for inference # but "harmful" for LLVM return r[]::Int end 1-element Vector{Any}: CodeInfo( 1 ─ %1 = %new(Base.RefValue{Any})::Base.RefValue{Any} │ Base.setfield!(%1, :x, 42)::Int64 │ %3 = Base.getfield(%1, :x)::Any │ Core.typeassert(%3, Main.Int)::Int64 │ %5 = π (%3, Int64) └── return %5 ) => Int64 ``` ## Heap to stack allocation conversion ### The idea Julia allocates every actually-allocated mutable object on heap memory, while in ideal world we want to have them on stack if their lifetime doesn't escape from the execution frame. An obvious example would be like this -- Julia will allocate `MyRef("foo")` even though it doesn't escape anywhere, because `@noinline`d `getx` call will prevent both Julia-level and LLVM-level SROA, and they can't eliminate the allocation: ```julia julia> mutable struct MyRef{T}; x::T; end julia> @noinline getx(x::MyRef) = x.x getx (generic function with 1 method) julia> code_typed(()) do r = MyRef("foo") x = getx(r) sizeof(x) end |> only CodeInfo( 1 ─ %1 = %new(MyRef{String}, "foo")::MyRef{String} │ %2 = invoke Main.getx(%1::MyRef{String})::String │ %3 = Core.sizeof::typeof(Core.sizeof) │ %4 = (%3)(%2)::Int64 └── return %4 ) => Int64 julia> code_llvm(()) do r = MyRef("foo") x = getx(r) sizeof(x) end ; @ none:2 within `#75` define i64 @"julia_#75_4792"() #0 { top: %0 = alloca {}*, align 8 %gcframe7 = alloca [3 x {}*], align 16 %gcframe7.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe7, i64 0, i64 0 %1 = bitcast [3 x {}*]* %gcframe7 to i8* call void @llvm.memset.p0i8.i32(i8* nonnull align 16 dereferenceable(24) %1, i8 0, i32 24, i1 false) %2 = call {}*** inttoptr (i64 140733733901541 to {}*** (i64)*)(i64 261) #7 ; ┌ @ none:1 within `MyRef` @ none:1 %3 = bitcast [3 x {}*]* %gcframe7 to i64* store i64 4, i64* %3, align 16 %4 = load {}**, {}*** %2, align 8 %5 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe7, i64 0, i64 1 %6 = bitcast {}** %5 to {}*** store {}** %4, {}*** %6, align 8 %7 = bitcast {}*** %2 to {}*** store {}** %gcframe7.sub, {}*** %7, align 8 %ptls_field3 = getelementptr inbounds {}**, {}*** %2, i64 2305843009213693954 %8 = bitcast {}*** %ptls_field3 to i8** %ptls_load45 = load i8*, i8** %8, align 8 %9 = call noalias nonnull {}* @jl_gc_pool_alloc(i8* %ptls_load45, i32 1392, i32 16) #2 %10 = bitcast {}* %9 to i64* %11 = getelementptr inbounds i64, i64* %10, i64 -1 store atomic i64 5398336928, i64* %11 unordered, align 8 %12 = bitcast {}* %9 to {}** store {}* inttoptr (i64 5376165704 to {}*), {}** %12, align 8 %13 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe7, i64 0, i64 2 store {}* %9, {}** %13, align 16 ; └ ; @ none:3 within `#75` store {}* %9, {}** %0, align 8 %14 = call nonnull {}* @j1_getx_4794({}* inttoptr (i64 5377254840 to {}*), {}** nonnull %0, i32 1) ; @ none:4 within `#75` ; ┌ @ Base.jl:175 within `sizeof` %15 = bitcast {}* %14 to i64* %16 = load i64, i64* %15, align 8 %17 = load {}*, {}** %5, align 8 %18 = bitcast {}*** %2 to {}** store {}* %17, {}** %18, align 8 ; └ ret i64 %16 } ``` The basic idea of this optimization is to modify LLVM IR so that we can allocate `MyRef("foo")` on stack rather than heap. ### Do we really need this ? We worked on this optimization at the first project iteration (GSoC term), but for now, I'd like to put lower priority on this optimization: - (_**SROA and DCE**_) as explained above, successful SROA can eliminate allocations at all: so enhanced SROA can subsume pretty much part of this optimization e.g. for this code snippet, we can just convert ```julia let r = MyRef("foo") x = getx(r) sizeof(x) end ``` to ```julia let sizeof("foo") end ``` at the enhanced SROA pass, then there is no need for this heap-to-stack conversion - (_**GC interaction**_) we need to make (complicated) design choices, mainly around interactions with GC (Jameson can elaborate this) - (_**throw exception**_) the current throwness check is not so accurate, and we often need to account for escapes through exceptions conservatively, which causes stack allocation harder[^2] [^2]: There seems to be a way to circumvent this escape. Elaborated [here](https://github.com/aviatesk/EscapeAnalysis.jl/issues/15). ## Allocation movement TODO: elaborate this. ## `finalize`r elision Julia finalizer is managed by GC, and so they may hang around for a while as GC lets the mutable object to hang around. This means we may end up consuming much memory when finalizers are associated with e.g. resource managements, which is a common practice seen e.g. in GPU programming. The idea of this optimization is to early-call a finalizer by inserting `finalize(obj)` call when we can prove the lifetime of `obj` finishes within a frame: ```julia julia> mutable struct WithFinalizer v function WithFinalizer(v) x = new(v) f(t) = @async println("Finalizing $t.") return finalizer(f, x) end end julia> make_obj(v = 10) = WithFinalizer(v) make_obj (generic function with 1 methods) julia> let o = make_obj() copy(o.v) # <= automatically insert `finalize(o)` here end 10 ``` ### Considerations - [ ] can we just call `finalize(obj)` ? - `finalize(obj)` is super costly ! - 1. we want to "enqueue" the finalizer (per-Valentin) - 2. make it intrinsic and make it fast - [ ] how can we find object that has `finalizer` ? - [ ] do we need to account for possible exceptions within finalizer call itself ... ? - [ ] do we want to have flow-sensitivity (i.e. insert `finalize` on return site) ? # Pipeline Design 1. type inference 2. optimization 1. preps 2. inlining 3. escape analysis 4. optimization passes 1. enhanced SROA 2. aggressive dead code elimination 3. finalizer elision 3. caching Even if we have an useful inter-procedural cache, escape analysis and SROA are better to run after inlining, just because inter-procedural handling is much easier we can just look for `:invoke` expressions that the inlinear has resolved. Caching: - create new field of `CodeInstance`, and - pack (minimized) escape information into it, as we do for inference result. # Development Plan ## Schedule - [x] 21/09 (kick-off): re-define targets, setup milestones - [ ] 21/10 (stage 1): implement as "dummy" `AbstractInterpreter` - optimizations - enhanced SORA - aggressive DCE - finalizer elision prototyping - analysis - [ ] 21/11: (stage 2) executable `AbtractInterpreter` - keep working on stage 1 remainings - scratch compiler-plugin infrastructure to actually execute that pipeline - [ ] 21/12: (stage 3) make a PR on Julia Base, finish reviews --- # Discussions **NOTE: agenda for the recent mtg should come first than others** ## 2021/12/17 - For `ImmutableArray` - `mutating_arrayfreeze` - lifetime: whether or not mutable array is "mutated" - alias analysis? - about accuracy - may not need so much? - `BoundsError` - if only escape is from `BoundsError` - language semantics requires "non-semantic copy" - lifetime information? - might not be too important? - but it's very rare! - focus: - phase 1: reviving `ImmutableArray` - phase 2: LLVM escape - limitations: `Union`-type, inter-procedural - `ImmutableArray` PR - taint analysis ## 2021/11/24 ### Agenda - [ ] Implementation of field/alias analysis for enhanced SROA - [ ] "Non-semantic" copy for stack allocation ### Implementation of field/alias analysis for enhanced SROA #### Allocation-elimination targets :arrow_double_up: easy / :arrow_double_down: hard: - [x] simplest SROA ```julia code_typed((Any,)) do a r = Ref{Any}(a) r[] # => a end ``` - [x] domination-aware SROA ```julia code_typed((Any,Any)) do a, b r = Ref{Any}(a) r[] = b r[] # => b end # uninialized field # https://github.com/JuliaLang/julia/pull/42834 code_typed((Any,)) do a r = Ref{Any}() r[] = a r[] # => a end code_typed((Bool,Any,Any)) do c, a, b r = Ref{Any}() if cond r[] = a r[] # => a else r[] = b r[] # => b end end ``` - [x] φnode-fying SROA ```julia code_typed((Bool,Any,Any)) do cond, a, b r = Ref{Any}(nothing) if cond r[] = a else r[] = b end r[] # => φ(a, b) end # uninialized field # https://github.com/JuliaLang/julia/pull/43208 code_typed((Bool,Any,Any)) do cond, a, b r = Ref{Any}() if cond r[] = a else r[] = b end r[] # => φ(a, b) end ``` - [ ] alias-aware SROA (nested load elimination) ```julia # `s -> broadcast(identity, Ref(s))` code_typed((String,)) do s r = Ref(s) t = (r,) t[1][] # => s end code_typed() do local arr::Vector{Any} alloc! = () -> arr = Any[] alloc!() arr end code_typed((Bool,Any,Any)) do cond, a, b r1 = Ref{Any}(a) r2 = Ref{Any}(b) (cond ? r1 : r2)[] # φ(a, b) # or => (cond ? r1[] : r2[]) end ``` - [ ] "partial" SROA ```julia code_typed((Bool,Any,)) do rare_cond, a r = Ref{Any}(a) if rare_cond throw(r) # move `r` allocation here else return r[] # => a end end ``` - [ ] stack allocation ```julia @noinline safe_op(a) = ... code_typed((Any,)) do a r = Ref{Any}(a) # can be allocated on stack safe_op(r) end ``` #### Design of analysis/optimization 0. current Julia Base SROA ``` [SROA] [simple analysis] [domination analysis] [actual code transmation] ``` 0. multiple SROA: **benchmark** 1. backward escape/alias/field analysis & load forwarding by domination analysis - <https://github.com/aviatesk/EscapeAnalysis.jl/pull/54> - overview ``` # analysis while [converge] [backward escape/alias/field analysis] end # transformation [domination analysis] [SROA] [load forwarding] [DCE] ``` 2. backward escape/alias analysis & forward field propagation - [`EscapeAnalysis.jl/avi/flowsensitive-aliasanalysis`](https://github.com/aviatesk/EscapeAnalysis.jl/tree/avi/flowsensitive-aliasanalysis) - overview: ``` # analysis while [converge] [backward escape/alias analysis] while [converge] [forward field information propagation] end end # transformation [SROA] [simple load replacement] [DCE] ``` 3. forward escape/alias/field analysis - JVM compilers' approach - overview ``` # analysis while [converge] [forward escape/alias analysis] [forward field information propagation] end # transformation [SROA] [simple load replacement] [DCE] ``` - Points - complexity of domination/control-flow - "last" definition - field merge (φnode-fying) - uninitialized field - conflicts between escape information and field information - escape information: flow from usage to definition - field information: flow from definition to usage - versatility of EA - Valentin: separate SROA from EA ### Discussion - Valentin: we want to have a "slot" in IRCode? - represents a mutable memory - might be useful for: - Tapir integration - stack allocation - i.e. may be able to slot2ssa system? - Valentin: array? - array optimization - target? - ref/load is all local - constant index - approaches? - Valentin: don't make array special! - LLVM can handle it nicely - Julia-level analysis: tell inter-procedural information! ### "Non-semantic" copy for stack allocation - Problem: - so many escape through exception - in Julia an exception means "global escape" - Details: - conditions of "non-semantic" copy - no escape (including `finalize` call) - no lifetime overlap ```julia local o # finalize(f, o) # may escape in finalizer finalizer try unsafe_op(o) catch err # o.x = "bar" # lifetime overlap end ``` - but these two are **NOT** enough: other threads can access to `err` through the evil `jl_current_exception` - Ideas ? - special errors ? - `MethodError`: static - `BoundsError`: runtime - move `throw` ? - as a LICM: but need to preserve the order - unroll loop by 1, and use the obtained constants ## 2021/11/10 ### Agenda - [x] When to run EA - ~~[ ] discussion on "non-semantic" copy~~ ### When to run EA - tl;dr, **do escape analysis at optimization** - reasons: - inference works on type domain vs. optimization works on SSA form - current applications of escape analysis is mostly about optimizations: - SROA - `ImmutableArray`: copy elision - AD: sensitivity analysis - as a first step, it would be easier and cleaner if they are just separated - more thoughts: - inference already does some sort of alias analysis in forms of `Const` and `PartialStruct` - but they don't encode anything about CFG - encoding CFG information into type lattice would be far more complicated (like `Conditional`) - also, if we reason about inter-procedural mutation analysis, we also need to think about cycle detection - more details: - [Partial Escape Analysis and Scalar Replacement for Java](https://ssw.jku.at/Research/Papers/Stadler14/Stadler2014-CGO-PEA.pdf) can be a good reference - local analysis: - make EscapeAnalysis.jl control-flow sensitive - introduce alias (field) analysis (revive <https://github.com/aviatesk/EscapeAnalysis.jl/pull/43>) - inter-procedural analysis: - easy ? - can be used for - stack allocation - `ImmutableArray` - SROA is better to be separated from escape analysis - escape analysis: backward analysis - SROA: forward iteration of CFG transformation - other applications: some transformation - inter-procedural - how to cache the inter-procedural information ? - new field of `CodeInstance` ? - what kind of information - argument escapes ? - returned value escapes ? - side effect analysis ? - _global_ side effect: read/write on global memory - e.g. does this loop write to global memory, then it can't be hoisted, if not, start thinking about local - obvious example: `stdout` - _local_ side effect: mutation on field - LICM - better DCE by load forwarding ? - query: - input: `MethodInstance`, "x" - output: does this `MethodInstance` affect the state of "x" - side effect: a restriction on possible optimizations - synchronization optimization ? - what kind of optimization should side-effect target ? - reuse LLVM ? - LLVM-level purity vs. Julia-level purity - TODO: think more and more about applications of "side-effect" ! - "recursion" matters ? - "recursion" means the same `MethodInstance` - it won't be very ideal if recursion => escape ## 2021/10/13 ### Agenda - [x] more optimization ideas - [x] how to get Ian involved with this project ? - we will collaborate on finishing [`ImmutableArray` PR](https://github.com/JuliaLang/julia/pull/42465) - aggressive DCE - what needed ? - make clear the cost of updating domtree dynamically (benchmarking) - `AbstractInterpreter` can help benchmarking - if it turns out be costly, how we can mitigate ? - iterative processing to batch processing ? - just compute entire domtree in certain points ? - c.f. LLVM can reason about relations of various analyses to mitigate the cost of dynamic domtree update - correctness - [ ] (if time remains) `Vararg` handling is complicated, any hints ? ### Optimization ideas - **"allocation movement" (I'm not quite sure this naming is right)** * idea: move allocation to enable further optimization (elaborated below) - **memory-reuse**: - idea: try to use the same memory that happen in a loop * we can optimize it when it doesn't escape the loop (use scope marker, like gcpreserve) * we want to make it control-flow sensitive - **provide alias information to LLVM** - background: LLVM-level TBAA pass: it just checks types - idea: enrich LLVM-level optimizations with Julia-level alias information - do we want to tell lifetime information also ? - how do we want to communicate to LLVM ? - attach meta data to LLVM AST - customizable: metadata may be consumed by our own optimization passes - **`ImmutableArray`** - <https://github.com/JuliaLang/julia/pull/42465> - we want to figure out what information will be needed - provide nice API to query object lifetime ? #### "allocation movement" - observations: we may not be able to prove effect-free-ness of something (e.g. `typeassert`) - but we may sometimes know it doesn't escape - and then we can try to "delay" allocation to where it can be thrown, and do optimization (maybe SROA ?) more aggressively - idea: "move allocation site to where it can be really needed" * should enable further SROA * we can have dynamic check to enable stack allocation ? - make "type-tag" keep information about allocation, and have a query to check if an object is on stack or heap * can be extended to optimize `Array`s ? > successful SROA ```julia julia> code_typed((Int,)) do n s = 0 for i in 1:n r = Ref(i) # r[] < 0 && throw(r) # very rare condition s += r[] end s end |> only ... # no allocation of `Base.RefValue{Int}` ``` > unsuccessful SROA because of rare `throw` case (actuall it's impossible for this case) ```julia julia> code_typed((Int,)) do n s = 0 for i in 1:n r = Ref(i) r[] < 0 && throw(r) # very rare condition s += r[] end s end |> only CodeInfo( 1 ── %1 = Base.sle_int(1, n)::Bool │ %2 = Base.ifelse(%1, n, 0)::Int64 │ %3 = Base.slt_int(%2, 1)::Bool └─── goto #3 if not %3 2 ── Base.nothing::Nothing └─── goto #4 3 ── goto #4 4 ┄─ %8 = φ (#2 => true, #3 => false)::Bool │ %9 = φ (#3 => 1)::Int64 │ %10 = φ (#3 => 1)::Int64 │ %11 = Base.not_int(%8)::Bool └─── goto #12 if not %11 5 ┄─ %13 = φ (#4 => %9, #11 => %30)::Int64 │ %14 = φ (#4 => %10, #11 => %31)::Int64 │ %15 = φ (#4 => 0, #11 => %23)::Int64 │ %16 = %new(Base.RefValue{Int64}, %13)::Base.RefValue{Int64} │ %17 = Base.getfield(%16, :x)::Int64 │ %18 = Base.slt_int(%17, 0)::Bool └─── goto #7 if not %18 6 ── Main.throw(%16)::Union{} └─── unreachable 7 ── %22 = Base.getfield(%16, :x)::Int64 │ %23 = Base.add_int(%15, %22)::Int64 │ %24 = (%14 === %2)::Bool └─── goto #9 if not %24 8 ── Base.nothing::Nothing └─── goto #10 9 ── %28 = Base.add_int(%14, 1)::Int64 └─── goto #10 10 ┄ %30 = φ (#9 => %28)::Int64 │ %31 = φ (#9 => %28)::Int64 │ %32 = φ (#8 => true, #9 => false)::Bool │ %33 = Base.not_int(%32)::Bool └─── goto #12 if not %33 11 ─ goto #5 12 ┄ %36 = φ (#10 => %23, #4 => 0)::Int64 └─── return %36 ) => Int64 ``` > fan fact: `unoptimize_throw_blocks` makes SROA sad ```julia julia> code_typed((Int,)) do n s = 0 for i in 1:n r = Ref(i) r[] < 0 && throw(r[]) # very rare condition s += r[] end s end |> only CodeInfo( 1 ── %1 = Base.sle_int(1, n)::Bool │ %2 = Base.ifelse(%1, n, 0)::Int64 │ %3 = Base.slt_int(%2, 1)::Bool └─── goto #3 if not %3 2 ── Base.nothing::Nothing └─── goto #4 3 ── goto #4 4 ┄─ %8 = φ (#2 => true, #3 => false)::Bool │ %9 = φ (#3 => 1)::Int64 │ %10 = φ (#3 => 1)::Int64 │ %11 = Base.not_int(%8)::Bool └─── goto #12 if not %11 5 ┄─ %13 = φ (#4 => %9, #11 => %31)::Int64 │ %14 = φ (#4 => %10, #11 => %32)::Int64 │ %15 = φ (#4 => 0, #11 => %24)::Int64 │ %16 = %new(Base.RefValue{Int64}, %13)::Base.RefValue{Int64} │ %17 = Base.getfield(%16, :x)::Int64 │ %18 = Base.slt_int(%17, 0)::Bool └─── goto #7 if not %18 6 ── %20 = Base.getindex(%16)::Any │ Main.throw(%20)::Union{} └─── unreachable 7 ── %23 = Base.getfield(%16, :x)::Int64 │ %24 = Base.add_int(%15, %23)::Int64 │ %25 = (%14 === %2)::Bool └─── goto #9 if not %25 8 ── Base.nothing::Nothing └─── goto #10 9 ── %29 = Base.add_int(%14, 1)::Int64 └─── goto #10 10 ┄ %31 = φ (#9 => %29)::Int64 │ %32 = φ (#9 => %29)::Int64 │ %33 = φ (#8 => true, #9 => false)::Bool │ %34 = Base.not_int(%33)::Bool └─── goto #12 if not %34 11 ─ goto #5 12 ┄ %37 = φ (#10 => %24, #4 => 0)::Int64 └─── return %37 ) => Int64 ``` #### so what is needed ? - they are really similar in a sense that what they need is "lifetime" information ? - is the current `EscapeLattice` enough, or does it need more props ? ### `Vararg` handling ... ```julia julia> @noinline function foo(s...) string(s) end foo (generic function with 1 method) # when is `Vararg` introduced ? julia> code_typed((String,)) do s foo(s) end 1-element Vector{Any}: CodeInfo( 1 ─ %1 = invoke Main.foo(_2::String)::String └── return %1 ) => String julia> code_typed((String,)) do s foo(s, s) end 1-element Vector{Any}: CodeInfo( 1 ─ %1 = invoke Main.foo(_2::String, _2::Vararg{String})::String └── return %1 ) => String julia> code_typed((String,)) do s foo(s, s, s) end 1-element Vector{Any}: CodeInfo( 1 ─ %1 = invoke Main.foo(_2::String, _2::Vararg{String}, _2)::String └── return %1 ) => String # it seems like inference/optimizer has dealt with `foo(::String, ::String)` !? julia> only(methods(foo)).specializations svec(MethodInstance for foo(::String), MethodInstance for foo(::String, ::String), MethodInstance for foo(::String, ::String, ::String), nothing, nothing, nothing, nothing, MethodInstance for foo(::String, ::Vararg{String})) ``` ## 2021/10/06 - [x] plans - [x] updates: * (mostly) debuginfo is available for EscapeAnalysis.jl ### Next actions - Xuanda will take a look at more aggressive DCE - first step: read through [#27547](https://github.com/JuliaLang/julia/issues/27547) and [#37882](https://github.com/JuliaLang/julia/pull/37882) - Shuhei will learn much more about Julia runtime, and try to fully understand other optimization targets other than SROA and finalzier elision, and re-define project targets and update timeline if necessary - Shuhei will keep working on field analysis and enhanced SROA ### DCE plan - step 1: `typeassert` can simply be eliminated - step 2: fix [#27547](https://github.com/JuliaLang/julia/issues/27547) ( [#37882](https://github.com/JuliaLang/julia/pull/37882) can be revived) - step 3: eliminate dead `isdefined` / `isa` / `===` branches (maybe `<:` too ?) ```julia= julia> code_typed() do local arr::Vector{Any} doit() do arr = Any[] end arr end |> only CodeInfo( 1 ─ %1 = Core.Box::Type{Core.Box} │ %2 = %new(%1)::Core.Box │ %3 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Any}, svec(Any, Int64), 0, :(:ccall), Vector{Any}, 0, 0))::Vector{Any} │ Core.setfield!(%2, :contents, %3)::Vector{Any} │ %5 = Core.isdefined(%2, :contents)::Bool # <= step 2 └── goto #3 if not %5 # <= step 2 2 ─ goto #4 3 ─ $(Expr(:throw_undef_if_not, :arr, false))::Any # <= step 2 4 ┄ %9 = Core.getfield(%2, :contents)::Any │ Core.typeassert(%9, Vector{Any})::Vector{Any} # <= step 1 │ %11 = π (%9, Vector{Any}) └── return %11 ) => Vector{Any} ``` ### More optimization ideas - move allocation (by copy) to where it can be thrown - observations: we may not be able to prove effect-free-ness of something (e.g. `typeassert`) - but we may sometimes know it doesn't escape - and we can try to "delay" allocation to where it can be thrown, and do optimization (maybe SROA ?) more aggressively - when it does escape in a limited scope, only allocate object when thrown ```julia s = 0 for i in 1:n arr = [i] # want to eliminate this allocation s += arr[1] # may throw, end s ``` ```julia let r = Ref(x) s = r::Base.RefValue{String} r[] end ``` - conditions ... ? - it shouldn't alias to anything - lifetime should end in optimization scope - dynamic check to enable stack allocation ? - make "type-tag" keep information about allocation - "pool-allocation" - optimize if an object is in the pool - make query to check whether it's allocated in the pool - when GC allocates, it's gonna allocated in the pool - (as an implementation detail of heap-to-stack alloc opt) - immutables ~~don't need analysis ...?~~ ```julia let # type-tag needs to be on stack (next to the object) so that GC can see it ..? # 1. immutable points to reference to heap object (`Any`) # but the `Any` object might be able to be allocated on stack as well # type-tag (size unknown ?) # 2. if immutable escapes, allocate it on the heap s = Some{Any}(10) # <= points to heap allocted (if escapes) end ``` - AD may also possibly benefit from mutation analysis - Zygote: doesn't support - Differactor: ??? - Enzyme: use alias/escape analysis to find objects whose mutation order should be recorded - Julia's memory management/optimization - `llvm-alloc-opt.cpp`: SROA pass on heap memory - some LLVM has its own SROA pass on stack memory - `gc.c`: memory management (allocation, finalization) - defines standard runtime semantics - keywords: - mark-and-sweep - non-moving - tri-color - precise: never changes - generation - stop-the-world: => concurrent garbage collection ## 2021/09/29 ### Agenda - [x] agree on target optimizations ([Targets](#Targets)) - [x] plan collaboration ([Development](#Development-Plan)) * [x] meeting schedule * [x] discuss roles ### more optimization ideas ? - alias analysis on Julia level - we have LLVM-level TBAA pass: it just checks types - we can tell more information to it - memory-reuse: try to use the same memory that happen in a loop - we can optimize it when it doesn't escape the loop (use scope marker, like gcpreserve) - we want to make it control-flow sensitive

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully