As midterm objectives of this project, we will try to implement following optimizations:
finalize
r elisionJulia's compiler does SROA (Scalar-Replacement-of-Aggregates) optimization on two levels:
getfield_elim_pass!
The purpose of this optimization is to enhance the Julia-level SROA with escape analysis.
The current SROA pass would be described as (per #42357):
getfield_elim_pass!(ir::IRCode) -> newir::IRCode
getfield
elimination pass, a.k.a. Scalar Replacements of Aggregates optimization.This pass is based on a local alias analysis that collects field information by def-use chain walking. It looks for struct allocation sites ("definitions"), and
getfield
calls as well as:foreigncall
s that preserve the structs ("usages"). If "definitions" have enough information, then this pass will replace corresponding usages with lifted values.mutable struct
s require additional cares and need to be handled separately from immutables. Formutable struct
s,setfield!
calls account for "definitions" also, and the pass should give up the lifting conservatively when there are any "intermediate usages" that may escape the mutable struct (e.g. non-inlined generic function call that takes the mutable struct as its argument).In a case when all usages are fully eliminated,
struct
allocation may also be erased as a result of dead code elimination.
The important observations are:
getfield_elim_pass!
is limited in a sense that:
Let's see successful examples first, and then understand limitations.
Here, the allocation site r = Ref{Int}(42)
is "safe" and has enough information,
and r[]
(inlined to getfield
) will be eliminated, and even more,
the allocation and setfield!
are voided out too:
julia> code_typed() do
r = Ref{Int}(0)
r[] = 42
b = sin(r[])
return b
end |> only
CodeInfo(
1 ─ %1 = Base.sitofp(Float64, 42)::Float64
│ %2 = invoke Base.Math.sin(%1::Float64)::Float64
└── return %2
) => Float64
getfield_elim_pass!
requires "safe" allocation.
So if we initialize r = Ref{}()
, then the whole optimization won't work:
julia> code_typed() do
r = Ref{Int}()
r[] = 42
b = sin(r[])
return b
end |> only
CodeInfo(
1 ─ %1 = %new(Base.RefValue{Int64})::Base.RefValue{Int64}
│ Base.setfield!(%1, :x, 42)::Int64
│ %3 = Base.getfield(%1, :x)::Int64
│ %4 = Base.sitofp(Float64, %3)::Float64
│ %5 = invoke Base.Math.sin(%4::Float64)::Float64
└── return %5
) => Float64
Still the later LLVM-level SROA may be able to optimize it further, and it seems to be more powerful than Julia-level one in general, e.g. it can "fully" optimize the above code (the argument of @j_sin_4761
is replaced with the scalar value double 4.20000e+01
,
and we don't have any allocation)
julia> code_llvm(()) do
r = Ref{Int}()
r[] = 42
b = sin(r[])
return b
end
; @ none:2 within `#49`
define double @"julia_#49_4759"() #0 {
top:
; @ none:4 within `#49`
; ┌ @ math.jl:1228 within `sin`
%0 = call double @j_sin_4761(double 4.200000e+01) #0
; └
; @ none:5 within `#49`
ret double %0
}
Now let's bother LLVM – we can just add non-inlined function call, then LLVM can't help giving up the optimization:
julia> @noinline setter!(r, x) = r[] = x
setter! (generic function with 1 method)
julia> code_llvm(()) do
r = Ref{Int}()
setter!(r, 42)
b = sin(r[])
return b
end
; @ none:2 within `#45`
define double @"julia_#45_4943"() #0 {
top:
%gcframe7 = alloca [3 x {}*], align 16
%gcframe7.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe7, i64 0, i64 0
%0 = bitcast [3 x {}*]* %gcframe7 to i8*
call void @llvm.memset.p0i8.i32(i8* nonnull align 16 dereferenceable(24) %0, i8 0, i32 24, i1 false)
%1 = call {}*** inttoptr (i64 140733735208165 to {}*** (i64)*)(i64 261) #6
; ┌ @ refpointer.jl:135 within `Ref`
; │┌ @ refvalue.jl:7 within `RefValue`
%2 = bitcast [3 x {}*]* %gcframe7 to i64*
store i64 4, i64* %2, align 16
%3 = load {}**, {}*** %1, align 8
%4 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe7, i64 0, i64 1
%5 = bitcast {}** %4 to {}***
store {}** %3, {}*** %5, align 8
%6 = bitcast {}*** %1 to {}***
store {}** %gcframe7.sub, {}*** %6, align 8
%ptls_field3 = getelementptr inbounds {}**, {}*** %1, i64 2305843009213693954
%7 = bitcast {}*** %ptls_field3 to i8**
%ptls_load45 = load i8*, i8** %7, align 8
%8 = call noalias nonnull {}* @jl_gc_pool_alloc(i8* %ptls_load45, i32 1392, i32 16) #2
%9 = bitcast {}* %8 to i64*
%10 = getelementptr inbounds i64, i64* %9, i64 -1
store atomic i64 4617487696, i64* %10 unordered, align 8
%11 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe7, i64 0, i64 2
store {}* %8, {}** %11, align 16
; └└
; @ none:3 within `#45`
%12 = call i64 @"j_setter!_4945"({}* nonnull %8, i64 signext 42) #0
; @ none:4 within `#45`
; ┌ @ refvalue.jl:56 within `getindex`
; │┌ @ Base.jl:38 within `getproperty`
%13 = load i64, i64* %9, align 8
; └└
; ┌ @ math.jl:1232 within `sin`
; │┌ @ float.jl:269 within `float`
; ││┌ @ float.jl:243 within `AbstractFloat`
; │││┌ @ float.jl:146 within `Float64`
%14 = sitofp i64 %13 to double
; │└└└
; │ @ math.jl:1234 within `sin`
%15 = call double @j_sin_4946(double %14) #0
%16 = load {}*, {}** %4, align 8
%17 = bitcast {}*** %1 to {}**
store {}* %16, {}** %17, align 8
; └
; @ none:5 within `#45`
ret double %15
}
LLVM can also be confused with some other Julia level semantics other than inter-procedural calls.
For example, typeassert
can pretty much null out LLVM SROA.
In the following code snippet, neither of Julia-level nor LLVM-level SROA fails to eliminate closure constructions:
julia> doit(f, args...) = f(args...)
julia> code_typed() do
local arr::Vector{Any}
doit() do
arr = Any[]
end
arr
end |> only
CodeInfo(
1 ─ %1 = Core.Box::Type{Core.Box}
│ %2 = %new(%1)::Core.Box
│ %3 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Any}, svec(Any, Int64), 0, :(:ccall), Vector{Any}, 0, 0))::Vector{Any}
│ Core.setfield!(%2, :contents, %3)::Vector{Any}
│ %5 = Core.isdefined(%2, :contents)::Bool
└── goto #3 if not %5
2 ─ goto #4
3 ─ $(Expr(:throw_undef_if_not, :arr, false))::Any
4 ┄ %9 = Core.getfield(%2, :contents)::Any
│ Core.typeassert(%9, Vector{Any})::Vector{Any}
│ %11 = π (%9, Vector{Any})
└── return %11
) => Vector{Any}
julia> code_llvm(()) do
local arr::Vector{Any}
doit() do
arr = Any[]
end
arr
end
; @ none:2 within `#55`
define nonnull {}* @"julia_#55_4772"() #0 {
pass:
; @ none:3 within `#55`
; ┌ @ none:1 within `doit`
; │┌ @ none:4 within `#56`
; ││┌ @ array.jl:423 within `getindex`
; │││┌ @ boot.jl:471 within `Array` @ boot.jl:452
%0 = call nonnull {}* inttoptr (i64 4483450916 to {}* ({}*, i64)*)({}* inttoptr (i64 4648194800 to {}*), i64 0)
; └└└└
; @ none:6 within `#55`
%1 = bitcast {}* %0 to i64*
%2 = getelementptr inbounds i64, i64* %1, i64 -1
%3 = load atomic i64, i64* %2 unordered, align 8
%4 = and i64 %3, -16
%5 = inttoptr i64 %4 to {}*
%6 = icmp eq {}* %5, inttoptr (i64 4648194800 to {}*)
br i1 %6, label %pass3, label %fail2
fail2: ; preds = %pass
call void @jl_type_error(i8* getelementptr inbounds ([11 x i8], [11 x i8]* @_j_str1, i64 0, i64 0), {}* inttoptr (i64 4648194800 to {}*), {}* %0)
unreachable
pass3: ; preds = %pass
ret {}* %0
}
# interesting fact:
# LLVM can optimize it away if we strip out the type declaration !
julia> code_llvm(()) do
local arr
doit() do
arr = Any[]
end
arr
end
; @ none:2 within `#55`
define nonnull {}* @"julia_#55_4959"() #0 {
pass:
; @ none:3 within `#55`
; ┌ @ none:1 within `doit`
; │┌ @ none:4 within `#56`
; ││┌ @ array.jl:423 within `getindex`
; │││┌ @ boot.jl:471 within `Array` @ boot.jl:452
%0 = call nonnull {}* inttoptr (i64 4431128612 to {}* ({}*, i64)*)({}* inttoptr (i64 4614886128 to {}*), i64 0)
; └└└└
; @ none:6 within `#55`
ret {}* %0
}
We want "better" Julia-level SROA and DCE. Especially, it should be:
typeassert
/isdefined
/isa
checks more aggressively so that we can eliminate more allocations[1]Target examples:
julia> @noinline setter!(r, x) = r[] = x
setter! (generic function with 1 method)
julia> code_typed() do
r = Ref{Int}()
setter!(r, 42)
b = sin(r[]) # should be turned into `sin(42)`
return b
end |> only
CodeInfo(
1 ─ %1 = %new(Base.RefValue{Int64})::Base.RefValue{Int64}
│ invoke Main.setter!(%1::Base.RefValue{Int64}, 42::Int64)::Any
│ %3 = Base.getfield(%1, :x)::Int64
│ %4 = Base.sitofp(Float64, %3)::Float64
│ %5 = invoke Base.Math.sin(%4::Float64)::Float64
└── return %5
) => Float64
# `RefValue{String}` alloction won't be eliminated
(s::String) -> broadcast(identity, Ref(s))
# abstract-intepretation based field information propagation
# should allow us to optimize this IR
2 1 ─ %1 = %new(Base.RefValue{String}, _2)::Base.RefValue{String} │╻╷╷ Ref
│ %2 = Core.tuple(%1)::Tuple{Base.RefValue{String}} │╻ broadcast
│ %3 = Core.getfield(%2, 1)::Base.RefValue{String} ││
└── goto #3 if not true ││╻╷ materialize
2 ─ nothing::Nothing │
3 ┄ %6 = Base.getfield(%3, :x)::String │││╻╷╷╷╷ copy
└── goto #4 ││││┃ getindex
4 ─ goto #5 ││││
5 ─ goto #6 │││
6 ─ goto #7 ││
7 ─ return %6 │
typeassert
check:
julia> code_typed() do
r = Ref{Any}()
r[] = 42
# this assertion is useful for inference
# but "harmful" for LLVM
return r[]::Int
end
1-element Vector{Any}:
CodeInfo(
1 ─ %1 = %new(Base.RefValue{Any})::Base.RefValue{Any}
│ Base.setfield!(%1, :x, 42)::Int64
│ %3 = Base.getfield(%1, :x)::Any
│ Core.typeassert(%3, Main.Int)::Int64
│ %5 = π (%3, Int64)
└── return %5
) => Int64
Julia allocates every actually-allocated mutable object on heap memory, while in ideal world we want to have them on stack if their lifetime doesn't escape from the execution frame.
An obvious example would be like this – Julia will allocate MyRef("foo")
even though it doesn't escape anywhere,
because @noinline
d getx
call will prevent both Julia-level and LLVM-level SROA,
and they can't eliminate the allocation:
julia> mutable struct MyRef{T}; x::T; end
julia> @noinline getx(x::MyRef) = x.x
getx (generic function with 1 method)
julia> code_typed(()) do
r = MyRef("foo")
x = getx(r)
sizeof(x)
end |> only
CodeInfo(
1 ─ %1 = %new(MyRef{String}, "foo")::MyRef{String}
│ %2 = invoke Main.getx(%1::MyRef{String})::String
│ %3 = Core.sizeof::typeof(Core.sizeof)
│ %4 = (%3)(%2)::Int64
└── return %4
) => Int64
julia> code_llvm(()) do
r = MyRef("foo")
x = getx(r)
sizeof(x)
end
; @ none:2 within `#75`
define i64 @"julia_#75_4792"() #0 {
top:
%0 = alloca {}*, align 8
%gcframe7 = alloca [3 x {}*], align 16
%gcframe7.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe7, i64 0, i64 0
%1 = bitcast [3 x {}*]* %gcframe7 to i8*
call void @llvm.memset.p0i8.i32(i8* nonnull align 16 dereferenceable(24) %1, i8 0, i32 24, i1 false)
%2 = call {}*** inttoptr (i64 140733733901541 to {}*** (i64)*)(i64 261) #7
; ┌ @ none:1 within `MyRef` @ none:1
%3 = bitcast [3 x {}*]* %gcframe7 to i64*
store i64 4, i64* %3, align 16
%4 = load {}**, {}*** %2, align 8
%5 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe7, i64 0, i64 1
%6 = bitcast {}** %5 to {}***
store {}** %4, {}*** %6, align 8
%7 = bitcast {}*** %2 to {}***
store {}** %gcframe7.sub, {}*** %7, align 8
%ptls_field3 = getelementptr inbounds {}**, {}*** %2, i64 2305843009213693954
%8 = bitcast {}*** %ptls_field3 to i8**
%ptls_load45 = load i8*, i8** %8, align 8
%9 = call noalias nonnull {}* @jl_gc_pool_alloc(i8* %ptls_load45, i32 1392, i32 16) #2
%10 = bitcast {}* %9 to i64*
%11 = getelementptr inbounds i64, i64* %10, i64 -1
store atomic i64 5398336928, i64* %11 unordered, align 8
%12 = bitcast {}* %9 to {}**
store {}* inttoptr (i64 5376165704 to {}*), {}** %12, align 8
%13 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe7, i64 0, i64 2
store {}* %9, {}** %13, align 16
; └
; @ none:3 within `#75`
store {}* %9, {}** %0, align 8
%14 = call nonnull {}* @j1_getx_4794({}* inttoptr (i64 5377254840 to {}*), {}** nonnull %0, i32 1)
; @ none:4 within `#75`
; ┌ @ Base.jl:175 within `sizeof`
%15 = bitcast {}* %14 to i64*
%16 = load i64, i64* %15, align 8
%17 = load {}*, {}** %5, align 8
%18 = bitcast {}*** %2 to {}**
store {}* %17, {}** %18, align 8
; └
ret i64 %16
}
The basic idea of this optimization is to modify LLVM IR so that we can allocate MyRef("foo")
on stack rather than heap.
We worked on this optimization at the first project iteration (GSoC term), but for now, I'd like to put lower priority on this optimization:
let
r = MyRef("foo")
x = getx(r)
sizeof(x)
end
let
sizeof("foo")
end
TODO: elaborate this.
finalize
r elisionJulia finalizer is managed by GC, and so they may hang around for a while as GC lets the mutable object to hang around. This means we may end up consuming much memory when finalizers are associated with e.g. resource managements, which is a common practice seen e.g. in GPU programming.
The idea of this optimization is to early-call a finalizer by inserting finalize(obj)
call when we can prove the lifetime of obj
finishes within a frame:
julia> mutable struct WithFinalizer
v
function WithFinalizer(v)
x = new(v)
f(t) = @async println("Finalizing $t.")
return finalizer(f, x)
end
end
julia> make_obj(v = 10) = WithFinalizer(v)
make_obj (generic function with 1 methods)
julia> let
o = make_obj()
copy(o.v) # <= automatically insert `finalize(o)` here
end
10
finalize(obj)
?
finalize(obj)
is super costly !finalizer
?finalize
on return site) ?Even if we have an useful inter-procedural cache,
escape analysis and SROA are better to run after inlining,
just because inter-procedural handling is much easier
we can just look for :invoke
expressions that the inlinear has resolved.
Caching:
CodeInstance
, andAbstractInterpreter
AbtractInterpreter
NOTE: agenda for the recent mtg should come first than others
For ImmutableArray
mutating_arrayfreeze
alias analysis?
BoundsError
BoundsError
lifetime information?
focus:
ImmutableArray
Union
-type, inter-proceduralImmutableArray
PR
taint analysis
easy /
hard:
code_typed((Any,)) do a
r = Ref{Any}(a)
r[] # => a
end
code_typed((Any,Any)) do a, b
r = Ref{Any}(a)
r[] = b
r[] # => b
end
# uninialized field
# https://github.com/JuliaLang/julia/pull/42834
code_typed((Any,)) do a
r = Ref{Any}()
r[] = a
r[] # => a
end
code_typed((Bool,Any,Any)) do c, a, b
r = Ref{Any}()
if cond
r[] = a
r[] # => a
else
r[] = b
r[] # => b
end
end
code_typed((Bool,Any,Any)) do cond, a, b
r = Ref{Any}(nothing)
if cond
r[] = a
else
r[] = b
end
r[] # => φ(a, b)
end
# uninialized field
# https://github.com/JuliaLang/julia/pull/43208
code_typed((Bool,Any,Any)) do cond, a, b
r = Ref{Any}()
if cond
r[] = a
else
r[] = b
end
r[] # => φ(a, b)
end
# `s -> broadcast(identity, Ref(s))`
code_typed((String,)) do s
r = Ref(s)
t = (r,)
t[1][] # => s
end
code_typed() do
local arr::Vector{Any}
alloc! = () -> arr = Any[]
alloc!()
arr
end
code_typed((Bool,Any,Any)) do cond, a, b
r1 = Ref{Any}(a)
r2 = Ref{Any}(b)
(cond ? r1 : r2)[] # φ(a, b)
# or => (cond ? r1[] : r2[])
end
code_typed((Bool,Any,)) do rare_cond, a
r = Ref{Any}(a)
if rare_cond
throw(r) # move `r` allocation here
else
return r[] # => a
end
end
@noinline safe_op(a) = ...
code_typed((Any,)) do a
r = Ref{Any}(a) # can be allocated on stack
safe_op(r)
end
[SROA]
[simple analysis]
[domination analysis]
[actual code transmation]
multiple SROA: benchmark
backward escape/alias/field analysis & load forwarding by domination analysis
# analysis
while [converge]
[backward escape/alias/field analysis]
end
# transformation
[domination analysis]
[SROA]
[load forwarding]
[DCE]
backward escape/alias analysis & forward field propagation
EscapeAnalysis.jl/avi/flowsensitive-aliasanalysis
# analysis
while [converge]
[backward escape/alias analysis]
while [converge]
[forward field information propagation]
end
end
# transformation
[SROA]
[simple load replacement]
[DCE]
forward escape/alias/field analysis
# analysis
while [converge]
[forward escape/alias analysis]
[forward field information propagation]
end
# transformation
[SROA]
[simple load replacement]
[DCE]
Problem:
Details:
finalize
call)local o
# finalize(f, o) # may escape in finalizer finalizer
try
unsafe_op(o)
catch err
# o.x = "bar" # lifetime overlap
end
err
through the evil jl_current_exception
Ideas ?
MethodError
: staticBoundsError
: runtimethrow
?
tl;dr, do escape analysis at optimization
reasons:
ImmutableArray
: copy elisionmore thoughts:
Const
and PartialStruct
Conditional
)more details:
ImmutableArray
inter-procedural
CodeInstance
?stdout
MethodInstance
, "x"MethodInstance
affect the state of "x"MethodInstance
ImmutableArray
PRAbstractInterpreter
can help benchmarkingVararg
handling is complicated, any hints ?ImmutableArray
typeassert
)
Array
s ?successful SROA
julia> code_typed((Int,)) do n
s = 0
for i in 1:n
r = Ref(i)
# r[] < 0 && throw(r) # very rare condition
s += r[]
end
s
end |> only
... # no allocation of `Base.RefValue{Int}`
unsuccessful SROA because of rare
throw
case (actuall it's impossible for this case)
julia> code_typed((Int,)) do n
s = 0
for i in 1:n
r = Ref(i)
r[] < 0 && throw(r) # very rare condition
s += r[]
end
s
end |> only
CodeInfo(
1 ── %1 = Base.sle_int(1, n)::Bool
│ %2 = Base.ifelse(%1, n, 0)::Int64
│ %3 = Base.slt_int(%2, 1)::Bool
└─── goto #3 if not %3
2 ── Base.nothing::Nothing
└─── goto #4
3 ── goto #4
4 ┄─ %8 = φ (#2 => true, #3 => false)::Bool
│ %9 = φ (#3 => 1)::Int64
│ %10 = φ (#3 => 1)::Int64
│ %11 = Base.not_int(%8)::Bool
└─── goto #12 if not %11
5 ┄─ %13 = φ (#4 => %9, #11 => %30)::Int64
│ %14 = φ (#4 => %10, #11 => %31)::Int64
│ %15 = φ (#4 => 0, #11 => %23)::Int64
│ %16 = %new(Base.RefValue{Int64}, %13)::Base.RefValue{Int64}
│ %17 = Base.getfield(%16, :x)::Int64
│ %18 = Base.slt_int(%17, 0)::Bool
└─── goto #7 if not %18
6 ── Main.throw(%16)::Union{}
└─── unreachable
7 ── %22 = Base.getfield(%16, :x)::Int64
│ %23 = Base.add_int(%15, %22)::Int64
│ %24 = (%14 === %2)::Bool
└─── goto #9 if not %24
8 ── Base.nothing::Nothing
└─── goto #10
9 ── %28 = Base.add_int(%14, 1)::Int64
└─── goto #10
10 ┄ %30 = φ (#9 => %28)::Int64
│ %31 = φ (#9 => %28)::Int64
│ %32 = φ (#8 => true, #9 => false)::Bool
│ %33 = Base.not_int(%32)::Bool
└─── goto #12 if not %33
11 ─ goto #5
12 ┄ %36 = φ (#10 => %23, #4 => 0)::Int64
└─── return %36
) => Int64
fan fact:
unoptimize_throw_blocks
makes SROA sad
julia> code_typed((Int,)) do n
s = 0
for i in 1:n
r = Ref(i)
r[] < 0 && throw(r[]) # very rare condition
s += r[]
end
s
end |> only
CodeInfo(
1 ── %1 = Base.sle_int(1, n)::Bool
│ %2 = Base.ifelse(%1, n, 0)::Int64
│ %3 = Base.slt_int(%2, 1)::Bool
└─── goto #3 if not %3
2 ── Base.nothing::Nothing
└─── goto #4
3 ── goto #4
4 ┄─ %8 = φ (#2 => true, #3 => false)::Bool
│ %9 = φ (#3 => 1)::Int64
│ %10 = φ (#3 => 1)::Int64
│ %11 = Base.not_int(%8)::Bool
└─── goto #12 if not %11
5 ┄─ %13 = φ (#4 => %9, #11 => %31)::Int64
│ %14 = φ (#4 => %10, #11 => %32)::Int64
│ %15 = φ (#4 => 0, #11 => %24)::Int64
│ %16 = %new(Base.RefValue{Int64}, %13)::Base.RefValue{Int64}
│ %17 = Base.getfield(%16, :x)::Int64
│ %18 = Base.slt_int(%17, 0)::Bool
└─── goto #7 if not %18
6 ── %20 = Base.getindex(%16)::Any
│ Main.throw(%20)::Union{}
└─── unreachable
7 ── %23 = Base.getfield(%16, :x)::Int64
│ %24 = Base.add_int(%15, %23)::Int64
│ %25 = (%14 === %2)::Bool
└─── goto #9 if not %25
8 ── Base.nothing::Nothing
└─── goto #10
9 ── %29 = Base.add_int(%14, 1)::Int64
└─── goto #10
10 ┄ %31 = φ (#9 => %29)::Int64
│ %32 = φ (#9 => %29)::Int64
│ %33 = φ (#8 => true, #9 => false)::Bool
│ %34 = Base.not_int(%33)::Bool
└─── goto #12 if not %34
11 ─ goto #5
12 ┄ %37 = φ (#10 => %24, #4 => 0)::Int64
└─── return %37
) => Int64
EscapeLattice
enough, or does it need more props ?Vararg
handling …julia> @noinline function foo(s...)
string(s)
end
foo (generic function with 1 method)
# when is `Vararg` introduced ?
julia> code_typed((String,)) do s
foo(s)
end
1-element Vector{Any}:
CodeInfo(
1 ─ %1 = invoke Main.foo(_2::String)::String
└── return %1
) => String
julia> code_typed((String,)) do s
foo(s, s)
end
1-element Vector{Any}:
CodeInfo(
1 ─ %1 = invoke Main.foo(_2::String, _2::Vararg{String})::String
└── return %1
) => String
julia> code_typed((String,)) do s
foo(s, s, s)
end
1-element Vector{Any}:
CodeInfo(
1 ─ %1 = invoke Main.foo(_2::String, _2::Vararg{String}, _2)::String
└── return %1
) => String
# it seems like inference/optimizer has dealt with `foo(::String, ::String)` !?
julia> only(methods(foo)).specializations
svec(MethodInstance for foo(::String), MethodInstance for foo(::String, ::String), MethodInstance for foo(::String, ::String, ::String), nothing, nothing, nothing, nothing, MethodInstance for foo(::String, ::Vararg{String}))
typeassert
can simply be eliminatedisdefined
/ isa
/ ===
branches (maybe <:
too ?)
julia> code_typed() do
local arr::Vector{Any}
doit() do
arr = Any[]
end
arr
end |> only
CodeInfo(
1 ─ %1 = Core.Box::Type{Core.Box}
│ %2 = %new(%1)::Core.Box
│ %3 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Any}, svec(Any, Int64), 0, :(:ccall), Vector{Any}, 0, 0))::Vector{Any}
│ Core.setfield!(%2, :contents, %3)::Vector{Any}
│ %5 = Core.isdefined(%2, :contents)::Bool # <= step 2
└── goto #3 if not %5 # <= step 2
2 ─ goto #4
3 ─ $(Expr(:throw_undef_if_not, :arr, false))::Any # <= step 2
4 ┄ %9 = Core.getfield(%2, :contents)::Any
│ Core.typeassert(%9, Vector{Any})::Vector{Any} # <= step 1
│ %11 = π (%9, Vector{Any})
└── return %11
) => Vector{Any}
observations: we may not be able to prove effect-free-ness of something (e.g. typeassert
)
when it does escape in a limited scope, only allocate object when thrown
s = 0
for i in 1:n
arr = [i] # want to eliminate this allocation
s += arr[1] # may throw,
end
s
let
r = Ref(x)
s = r::Base.RefValue{String}
r[]
end
conditions … ?
dynamic check to enable stack allocation ?
let
# type-tag needs to be on stack (next to the object) so that GC can see it ..?
# 1. immutable points to reference to heap object (`Any`)
# but the `Any` object might be able to be allocated on stack as well
# type-tag (size unknown ?)
# 2. if immutable escapes, allocate it on the heap
s = Some{Any}(10) # <= points to heap allocted (if escapes)
end
llvm-alloc-opt.cpp
: SROA pass on heap memory
gc.c
: memory management (allocation, finalization)
This optimization can be developed standalone.
For example, we can eliminate the else
branch in the following code snippet w/o escape analysis
if we have more aggressive DCE:
julia> code_typed(()) do
r = Ref{Any}(42)
a = r[]
if isa(a, Int)
return 0
else
return nothing
end
end
1-element Vector{Any}:
CodeInfo(
1 ─ %1 = (42 isa Main.Int)::Bool
└── goto #3 if not %1
2 ─ return 0
3 ─ return Main.nothing
) => Union{Nothing, Int64}
There seems to be a way to circumvent this escape. Elaborated here. ↩︎