or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
Julia Escape Analysis Project
Targets
As midterm objectives of this project, we will try to implement following optimizations:
Heap-to-stack allocation conversionfinalize
r elisionEnhanced SROA & DCE
The current state
Julia's compiler does SROA (Scalar-Replacement-of-Aggregates) optimization on two levels:
getfield_elim_pass!
The purpose of this optimization is to enhance the Julia-level SROA with escape analysis.
The current SROA pass would be described as (per #42357):
The important observations are:
getfield_elim_pass!
is limited in a sense that:Let's see successful examples first, and then understand limitations.
Here, the allocation site
r = Ref{Int}(42)
is "safe" and has enough information, andr[]
(inlined togetfield
) will be eliminated, and even more, the allocation andsetfield!
are voided out too:getfield_elim_pass!
requires "safe" allocation. So if we initializer = Ref{}()
, then the whole optimization won't work:Still the later LLVM-level SROA may be able to optimize it further, and it seems to be more powerful than Julia-level one in general, e.g. it can "fully" optimize the above code (the argument of
@j_sin_4761
is replaced with the scalar valuedouble 4.20000e+01
, and we don't have any allocation)Now let's bother LLVM – we can just add non-inlined function call, then LLVM can't help giving up the optimization:
LLVM can also be confused with some other Julia level semantics other than inter-procedural calls. For example,
typeassert
can pretty much null out LLVM SROA. In the following code snippet, neither of Julia-level nor LLVM-level SROA fails to eliminate closure constructions:Enhanced SROA & More aggressive DCE
We want "better" Julia-level SROA and DCE. Especially, it should be:
typeassert
/isdefined
/isa
checks more aggressively so that we can eliminate more allocations[1]Target examples:
typeassert
check:Heap to stack allocation conversion
The idea
Julia allocates every actually-allocated mutable object on heap memory, while in ideal world we want to have them on stack if their lifetime doesn't escape from the execution frame.
An obvious example would be like this – Julia will allocate
MyRef("foo")
even though it doesn't escape anywhere, because@noinline
dgetx
call will prevent both Julia-level and LLVM-level SROA, and they can't eliminate the allocation:The basic idea of this optimization is to modify LLVM IR so that we can allocate
MyRef("foo")
on stack rather than heap.Do we really need this ?
We worked on this optimization at the first project iteration (GSoC term), but for now, I'd like to put lower priority on this optimization:
Allocation movement
TODO: elaborate this.
finalize
r elisionJulia finalizer is managed by GC, and so they may hang around for a while as GC lets the mutable object to hang around. This means we may end up consuming much memory when finalizers are associated with e.g. resource managements, which is a common practice seen e.g. in GPU programming.
The idea of this optimization is to early-call a finalizer by inserting
finalize(obj)
call when we can prove the lifetime ofobj
finishes within a frame:Considerations
finalize(obj)
?finalize(obj)
is super costly !finalizer
?finalize
on return site) ?Pipeline Design
Even if we have an useful inter-procedural cache, escape analysis and SROA are better to run after inlining, just because inter-procedural handling is much easier we can just look for
:invoke
expressions that the inlinear has resolved.Caching:
CodeInstance
, andDevelopment Plan
Schedule
AbstractInterpreter
AbtractInterpreter
Discussions
NOTE: agenda for the recent mtg should come first than others
2021/12/17
For
ImmutableArray
mutating_arrayfreeze
alias analysis?
BoundsError
BoundsError
lifetime information?
focus:
ImmutableArray
Union
-type, inter-proceduralImmutableArray
PRtaint analysis
2021/11/24
Agenda
Implementation of field/alias analysis for enhanced SROA
Allocation-elimination targets
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Design of analysis/optimization
multiple SROA: benchmark
backward escape/alias/field analysis & load forwarding by domination analysis
backward escape/alias analysis & forward field propagation
EscapeAnalysis.jl/avi/flowsensitive-aliasanalysis
forward escape/alias/field analysis
Discussion
"Non-semantic" copy for stack allocation
Problem:
Details:
finalize
call)err
through the eviljl_current_exception
Ideas ?
MethodError
: staticBoundsError
: runtimethrow
?2021/11/10
Agenda
[ ] discussion on "non-semantic" copyWhen to run EA
tl;dr, do escape analysis at optimization
reasons:
ImmutableArray
: copy elisionmore thoughts:
Const
andPartialStruct
Conditional
)more details:
ImmutableArray
inter-procedural
CodeInstance
?stdout
MethodInstance
, "x"MethodInstance
affect the state of "x"MethodInstance
2021/10/13
Agenda
ImmutableArray
PRAbstractInterpreter
can help benchmarkingVararg
handling is complicated, any hints ?Optimization ideas
ImmutableArray
"allocation movement"
typeassert
)Array
s ?so what is needed ?
EscapeLattice
enough, or does it need more props ?Vararg
handling …2021/10/06
Next actions
DCE plan
typeassert
can simply be eliminatedisdefined
/isa
/===
branches (maybe<:
too ?)More optimization ideas
observations: we may not be able to prove effect-free-ness of something (e.g.
typeassert
)when it does escape in a limited scope, only allocate object when thrown
conditions … ?
dynamic check to enable stack allocation ?
don't need analysis …?llvm-alloc-opt.cpp
: SROA pass on heap memorygc.c
: memory management (allocation, finalization)2021/09/29
Agenda
more optimization ideas ?
This optimization can be developed standalone. For example, we can eliminate the
else
branch in the following code snippet w/o escape analysis if we have more aggressive DCE:julia> code_typed(()) do r = Ref{Any}(42) a = r[] if isa(a, Int) return 0 else return nothing end end 1-element Vector{Any}: CodeInfo( 1 ─ %1 = (42 isa Main.Int)::Bool └── goto #3 if not %1 2 ─ return 0 3 ─ return Main.nothing ) => Union{Nothing, Int64}
There seems to be a way to circumvent this escape. Elaborated here. ↩︎