EscapeAnalysis.jl Overhaul

--- file-created: 2025-01-05 file-modified: 2025-02-01T21:15 --- #julia #design-docs #escape-analysis This document discusses the overhaul of the existing Julia-level escape analysis (a.k.a. `EscapeAnalysis.jl`), aiming to enhance its precision and efficiency while enabling optimizations for practical use cases. The current overhaul particularly focuses on achieving the following objectives: - Flow-sensitive load forwarding - Inter-procedural alias analysis - Improved SROA (Scalar-Replacement-of-Aggregates) with allocation sinking # Motivative Examples ## Optimizing Closure Call with Flow-Sensitive Load-Forwarding The primary focus of optimization this time is "capturing closures." For example, consider the following simple code: ```julia function issue15276_1(a) local x return (function (a) x = sin(a) return a + x end)(a) end ``` ```julia julia> descend(issue56561_1, (Float64,); optimize=true) issue56561_1(a) @ Main REPL[15]:1 Body::Any (!c,!e,!n,!t,!s,!m,!u,!o,!r) 2 1 ─ %1 = %new(Core.Box)::Core.Box │╻ Box 3 │ %2 = %new(Main.:(var"#issue56561##4#issue56561##5"), %1)::var"#issue56561##4#issue56561##5" │ %3 = invoke %2(a::Float64)::Any │ └── return %3 │ Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark. Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [o]ptimize, [d]ebuginfo, [r]emarks, [e]ffects, e[x]ception types, [i]nlining costs. Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code Actions: [E]dit source code, [R]evise and redisplay Advanced: dump [P]arams cache. • %3 = invoke #issue56561##4(::Float64) (!c,!e,!n,!t,!s,!m,!u,!o,!r) ↩ (::var"#issue56561##4#issue56561##5")(a) @ Main REPL[15]:3 Body::Any (!c,!e,!n,!t,!s,!m,!u,!o,!r) 4 1 ─ %1 = invoke Main.sin(a::Float64)::Float64 │ │ %2 = builtin Core.getfield(#self#, :x)::Core.Box │ │ builtin Core.setfield!(%2, :contents, %1) │ 5 │ %4 = builtin Core.getfield(#self#, :x)::Core.Box │ │ %5 = builtin Core.isdefined(%4, :contents)::Bool │ └── goto #3 if not %5 │ 2 ─ %7 = builtin Core.getfield(%4, :contents)::Any │ │ %8 = dynamic (a + %7)::Any │ └── return %8 │ 3 ─ $(Expr(:throw_undef_if_not, :x, false)) │ └── unreachable │ Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark. Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [o]ptimize, [d]ebuginfo, [r]emarks, [e]ffects, e[x]ception types, [i]nlining costs. Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code Actions: [E]dit source code, [R]evise and redisplay Advanced: dump [P]arams cache. • %1 = invoke sin(::Float64)::Float64 (+c,+e,!n,+t,+s,+m,+u,+o,+r) %8 = runtime < +(::Float64,::Any)::Any (!c,!e,!n,!t,!s,!m,!u,!o,!r) … ↩ ``` As discussed in [JuliaLang/julia#15276](https://github.com/JuliaLang/julia/issues/15276), even simple code like this can fail to perform type inference correctly when captured variables are present. Resolving this issue during the abstract interpretation phase (a.k.a. "type inference") would require escape analysis. However, since the results of escape analysis are most useful in the optimization phase, this document focuses on performing analysis directly on `IRCode`, similar to what `EscapeAnalysis.jl` currently does. (For further details, see [[#Idea Integrate Alias Analysis into Abstract Interpretation?]]). Examining the generated code closely reveals that in this case, the captured variables represented by `%2` and `%4` are (aliased to) mutable struct, and their allocation doesn't happen in the closure call (`(::var"#issue56561##4#issue56561##5")(a)`). As a result, load-forwarding when optimizing the call of the closure is not performed. However, by analyzing the code's data flow, it is immediately clear that `%1` is aliased to the field of `%2` and `%4`. If we could analyze such flow-sensitive[^1] alias information, it would be possible to derive new type information that could not be determined during the abstract interpretation phase. For instance, we could infer that the type of `%7 = builtin Core.getfield(%4, :contents)::Any` is `Float64`. With the improved type information for `%7`, we could [re-infer](https://github.com/JuliaLang/julia/pull/56687) `%8 = dynamic (a + %7)::Any`, not only improving the return type but also enabling inlining of the closure call into the caller context. This would allow Scalar Replacement of Aggregates (SROA) for the `Core.Box` allocation of the captured variable, resulting in significant performance improvements. ## Optimizing Closure Call with Inter-Procedural Alias Analysis So far, we have examined cases where captured variables are assigned within the closure. Now, let us consider cases where captured variables are assigned outside the closure: ```julia function issue56561_2(a) x = sin(a) x = identity(x) return (a -> a + x)(a) end ``` ```julia julia> only(Base.code_ircode(issue56561_2, (Float64,))) 2 1 ─ %1 = %new(Core.Box)::Core.Box │╻ Box │ %2 = invoke Main.sin(_2::Float64)::Float64 │ │ builtin Core.setfield!(%1, :contents, %2)::Float64 3 │ %4 = builtin Core.isdefined(%1, :contents)::Bool │ └── goto #4 if not %4 │ 2 ─ nothing::Nothing │ 3 ─ %7 = π (%1, Core.PartialStruct(Core.Box, Any[Any])) │ │ %8 = builtin Core.getfield(%7, :contents)::Any │ │ builtin Core.setfield!(%7, :contents, %8)::Any │ 4 │ %10 = %new(Main.:(var"#issue56561##2#issue56561##3"), %7)::Core.PartialStruct(var"#issue56561##2#issue56561##3", Any[Core.PartialStruct(Core.Box, Any[Any])]) │ %11 = invoke %10(_2::Float64)::Any │ └── return %11 │ 3 4 ─ $(Expr(:throw_undef_if_not, :x, false))::Any │ └── unreachable │ => Any ``` In this case, load-forwarding isn't performed for the captured variable represented by `%1 = %new(Core.Box)`. However, as before, tracing the data flow of the code makes it evident what values are stored in the fields of `%1` and `%7`. Using that alias information, the type of `%10 = %new(Main.:(var"#issue56561##2#issue56561##3"), %7)` could be refined to something like `::Core.PartialStruct(var"#issue56561##2#issue56561##3", Any[Core.PartialStruct(Core.Box, Any[Float64])])`. Using the improved type for `%10`, it would then be possible to re-infer the type of the call `invoke %10(…)`, thereby improving the return type. Similarly to the above case, this would also enable inlining of `invoke %10`, significantly enhancing performance. However, care must be taken regarding the seemingly straightforward statement that this "the type of `%10` could be refined to something like `::Core.PartialStruct(…, Any[Core.PartialStruct(Core.Box, Any[Float64])])`". This is because propagating the non-`const` fields of a mutable struct with a refined type beyond its declared type as a `PartialStruct` is generally invalid when mutability is involved (since it is unclear when and where those fields may be overwritten). Load-forwarding avoids such constraints by explicitly tracking the propagation of "definition"/"load" information. In this sense, what is effectively required here can be described as "inter-procedural load-forwarding." To "improve" the type of `%10` and subsequently re-infer `invoke %10`, it must be ensured that the fields of `%10` are not modified within `invoke %10`. In other words, inter-procedural alias analysis is necessary to propagate alias information from the callee back to the caller context (`issue56561_2`). For further details on optimizing cases where captured variables are modified both inside and outside closures, refer to [[#On Cases Where Captured Variables Are Modified Both Inside and Outside Closures]]. ## Allocation Optimization with "Sinking" If we could implement escape/alias analysis capable of optimizing the above capturing closures, we could also consider "allocation sinking" as an allocation optimization for cases like the following (xref: [[#Partial Escape Analysis and Scalar Replacement for Java]]). For example, consider the following code: ```julia mutable struct Key idx::Int ref Key(idx::Int, @nospecialize(ref)) = new(idx, ref) end import Base: == key1::Key == key2::Key = key1.idx == key2.idx && key1.ref === key2.ref global cache_key::Key global cache_value function get_value(idx::Int, ref) global cache_key, cache_value key = Key(idx, ref) if key == cache_key return cache_value else cache_key = key cache_value = create_value(key) return cache_value end end ``` ```julia julia> only(Base.code_ircode(get_value, (Int,Any))) 3 1 ── %1 = %new(Main.Key, _2, _3)::Key │╻ Key 4 │ %2 = Main.cache_key::Key │ │ %3 = builtin Base.getfield(%1, :idx)::Int64 │╻ == │ %4 = builtin Base.getfield(%2, :idx)::Int64 ││┃ getproperty │ %5 = builtin (%3 === %4)::Bool ││╻ == └─── goto #3 if not %5 ││ 2 ── %7 = builtin Base.getfield(%1, :ref)::Any ││╻ getproperty │ %8 = builtin Base.getfield(%2, :ref)::Any │││ │ %9 = builtin (%7 === %8)::Bool ││ └─── goto #4 ││ 3 ── goto #4 ││ 4 ┄─ %12 = φ (#2 => %9, #3 => false)::Bool │ └─── goto #6 if not %12 │ 5 5 ── %14 = Main.cache_value::Any │ └─── return %14 │ 6 ── nothing::Nothing │ 7 ── nothing::Nothing │ 7 8 ── (Main.cache_key = %1)::Key │ 8 │ %19 = Main.create_value::Any │ │ %20 = dynamic (%19)(%1)::Any │ │ %21 = builtin Core.get_binding_type(Main, :cache_value)::Type │ │ %22 = builtin (%20 isa %21)::Bool │ └─── goto #10 if not %22 │ 9 ── goto #11 │ 10 ─ %25 = dynamic Base.convert(%21, %20)::Any │ 11 ┄ %26 = φ (#9 => %20, #10 => %25)::Any │ │ (Main.cache_value = %26)::Any │ 9 │ %28 = Main.cache_value::Any │ └─── return %28 │ => Any ``` Here, "allocation sinking" refers to the following optimizations: - First, use context-sensitive load-forwarding to replace the following `getfield` calls with the stored values: - `%3 = builtin Base.getfield(%1, :idx)::Int64` => `%3 = _2` - `%7 = builtin Base.getfield(%1, :ref)::Any` => `%7 = _3` - Next, move ("sink") the allocation of `%1 = %new(Main.Key, _2, _3)::Key` to blocks after the basic block 8, thereby eliminating the allocation in cases where the condition `if key == cache_key` is hit. - Note that to sink the allocation to blocks after the basic block 8, it is necessary to track the liveness of `%1`. - The basic block 8 is the nearest common dominator of the liveness of `%1` except those eliminated `getfield` calls # Understading Escape/Alias Analysis - Alias-analysis is required for implementing a correct escape analysis - **"may" vs. "must" alias** - "may-alias": Essential for correct escape analysis (conservative information) - "must-alias": Necessary for actual optimizations such as SROA - Separately analyzing these pieces of information—first resolving "may-alias" and then constructing "must-alias"—might be inefficient. - **Backward/Forward** - **Forward** analysis: effective for analyzing **alias** information - Since aliasing happens at "definition" sites - e.g. aliasing via assignment, aliasing via `setfield!` to `getfield` chain - **Backward** analysis: effective for analyzing **escape** information - Since escape happens at "use" sites - **Control-Flow Insensitive vs Control-Flow Sensitive** - Analysis along the control flow enables more load-forwarding even for objects that later escape. - Alternatively, by moving the allocation site ("allocation sinking"), it is also possible to reduce allocations in specific branches. - Flow-sensitive analysis is made easier with forward analysis. # New Analysis Design ## Definitions ```julia abstract type MemoryInfo end struct UninitializedMemory <: MemoryInfo end struct AliasedMemory <: MemoryInfo alias::Any # anything that is valid as IR elements (e.g. `SSAValue`, `Argument`, `GlobalRef`, literals), or `IdSet{Any}` of them maybeundef::Bool # required when `AliasedMemory` is merged with `UninitializedMemory` end abstract type ObjectInfo end struct HasUnanalyzedMemory <: ObjectInfo end struct HasUnknownMemory <: ObjectInfo end struct HasIndexableMemory <: ObjectInfo MemoryInfos::Vector{MemoryInfo} end struct EscapeInfo ReturnEscape::BitSet ThrownEscape::BitSet Escape::BitSet ObjectInfo::ObjectInfo end struct BlockEscapeState escapes::Vector{EscapeInfo} # escapes for `Argument`s and `SSAValue`s aliasset::AliasSet # disjoint set to maintain groups of aliased values mustaliased::BitVector end struct EscapeState bbescapes::Vector{BlockEscapeState} ssamemoryinfo::Vector{MemoryInfo} # keep statement-wise memory info for SROA optimization end ``` ## Local Analysis - Forward, Flow-sensitive Analysis: - Manage the analysis state similarly to `typeinf_local`. - Maintain escape and alias information for each basic block. - Use a working set-based algorithm to avoid unnecessary traversals. - The analysis targets the escape/alias information of `SSAValue` and `Argument`. - For simplicity, the analyzed `SSAValue` and `Argument` are collectively referred to as **"values"** in the following. - `aliasset` manages "alias groups," which group aliasing "values." - It is efficient to implement this as a disjoint set. - Aliases represented by `aliasset` is usually "may-alias" possibilities - `mustaliased` is a `BitVector` of length `length(values)` that indicates whether each value is "must-alias." - While only managing `aliasset` is sufficient to propagate escapes, `mustaliased` is necessary to allow strict updates of `ObjectInfo`, as described later. - Thus, this analysis must distinguish between "may-alias" and "must-alias" when handling aliasing. - E.g., `y = PiNode(x, typ)` results in `x` and `y` being "must-alias," whereas `z = PhiNode(x, y)` makes `z`, `x`, and `y` "may-alias." - Think about the following example: `builtin Base.setfield!(%7, :x, "foo")::String` shouldn't strongly-update the `ObjectInfo` of `%7`, which may be aliased to `%2`, `%4` and `%5`. ```julia julia> Base.code_ircode((Bool,String,String,)) do c, x, y local w if c z = w = Ref(x) else z = Ref(y) end if !c z[] = "foo" end if @isdefined w return w[] end nothing end 1-element Vector{Any}: 3 1 ─ goto #3 if not _2 │ 4 2 ─ %2 = %new(Base.RefValue{String}, _3)::Base.RefValue{String} Ref └── goto #4 ││┃│ RefValue 6 3 ─ %4 = %new(Base.RefValue{String}, _4)::Base.RefValue{String} Ref 8 4 ┄ %5 = φ (#2 => %2, #3 => #undef)::PartialStruct(Base.RefValue{String}, Any[String]) │ %6 = φ (#2 => true, #3 => false)::Bool │ │ %7 = φ (#2 => %2, #3 => %4)::PartialStruct(Base.RefValue{String}, Any[String]) │ %8 = intrinsic Base.not_int(_2)::Bool │╻ ! └── goto #6 if not %8 │ 9 5 ─ builtin Base.setfield!(%7, :x, "foo")::String │╻╷ setindex! 11 6 ┄ goto #8 if not %6 │ 12 7 ─ %12 = builtin Base.getfield(%5, :x)::String │╻╷ getindex └── return %12 │ 8 ─ return Main.nothing │ => Union{Nothing, String} ``` - `escapes` is a `Vector{EscapeInfo}` of length `length(values)` that manages the escape information of "values." - "Values" within the same "alias group" share the same `EscapeInfo`. - Propagation of `BlockEscapeState`: - The impact of control flow must be expressed through the merging of `BlockEscapeState`s from branches. - `escapes` can be merged element-wise by applying the `EscapeInfo` merge operator: `⊔ₑ`/`⊑ₑ`. - [x] #COMBAK Think about how we can merge `aliasset`s from different blocks [created:: 2024-11-29] - The method for merging two different disjoint sets is not entirely straightforward, but I believe it can be achieved using the following approach: By traversing all the elements that belong to the same set in the source disjoint set `curraliasset` and performing `union!` operations in the target disjoint set `nextaliasset` to ensure that these elements belong to the same set: ```julia function merge_aliassets!(curraliasset::AliasSet, nextaliasset::AliasSet) anychanged = false for xidx = 1:length(curraliasset.parents) xroot = find_root!(curraliasset, xidx) if xroot ≠ xidx if !in_same_set(nextaliasset, xidx, xroot) union!(nextaliasset, xidx, xroot) anychanged |= true end end end return anychanged end ``` - About `ObjectInfo`: - Initialized with `HasUnanalyzedMemory`. - Updates to `ObjectInfo` for `x` occur with expressions like `x = Expr(:new, …)` or `setfield!(x, …)`: - `x = Expr(:new, typ, a, b, c)` updates `x.ObjectInfo` to something like `HasIndexableMemory(MemoryInfo[AliasedMemory(a, false), AliasedMemory(b, false), AliasedMemory(c, false)])`. - Similarly, `setfield!(x, f, v)` updates the relevant field in `x.ObjectInfo.MemoryInfos`. - Note there are two types of updates: - **Strong update**: Destructively updates any existing `MemoryInfo` for `x`. - This is particularly important for flow-sensitive load-forwarding. - Strong updates are only possible if `x` does not alias with anything or if the alias group of `x` is "must-alias." - **Weak update**: Merges new `MemoryInfo` with the existing `MemoryInfo` of `x`. - If `x` escapes anywhere, its `ObjectInfo` must be set to `HasUnknownMemory`. - In such cases, `a`, `b`, `c`, and `v` are "may-alias" with `x`, and their escape information is shared with `x`. - If the number of fields in `x` is unknown, an alternative like `HasUnindexableMemory` may be considered. However, since optimization in such cases is unlikely, this is not prioritized. - Operators `⊔ₘ`/`⊑ₘ` need to be implemented. - The effect of `y = getfield(x, …)` depends on the state of `x`: - If `x` has `HasUnknownMemoryInfo`: - Consider the possibility that `y` aliases with `x`. - Share `x`'s escape information with `y` (via aliasing). - If `x` has `HasIndexableMemoryInfo`: - Lookup the corresponding field in `MemoryInfo` for `y`. - If it is `UninitializedMemoryInfo`, `UndefRefError` error must occur at runtime. - If it is `AliasedMemoryInfo`, aliasing occurs between `y` and the "values" in `alias`, and their escape information is shared. - `ssamemoryinfo`: - Particularly useful for subsequent load-forwarding optimizations, it tracks aliased values in `getfield` calls. - The resulting `EscapeState` ultimately retains only the block-level escape information (`bb_escapes`). - However, for multiple `setfield!`/`getfield` calls within the same basic block, statement-wise information may be needed. - Alternatively, a `Dict{Int, MemoryInfo}` might be more efficient. - Computing global escape information from `EscapeState`: ^globalescapeinfo - Merge the escape information from all blocks. - Such merging is necessary to obtain information for the entire method when caching escape information for an `Argument` to propagate it to the caller context ([[#How to Cache the Escape Information of the "callee-context"]]). ## Inter-Procedural Analysis ### How to Cache the Escape Information ```julia struct ArgEscapeInfo ReturnEscape::Bool ThrownEscape::Bool Escape::Bool ObjectInfo::ObjectInfo end struct ArgAliasing # records aliasing between arguments aidx::Int # index into `argescapes` bidx::Int # index into `argescapes` end struct ArgEscapeCache argescapes::Vector{ArgEscapeInfo} argaliases::Vector{ArgAliasing} end ``` Here, `ArgEscapeInfo` can be created by converting from the merged `EscapeInfo` for an `Argument` across all blocks[[#^globalescapeinfo]]. This conversion is mostly straightforward, but one important consideration is that additional handling is required if the `alias` field of `ObjectInfo::AliasedMemory` contains `SSAValue`. Let's move this discussion forward using the following concrete example: ```julia const gyy = Ref(Ref("julia")) function someunsafefunc() println("someunsafefunc called") gyy[] = Ref("julia2") end function callerfunc(a::String, b::String) xxx = Ref(Ref(Ref(a))) y = Ref(b) calleefunc(xxx) xxx[][] = y someunsafefunc() return xxx[][][] # returns `b` end @noinline function calleefunc(xxx) yy = Ref(Ref("julia")) xxx[] = yy nothing end ``` `yy` escapes only through being stored in the field of `xxx`, which is an argument of `calleefunc`, but since it is replaced by `y` within `callerfunc`, it does not escape `callerfunc`. Moreover, if these modifications to the fields of `xxx` are correctly taken into account, it is also possible to deduce that `xxx[][][]` returns `b` at the final `return` statement. When considering escape analysis for this example, it is possible that the `ObjectInfo` of `xxx` within `callerfunc` retains `SSAValue`s corresponding to `Ref(Ref(a))` and `yy = Ref(Ref("julia"))`. What represented by the latter is essentially a memory in `calleefunc`, and that `SSAValue` does not hold meaning in the context of the local analysis on `callerfunc`. Thus, it must be converted into something sensible in the context of `callerfunc` memory representation and cached appropriately. As a specialized data structure to represent such "memory within the callee-context," this document proposes `CalleeMemory`. Two approaches to handling this data can be considered: 1. Treat `CalleeMemory` as unanalyzable within local escape analysis. 2. Consider the identity of each `SSAValue` representing memory in the callee and correspondingly create and cache `CalleeMemory` for each, taking into account the escape information and aliasing for each `CalleeMemory`. The approach 1 is much simpler than the approach 2. In essence, it gives up on analyzing memory set into the fields of arguments passed to a callee during the invocation of the callee. Applied to the example above, this means the memory held by `xxx[]` in `calleefunc` is considered unanalyzable. Consequently, in `callerfunc`, the statement `xxx[][] = y` would analyze `y` as potentially escaping to an unknown location[^2]. [^2]: Think `calleefunc` may be defined like `@noinline calleefunc(xxx) = (xxx[] = gyy; nothing)`. In contrast, the approach 2 should retain precision, allowing for accurate analysis that `y` does not escape (in the context of `callerfunc`) at `callerfunc`'s `xxx[][] = y`. ~~On the other hand, if `CalleeMemory` represents distinct memory within the context of `calleefunc`, it necessitates an extension of the local escape analysis state for `callerfunc`. Specifically, the definition of `BlockEscapeState` must be expanded so that its `escapes` and `aliasset` can accommodate individual instances of `CalleeMemory`. It is not difficult to imagine that implementing such an extension would be highly complex. Moreover, it is not immediately clear how to propagate such `CalleeMemory` further to the caller of `callerfunc`.~~ ~~And also, even if it cannot be proven that `y` does not escape `callerfunc`, the flow-sensitive escape analysis for `callerfunc` would still allow load-forwarding for `xy`. This means that the optimization targeted in this discussion can still be achieved with the approach 1, even if it doesn't achieve the best analysis precision.~~ ~~In conclusion, due to these implementation challenges, the approach 1 will be adopted for this iteration. `CalleeMemory` represents some memory within the callee-context. However, the memory referenced by `CalleeMemory` is indistinguishable and its escape information is unanalyzable and must be treated effectively as a mutable `GlobalRef`. Therefore, `CalleeMemory` can simply be a singleton that acts as a special token. When converting `EscapeInfo` into `ArgEscapeInfo`, all `SSAValue`s stored in `(ArgEscapeAnalysis.ObjectInfo::AliasedMemory).alias` are converted into `CalleeMemory`. Additionally, any other memory that aliases with these `SSAValue` entries must also be treated as escaping.~~ From an implementation perspective, the approach 1 is significantly simpler and more straightforward than the approach 2. This is because, in approach 1, `CalleeMemory` represents the memory in the callee context without distinguishing between identities. In contrast, approach 2 requires extending both `BlockEscapeState` and `ArgEscapeCache` to distinguish the identities of each `CalleeMemory`. Now, let's discuss how approach 2 can be implemented in practice: ```julia struct CachedCalleeMemory id::Int end struct CachedEscapeInfo ReturnEscape::Bool ThrownEscape::Bool Escape::Bool ObjectInfo::ObjectInfo end struct CachedAliasInfo # records aliasing between cached elements aidx::Int # index into `escapes` bidx::Int # index into `escapes` end struct EscapeCache escapes::Vector{CachedEscapeInfo} # `escapes` now may have more than `nargs` elements, including `CachedEscapeInfo` for `CachedCalleeMemory(nargs+id)` and the returned value aliases::Vector{CachedAliasInfo} end struct CalleeMemory id::Int end struct BlockEscapeState escapes::Vector{EscapeInfo} # escapes for `Argument`s and `SSAValue`s aliasset::AliasSet # disjoint set to maintain groups of aliased values mustaliased::BitVector end struct EscapeState bbescapes::Vector{BlockEscapeState} ssamemoryinfo::Vector{MemoryInfo} # keep statement-wise memory info for SROA optimization callee_memory_map::Dict{Int,UnitRange{Int}} # pc -> startindex:endindex? end ``` > [!note] > The above example appears to be analyzed by the current EscapeAnalysis.jl as like what the approach 2 would achieve, but this is due to the current implementation of EscapeAnalysis.jl failing to properly propagate the `EscapeInfo.AliasInfo` of `Argument`. So this is actually a correctness issue in EscapeAnalysis.jl. ### How to Initialize the `EscapeInfo` of `Argument` Arguments同士のエイリアスの可能性を考慮した場合、ローカル解析の正しさが保証できなくなる？ Consider `CallerMemory(id::Int)`, which represents memory passed from the caller. `CallerMemory` is a virtual IR element that holds the same meaning as `Argument` in the escape lattice. `CallerMemory` can be thought of as a generalized form of `Argument`. In the following, the escape information associated with `CallerMemory` is expressed as `CallerEscape()`. It is important to note that `CallerEscape()` is inherently unanalyzable. Specifically, its `ThrownEscape`, `ObjectInfo` and `Liveness` cannot be accurately determined, as it is unclear when and how they may change. Similarly, its alias information cannot be precisely tracked. For example, even in a very simple case like the one below, load-forwarding cannot be performed (because the potential aliasing between `a` and `b` must be considered): ```julia code_escapes((Base.RefValue{Char},Base.RefValue{Char})) do a, b a[] = 'a' b[] = 'b' a[] # might be `'b'` end ``` However, ==it is possible to observe and record the escape information==, i.e., while direct optimizations (including load-forwarding) cannot be performed on `CallerMemory`, we can still record escape information for `CallerMemory` while avoiding widening it to `⊤` and expand it in the caller context. This can help improve the precision of analysis within the caller context. Now, let us discuss how to analyze objects that possess `CallerEscape` in details. #TODO ~~Next, we consider how `EscapeInfo` for `Argument` should be handled in the first place.~~ ~~First, since all `Arguments` are passed from the caller context, their `ReturnEscape` should be initialized as `ReturnEscape(0) == BitSet((0,))`. Additionally, we need to account for the possibility of "may-alias" between `Argument`s[[#^argalias]]. Conservatively, we must consider the aliasing possibility for all `Arguments`. However, one idea to reduce the "may-alias" possibilities is to use type information[[#^type-based-alias-analysis]]. Here, we are focusing on the aliasing possibilities between `Argument`s themselves, so just performing `hasintersect`-like analysis on the argument types would be sufficient enough, while the aliasing possibilities between `Arguments` and their fields will be discussed later. #COMBAK Additionally, in some cases, it might be worth considering inter-procedural forward escape analysis that directly propagates alias information from the caller context, similar to const-prop' in type inference.~~ ~~Next, we will discuss how to handle aliasing between memory in the fields of an `Argument` and the `Argument` itself or other IR elements, using the following example:~~ ```julia const gxs = Ref{Any}() function callerfunc(s::String) x = Ref(s) calleefunc(x, gxs) nothing end function calleefunc(x, xs) xs[] = x # `x` may escape here return x end # aliasing between `HasIndexableMemoryInfo.alias` function callerfunc(s::String) zs = Ref(s) xxs = Ref(zs) yys = Ref(zs) calleefunc(xxs, yys) end function calleefunc(xxs, yys) xs = xxs[] gxs[] = xs return yys[] end ``` - The `ObjectInfo` of `Argument` is initialized as either `HasUnknownMemory` or `HasIndexableMemoryInfo`. - If it is `HasIndexableMemoryInfo`, it must represent that it is aliased with memory in the caller context. - To achieve this, a special element `struct CallerMemory` is introduced to represent "caller context memory" as an allowable element stored in the `alias` field of `AliasedMemory`. ```julia struct CachedCallerMemory argidx::Int fldidx::Int end struct EscapeCache escapes::Vector{CachedEscapeInfo} # `escapes` now may have more than `nargs` elements, including `CachedEscapeInfo` for `CachedCalleeMemory(nargs+id)` and the returned value and `CachedCallerMemory` aliases::Vector{CachedAliasInfo} num_callee_memory::Int num_caller_memory::Int end struct CallerMemory argidx::Int fldidx::Int end struct BlockEscapeState escapes::Vector{EscapeInfo} # escapes for `Argument`s and `SSAValue`s aliasset::AliasSet # disjoint set to maintain groups of aliased values mustaliased::BitVector end struct EscapeState bbescapes::Vector{BlockEscapeState} ssamemoryinfo::Vector{MemoryInfo} # keep statement-wise memory info for SROA optimization callee_memory_map::Dict{Int,UnitRange{Int}} # pc -> startindex:endindex? num_caller_memory::Int end ``` ### How to Convert the `ArgEscapeCache` into the "caller-context" - [ ] #COMBAK Think about `ArgEscapeCache` conversion [created:: 2024-11-29] # References ## Partial Escape Analysis and Scalar Replacement for Java - [[Partial Escape Analysis and Scalar Replacement for Java]] - URL: <https://www.ssw.uni-linz.ac.at/Research/Papers/Stadler14/Stadler2014-CGO-PEA.pdf> - Flow-sensitive escape analysis aiming to achieve a better SROA with allocation sinking - Motivative example: "sink" the allocation of `key` to the else branch, cutting away the allocation for the case when the cache is hit ```java class Key { int idx; Object ref; Key(int idx, Object ref) { this.idx = idx; this.ref = ref; } synchronized boolean equals(Key other) { return idx === other.idx && ref === other.ref; } } static CacheKey cacheKey static Object cacheValue Object getValue(int idx, Object ref) { Key key = new Key(idx, ref); if (key.equals(cacheKey)) { return cacheValue; } else { // `key` escapes here cacheKey = key; cacheValue = createValue(...); return cacheValue; } } ``` ## Escape Analysis in the Context of Dynamic Compilation and Deoptimization - [[Escape Analysis in the Context of Dynamic Compilation and Deoptimization]] - URL: <https://www.usenix.org/legacy/events/vee05/full_papers/p111-kotzmann.pdf> - This is what basically implemented within `EscapeAnalysis.jl` (modulo the inter-procedural alias handling) ## CS 6120 - Lesson 9: Alias Analysis - <https://www.cs.cornell.edu/courses/cs6120/2020fa/lesson/9/> - alias analysis is inherently inter-procedural analysis ^argalias ```c bool foo(char *a, char *b) { *a = 'a'; *b = 'b'; return *a == 'a'; } ``` - analysis answers: - "must alias": useful - maybe: less useful - "must not alias": useful # Appendix ## Idea: Integrate Alias Analysis into Abstract Interpretation? One idea is that incorporating alias analysis directly into type inference, which serves as the foundation for optimization, could make subsequent optimizations easier and more efficient. For example, consider the inferred code for `issue56561_2(::Float64)` in [[#Optimizing Closure Call with Inter-Procedural Alias Analysis]]: ```julia julia> only(code_typed(issue56561_2, (Float64,); optimize=false)) CodeInfo( 1 ─ Core.NewvarNode(:(#56))::Any │ (x@_4 = Core.Box())::Core.Box │ %3 = dynamic Main.sin(a)::Float64 │ %4 = x@_4::Core.Box │ builtin Core.setfield!(%4, :contents, %3)::Float64 │ %6 = Main.identity::Core.Const(identity) │ %7 = x@_4::Core.Box │ %8 = builtin Core.isdefined(%7, :contents)::Bool └── goto #3 if not %8 2 ─ goto #4 3 ─ Core.NewvarNode(:(x@_5))::Any └── x@_5::Union{} 4 ┄ %13 = x@_4::Core.PartialStruct(Core.Box, Any[Any]) │ %14 = builtin Core.getfield(%13, :contents)::Any │ %15 = dynamic (%6)(%14)::Any │ %16 = x@_4::Core.PartialStruct(Core.Box, Any[Any]) │ builtin Core.setfield!(%16, :contents, %15)::Any │ %18 = Main.:(var"#issue56561_2##0#issue56561_2##1")::Core.Const(var"#issue56561_2##0#issue56561_2##1") │ %19 = x@_4::Core.PartialStruct(Core.Box, Any[Any]) │ (#56 = %new(%18, %19))::Core.PartialStruct(var"#issue56561_2##0#issue56561_2##1", Any[Core.PartialStruct(Core.Box, Any[Any])]) │ %21 = #56::Core.PartialStruct(var"#issue56561_2##0#issue56561_2##1", Any[Core.PartialStruct(Core.Box, Any[Any])]) │ %22 = dynamic (%21)(a)::Any └── return %22 ) => Any ``` The idea of incorporating alias information into type inference could proceed as follows: - Update the type of `%4 == x@_4` to `PartialStruct(Core.Box, Any[Base.RefValue{Float64}])` after `setfield!(%4, :contents, %3)`. - Once this is achieved, it becomes possible to infer the types of operations like `Core.getfield(%13, :contents)` and `Core.setfield!(%16, :contents, %15)`. - Similarly, the type of `%16 == x@_4` can be inferred as `PartialStruct(Core.Box, Any[Base.RefValue{Float64}])` after `Core.setfield!(%16, :contents, %15)`. - [ ] Propagating this `PartialStruct` using standard constant propagation or IR interpretation allows the closure call to be easily inferred and optimized. However, in order to implement this idea correctly, it becomes necessary to "downgrade" the type of a variable if it escapes, ultimately requiring EA and AA: - For instance, if there is a call like `escapefunc(%4)`, the type of `%4` must be downgraded from `PartialStruct(Core.Box, Any[Base.RefValue{Float64}])` to `Core.Box`. - Additionally, if there is a seemingly non-escaping call like `escapefunc(xxx)`, it must be proven that `xxx` does not alias with `x@_4`. That said, this issue could potentially be resolved by devising a method to perform EA/AA along with type inference. Since the EA/AA discussed in this document is a forward/flow-sensitive analysis, integrating it into the existing abstract interpretation algorithm may not be impossible. #COMBAK Valentine's idea: Integrate each distinguished memory location information into the type lattice (to deduce the "must-not" alias information) ```julia function f(a, b) a .= b end function g(a) b = rand(length(a)) # not aliasing to `a` f(a, b) end ``` ## On Cases Where Captured Variables Are Modified Both Inside and Outside Closures Let us consider how cases where captured variables are modified both inside and outside closures can be optimized using the EA/AA approach being proposed. ```julia function issue56561_3(a) x = sin(a) return (function (a) x = identity(x) return a + x end)(a) end ``` ```julia julia> descend(issue56561_3, (Float64,); optimize=true) issue56561_3(a) @ Main REPL[41]:1 Body::Any (!c,!e,!n,!t,!s,!m,!u,!o,!r) 2 1 ─ %1 = %new(Core.Box)::Core.Box │╻ Box │ %2 = invoke Main.sin(a::Float64)::Float64 │ │ builtin Core.setfield!(%1, :contents, %2) │ 3 │ %4 = %new(Main.:(var"#issue56561_3##0#issue56561_3##1"), %1)::var"#issue56561_3##0#issue56561_3##1" │ %5 = invoke %4(a::Float64)::Any │ └── return %5 │ Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark. Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [o]ptimize, [d]ebuginfo, [r]emarks, [e]ffects, e[x]ception types, [i]nlining costs. Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code Actions: [E]dit source code, [R]evise and redisplay Advanced: dump [P]arams cache. %2 = invoke sin(::Float64)::Float64 (+c,+e,!n,+t,+s,+m,+u,+o,+r) • %5 = invoke #issue56561_3##0(…)::Any (!c,!e,!n,!t,!s,!m,!u,!o,!r) ↩ (::var"#issue56561_3##0#issue56561_3##1")(a) @ Main REPL[41]:3 Body::Any (!c,!e,!n,!t,!s,!m,!u,!o,!r) 4 1 ─ %1 = builtin Core.getfield(#self#, :x)::Core.Box │ │ %2 = builtin Core.isdefined(%1, :contents)::Bool │ └── goto #5 if not %2 │ 2 ─ %4 = builtin Core.getfield(%1, :contents)::Any │ │ %5 = builtin Core.getfield(#self#, :x)::Core.Box │ │ builtin Core.setfield!(%5, :contents, %4) │ 5 │ %7 = builtin Core.getfield(#self#, :x)::Core.Box │ │ %8 = builtin Core.isdefined(%7, :contents)::Bool │ └── goto #4 if not %8 │ 3 ─ %10 = builtin Core.getfield(%7, :contents)::Any │ │ %11 = dynamic (a + %10)::Any │ └── return %11 │ 4 ─ $(Expr(:throw_undef_if_not, :x, false)) │ └── unreachable │ 4 5 ─ $(Expr(:throw_undef_if_not, :x, false)) │ └── unreachable │ Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark. Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [o]ptimize, [d]ebuginfo, [r]emarks, [e]ffects, e[x]ception types, [i]nlining costs. Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code Actions: [E]dit source code, [R]evise and redisplay Advanced: dump [P]arams cache. • %11 = runtime < +(::Float64,::Any)::… (!c,!e,!n,!t,!s,!m,!u,!o,!r) > ↩ ``` In such cases, since the fields of `%4` are actually being modified inside `invoke %4(a::Float64)`, the approach described in [[#Optimizing Closure Call with Inter-Procedural Alias Analysis]] cannot be applied. It is only possible to re-infer `invoke %4` with `PartialStruct` carrying the precise field type if inter-procedural alias analysis confirms that the field types do not change in the closure call. To achieve this, it is necessary to integrate type information into the alias analysis. [^1]: Referring to such information as "flow-sensitive" might sound unconventional. However, in this case, simply analyzing the def-chain of `getfield`/`setfield!` operations for `%2 == %4` globally would not yield the precise alias information described above. Instead, it is necessary to perform data-flow analysis to update and manage alias information along the control flow.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.