Intensive Prior Research on Escape Analysis

Prior Researches

Format:

  1. what ?
  2. what is the point of the idea ?
  3. how is it interesting compared to prior works ?
  4. how is it validated ?
  5. is there any remaining discussion ?
  6. what's next to read ?

(JVM) Escape Analysis in the Context of Dynamic Compilation and Deoptimization

  1. what ?

    • intra-procedural and inter-procedural algorithm of escape analysis
      • with the support for dynamic class loading and "deoptimization"[1]
    • target optimization: SROA / synchronization removal[2] / stack allocation
    • by-product: a better inlining heuristic based on escape information
  2. what is the point of the idea ?

    • intra-procedural analysis: forward abstract interpretation on SSA from
      • escape state:
        • local objects (SSA allocation): fixed-length array
        • fields: growing-length array
        • NOTE: propagate locals escapes to fields
      • control flow handling (i.e. phi-node handling)
        • if any operand of phi-function escapes, propagate the escapes to any other (since then "global SROA doesn't make any sense"[3])
        • use equi-escape set (EES) to propagate escape information among the related phi-operands
          • EES is implemented as an union-find algorithm with path compression
    • inter-procedural analysis: propagate the ("formal"[4]) arguments escape information as a method descriptor
      • purpose: stack allocation (i.e. identifies objects that can be allocated on the caller stack)
        • benefits of stack allocation:
        • cheap allocation (reuse the same stack slot for allocation in a loop)
        • efficient field load (access can be relative to frame's base pointer)
        • (efficient field store: possible but note that actual argument might be heap-allocated)
      • by-product: inter-procedural escape information turned out to be a good inlining heuristics
        • inline methods up to twice as large as the normal threshold if there is at least one parameter that does neither escape the caller nor the callee

    • runtime support
      • support for deoptimization:
        • debugging information keeps a record of how SROA and synchronization removal was performed by the compiler
      • GC modification for stack allocation:
        • A stack object must not be moved in memory, but the heap objects referenced by its fields need to be visited and kept alive. Therefore, we modified the garbage collector so that it does not visit stack objects but treats pointer fields within stack objects as root pointers

  3. how is it interesting compared to prior works ?

    • As as good summary of a basic escape analysis design, and basic target optimization design
    • Their idea to use equi-escape set to handle control flow effects
  4. how is it validated ?

    • SROA validation: insert runtime assertion (run without SROA but with assertions on otherwise-replaced values)
    • performance: benchmarks
  5. is there any remaining discussion ?

    • Is "graph" (EES) is really needed to handle control flow ? Couldn't backward analysis make it more simpler ?
    • How they limit the lattice of field escapes (since its size isn't limited) ? # of SSA statements is finite, and # of allocations is finite, thus field states is also finite
  6. what's next to read ?

    • F. Vivien and M. Rinard. Incrementalized pointer and
      escape analysis. In Proceedings of the ACM SIGPLAN
      Conference on Programming Language Design and
      Implementation, pages 35–46, Snowbird, June 2001.
      • general observation on allocations, proposes incremental analysis

(JVM) Partial Escape Analysis and Scalar Replacement for Java

  1. what ?
    • flow-sensitive escape analysis and scalar replacements
      • (with a support for deoptimization from speculative execution)
    • target optimizations: SROA, lock elision (on individual branches)
  2. what is the point of the idea ?
    • In many cases, however, an object escapes just in a single unlikely branch.

      ​​​​​​Object getValue(int idx, Object ref) {
      ​​​​​​    Key key = new Key(idx, ref);
      ​​​​​​    if (key.equals(cacheKey)) {
      ​​​​​​        return cacheValue;             // not escape
      ​​​​​​    } else {
      ​​​​​​        cacheKey = key;                // escapes to the static field
      ​​​​​​        cacheValue = createValue(...);
      ​​​​​​        return cacheValue;
      ​​​​​​    }
      ​​​​​​}
      
    • the basic idea is to transform above snippet into something like:
      ​​​​​​Object getValue(int idx, Object ref) {
      ​​​​​​    if (/*scalar-replaced operations*/ ...) {
      ​​​​​​        return cacheValue;
      ​​​​​​    } else {
      ​​​​​​        Key key = new(idx, ref);       // "materiazation"
      ​​​​​​        cacheKey = key;
      ​​​​​​        cacheValue = createValue(...);
      ​​​​​​        return cacheValue;
      ​​​​​​    }
      ​​​​​​}
      
    • concepts
      • "partial escape analysis": flow-sensitive escape analysis
      • "materialization": materialize allocation to where it escapes
    • analysis: forward, flow-sensitive, intra-procedural
      • allocation state
        ​​​​​​​​​​class Id extends Node {...} // allocation identity
        ​​​​​​​​​​class ObjectState {}
        ​​​​​​​​​​class VirtualState extends ObjectState {
        ​​​​​​​​​​    int lockCount;
        ​​​​​​​​​​    Node[] fields;
        ​​​​​​​​​​}
        ​​​​​​​​​​class EscapedState extends ObjectState {
        ​​​​​​​​​​    Node materializedValue;
        ​​​​​​​​​​}
        ​​​​​​​​​​class State { // each Node has this state
        
        ​​​​​​​​​​    Map<Id, ObjectState> states;
        
        ​​​​​​​​​​    /**
        ​​​​​​​​​​     * Loading an object with `VirtualState` from a field of another
        ​​​​​​​​​​     * `VirtualState` object inserts a new entry into this map, so
        ​​​​​​​​​​     * that the `Load node can be recognized as referring to the
        ​​​​​​​​​​     * `VirtualState` object during further processing.
        ​​​​​​​​​​     *
        ​​​​​​​​​​     * If any of the operands of a node is key in this `aliases` map,
        ​​​​​​​​​​     * then the node needs to be examined also.
        ​​​​​​​​​​     */
        ​​​​​​​​​​    Map<Node, Id> aliases;
        ​​​​​​​​​​}
        
      • state propagation
        • each "node" has the state (=> flow-sensitivity)
        • each node updates its state using the states from previous nodes
          • Merge node: merge control-flows
          • loop entry node: merge loop backedges (could be more than two ?)
          • others: just propagate
  3. how is it interesting compared to prior works ?
    • flow-sensitive SROA (and "materialization")
  4. how is it validated ?
    • performance benchmarks: reduced allocated memory by up to 58.5%, improves performance by up to 33%
  5. is there any remaining discussion ?
    • in worst case scenario, it may lead to bigger code size if "materialization" happens at both branches ?
    • how can it be extended to eliminate array allocations ?
  6. what's next to read ?
    • A. Shankar, M. Arnold, and R. Bodik. Jolt:
      lightweight dynamic analysis and removal of object
      churn. In Proceedings of the ACM SIGPLAN
      Conference on Object-Oriented Programming Systems,
      Languages, and Applications, pages 127–142. ACM
      Press, 2008.
      • a lightweight dynamic analysis to reduce excessive creation of short-lived objects, by guiding inlining so that escape analysis is more effective
    • P. Molnar, A. Krall, and F. Brandner. Stack allocation
      of objects in the CACAO virtual machine. In
      Proceedings of the International Conference on the
      Principles and Practice of Programming in Java,
      pages 153–161. ACM Press, 2009
      • similar approach as Kotzmann and Mössenböck to perform stack allocation on Cacao VM
    • V8: very local analysis (looking only at the usages of the allocation, independently)

Notes:

  • "Zone": heap areas with a known, limited lifetime. The whole area can be freed in bulk when a certain scope is left.

(JVM) Current state of EA and its uses in the JVM

This talk summarizes the status of escape analysis and related optimizations in Java community:

  • 3 main optimizations:
    1. monitor elision (lock elision)
    2. stack allocation
    3. scalar replacement
  • C2 vs. Graal vs. OpenJ9
    • C2: flow-insensitive escape analysis, global SROA
    • Graal: flow-sensitive escape analysis, partial SROA
    • OpenJ9: flow-sensitive escape analysis, partial SROA & stack allocation !
  • SROA vs. stack allocation: SA can be applied if the control flow is complicated:
    e.g. motivating example:
    ​​public static int simple_math_myobject_valueof(int val1, int val2) {
    ​​    // these allocations can't be eliminated because they are represented as
    ​​    // phi nodes of (global object, scalar-replaceable object)
    ​​    MyObject int1 = MyObject.valueOf(val1);
    ​​    MyObject int2 = MyObject.valueOf(val2);
    ​​    return int1.add(int2);
    ​​}
    
    ​​public MyOjbect valueOf(int val) {
    ​​    if (0 <= val && val <= 2) {
    ​​        return _cache[val];
    ​​    } else {
    ​​        return new MyObject(val);
    ​​    }
    ​​}
    
  • challenges with stack allocation
    • GC extension
    • lifetime overwrap detection
    • "heapification" at deoptimization
  • prototyped code can be found at: https://github.com/eclipse-openj9/openj9/blob/master/runtime/compiler/optimizer/EscapeAnalysis.cpp
    • complicated, 9432 lines of C++ code

(LuaJIT) Allocation Sinking Optimization

  1. what ?
    • a variant of code motion, which moves an allocation to where it's really needed (e.g. in a fallback path), built on top of flow-sensitive SROA enhanced by LuaJIT's "advanced alias analysis"
  2. what is the point of the idea ?
    • observations:
      • global escape analysis is pointless especially in dynamic languages, where their code typically has many fallback paths that often escape things
      • escapes often happen in uncommon paths, and allocations are avoidable in fast path
    • implementation: "mark & sweep"
      • (mark) marks all allocations that cannot be sunk
        • this phase is essentially one instance of escape analysis
        • intra-procedural, backward, flow-insensitive[5]
      • (sweep) tags the unmarked allocations and the related stores as sunk
  3. how is it interesting compared to prior works ?
    • one of the most earlier approaches of flow-sensitive SROA ?
  4. how is it validated ?
  5. is there any remaining discussion ?
  6. what's next to read ?

(JavaScript) Escape Analysis in V8

  1. what ?
    • speculative escape analysis and allocation elimination with load replacement (SROA)
    • forward, flow-sensitive, allocation analysis
      • each escape site will refine analysis state
      • state update can be costly since each "node" has state and # of nodes is large
      • => "functional map data structure" solved it
  2. how is it interesting compared to prior works ?
    • implement escape analysis on "unscheduled" SSA form
      • "unscheduled"
        • scheduled graph: one state per basic block
        • unscheduled: one state per "effectful node"
      • "functional map data structure" turns out to be successful
    • handle V8 deoptimization
  3. how is it validated ?
  4. is there any remaining discussion ?
  5. what's next to read ?

(LLVM) Alias Analysis

  • LLVM Alias Analysis Infrastructure
    • alias method
    • getModRefInfo(CS1, CS2): memory effect test between function calls
      • helper methods: doesNotAccessMemory / onlyReadsMemory
    • interesting: interface methods to keep the analysis state up-to-date even after other consumer transformations
  • analysis passes:
    • basic-aa: walks through domination
    • TBAA: walk through type domain

Other references

Observations

General transition of memory optimization:

  1. SROA: eliminates allocations
  2. partial SROA (allocation sinking): eliminates allocations in the existence of simpler branches
  3. stack allocation: eases allocation cost in the existence of complicated branches

Analysis design

General design choices:

  • flow-insensitive vs. flow-sensitive: simply, analysis complexity vs.analysis accuracy
  • intra-procedural vs. inter-procedural[6]:
    • intra-procedural for SROA
    • inter-procedural for stack allocation
  • forward vs. backward:
    • forward: easier to follow control-flow, suited for SROA
    • forward analysis will invalidate previous state whenever it sees new escape information
      (which may lead to more iterations to reach a fixpoint)
    • backward: easier to propagate escape information, suited for general escape analysis

Escape analyses in the wild:

  • (JVM C2) SROA: flow-insensitive, intra-procedural, forward
  • (JVM Graal) partial SROA: flow-sensitive, intra-procedural, forward,
  • (JVM OpenJ9) partial SROA & stack allocation: flow-sensitive, inter-procedural, ???
  • (JavaScript V8) partial SROA: flow-sensitive, intra-procedural, forward,
  • (LuaJIT) allocation sinking (partial SROA): flow-insensitive[5:1], intra-procedural, two-phases analysis
    1. backward analysis for escape analysis (collect escape information about allocation)
    2. forward analysis for partial SROA (to tag sunk allocations)

What's for Julia

status quo, and future

  • SROA and stack allocation
    • flow-insensitive SROA :+1: (:warning:[7])
      ​​​​​​struct Point
      ​​​​​​    x::Float64
      ​​​​​​    y::Float64
      ​​​​​​end
      ​​​​​​add(a::Point, b::Point) = Point(a.x + b.x, a.y + b.y)
      ​​​​​​function compute()
      ​​​​​​    a = Point(1.5, 2.5)
      ​​​​​​    b = Point(2.25, 4.75)
      ​​​​​​    for i in 0:(100000000-1)
      ​​​​​​        a = add(add(a, b), b)
      ​​​​​​    end
      ​​​​​​    a.x, a.y
      ​​​​​​end
      
    • flow-sensitive SROA :disappointed:
      ​​​​​​struct Key
      ​​​​​​    idx::Int
      ​​​​​​    ref::Symbol
      ​​​​​​end
      ​​​​​​const CACHED_KEY = Ref(Key(0, :orig))
      ​​​​​​@inline eqkey(k1, k2) = k1.idx == k2.idx && k1.ref === k2.ref
      
      ​​​​​​function getref(idx, ref)
      ​​​​​​    key = Key(idx, ref) # move the allocation into the `if` clause
      ​​​​​​    if !eqkey(key, CACHED_KEY[])
      ​​​​​​        CACHED_KEY[] = key
      ​​​​​​    end
      ​​​​​​    return key.ref
      ​​​​​​end
      
    • stack allocation :drooling_face:
      ​​​​​​let
      ​​​​​​    if cond
      ​​​​​​        key = CACHED_KEY[]
      ​​​​​​    else
      ​​​​​​        key = Key(idx, ref) # can we avoid this allocation ?
      ​​​​​​        CACHED_KEY[] = key
      ​​​​​​    end
      ​​​​​​    return key.ref # key::ϕ(global object, allocation)
      ​​​​​​end
      
  • DCE, more constant-folding/inference: SROA can derive new type information
    • Java: every field is statically-typed (mostly ?)
    • Julia: untyped, Union-typed fields
  • third consumers of escape information
    • ImmutableArray
    • some taint analysis for efficient contribution prop' for Diffractor
  • some implementation ideas
    1. separate "state" into two parts
      • backward escape analysis: general usages, allocation movement (for partial SROA)
        • with escape state per basic block (rather than per SSA statement)
      • forward allocation analysis: for SROA
    2. SROA at inference time:
      • insight: we already have a decent forward analysis
      • inference time
        • (effect analysis) inter-procedural memory-effect propagation
          • form PartialStruct for mutables
        • (SROA, tagging): constant-folding or as encoded lattice property at inference
          • make PartialStruct.fields contain Replaceable(::SlotNumber)
        • (DCE, constant-folding): inference is already flow-sensitive
      • optimization time
        • (SROA, replacement): look for getfield(...)::Union{Const,Replaceable}
        • (SROA) allocation sinking: new code motion pass
        • escape analysis: just summarizes the result of inference for succeeding optimizations
          • what's needed ?
            • lifetime query

  • Q. "SROA at inference"?

    • Valentin: what kind of information do you want to propagate around ?
    • Shuhei (revisit)
      • memory effect analysis should be presented to inference
      • SROA tagging during inference:
        • may complicates thing:
          • brings a part of optimization into inference
          • the domain of inference lattice gets complicated: it now encodes CFG information
          • we need to keep tags up-to-date throughout many optimizations, including slot2ssa, inlining
    • idea: reuse existing flow-sensitive, forward, analysis to implement more "decent" SROA
    • stack allocation:
      • non-escape argument: ok
      • return value: (change calling convention)
        • calling convention:
          • LLVM level: struct pointer (*)
        • mutability is still very hard to handle
    • how information propagates ?
      • LLVM: rerun analysis
      • on high-level Julia IR: how to handle inter-procedural information
  • why on LLVM ?

    • existed for long
    • opportunities for more optimizations using LLVM infrastructure
  • problems:

    • intra-procedural
    • stack allocation
  • more researches

    • publish review paper: before something novel
  • focus on compactness ?

  • other compilers (tracing JIT) vs. Julia (ahead of runtime)

    • predictability
    • implementation complexities of tracing-JIT:
      • complexity vs. optimization opportunities

  1. This seems to be such a concept that is very specific to tracing-JIT compilers,
    where VM execution sometimes transfers the control to the interpreter
    in a case any speculative assumption for optimization gets invalidated. ↩︎

  2. "synchronization removal" here seems to simply means "lock elision" ↩︎

  3. This statement seems to reflect their purpose of SROA: SROA is done for allocation elimination and an object allocation can't be eliminated if there is any site it escapes. ↩︎

  4. They seem to use "formal" when referring to declared method arguments, and "actual" to those actually passed at callsite. ↩︎

  5. "How LuaJIT encodes flow-sensitivity ?"
    LuaJIT's tracing-JIT nature separates an unlikely branch into a separate snapshot.
    It allows escapes that happens in the rare snapshots not to escape the objects in the main trace.
    In other word, necessary flow-sensitivity is encoded within the nature of tracing JIT. ↩︎ ↩︎

  6. Well, usually, the "inter-procedural" part isn't so challenging: basically, inter-procedural
    information is encoded within callee method descriptor, and the main lifting is to translate
    the information in caller context. ↩︎

  7. Even flow-insensitive SROA sometimes fails on Julia-level.
    For example, if Point is declared as mutable struct, Julia-level SROA fails on the above example.
    But for such cases, succeeding LLVM passes will optimize them away. ↩︎

Select a repo