# RefCount Refactoring slide: https://hackmd.io/@j-/rJ8v-AVAD#/ --- --- # Intrusive RefCount ```cpp template <class Derived> class CRefCount { long refs_{1}; public: CRefCount(const CRefCount&) = delete; void AddRef() { ++refs_; } void Release() { if (--refs_ == 0) delete static_cast<Derived*>(this); } }; ``` ---- ## Derived ```cpp class CBitSet : public CRefCount<CBitSet> {}; ``` ---- ## Pros 1. Avoid the "split control block" pitfall of `shared_ptr` 1. Optimization for some particular workloads ---- ## Cons: 1. Losing `const`. (Compared to the idiom of `shared_ptr<const T>`) 1. Losing `weak_ptr`. (Have to avoid circularity) --- # Manual Reference Counting: The explict calls to `AddRef` and `Release` at the right spot to manage object lifetime. ---- ## Example 1 ```cpp COrderSpec *CPhysicalIndexOnlyScan::PosDerive( CMemoryPool *CExpressionHandle &) const override { m_pos->AddRef(); return m_pos; } ``` ```cpp CPhysicalIndexOnlyScan::~CPhysicalIndexOnlyScan() { m_pindexdesc->Release(); m_pos->Release(); } ``` ---- ## Example 2 <!-- .slide: style="font-size: 36px;" --> ```cpp pexprOuter->AddRef(); pexprInner->AddRef(); pexprPred->AddRef(); CExpression *pexprResult = new CExpression(new TJoin(), pexprOuter, pexprInner, pexprPred); // add alternative to results pxfres->Add(pexprResult); ``` ---- ## Pros 1. No magic, what you see is what you get 1. "Drive stick": allows for micro-optimization, pre- C++ 11 ---- ## Cons 1. Manual manipulation of reference count permeates the code base 1. Lifetime manipulation becomes part of the API * local variables * return value of functions * function parameters * constructor initializers * class data members * elements in a container / array 1. Extremely hard to reason about locally ---- ## Cons (ex) We want these <!-- .slide: style="width: 109%;" --> ```cpp vector<CBitset> bitsets_; unordered_map<int, CExpression> plans_; ``` We get those ```cpp CDynamicPtrArray<CBitSet, CleanupRelease> *bitsets_; CHashMap<int, CExpression, CleanupDelete, CleanupRelease> *hm_; ``` --- # Vision 1. Smart pointer that provide ownership semantics 1. Express ownership in code 1. Automatically handle lifecycle 1. `std::move` for micro optimization ---- ## Instead of <!-- .slide: style="font-size: 85%;" --> ```cpp! class U { CExpression* m_pexpr; void SetExpr(CGroup* pexpr) { if (m_pexpr) m_pexpr->Release(); m_pexpr = pexpr; } U(CExpression* pexpr) : m_pexpr(pexpr) {} ~U() { if (m_pexpr) m_pexpr->Release(); } }; ``` Client code ```cpp U* bar(CExpression* pexpr) { pexpr->AddRef(); return new U(pexpr); } void foo(U* u, CGroup* pgroup) { pgroup->AddRef(); u->SetGroup(pgroup); } ``` ---- ## Ideally... Destructor is gone! `AddRef` is gone. ```cpp! class U { Ref<CExpression> m_pexpr; void SetExpr(Ref<CGroup> pexpr) { m_pexpr = pexpr; } U(Ref<CExpression> pexpr) : m_pexpr(pexpr) {} }; ``` Client code ```cpp Ref<U> bar(Ref<CExpression> pexpr) { return new U(pexpr); } void foo(U* u, CGroup* pgroup) { u->SetGroup(pgroup); } ``` ---- <!-- .slide: style="font-size: 85%;" --> ## Optimization For the performance-minded... ```cpp! class U { Ref<CExpression> m_pexpr; void SetExpr(Ref<CGroup> pexpr) { m_pexpr = std::move(pexpr); } U(Ref<CExpression> pexpr) : m_pexpr(std::move(pexpr)) {} }; ``` Client code ```cpp Ref<U> bar(Ref<CExpression> pexpr) { return new U(std::move(pexpr)); } void foo(U* u, CGroup* pgroup) { u->SetGroup(pgroup); } ``` --- # Road map 1. "Semantic marker" / annotation 1. local reasoning 1. One-shot conversion guided by annotations ---- ## Annotation Helper tags: ```cpp template <class T> using owner = T; template <class T> using pointer = T; ``` To apply annotation on: ```cpp U* foo(int, U*); ``` We might get ```cpp owner<U*> foo(int, pointer<U*>); ``` ---- <!-- .slide: style="font-size: 36px;" --> ## Conversion * We "just" need to get every one of the following (reference-counted) annotated * variable; * function return type; * function parameter; * class data member ---- * We can then perform a conversion of the annotated code: * substitute a raw pointer `T*` for `pointer<T*>` * substitute a smart pointer `Ref<T>` for `owner<T*>` * remove all `Release` and `AddRef` calls --- ## Before: ```cpp struct U : CRefCount<U> {}; U *F(); U *foo(int i, bool b, U *param) { U *u = F(); if (i < 42) { u->AddRef(); return u; } param->AddRef(); return param; } ``` ---- ## Annotate ```cpp struct U : CRefCount<U> {}; U *F(); owner<U *> foo(int i, bool b, pointer<U *> param) { pointer<U *> u = F(); if (i < 42) { u->AddRef(); return u; } param->AddRef(); return param; } ``` ---- ## Convert ```cpp struct U : CRefCount<U> {}; U *F(); Ref<U> foo(int i, bool b, U *param) { U *u = F(); if (i < 42) { return u; } return param; } ``` --- # Tooling * link to LLVM / Clang for access to the AST (abstract syntax tree) * simple rules for local reasoning (examples) * generate edits to source file ---- ## Example rule * A field (data member) released in destructor is an owner. When we match ```cpp struct R { T* t; ~R() { t->Release(); } }; ``` We annotate ```cpp struct R { owner<T*> t; ~R() { t->Release(); } }; ``` ---- ## Local reasoning * We only need to look at code from one translation unit * Often we only need to look at code around one variable, or within one function * This is not only good for tooling, but it's also better for humans ---- ## Bonus: propagation rule example * Propagation rule: a rule that matches not only code pattern but existing annotation * e.g. virtual functions share the same "ownership" signature (return type, parameter) ---- ## Bonus example (Contd) <!-- .slide: style="width: 110%" --> <table> <tr> <td style="width: 50%">Before propagation</td> <td>After One iteration of propagation</td> </tr> <tr> <td> ```cpp! struct Q { virtual U* foo(); virtual U* bar(); }; struct R : Q { owner<U*> foo() override; pointer<U*> bar() override; }; ``` </td> <td> ```cpp struct Q { virtual owner<U*> foo(); virtual pointer<U*> bar(); }; struct R : Q { owner<U*> foo() override; pointer<U*> bar() override; }; ``` </td> </tr> </table> --- # Scale In a half-a-million LOC code base, our tool changed around 35K lines of code. ``` git diff --shortstat 1682 files changed, 33104 insertions(+), 28274 deletions(-) ``` --- # Bonus * How to identify `std::move` (CFG) * Future work (DFA) --- # Identifying `move` opportunities * Observation: an owner can be moved on its definite last use * Construct a control flow graph (CFG) * Find last use in the basic block (BB) immediately before the exit block <!-- .slide: style="font-size: 30px;" --> Before ```cpp bool F(gpos::owner<T*>); bool bar(T* t) { return F(t); } ``` After ```cpp bool F(gpos::owner<T*>); bool bar(owner<T*> t) { return F(std::move(t)); } ``` --- # We can do better than CFG <table style="margin: 0 -10% 0 -10%; width: 120%"> <tr> <td> Given </td> <td> We want </td> </tr> <tr> <td> ```cpp bool F(gpos::owner<T*>); bool bar(T* t, int x) { if (F(t)) { return x < 42; } else { return x > 420; } } ``` </td> <td> ```cpp bool F(gpos::owner<T*>); bool bar(owner<T*> t, int x) { if (F(std::move(t))) { return x < 42; } else { return x > 420; } } ``` </td> </tr> </table> ---- ## Data Flow Analysis * What if the definite last use occurs too early and there are branches after it? We lose an opportunity to move. * We can taylor the CFG for each variable, contracting nodes that don't affect the usage of a variable, simplifying the graph ---- <table style="margin: 0 -10% 0 -10%; width: 120%;"> <tr> <td> Given </td> <td> CFG </td> <td> Simp. for `t` </td> </tr> <tr style="vertical-align: top; "> <td style="width: 40%"> ```cpp bool F(gpos::owner<T*>); bool bar(T* t, int x) { if (F(t)) { return x < 42; } else { return x > 420; } } ``` </td> <td style="font-size: 60%;"> ``` [B0 (EXIT)] Preds: [1,2] [B1] 1: `return x < 42` Succs: 0 Preds: 3 [B2] 1: `return x > 420` Succs: 0 Preds: 3 [B3] 1: F(t) Terminal: if [B3.1] Succs: [1,2] Preds: 4 [B4 (ENTRY)] Succs: 3 ``` </td> <td style="font-size: 60%;"> ``` [B0 (EXIT)] Preds: [1,2] [B3] 1: F(t) Terminal: if [B3.1] Succs: [0,0] Preds: 4 [B4 (ENTRY)] Succs: 3 ``` </td> </tr> </table>
{"metaMigratedAt":"2023-06-15T18:07:52.852Z","metaMigratedFrom":"YAML","title":"RefCount Refactoring","breaks":false,"slideOptions":"{\"spotlight\":{\"enabled\":true}}","contributors":"[{\"id\":\"793f9801-a67c-4d0c-bbd8-ec73a0604629\",\"add\":173478,\"del\":164878}]"}
    235 views