--- title: Interval Tree Clocks version: December 03, 2022 description: a novel clock mechanism that can be used in scenarios with a dynamic number of participants, allowing a completely decentralized creation of processes/replicas without need for global identifiers. license: CC-SA-2.5 --- # ITC -- Interval Tree Clocks[^1] [TOC] [^1]: “Interval Tree Clocks: A Logical Clock for Dynamic Systems - Uminho,” accessed December 3, 2022, https://gsd.di.uminho.pt/members/cbm/ps/itc2008.pdf. > source https://gsd.di.uminho.pt/members/cbm/ps/itc2008.pdf. **Interval Tree Clocks**, a novel clock mechanism that can be used in scenarios with a dynamic number of participants, allowing a completely decentralized creation of processes/replicas without need for global identifiers. The mechanism has a variable size representation that adapts automatically to the number of existing entities, growing or shrinking appropriately. There are two essential differences between ITC and classic clock mechanisms, from the point of view of our function space framework: * In classic mechanisms each participant uses a fixed, pre-defined function for id; in ITC the id component of participants is manipulated to adapt to the dynamic number of participants; • classic mechanisms are based on functions over a discrete and typically finite domain; * ITC is based on functions over a continuous infinite domain (R) with emphasis on the interval [0, 1); this domain can be split into an arbitrary number of subintervals as needed. The idea is that each participant has available, in the id, a set of intervals that can use to inflate the event component and to give to the successors when forking; a join operation joins the sets of intervals. Each interval results from successive partitions of [0, 1) into equal subintervals; the set of intervals is described by a binary tree structure. ITCs can be used in all these settings and will excel in dynamic settings, i.e. whenever the number and set of active entities varies during the system execution, since it allows localized introduction and removal of entities. Before ITCs, the typical strategy to address these dynamic settings was to implement the classical vectors as mappings from a globally unique id to an integer counter. The drawback is that unique ids are not space efficient and that if the active entities change over time (under churn) the state dedicated to the mapping will keep growing. This has lead to ad-hoc pruning solutions (e.g. in Dynamo) that can introduce errors and compromise causality tracking. ITCs encode the state needed to track causality in a stamp, composed of an event and id component, and introduce 3 basic operations: * _Fork_ is used to introduce new stamps. Allows the cloning of the causal past of a stamp, resulting in a pair of stamps that have identical copies of the event component and distinct ids. E.g. it can be used to introduce new replicas to a system. * _Join_ is used to merge two stamps. Produces a new stamp that incorporates both causal pasts. E.g. it can be used to retire replicas or receive causal information from messages. * _Event_ is used to add causal information to a stamp, "incrementing" the event component and keeping the id. * (_Peek_ is a special case of fork that only copies the event component and creates a new stamp with a null id. It can be used to make messages that transport causal information.) The trick for that is to base the clock on an Id that is divisible between nodes, such that any single member of the cluster can subdivide its own key space and hand a fraction of it to another one (as a fork), or to reunite any two of them together (as a join). ### ITC Example[^2] > Example from https://ferd.ca/interval-tree-clocks.html > [^2]: https://ferd.ca/interval-tree-clocks.html The first initial ID is always 1: ``` 1 = ████ ``` When that ID gets forked, it becomes two distinct halves: ``` {1, 0} {0, 1} ██▒▒ ▒▒██ ``` If I fork the one to the left again, I now have 3 IDs: ``` {{1,0},0} {{0,1},0} {0, 1} █▒▒▒ ▒█▒▒ ▒▒██ ``` These three Ids are still unique, and remain perfectly *mergeable* or <u>subdivisible</u> again. I could, for example, join the first ID with the third one and obtain: ``` {{0,1},0} {{1,0},1} ▒█▒▒ █▒██ ``` Where the second Id now absorbs the identity of both previous values. The ID is only half of the clock, though. The clock works by incrementing counters by basing yourself off the Id. Only when both components are around do you have a full interval tree clock timestamp. ![Example 1](https://d.pr/i/n1Elez.jpeg) --- ## Improved Core Operations > section from https://ferd.ca/interval-tree-clocks.html Add 2 new core operations to the fork, event, join > explode > rebuild ## On Time You may have heard "relativity means there is no such thing as simultaneity” used as an argument that synchronized clocks cannot exist. This is a misunderstanding: the equations of special and general relativity provide exact equations for time transformations, and it is possible to define any number of sensible, globally-synchronized time clocks. Consistency constraints that refer to these clocks will depend on the choice of clock, e.g. depending on one’s reference frame, a system might or might not provide linearizability.[^1] [^1]: Jepsen, n.d. Accessed December 5, 2022. https://jepsen.io/consistency#fnref-3 The good news is that > a.) for all intents and purposes, clocks on earth are so close to each other’s velocities and accelerations that errors are much smaller than side-channel latencies, > b.) many of the algorithms for ensuring real-time bounds in asynchronous networks use causal messages to enforce a real-time order, and the resulting order is therefore invariant across all reference frames. --- To implement this synchronous broadcast abstraction atop asynchronous networks, we introduce a class of protocols we call threshold logical clocks (TLC). Like Lamport clocks [57,82], TLC assigns integer numbers to communication events independently of wall-clock time. Also like Lamport clocks but unlike vector clocks [36, 37, 39, 62, 67, 82] or matrix clocks [34, 82, 87, 88, 102], all communicating nodes share a common logical time. Unlike Lamport clocks, which merely label arbitrary communication timelines, TLC not only labels but also actively paces communication so that all nodes progress through logical time in “lock-step” – although different nodes may reach a logical time step at vastly different real (wallclock) times. Nodes that fail (crash) may be conceptually viewed as reaching some logical time-steps only after an infinite real-time delay. ## Threshold logical clocks (TLC). The main purpose of TLC is to provide the *illusion* of lock-step synchrony that the TSB abstraction presents and that QSC relies on, despite the underlying network being asynchronous. Secondarily, a **TLC also conveniently provides the communication patterns needed to implement the threshold reliability that the TSB abstraction promises.** TLCR, a simple receive-threshold logical clock algorithm realizing TSB(tr, 0, 0). Afterwards, in Section 5.2, we discuss TLCB a broadcastthreshold logical clock algorithm building on top of TLCR to provide full-spread broadcast communication TSB(tr, tb, n). In Section 5.3 we then present TLCW, a witnessed-threshold logical clock algorithm implementing TSB(tr, tb, ts) communication, amending some of the restrictions of TLCB. Finally, in Section 5.4, we describe TLCF, which builds full-spread witness broadcast communication TSB(tr, tb, n) on top of TLCR and TLCW # Four contrasting notions of time We utilize four different conceptual notions of time, in fact, in different elements and for different purposes in the architecture: • **Real time**: Although consensus in our architecture is driven asynchronously purely by communication and requires no timeouts, we nevertheless use real or “wall-clock” time, as measured by each node’s system clock, to label blocks with the date and time they were committed, and to allow the timed release of contributions after designated moments in real time as described below. We assume that correct nodes’ system clocks are roughly synchronized purely for these block-labeling and timed-release purposes, but Byzantine nodes’ clocks may behave arbitrarily. • **Node-local log time**: For coordination and accountability purposes each node maintains its own tamperevident append-only log of all the nondeterministic events it observes, such as message arrivals, as described below. Each such event increments the node’s local log time by one, independent of real wall-clock time or other nodes’ log timelines. • **Vector time**: As nodes communicate and learn about new events in other nodes’ logs, each node i maintains an N-element vector of the most recent local log times it knows about across all nodes. This vector time [63, 66, 105, 114] represents the exact set of historical events across all nodes that are causally prior to the present moment at node i. Our architecture uses vector time to enable nodes to reason about what other nodes saw or knew at specific historical moments, and for systematic accountability via accountable state machines [78, 79]. • **Threshold logical time**: Finally, our consensensus architecture both provides and internally uses threshold logical time as a global metric of asynchronous communication progress across the (fastest threshold subset of) all N participants. --- ## Timecode - A Review > http://www.edlmax.com/EdlMaxHelp/Edl/maxguide.html#:~:text=APPENDIX-,Timecode%20%2D%20A%20Review,-Timecode%20is%20an Timecode is an electronic signal which labels video frames on video tape. The ideas for the format were invented by NASA for telemetry tapes tracking space missions. The concept was adopted by EECO in 1967 for video editing, and, after many folks had built similar (but in-compatible) systems, SMPTE created the standard in 1969. The European Broadcasting Union adopted the standard and it is now called SMPTE/EBU (referred to simply as "timecode"). Timecode may be recorded on tape in two ways: A) Longitudinal Time Code, or LTC, is recorded on an audio channel or a dedicated "address" channel. B) Vertical Interval Time Code or VITC, is recorded in the vertical interval of the video signal itself. These two formats are similar as far as the edit information they contain. Longitudinal Time Code (LTC) is an electronic signal that switches from one voltage to another, forming a string of pulses, or bits. Each one-second long chunk of this signal is "sliced" into 2400 parts (for NTSC) or 2000 parts (for PAL/SECAM). This makes 80 time code bits for each video frame: NTSC 2400 bits/sec divided by 30 frames/sec = 80 bits/frame PAL/SECAM 2000 bits/sec divided by 25 frames/sec = 80 bits/frame These 80 bits are given certain assignments according to the standard. There are these important groups of information: A) The hours/minutes/seconds/frames B) Drop-frame flag - Whether this timecode is Drop-frame C) Color-frame Flag - Whether color-framing is intended D) User bits - "leftover" bits available for user assignment. E) "Sync word" - Tells an electronic reader where the frame information begins & ends and which direction the tape is moving. F) "Sync bits" - Help verify (together with the sync word) the position of the data as the tape moves. VITC (Vertical Interval Time Code) format adds 10 extra bits to each frame label. These add the following to the LTC list, above: A) Additional "sync" bits between data bit groups. B) Field bit (allows video-field indexing accuracy). C) Cyclical Redundancy Check Code (CRC), used for error detection. The hours/minutes/seconds/frames is the primary information we are interested in. This data is Drop-frame, Non-drop frame or EBU (25 frames/sec, PAL/SECAM) as indicated by the "drop-frame" flag. The "sync word" and "sync bits" are the heart of how timecode works. As the tape moves, these bits instruct the electronic reader which direction the tape is moving, where the hours/minutes/seconds/frames, drop-frame bit, etc. are located and (if properly aligned on the tape) where the video frames themselves are located with respect to the timecode stream. All of this is pretty clever, when you think about it. The tape is moving at an unknown speed in either direction as the machine is used, yet the timecode can be read at all times (given a good recording and properly functioning equipment). Note that, at microscopic levels, the tape is jumping and jittering across the reader heads, stopping, reversing direction, going into fast-forward, etc. The sync words and the design of the readers makes it all work! Cool. ## MEV Applications Research into ensuring order-fairness in transactions seems more effective since it prevents reordering and mitigates order insertion, both of which are root causes for MEV. However, strict arrival order fairness is impossible, as shown by [19]. Instead, papers rely on weaker versions that do not prevent all MEV attacks. $[18,19]$ attain batch order-fairness, which states that if a majority of honest participants received $t_a$ before $t_b$, then $t_a$ must be in an older block or the same block as $t_b$. This, unfortunately, does not prevent reordering within a block, implying MEV extraction is still possible. [20] achieves order-linearizability, meaning that if all honest participants receive $t_a$ before any receives $t_b$, then $t_a$ must be ordered before $t_b$. Unfortunately, due to propagation time in the network, it is still possible for an adversary to send their transaction before all nodes have received $t_a$, and thus, can still be placed higher than $t_a$. ## 🚧 Appendix ### Ref. "Weak Consistency: A Generalized Theory and Optimistic Implementations for Distributed Transaction" *re: 3.2.5 Summary of Isolation Level* Atul Ady MIT 1999 https://pmg.csail.mit.edu/papers/adya-phd.pdf > Efficient Replication via Timestamp Stability > http://muratbuffalo.blogspot.com/2022/02/efficient-replication-via-timestamp.html Paper Reference ## Fork-Event-Join These Model Causality tracking mechanisms can be modeled by a set of core operations: fork; event and join, that act on stamps (logical clocks) whose structure is a pair (i, e), formed by an id and an event component that encodes causally known events. ### Fork **fork** The fork operation allows the cloning of the causal past of a stamp, resulting in a pair of stamps that have identical copies of the event component and distinct ids; fork(i, e) = ((i1, e),(i2, e)) such that i2 6= i1. Typically, i = i1 and i2 is a new id. In some systems i2 is obtained from an external source of unique ids, e.g. MAC addresses. In contrast, in Bayou [18] i2 is a function of the original stamp f((i, e)); consecutive forks are assigned distinct ids since an event is issued to increment a counter after each fork. ### peek **peek** A special case of fork when it is enough to obtain an anonymous stamp (0, e), with “null” identity, than can be used to transmit causal information but cannot register events, peek((i, e)) = ((0, e),(i, e)). Anonymous stamps are typically used to create messages or as inactive copies for later debugging of distributed executions. ### Event **event** An event operation adds a new event to the event component, so that if(i, e0 )results from event((i, e)) the causal ordering is such that e < e0 . This action does a strict advance in the partial order such that e 0 is not dominated by any other entity and does not dominate more events than needed: for any other event component x in the system, e 0 6≤ x and when x < e0 then x ≤ e. In version vectors the event operation increments a counter associated to the identity in the stamp: ∀k 6= i. e0 [k] = e[k] and e 0 [i] = e[i] + 1. ### Join join This operation merges two stamps, producing a new one. If join((i1, e1),(i2, e2)) = (i3, e3), the resulting event component e3 should be such that e1 ≤ e3 and e2 ≤ e3. Also, e3 should not dominate more that either e1 and e2 did. This is obtained by the order theoretical join, e3 = e1 t e2, that must be defined for all pairs; i.e. the order must form a join semilattice. In causal histories the join is defined by set union, and in version vectors it is obtained by the pointwise maximum of the two vectors. The identity should be based on the provided ones, i3 = f(i1, i2) and kept globally unique (with the exception of anonymous ids). In most systems this is obtained by keeping only one of the ids, but if ids are to be reused it should depend upon and incorporate both [2]. When one stamp is anonymous, join can model message reception, where join((i, e1),(0, e2)) = (i, e1 t e2). When both ids are defined, the join can be used to terminate an entity and collect its causal past. Also notice that joins can be applied when both stamps are anonymous, modeling in-transit aggregation of messages. --- At the junction of symbolic edit distances [1][3][5][7][11] and dynamic time warping measures [2][9][4][10] we propose a family of Time Warp Edit Distance that we denote $\delta_{\lambda, \gamma}$ to refer to the two parameters that characterize the family, namely the gap penalty $\lambda$ and the stiffness parameter $\gamma$. We first define $\delta_{\lambda, \gamma}$ and then we prove successively that whenever $\lambda \geq 0, \gamma>0$ 1. Proposition 1: $\delta_{\lambda, \gamma}$ is a distance metric. 2. Proposition 2: $\delta_{\lambda, \gamma}$ is upper bounded by twice the $L 1$ distance. 3. Proposition 3: $\delta_{\lambda, \gamma}$ is an increasing function of $\lambda$ and $\gamma$. 4. Proposition 4: Upper-bounding the distance between a time series and its piecewise constant polygonal approximation A. Definitions Let $U$ be the set of finite time series: $U=\left\{A_1^p / p \in N\right\} . A_1^p$ is a time series with discrete time index varying between 1 and $p$. We note $\Omega$ the empty time series (with null length) and by convention $A_1^0=\Omega$ so that $\Omega$ is member of set $U$. Let $A$ be a finite discrete time series. Let $a_i^{\prime}$ be the $i^{\text {th }}$ sample of time series $A$. We will consider that $a_i^{\prime} \in S \times T$ where $S \subset R^d$ with $d \geq 1$ embeds the multidimensional space variables and $T \subset R$ embeds the time stamp variable, so that we can write $a_i^{\prime}=\left(a_i, t_{a_i}\right)$ where $a_i \in S$ and $t_{a_i} \in T$, with the condition that $t_{a_i}>t_{a_j}$ whenever $i>j$ (time stamp strictly increase in the sequence of samples). $A_i^j$ with $i \leq j$ is the sub time series consisting of the $i^{\text {th }}$ through the $j^{\text {th }}$ samples (inclusive) of A. So $A_i^j=a_i^{\prime} a_{i+1}^{\prime} \ldots a_j^{\prime} .|A|$ denotes the length (the number of samples) of $A$. $\Lambda$ denotes the null sample. $A_i^j$ with $i>j$ is the null time series noted $\Omega$. An edit operation is a pair $\left(a^{\prime}, b^{\prime}\right) \neq(\Lambda, \Lambda)$ of time series samples, written $a^{\prime} \rightarrow b$.' Time series $B$ results from the application of the edit operation $a \rightarrow b$ into time series $A$, written $A \Rightarrow B$ via $a^{\prime} \rightarrow b^{\prime}$, if $A=\sigma a^{\prime} \tau$ and $B=\sigma b^{\prime} \tau$ for some time series $\sigma$ and $\tau$. We call $a^{\prime} \rightarrow b^{\prime}$ a match operation if $a^{\prime} \neq \Lambda$ and $b^{\prime} \neq \Lambda$, a delete operation if $b^{\prime}=\Lambda$, an insert operation if $a^{\prime}=\Lambda$. Similarly to the edit distance defined for string [11] , we define $\delta(A, B)$ as the similarity between any two time series $A$ and $B$ of finite length, respectively $p$ and $q$ as: $$ \delta\left(A_1^p, B_1^q\right)=\operatorname{Min} \begin{cases}\delta\left(A_1^{p-1}, B_1^q\right)+\Gamma\left(a_p^{\prime} \rightarrow \Lambda\right) & \text { delete } \\ \delta\left(A_1^{p-1}, B_1^{q-1}\right)+\Gamma\left(a_p^{\prime} \rightarrow b_q^{\prime}\right) & \text { match } \\ \delta\left(A_1^p, B_1^{q-1}\right)+\Gamma\left(\Lambda \rightarrow b_q^{\prime}\right) & \text { insert }\end{cases} $$ Where $p \geq 1, q \geq 1$ and $\Gamma$ is an arbitrary cost function which assigns a nonnegative real --- \tsb's promise that a message $m \in B$ will ``eventually'' reach at least $t_s$ nodes becomes unconditional, independent of node failure. This is because any failed node $j$ in that set conceptually {\em does} reach step $s$ and receive $m$, only at real-time $\infty$.