HackMD - Collaborative Markdown Knowledge Base

## General Response First of all, we would like to thank the reviewers for their time and effort put into our paper. We will first answer the overall concerns raised by the reviewers followed by individual comments per reviewer. Following reviewer A's suggestion, we summarized existing approaches together with our approach in the table below. The overall goal of our work is to avoid coordination whenever possible because it is costly. All existing approaches that support non-commutative operations guarantee *convergence* by coordinating non-commutative operations, or by resorting to a stronger consistency model. Our approach (ECRO) is the only one that allows non-commutative calls to execute without coordination, and instead, deterministically reorders them at each replica. As for application-level invariants, existing approaches coordinate unsafe operations to avoid conflicts (i.e. invariant violations) at runtime. In contrast, ECRO allows unsafe operations to run concurrently if a safe reordering of the calls exists. **Applicability of operation reordering.** In general, concurrent non-commutative operations do not require coordination but can be ordered deterministically. A second category are operations that lead to a state transition, in which some concurrent operations may become unavailable (e.g. `placeBid` is not available on a closed auction, `register` is no longer available once an online game started). Those conflicts can be solved without coordination by reordering unavailable operations before transitioning to the new state. Note that conflicts caused by mutually exclusive operations require coordination to restrict the operations from running concurrently, e.g. if usernames must be unique then the `registerUser(username)` operation must take a lock on `username` to avoid that the same username is being registered concurrently. **Comparison to Related Work.** Besides the differences mentioned before, we now briefly compare our approach to the approaches explicitly mentioned in the reviews. Most approaches (Indigo, RedBlue, IPA, PoR, Quelea, Q9) start from a replicated data type, e.g. a CRDT, and annotate it with invariants. However, building correct replicated data types (RDTs) is known to be difficult and error-prone [1,2,3] and those approaches do not aid with this task. In the case of CRDTs, the modifications required to make operations commute are usually data type specific. In contrast, ECROs allow programmers to start from an *existing* sequential data type and augment it with application-level invariants. The runtime algorithm *automatically* guarantees convergence and invariant preservation with minimal coordination. IPA studied coordination-avoidance techniques by tweaking the operations' implementation to make them invariant preserving. However, such a technique requires manual intervention and complicates the data type implementation even more. Other approaches (RedBlue, PoR, Indigo, Hamsaz, Hampa, etc.) statically detect unsafe operations and coordinate them to avoid conflicts at runtime, thereby impacting latency. **Novelty of Ordana's static analysis.** The key novelty of our static analysis tool is that it not only detects invariant violating operations, but also finds coordinationless solutions. Furthermore, Ordana's analyses consider equality relations between the arguments of method calls to derive the precise conditions for non-commutativity and invariant violations. This is crucial to achieve fine-grained locks and thus reduce overall contention between operations, and it was not considered by prior RDT designs. To illustrate why existing analyses are too conservative, consider a replicated auction system (cf. section 2.3). The invariants for `placeBid(auction, user, price)` are that the auction is open, the user exists, and the price is greater than zero. All approaches, including ECRO, statically detect a conflict between concurrent `placeBid(auction, user, price)` and `closeAuction(auction)` calls. Indeed, if an auction is closed and a concurrent bid on that auction arrives later, it violates the invariant that bids can only be placed on auctions that are open. As a result, existing approaches coordinate all calls to `placeBid` and `closeAuction`. This is too conservative because only concurrent `placeBid` and `closeAuction` calls that modify the **same** auction are conflicting, yet no calls to `placeBid` and `closeAuction` are allowed to run concurrently. Since our safety analysis considers equivalence relations between the arguments of the calls, it precisely detects the reason for this conflict, and finds that reordering the `placeBid` operation locally before the concurrent `closeAuction` operation solves this particular conflict. Hence, ECRO allows calls to `placeBid` and `closeAuction` to run concurrently, while related work requires coordination between these operations. | | Consistency Guarantees | Convergence | Invariants | |-|-|-|-| | **ECRO** | Hybrid | by totally ordering non-commutative ops | reorder conflicting operations and coordinate them if no safe reordering exists | | CRDT | SEC | by design | not supported | | Indigo | Hybrid | by relying on CRDTs | coordinate unsafe operations, or, repair broken invariants after the facts | | RedBlue | Hybrid | by coordinating non-commutative ops | coordinate unsafe operations | | IPA | SEC | by relying on CRDTs | modify operation's implementation such that conflicts are avoided | | CISE | Hybrid | by relying on CRDTs | determines consistency models that avoid the conflict | | Quelea | Hybrid | picks weakest consistency model that guarantees convergence | picks weakest consistency model that upholds the user-defined contract | | Q9 | Hybrid | picks weakest consistency model that guarantees convergence | picks weakest consistency model that avoids the conflict | | Hamsaz | Hybrid | by coordinating non-commutative ops | coordinate unsafe operations | | Hampa | Hybrid | by coordinating non-commutative ops | coordinate unsafe operations | ## Individual Responses We now address the remaining concerns of each reviewer individually. ### Response to reviewer A > One of my frustrations with the paper as it stands is that I'm left without a real understanding of why ECROs are superior to the collection of existing RDT approaches that they build on. ECROs improve on existing RDT approaches in two ways. First, ECROs allow programmers to turn any (existing) sequential data type into a correct replicated data type that guarantees convergence and upholds the specified invariants. In contrast, when using CRDTs, one is limited to the portfolio of CRDTs that can be found in the literature. If no existing CRDT suits her needs, she has to carefully design a custom CRDT which is far from trivial due to the reliance on commutativity. It should be noted that every CRDT is an ECRO because the operations commute and the operations have no invariants, hence, the ECRO algorithm allows concurrent operations to be delivered in different orders at different replicas. Other approaches such as Indigo, IPA, and CISE extend CRDTs with invariants but do not help programmers with devising a CRDT in the first place. Second, existing approaches are too conservative leading to unnecessary coordination as illustrated with the auction system example, thereby, impacting latency and availability. The crux of ECROs is that many conflicts can be avoided by locally reordering operations without requiring coordination. Still, ECROs provide the same guarantees (convergence and invariant preservation) as existing RDT approaches. > Shouldn't commutativity be "property of method calls" rather than "property of methods"? Indeed, we meant that commutativity is a property of method calls, that’s the essence of definition 3.4. Note, however, that if for some pair of methods (m1, m2) *all* calls to these methods commute, we say that those methods commute. > What do "concurrent method (call)" and "sequential method (call)" mean in the context of the paper? […] It looks like they are not synonyms for "independent method call" and "dependent method call" We should definitely clarify this in the paper. Two method calls c1, c2 are sequential iff c2 executed after c1 was observed. Two method calls c1 and c2 are concurrent iff none observed the other when they executed on their origin replica. In practice, every method call is tagged with a vector clock which allows the algorithm to determine the causality between them. Dependent and independent method calls are properties of *sequential* method calls. If c1, c2 are sequential method calls, then c2 may be dependent on c1 or it may be independent. For example, in a replicated auction system, replica A may `openAuction(auction1)` and then `placeBid(50$, auction2)`. Even tough `placeBid(50$, auction2)` happened after `openAuction(auction1)`, it does not depend on the former because it could also execute before opening auction1. On the other hand, consider `openAuction(auction1)` followed by `placeBid(40$, auction1)`. Now, `placeBid(40$, auction1)` depends on `openAuction(auction1)` because you cannot bid on an auction that does not yet exist. Detecting dependencies between sequential method calls is important because it allows the algorithm to ignore causal relations (which are a side-effect of communication) between independent calls. > since the paper says (lines 364-6, among other places) that the Ordana analysis specifically builds on Indigo and Hamsaz, why not compare with those systems? Indigo starts from eventually consistent RDTs (that already guarantee state convergence) and extend them with invariants. To this end, they propose to use CRDTs and annotate them with invariants. Hence, one cannot readily implement RUBiS (or any other RDT that features non-commutative operations) on top of Indigo. In an attempt to implement RUBiS, one may compose existing RDTs (cf. section 2.3) but invariants such as referential integrity (e.g. every bid must be associated to an existing user) span several RDTs. They thus require an atomic update across RDTs, something that is not yet known to exist. We did not compare to Hamsaz or Hampa because there is no publicly available implementation of their approach. Moreover, it is not trivial to interpret the mathematical assumptions necessary to implement the proposed protocols. Nevertheless, Hamsaz and Hampa are similar to RedBlue since they coordinate non-commutative operations and unsafe operations (cf. table in high-level response). Hence, the figures for those approaches would look similar to the figures reported for RedBlue. ### Response to reviewer B > The runtime cost of serializing potentially conflicting calls depends on ensuring that a replica's execution graph is causally stable, which in turn depends on the rate in which new operations are generated. The scalability arguments presented in the paper are too limited to assess the consequence of this tradeoff in practice. Causal stability is a well-known problem in replicated data types. The original CRDT proposal suffers from unbounded memory growth. Later, pure op-based RDTs [4] were proposed to avoid unbounded memory growth by taking into account information about causally stable operations. The ECRO algorithm commits method calls when they become causally stable, thereby, effectively removing them from the replica’s execution graph. How long it takes for an operation to become causally stable indeed depends on the rate at which operations are generated and disseminated. To speed up causal stability, replicas may acknowledge received operations by sending back their own vector clock; as such it is not a problem if one replica only generates operations every now and then [5]. Furthermore, our approach targets geo-replicated data centers for which we can reasonably assume good interconnectivity. Hence, it is unlikely that a replica’s execution graph grows unbounded. > The paper does not provide any substantive details about the specification language other than what is given in the example Due to space constraints we elided a description of the specification language to the appendix. > the paper makes no assumptions on the guarantees provided by the underlying distributed store. How would the generated results be affected by a store that provides stronger guarantees than just EC? We should definitely clarify our assumptions in the paper. The algorithm assumes that safe method calls are reliably propagated to all replicas exactly once, i.e. using a reliable causal broadcast mechanism as is often required by CRDTs. For unsafe operations, the algorithm relies on a locking mechanism to coordinate the calls using the fine-grained lock derived by the analysis. Our approach can still be implemented on top of distributed stores providing stronger guarantees since stronger guarantees will not hinder correctness. However, it will impact the latency of operations and the overall availability of the system as some safe operations may be upheld unnecessarily by the stronger consistency model. For example, disseminating safe operations under causal consistency unnecessarily delays the reception of independent but causally related operations. Indeed, if an operation `o2` happened after an operation `o1` the receival of operation `o2` will be delayed until we received `o1` even if `o2` does not depend on `o1`. > Finally, with respect to conceptualization, there have been several contemporary efforts at relating high-level invariants with replication behaviour. Besides [4] and [16], Enea et. al (PLDI'19) discuss the use of high-level specifications for verifying the correctness of CRDTS; Kaki et. al (OOPSLA'19) explore (bounded) verification of RDTs with respect to high-level specifications for a system implemented in OCaml. The goal of those works is fundamentally different from ours. Those works start from a replicated data type and seek to prove its correctness, whereas, our approach seeks to derive correct replicated data types from sequential code. As explained before, designing replicated data types is difficult and those approaches do not aid the programmer with this task. In contrast, ECROs enable programmers to turn existing sequential data types into replicated data types without modifications to the implementation. To do so, the ECRO algorithm requires information derived from several static analyses which share some characteristics with the analyses conducted in those works. ### Response to reviewer C > The authors claim their techniques work with any data type. I find this somewhat surprising, because some data types may have methods with very complex ordering constraints By combining reordering with coordination, our work is able to address the class of conflicts that can be solved by reordering (as detailed above) and fallback to coordination otherwise, making our approach generally applicable. > […] It is unclear whether the constraints of this data type's interface are expressible in the annotation language and whether they are solvable by the static analysis component. Our approach works for all data types whose invariants can be expressed using first-order logic. Related work showed that many useful invariants of RDTs can be expressed using first-order logic, e.g. refential integrity and numeric invariants. Works such as Indigo, IPA, Hamsaz, Hampa, Quelea, and CISE also employ first-order logic to express application-level invariants. If the solver is not able to analyze an invariant it will either abort or timeout. In that case we coordinate the operations to guarantee correctness. For example, if the safety analysis for a pair of operations `o1` and `o2` times out, we don’t know whether this pair is safe or not. Since we are uncertain, we consider it to be unsafe, thereby, preserving correctness. Similarly, if the commutativity analysis times out, we assume the operations to be non-commutative. Note that all RDTs presented in this paper are solvable by mainstream SMT solvers (e.g. Z3) because their invariants remain within the decidable fragment of first-order logic. >The authors claim their techniques work with any data type. I find this somewhat surprising, because some data types may have methods with very complex ordering constraints. [..] My concern here would be assuaged either by a clear characterization of the data types targeted by this work, a more extensive set of data types in the evaluation section, or a combination thereof. We agree that the paper should better convey the characterization of the data types so that it is clear that the implemented RDTs are representative of a great variety of RDTs. First, we cover existing RDTs (sets, maps, lists, etc.). Then, we also implemented new RDTs for which no prior $($C)RDT design exists as they require coordination, e.g. stacks and queues. Finally, we implemented custom replicated applications such as the RUBiS auction system from sequential data types only (without having to build ad-hoc RDTs). We believe that the current portfolio of data types demonstrates the flexibility of our approach as it does not require user-defined merge procedures nor data type specific modifications to make the operations commute and preserve invariants. > It would be great to see more details about the complexity of the data types, the complexity of the annotations, and the performance characteristics of all replicated versions of these data types (in the sense of the geo-replicated benchmark of RUBiS). It would also be good to get an idea of how close to the optimum performance the resulting RDT is, for a variety of RDTs (>8). We will submit the implementation and specification of all replicated data types mentioned in this paper as an artifact (including the benchmarks) should the paper be accepted. Regarding their performance characteristics, we could benchmark each RDT individually, however, they all share the same pattern: side-effect free methods and commutative methods exhibit low to no overhead and thus have latencies that are close to optimum (i.e. similar to the sequential implementation), for non-commutative operations the latency decreases with the number of operations in the execution graph (cf. figure 4 and its explanation in section 6.3) which can be kept reasonably low by committing causally stable operations. ## References [1] Almeida, P.S., Shoker, A., Baquero, C.: Efficient State-based CRDTs by Delta- Mutation. In: Bouajjani, A., Fauconnier, H. (eds.) Int. Conference on Networked Systems. pp. 62–76. Springer-Verslag, Agadir, Morocco (2015) [2] Kleppmann, M., Beresford, A.R.: A Conflict-Free Replicated JSON Datatype. IEEE Trans. on Parallel and Distributed Systems 28(10), 2733–2746 (2017) [3] Shapiro, M., Pregui ̧ca, N., Baquero, C., Zawirski, M.: Conflict-free Replicated Data Types. In: Défago, X., Petit, F., Villain, V. (eds.) 13th Int. Symp. on Stabilization, Safety, and Security of Distributed Systems. pp. 386–400. SSS’11, Springer-Verslag, Grenoble, France (2011) [4] Baquero, Carlos, Paulo Sérgio Almeida, and Ali Shoker. Pure operation-based replicated data types. arXiv preprint arXiv:1710.04469 (2017). [5] Bauwens, Jim, and Elisa Gonzalez Boix. From causality to stability: understanding and reducing meta-data in CRDTs. Proceedings of the 17th International Conference on Managed Programming Languages and Runtimes. 2020.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.