OOPSLA 24 Rebuttal

We thank the reviewers for their detailed and insightful reviews. We first answer the individual questions of the reviewers, and then propose a changelist based on their suggestions.

Response

Reviewer A

Sources for incompleteness

The reviewer is correct in that \(\phi_{src}\) and \(\phi_{dst}\) would flag certain programs as non-robust even though they may actually be robust, since they do not consider the scenario where a non-maximal write \(\mu_w\) may be in the \(rf\) relation to a CAS event, and hence cannot cause a robustness violation if the next event (say \(\mu\)) to be added is a write event to the same location as \(\mu_w\), and the maximal write is in \(hb_{SC}\) order to the session containing \(\mu\). Below, we have shown a simple litmus test to realize this behavior. For every other scenario, \(\phi_{src}\) and \(\phi_{dst}\) models the absence of a non-robustness witness faithfully. These conditions would be violated if the maximal write to a location is in non-\(hb\) \(hb_{SC}\) order to a session just before another access to the same location. In every other scenario except the one described above, if \(\phi_{src}\) and \(\phi_{dst}\) is violated, it would imply the presence of a non-robustness witness.

We have observed that in library implementations, either all writes to a location happen through CAS operations, or none do (i.e. it doesn't happen that there is both a CAS and a normal write operation to the same location). Notice that a CAS access to a location is required to manifest the above scenario. If all accesses to a location are through CASes, we use the CAS constraints as explained in Sec. 7 (L:983) to encode additional properties, which helps us avoid any false positives arising out of the imprecision of \(\phi_{src}\) and \(\phi_{dst}\).

We also note that it would be a bit difficult and cumbersome to faithfully encode this scenario in \(\phi_{src}\) and \(\phi_{dst}\), since they do not model the event non-maximal write \(\mu_w\). We will include this discussion in the revised version.

T1          | T2                | T3
---------------------------------------------
W(l1,1,rlx) | U(l1,1,2,rlx,rlx) | R(l2,1,rlx)
              W(l2,1,rlx)       | W(l1,3,rlx)

The litmus test which concretely demonstrates the scenario is given above. The update operation to l1 in T2 reads from the write to l1 in T1. The read in T3 to location l2 reads from the write in T2, hence the maximal write to l1 (which is the update in T2) is in \(hb_{SC}\) order, but not in \(hb\) order since the accesses to l2 are relaxed. Now consider the write to l1 in T3. Excluding this write, the execution is not a non-robustness witness, because even though the maximal write to l1 is \(hb_{SC}\) order but not in \(hb\) order, W(l1,3,rlx) cannot be placed in \(mo\) order between W(l1,1,rlx) and U(l1,1,2,rlx,rlx). Otherwise, this would induce an fr;mo cycle between W(l1,3,rlx) and U(l1,1,2,rlx,rlx). Hence, the only possible \(mo\) relation is from U(l1,1,2,rlx,rlx) to W(l1,3,rlx), which makes the execution robust. However, neither \(\phi_{src}\) nor \(\phi_{dst}\) are satisfied.

Examples that were studied in the previous papers [35,38]

We do not anticipate any difficulties in verifying the benchmarks from [35,38]. We have already verified Peterson's lock, Dekker's algorithm, Chase-Lev queue using the same sources as [35,38]. We will extend our evaluation to include the Spinlock, Seqlock and Atomic Reference Counter (ARC) implementations in the papers.

Fully automated or not

The approach is fully automated; one only needs to provide the library implementation in C and a set of axioms in textual format. If the verification fails with a SAT result in the time limit, it implies that there exists a finite counter-example which is output and the programmer may add additional axioms to rule out such cases. In our evaluation, we could verify all librarires by just using the set of axioms we propose in Section 8, L1037:L1063.

non-atomic accesses

The memory model we considered does not support non-atomic accesses. None of our benchmarks used non-atomic accesses.

…whether/how release/acquire fences are supported…

Release-acquire fences are supported in our method, and their encoding shows up in the definition of sw at L443. Our SMT implementation instantiates these sub-relations, any time an acquire/release fence instruction is instantiated as part of a subgraph. Our benchmarks do make use of acquire/release fences and thus trigger this SMT encoding.

Definition 4.2 talks about the next event. I don't see why such an event is unique…

The next event of \(E'\) is in the context of a fixed complete execution \(E\), whose prefix is \(E'\). Notice that in \(E\), an interleaving among the threads has already been fixed, and hence for a given prefix \(E'\) of \(E\), the next event is guaranteed to be unique. Def. 4.3 and the conditions \(\phi_{src}\), \(\phi_{dst}\) are with respect to a fixed complete execution \(E\), and Theorem 4.6 (and Theorem 4.4) then universally quantifies over all complete executions \(E\), thereby considering all possible interleavings.

Thm. 4.6 is very strange.

Thm. 4.6 directly establishes the soundness of our main algorithm (Fig. 9), which separately checks for \(\phi_{src}\) and \(\phi_{dst}\). Because we are approximating over unbounded executions using the specification axioms, we often find that for some locations in a library, \(\phi_{src}\) can be inferred using the axioms, but not \(\phi_{dst}\). This does not mean that \(\phi_{dst}\) does not hold, but rather that we get a spurious counter-example. Hence, Thm. 4.6 includes the disjunction of both \(\phi_{src}\) and \(\phi_{dst}\). An alternative would be to separately show \(\phi_{src} \implies \phi_{dst}\), and only include \(\phi_{dst}\) in Thm. 4.6, but we believe this would slightly break the close correspondence between Thm. 4.6 and our main algorithm.

Supplementary material:

Dekker's lock is missing an exit method?

We have implemented a critical section algorithm using Dekkers, thus it would be better to rename the method as critical_section. We will clarify this point in the final version of the paper and explain it better in the artifact submission.

Peterson's algorithm has typos

We agree that the Peterson's lock code provided in the supplemental material has a typo. The version that is verified, however, is the correct version that corresponds to the implementation in the artifact of [38]. We will fix the typo in the revised version.

in Chase-Lev it seems that the client doesn't have any concurrency.

The client programs provided in the artifact were only for a sanity test to be able to run the benchmarks. We do not use the client program in any way, and only take as input the actual library implementation.

In the implementation of Chase-Lev it seems that additional SC-fences are placed

We have followed the implementation as provided by [38]. We will explicit mention the placement of additional SC fences to make it robust.

Reviewer B

Is your method actually guaranteeing DRF? If yes why did you choose to go with the more involved notion of execution graph robustness?

Execution graph robustness is a more precise condition than DRF, as mentioned in [38], we cite verbatim - "A (very imprecise) robustness criteria against a programming language memory model emerges from the well-known DRF guarantee […], and our non-robustness witness can be viewed as a precise DRF condition".

Moreover, we verify lockfree library implementations which are inherently racy and use low-level synchronization constructs such as CAS, thereby violating the DRF requirement.

Suppose that instead of induced subgraph robustness, you used the notion of observational robustness, with ignoring reads that do not affect the return value. What would go wrong? Would the results of your experiments change?

It is not clear how one would go about using observational robustness and discarding writes that don't affect the return value, because the current technique for verifying observational robustness [38] uses model checking to exhaustively explore complete execution graphs, and cannot be used for library executions in the presence of a most general client which can make an unbounded number of invocations.

Reviewer C

Two sorts of important libraries, the string and the linked list, are missing

We have verified the non-blocking set library from [27], which internally uses linked lists with operations such as insertion and deletion anywhere within the list. We are not aware of any concurrent string library, but are open to adding it to the evaluation, if a reference is provided.

..I wonder whether the proposed approach for RC20 can be adapted to more weak memory models, and maybe any WMMs that can be declaratively formalized.

Since our robustness verification approach relies on identifying executions where two events are in \(hb_{SC}\) order but not in \(hb\) order, it can be applied for other formalisms of the C11 memory model such as RC11 and IMM. In general, if a WMM model satisfies the condition that all \(hb\) executions of an execution can be searched by inductive search over the SC executions, then identifying certain sub-structures in the execution graph (say for C11, this is a \(hb_{SC}\) but not \(hb\) relation between events) can lead to the formulation of the robustness violation for the specific memory model. Thus, our approach of using the SC specifications to constrain this search would be applicable.

I reckon that not all robustness violation are problematic, so have you manually checked if the libraries proven to be robust are actually correct?

The presence of a robustness violation implies that it has behaviors under the weak memory model, which are not observed under the SC memory model. Whether such a behavior is correct or not, presumes that there is a specification of the library under weak memory which permits this behavior. Specifying libraries under a weak memory model is an orthogonal problem to ours, and there are indeed other works which have looked at specifications under weak memory (e.g, [42]). The advantage of our approach is that it enables automated verification, and the correctness of an application using a robust library (provided that the application correctly uses SC fences at library boundaries) can now be reasoned about under SC, which is a significantly simpler problem.

I wonder if your method could achieve 'vertical' compositionality…

If a library calls another library, we can in principle inline the call and apply our technique for the larger monolithic library. For compositional reasoning, we would need the caller library to place additional SC fences as described in Sec. 6 to ensure end-to-end robustness. It may still be possible to use robustness guarantees of the callee library to ensure robustness of the caller without the use of fences; we leave this as part of future work.

Change List

Add a discussion in the paper about incompleteness of our approach, and the gap between definition of non-robustness witness and the conditions \(\phi_{src}\), \(\phi_{dst}\).
Extend the evaluation to include all benchmarks from [35,38].
Correct identified typos in the main paper and supplementary material.

End of rebuttal

The initial value of location X is set to 0. T1 tries to CAS the location X to 1. T2 reads and writes X+1 back. And T3 tries a CAS loop until X is set to 2. In the SC case, the possible values stored in X are 2 or 3. In the weak case also, the possible values are 2 or 3. Thus the litmus test is robust, but our criteria will reject this example, because in the SC execution where the events of T1 are in \(hbSC\) order to T2, the maximal write is Write(X, V+1), but that is not in \(hb\) order to the next event, CAS(X,V,2). But, from the example we know that the CAS retries, if the read to V in T3 read the old value of X, before the CAS was tried.

X = 0

T1             | T2              | T3
----------------------------------------------------------
CAS(X, 0, 1)   | V = READ(X)     | V = READ(X) 
               | WRITE(X, V+1)   | WHILE(~CAS(X, V, 2)) {}

Having said that, conceptually, if we can somehow apply observational robustness to show that the \(\rho_{CAS}\) transformation is robustness-preserving, i.e. for every RC20 execution of the original library, there is an SC execution of the transformed library with the same invocation events and an induced subgragh, then we can use observational robustness. Induced subgraph robustness gives us a better and cleaner approach for proving soundness of the \(\rho_{CAS}\) transformation.

corrected version -

T1          | T2                | T3
-------------------------------------------
W(l1,1,rlx) | U(l1,1,2,acq,rel) | R(l2,1,acq)
            |  W(l2,1,rel)      | W(l1,3,rlx)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.