Untitled - HackMD

We thank the reviewers for their detailed and thoughtful comments.

Overview

Our response primarily focuses on (i) the technical questions posed by Reviewer-A; (ii) the relationship of our work with DRF-SC proofs as questioned by Reviewer-B; and (iii) Reviewer-C's questions on how our technique compares to model-checking approaches for weak memory.

Proposed Changes

In addition to addressing all the technical clarifications and minor typos raised in the reviews, our revision will provide a more detailed in-depth comparison with both DRF-based verification techniques as well as model-checking approaches developed for weak memory systems. We will also move salient material from the appendix to the main body of the paper as suggested by Reviewer-A.

Response

Review-A

Dependence on specifications: In order to discharge verification conditions, we typically require \(hb\) relations to be established among events. Some of them can be automatically inferred through static analysis (e.g. events in the same invocation, or update events caused by CAS statements). However, in some cases, we do require information about LPs. For example, in the register library, since there are no CAS statements, and we require \(hb\) information between events in different invocations, LPs provide the necessary information required to establish correctness.

Client locations: We require that a library implementations "owns" its accessed locations, and that they should not be accesible by clients. This is consistent with the assumptions made by previous work related to library verification [e.g., Raad et. al 2019], and is necessary for modular verification.

C11 with LB: Allowing LB introduces \(so \cup rf\) cycles which essentially allows read events to read from write events in the future (according to \(hb_{SC}\)). Our verification technique is based on maintaining equality between \(hb\) and \(hb_{SC}\) among write events in the past, which ensures robust behavior under RC20. Hence, our technique, as currently defined, cannot be used to detect LB type violations. We will add this requirement in the revised version. We leave the question of whether the libraries proven correct under our current method are indeed robust under C11-semantics with LB as a topic for future investigation.

Method calls inside other methods: We note that we only disallow method calls from inside other methods to simplify our formal treatment. Because our verification conditions operate at the lower granularity of memory events, it is not sensitive to whether these memory events are generated directly by statements inside a method body, or by a method call inside the body. In either case, our verification technique would still search for a potential or actual non-robustness witness.

Minor Comments:

Fig 3: Yes, there will be another case where \(\mu\) is W<L,> and \(hb_{SC}\) edge is added from \(\mu_w\) to \(\mu\) through the \(mo\) edge. For this case, \(\Phi_{src}\) is trivial to establish, because \(\mu\) would actually be the most recent write to \(L\), and there would be no \(hb_{SC}\) path from \(\mu\) to any other event in the prefix. Due to the triviality of this case and lack of space, we did not include it. And yes, all the cases including \(so\) are also trivial, and hence not included. We will mention the missing cases in the revised version.
374: This simply means that a session can only have one invocation running at a time. Once an invocation begins, it must complete by executing its return statement, before the next invocation can run in the same session. This is formalized in the INVK rule in the supplemental material, which requires the session \(\tau\) to not be active.
In the INVK rule, \(l\) should be \(m\), as rightly pointed out by the reviewer.
689-692: The \(\mu\) event in the definition of the potential non-robustness witness is essentially the event which causes the maximal write event \(\mu'\) to come in non-\(hb\) but \(hb_{SC}\) order before the session containing \(\mu\). In the definition of actual non-robustness witness (Def 4.3, L-593), we only require \(\mu'\) to come in non-\(hb\) \(hb_{SC}\) order before some event of the session \(\tau\) (in condition-1). Hence, the event \(\mu\) itself does not feature in Def. 4.3; its role is to just ensure conditions 1 and 2.
692: The reviewer is correct - \(\mu_w\) should be replaced by \(\mu'\).
879: The first condition ensures that for every RC20 execution of the original implementation \(L\), there is an RC20 execution of the transformed implementation \(\rho(L)\) with the same observable behavior. Since our goal is to show that \(L\) is effect robust, we need to show that for every RC20 execution of \(L\), there is a SC execution with the same observable behavior. Condition-1 allows us to instead consider RC20 executions of \(\rho(L)\). By showing \(\rho(L)\) to be execution graph robust, we ensure effect robustness of \(L\) (this argument is formalized in the proof of Theorem 5.4 in the supplemental material).
Fig 8: Yes, it should be \(hb^?\). However, the \(hb\) in Fig. 8 is actually the binary FOL predicate, and hence writing \(hb^?\) (which is supposed to denote the reflexive closure of \(hb\)) is unnecessary. In the actual encoding, we separately handle this case by constraining the two events \(\eta(i_w,l)\) and \(\eta(i_1,l_1)\) to be equal.

Review-B

Comparison with DRF: The DRF property associated with a relaxed memory model typically asserts that if a program does not have races under SC (where a race can be defined either operationally or declaratively) then the program will not exhibit new behaviors under the relaxed memory model. As observed by the reviewer, this closely resembles the definition of execution-graph robustness, and indeed prior work ([Lahav and Margarit, PLDI 19], [Margarit and Lahav, POPL 21]) have formally proved a correspondence between the two definitions (see e.g., p14 of the POPL'21 paper). In this work, however, our goal is to automatically verify that a library implementation satisfies this property across all of its (unbounded) executions. Towards this end, we propose a novel inductive technique that crucially defines both (a) a general inductive invariant to handle unbounded executions and (b) a verification strategy that takes advantage of the high-level SC specifications of the library that helps to strengthen the inductive hypothesis and reduces the space of possible executions that need to be examined. We are unaware of any prior DRF-based verification technique that provides an automated verification methodology in this setting.

Moreover, our definition of effect robustness is specifically tailored for dealing with libraries, which often may have benign races that do not affect its observable behavior; effect robustness allows us to ignore such races, which, in turn, leads to important precision gains.

Specification language: Our technique establishes effect robustness, which ensures that every observable behavior of the library implementation (in terms of invocation argument and return values) under RC20 should also be possible under SC (L-834, Def. 5.1). Towards this end, we only consider robustness of events that contribute (directly or indirectly) towards the return value. Hence, the example provided by the reviewer is indeed effect-robust, since the return values of both methods are independent of all the other operations. Because the 'fail()' operation is an internal library event, it does not have any impact on observable behavior.

We specify correctness of SC executions using only the invocation events in an execution (Line 545, Def. 3.2, notice that each axiom \(\mathcal{A}_i\) is only applied on \(\Gamma\), the set of invocation events). As such, specification axioms can only constrain argument and return values, as well as the happens-before ordering among invocation events. We will expand on this description in the revised version.

For specifications involving unreachability of a program point, if it can affect the observable library behavior, then our technique will indeed be applicable. In the example provided by the reviewer, if the fail() location affected the return value of the method, then the program would be neither effect-robust nor execution-graph robust, and this would be detected by our tool in the form of an explicit non-robustness witness.

Review-C

We note that the primary contribution of this work is a fully automated technique to verify robustness of library implementations assuming the most general client. We are unaware of any prior work that automatically establishes robustness of arbitrarily long executions involving an unbounded number of threads, only given an SC correctness specification.

Novelty of the proposed work: While the definition of C11 robustness facilitates incremental verification, all prior works targeting library robustness (including those given by the reviewer) only perform bounded verification, assuming a constant number of threads. Hence, we do not believe they can be directly used for verifying library implementations in the presence of a most general client. Further, performing bounded verification for library implementation is unsound, as the library may manifest non-robust executions depending on a certain number of threads/invocations. As a concrete example, note the implementation in Fig. 7 in the paper exhibits a non-(execution-graph) robust execution with at least 4 invocations. Running this program on existing model-checking tools assuming e.g., only 3 invocations would not reveal the robustness violation.

As a more realistic example, we created a simple concurrent set library implementation to demonstrate this problem. The set is implemented as a sorted linked list. The add operation is implemented as a search for the correct position in the list, followed by the allocation of a new node; field updates are implemented using CAS operations. Our test program creates a designated thread to check membership and multiple concurrent threads to add elements. We run this program through GenMC (https://plv.mpi-sws.org/genmc/) a state-of-the-art stateless model checker for weak memory systems to check for robustness violations. We note that if the CAS operations are relaxed (not acquire release), then in two threads no robustness violations are observed, although a violation is detected when using 3 threads. In the two thread case, the thread executing the add operations induces a total order on its CAS operations (due to session order). But in 3 threads, concurrent CAS operations in the 2 concurrent adding threads do not form a total order. We highlight that we directly run the model checker as-is on the library implementation and the test code, without any changes on our end. The implementation and results of running the model checker on the implementation can be found at this anonymized link - https://github.com/m5vnv4sjvtc/automated-robustness-rebuttal. The output of our tool, which successfully identifies the robustness violation, along with an explanation of its results is also provided.

Finally, we also note that performing incremental robustness verification in the unbounded case is not straightforward, as we cannot directly use robustness as an inductive invariant (see Line 660). Our main insight is to strengthen the inductive hypothesis with two conditions \(\Phi_{src}\) and \(\Phi_{dst}\), which together ensure robustness at both source and destination to make it inductive. Our second important insight is to demonstrate that both these conditions can be automatically inferred from the high level SC specification of the library.

Silent actions: These correspond to actions in an invocation that do not access the shared memory (Line 377). These actions will occur in the program semantics transition system (\(\Omega_L\)), and not in the memory model transition system.

States in \(\S 3\): The states of the program semantics LTS maintain necessary bookkeeping information such as active invocations per thread and local variables valuation per invocation. The LTS is described in detail in the supplemental material. The states of the memory model LTS are just execution graphs; this is the standard LTS formulation used in previous work.

Well-formedness conditions: We do require \(rf\) to be functional and total in its range. We thank the reviewer for pointing out this omission; we will be sure to add this in the revised version.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.