We thank the reviewers for their detailed and insightful comments. For the revised version, we propose to (i) improve the presentation with simpler examples and more intuition; (ii) expand the evaluation section with additional details on metrics, queries, solution analysis, limitations, and scalability characteristics.
**Context** (All): We assume a library whose methods have fine-grained (likely complex) specifications, presumably generated to aid in its verification. Thus, unlike other component-based synthesis efforts in which there are a large number of potential candidates with simple specifications (aka types) that are filtered through the use of additional constraints like input/output examples, our setting involves searching a sparse space of possible solutions, leveraging only the provided rich specifications to derive a solution, leading to a significant shift in perspective and approach.
*All* the benchmarks in our evaluation were taken from verified libraries whose specifications were provided by the library authors, not us (Sec.5.1) Cobalt is intented to be used for synthesis tasks against any library that provides effectful specifications (e.g.,VOCAL (https://github.com/ocaml-gospel/vocal), JS libraries specified using F*[28], etc.) The only input required from the user is the specification of the synthesis goal.
**Cobalt Output** (All): Fig. 10 shows the size of the synthesized result in terms of #AST nodes. The number of components generated range from 2 to 6, results that are comparable to other systems [8 (Fig.8), 12 (Table-1)].
**Scalability** (All): To address scalability concerns, we conducted two experiments: (1) in the presence of fine-grained specifications: We reran all our experiments in the context of a single concatenated library from all three domains (45 methods total) and recorded a max 6% increase in synthesis times, with 71% of the queries showing no change; and (2) in the presence of libraries whose methods have trivial specifications: We imported a purely functional library from H+[11], a component-based synthesis tool for Haskell libraries, with a total of 105 functions, none of which have effectful specifications, concatentated these functions with the effectful methods in our existing libraries (150 methods total), and reran our benchmarks. The max increase in synthesis times was less than 10% with 67% of the benchmarks showing no change.
**Reviewer-1:**
_Bw-call rule applications and chaining:_ The backward call rule also **applies** on effectful components, for instance the database benchmarks (D3, D4, D5), Imperative benchmarks (I3). In principle, if a library spec is precise enough to imply the given goal post-condition in an environment the backward call-rule can be applied. Many benchmarks sequence multiple method calls in a form where backward-reasoning is applicable - all the black bars in Fig.10 represent cases where BW-alone can find a solution.
_Framing:_ Please see L320-329 for motivation.
_Aliasing:_ Imperative OCaml libraries assume arguments are not aliased (see e.g., Jane Street libraries (https://opensource.janestreet.com/core), obviating the need for separation formulas in specifications, allowing a simpler disjointness qualifier check to find the frame for a given method call.
_Choice of per-component conflicting predicate and CDCL-experience:_ The reviewer's conjecture is correct; our choice reduces the number and complexity of generated SMT queries.
_Query Selection:_ For some queries, we use the specifications of verified program fragments as queries (like in Table, Firewall and Newsletter), for others we use the standard definitions of correctness (e.g Parsers), while we also invent a few meaningful queries (e.g. Queue queries)
_Other-metrics:_ Our results do not report constructors which are: databases: 4, imperative: 3, parsers: 5.
Boolean valued functions are: databases: 5, imperative: 7, parsers: 3.
There was a 10 minute timeout; experiments were executed on a standard Linux laptop with 16GB memory.
**Reviewer-2**
_Usability:_ We view the challenge of writing meaningful effectful specifications as orthogonal to our contributions, but note that using input/output examples as a proxy for such specifications is non-trivial - e.g., it's not apparent to us on how we would write test cases that capture effectful specifications that use predicates like mem and size (line 245).
_Novelty of CDCL:_ The application of CDCL is novel because of the a) complexity of formulas learnt and discharged to solvers, i.e. propositional formulas over components in [6] vs formulas from EUFA theories (see Fig 6 and CDCL_LERAN rule in Fig 9); b) need to prune infeasible paths using a path sensitive implication check (CDCL_CHOICE rule Fig 9).
_Comparison with other-tools:_ We tried to make a closer comparison with other component based synthesis tools. Most feasible was the H+[11], since its pure types are close to Cobalt's. Unfortunately, using Cobalt on their benchmarks is not meaningful since their libraries are not effectful. Conversely, applying their tool on our libraries was problematic since it operates over a predefined set of libraries that cannot be extended (based on the communications with the authors.)
_Run without CDCL:_ Experiments on the NO-CDCL version, which we'll report in the revised version, is substantially less efficient than the CDCL, either unable to find a solution, or an order of magnitude slower.
**Reviewer-3**
_Small-footprint:_ All our specifications indeed are local, e.g., see Fig 1, where add_tbl only specifies changes to size and membership, but does not specify the minmax property. However, because we did not write any of the specifications for the libraries, this claim is ancedotal; we'll clarify this point in the revision.
**Reviewer-4**
_Tests for Synthesis:_ An example of an unsound program for the goal2 in Fig 1 would be _"add (s)"_ that violates the uniqueness constraint for the add_tbl library. Unfortunately, writing tests to filter this program is challenging since a) the user must now be aware of the specifications for each library function, b) must write tests that differentiate each library method's precondition.
_k-value:_ Yes, we used k=5 for all experiments.
_Formalization_ : Section 4.2 gives soundness and completeness arguments (with definitions for stuck-nodes and full proofs given in the Appendix).
_CDCL_CHOICE:_ We bias the choice with maximum overlap with the current goal specification, where overlap is calculated using the set of Qualifiers in the specifications. This heuristic efficiently filters unrelated functions.