FSE 2022 rebuttal

# FSE 2022 rebuttal # Reviewer A ## Q1: Why is the relation between each model and its implementation (correct?, complete?) not a factor that may significantly influence your findings? The relation between models and their implementations is undoubtedly a very significant factor in every work. In this work, models as _executable instances_ of process and controller. * In the MIL setup, the model of the process is a set of equations that describe the process' physics and the model of the controller is a set of equations that describe the evolution of the actuation variables given the sensed data. * In the SIL and HIL setups, the software is the actual software running on the drone, and hence there is no controller model. However, the physics model implementation is exactly the same as in the MIL setup. The _unique_ model of the physics is considered a good approximation of reality (correct, but not necessarily complete). Unique is meant in the sense that we use the [same implementation](https://github.com/dummy-testing-abstractions/cps-testing-abstractions/blob/main/testing-frameworks/mitl/Model.py) of the physics model for the MIL, SIL, and HIL setup. Hence, the model does not influence our comparison between MIL, SIL and HIL; this was an explicit choice. The implementation with the PIL setup is also used to confirm that the physics model is a good enough approximation for our testing purposes. ## Q2: What is the evidence that the common understanding of the three abstractions you study is a hierarchy? We first and foremost note that the previous literature uses the naming "testing levels" for the different setups. Since set of levels is an ordered set, it implies a hierarchy. For this reason we explicitly avoided using "testing levels" and talked about "setups" and "frameworks". The hierarchical nature of the testing setups (levels) is implied in the following publications: * [85, pp. 13-14] includes a discussion of how each testing setup adds detail to the testing capabilities (rather than complementing), * [36, pp. 3] discusses the increasing level of integration of the different testing setups, * [35] and [74] discuss the re-use of test cases across testing setups, and their incremental nature in approximating the real-world behavoiur, * [This work](https://ieeexplore.ieee.org/document/5381627) makes the followign statement on page 2: _With every new test level, the test object becomes more similar to the real system._ # Reviewer B ## Q1: Why your empirical study is only on one CPS? Our target domain is _control_ CPS. Every control application repeats a sequence of operations: (1) sense signals from the environment, (2) calculate a control action, (3) actuate the control action. While we only chose one testing application, this application is a good representation of _control_ software in general. Also, we accurately chose the specific application to include the most common control algorithms, namely Kalman filters and PID controllers (the industrial survey, "Increasing Customer Value of Industrial Control Performance Monitoring - Honeywell's Experience", showed that 97% of the controllers worldwide are PIDs). Furthermore we would like to stress that: * As far as we can tell, this is the only scientific work that implements _all_ the testing setups (MIL, SIL, HIL, and PIL) for a non-trivial and complete control application. On top of the development and nominal conditions testing, injecting 10 bugs in 3 different setups (SIL, HIL, and PIL) requires performing and analysing 30 (repeated) test flighs. * Even in its current version, we had to exclude many details (our case study takes 5/10 pages) and limit the number of injected bugs. Developing more than one case study would make it impossible to describe the findings in detail, analyze the root cause of each bug, and its successful or unsuccessful detection in each setup. ## Q2: What is the rationale behind the bugs listed in Table 2? Are these common bugs? How can you categorize them? The literature on common bugs in control software is very limited. The used bugs are: * taken by the bitcraze repository (mining it would not give good results, because of inconsistent naming conventions), and * inspired by the literature [77, 80]. [77] is a survey from a robotics competition, and [80] provides a list of bugs that were found in a real application. Neither of these work include the actual code that would be needed to replicate the bugs. As written in the paper, we nonetheless drew ispiration from the reported bugs, to determine a set of bugs that are relevant in control systems (and _provided the code_ to replicate them). The bugs we injected can be categorized in: 1. functional (`voltageCompCast`, `initialPos`, `flowGyroData`, `motorRatioDef`, `simUpdate`), 2. low-level firmware-related (`byteSwap`, `gyroAxesSwap`, and partially `motorRatioDef`), and 3. timing-related (`timingKalman`, `flowDecktTiming`, and `slowTick`). This categorization is discussed in lines 934-941, and introduces the answer to RQ1. # Reviewer C ## Q1: How did you ensure the fault injection implementation consistency across multiple abstract levels? There is no guarantee of consistency between the MIL and the other setups. In fact, the MIL setup is mainly used for the development of the control high-level strategy (e.g., what are reasonable values for the PID controller gains, is the closed-loop system stable, etc) and does _not_ include the actual controller implementation. On the contrary, consistency among the remaining _software implementations_ is guranteed in SIL, HIL and PIL setup because the drone software is _exactly the same_. We inject the bugs only once in the drone software and then run the same software. For each bug, we provide [a single corresponding software patch](https://github.com/dummy-testing-abstractions/cps-testing-abstractions/tree/main/bugs) that then is used for _all_ setups. We execute the actual (nominal or patched) software in all three setups, either emulating the hardware and simulating the physics (SIL), or simulating only the physics (HIL), or in the actual flight conditions (PIL). ## Q2: Please clarify the testing input, strategy, and objectives. We agree that the choice of the test cases is highly relevant. We made the following choices: * Testing input: Our testing input is the position that we ask the drone to reach. * Testing strategy: We test our drone with step changes in the x, y, and z-axes. Step-responses are used in control engineering because of their wide frequency spectrum, that allows for the evaluation of most of the main properties of the control algorithm (e.g., stability, speed of convergence, etc). * Testing objective: The objective of the testing campaign is to assess whether the implemenation of the control algorithm behaves similarly to the MIL oracle (i.e., to the performance that the control engineer expects the drone to achieve in terms of how close the position is to the setpoint and how quickly the drone was in reaching it). ## Q3: Please clarify the relation of the proposed testing methods and CPS falsification/testing techniques. For control systems, falsification could be for example the selection of a negative desired value for the z-coordinate (which is unachievable). Alternatively, one may think of a malicious attacker spoofing the values of measurements to obtain a specific control action (this would for example be equivalent to introducing a bias, like the `initialPos` bug). We consider our test cases to be only affected by the bugs introduced in the software and not by manipulation of the requests, so we do not provide any contribution towards falsification. However, we provide insights on which testing setups can be most suitable to test the falsification of properties depending on what type of properties one is considering (e.g., for timing properties, the SIL setup seems the best testing setup for the considered drone). ## Q4: Please highlight the contribution and effort on each contribution of this paper. This clarifies the practical contribution of the paper. We implemented each testing setup from scratch: 1. For the MIL setup, we implemented (in Python) the models for the controller, state estimator, and physics. The model of the physics then is used in both SIL and HIL. 2. For the SIL setup we developed: (1) the simulator of the hardware, and (2) the software tools to make it interact with the Python model of the physics. 3. For the HIL setup we implemented the software to make the drone fly in our Python model of the physics, rather than in the real world. 4. For the PIL setup we implemented the real time logging of the relevant quantities durng real flights. In addition, we also developed the bugged versions of the drone software (for the bugs that are not taken from the bitcraze repository).

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.