## General Response Regarding the Robustness of StegaStamp
Our experiments involving StegaStamp were conducted under a constrained noise scale, as elaborated in the "Proposed Attacks" paragraph in the Evaluation section. Specifically, the noise scale was set to 60 steps across all evaluated watermarking techniques, including StegaStamp. This was to maintain a consistent utility loss across different watermarks.
StegaStamp's relatively poor visual quality and high perturbation, as documented in Table 2 (especially in terms of PSNR and FID) and Figure 3 of our manuscript, render it more resilient to watermark removal when constrained by a fixed, moderate noise scale. However, our new experiments suggest that increasing the noise scale in diffusion models significantly compromises StegaStamp's robustness. Below are the results illustrating this effect:
| [Steps] | [Noise level] | [Avg bit acc] | [Avg detect acc] |
| ----- | ----------- | ----------- | -------------- |
| 60 | 0.251 | 0.861 | 0.991 |
| 150 | 0.457 | 0.709 | 0.861 |
| 200 | 0.571 | 0.658 | 0.677 |
| 250 | 0.696 | 0.614 | 0.405 |
| 300 | 0.832 | 0.585 | 0.229 |
| 350 | 0.988 | 0.558 | 0.087 |
| 400 | 1.164 | 0.546 | 0.062 |
## Reviewer A
**R1. On the Robustness of StegaStamp**
We kindly refer you to our general response.
**R2. Recent Work in Watermarking**
We thank the reviewer for bringing to our attention this concurrent work WOUAF [1]. Upon your suggestion, we have attempted to evaluate its resilience to our attack; however, its code is not publicly available as of now.
We have initiated correspondence with the authors of WOUAF. We will incorporate its evaluation in the revision.
## Reviewer B
**R1. Trade-off Curve for StegaStamp**
Thank you for your suggestion!
In our paper, we show this trade-off curve against the DwtDctSvd watermark (Figure 6). Upon your request, we have generated a table for StegaStamp that exhibits the trade-off under different noise scales. We kindly refer you to the table in our general response.
**R2. Quantifying Utility Loss for Tree-Ring**
Indeed, the preservation of semantic meaning is not easily measured with conventional metrics like L2.
In the original Tree-Ring paper, the authors have employed Frechet Inception Distance (FID) [2] to assess generation quality, while also leveraging the CLIP score [3] to ascertain semantic consistency using OpenCLIP-ViT/G. Alternative metrics such as the BLIP score [4] and ImageReward [5] are also available for assessing vision-language alignment.
In light of your suggestion, we will discuss and evaluate more on semantic utility loss quantification in the revision.
## Reviewer C
**R1. > "Guarantee similar to differential privacy"**
The reviewer is right that we used ideas and techniques from differential privacy. Our technical contribution is a novel application of modern techniques (e.g., f-DP and GDP) from the DP literature to a new problem.
Let us summarize the main differences:
- f-CWF is different from DP (and f-DP) because f can depend on individual instance (image, watermark) pair. The distinction is important for us to quantify the effect of the embedding phi via the "Local Lipschitz" property.
- The utility bound -- Theorem 4.8 is new and somewhat clean.
- The use of modern technique (which is the reason why the presentation seems "contrived") is needed for us to get the tight characterization of the impossibility region as in Figure 4. If we use the classical (eps, delta)-DP then the certified region would be much smaller (and less valuable in practice).
- Unlike in most DP mechanisms, we do not need to artificially inject new noise. The noise is inherent to the diffusion model.
**R2. > "Adding Gaussian noise was proposed before"**
The reviewer is right. Different from existing works, we proposed to add Gaussian noise in the "embedding space" rather than the raw pixel space. Existing work also does not have formal guarantees for watermark removal as we do.
**R3. Presentation Issues**
We will strive to address all highlighted issues in the next revision.
## Reviewer D
**R1. > "Table 5 does not report L2" distance**
The PSNR that we reported is calculated using the L2 distance. PSNR = 10* log_10(Num of pixels / L2-dist^2).
**R2. > "utility of the theory"**
The advantage of the theory is that it is **future-proof**. Our results cover all future watermarks and detection methods, not just those that we empirically evaluated. Figure 4 clearly demonstrates the utility of our theory and its relevance to practice.
**R3. StegaStamp and StableSignature**
We think we understand the question you raised about interpreting our empirical results regarding “StegaStamp” and “StableSignature” and the extent to which they justify our claim "all invisible watermarks are removable" and why it appears to be "contradictory". Let us address this concern by making the following arguments.
First, our results describe a fundamental tradeoff between the “distortion” introduced by the watermark (measured in L2-distance) and the “security” (against removal attacks). It is not a binary predicate, but a continuum of pareto-front. Specifically, if the watermark perturbs the image with a higher L2 distance (as StegaStamp and Stable Signature did), our attack will require adding larger noise to achieve the same level of “watermark-free”ness — and certainly, the reconstructed image will be less similar to the original.
Figure 8 in our paper clearly demonstrates that StegaStamp and StableSignature add substantially more distortions in L2 distance than others. The amount of noise we add in the attack, however, is the same across all methods. For that reason, it is not surprising that StegaStamp detector is more effective than others (as in Table 1). To convince you that this is the case, we added experiments with a variety of different noise levels (pasted in our general response). Notably, when the number of steps of Diffusion is set to 400, the detection accuracy is merely 0.062.
**R4. > "StegaStamp give SSIM of 0.91 while "our reconstructed image has SSIM of 0.7". "How can you claim that StegaStamp suffers more visual artifacts, but the reconstructed images are good?"**
Note that we did not claim that the reconstructed image will be closer than the watermarked image to the original image. In fact, it is expected that the reconstructed image will be more different than the watermarked image when compared to the original — that is the price to pay for removing watermarks. Exceptions are when certain watermarks are “noise-like” (e.g., high-frequency noise), only then we can hope that the attacked images are closer to the original.
This may appear to be at odds with our Theorem 4.8. But notice that Theorem 4.8 is a relevant guarantee. It proved that if the original image + noise can be effectively denoised, then the watermarked image + noise can be denoised *almost just as effectively*.
Our hunch is that to remove the slightly larger watermark like StegaStamp, we need to introduce larger noise, therefore the baseline — original image + noise — after the denoising / reconstruction by diffusion, will be different from the original image (e.g., in the range of SSIM = 0.7).
Lastly, we want to emphasize that being different from the original image does not necessarily mean lower-visual quality (thanks to the ability of Stable-Diffusion to “hallucinate” details). Even if SSIM = 0.7, the reconstructed image can still be quite visually appealing (and semantically similar to the original image). We demonstrate this with more examples in the attachment.
**R5. > why "attack should increase both FPs and FNs." "only FNs matter."**
Our result covers all detectors, not just those designed for the watermark. That includes the trivial detector that always outputs “Yes, watermarked!”, hence FN = 0 but not useful.
### Reference
[1] Kim, Changhoon, et al. "WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models." arXiv preprint arXiv:2306.04744 (2023).
[2] Heusel, Martin, et al. "Gans trained by a two time-scale update rule converge to a local nash equilibrium." Advances in neural information processing systems 30 (2017).
[3] Radford, Alec, et al. "Learning transferable visual models from natural language supervision." International conference on machine learning. PMLR, 2021.
[4] Li, Junnan, et al. "Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation." International Conference on Machine Learning. PMLR, 2022.
[5] Xu, Jiazheng, et al. "Imagereward: Learning and evaluating human preferences for text-to-image generation." arXiv preprint arXiv:2304.05977 (2023).