Introducing zkDL++, a novel framework designed for provable AI. Leveraging zkDL++, we address a key challenge in generative AI watermarking—maintaining privacy while ensuring provability. By enhancing the watermarking system developed by Meta, zkDL++ solves the problem of needing to keep watermark extractors private to avoid attacks, offering a more secure solution. Beyond watermarking, zkDL++ proves the integrity of any deep neural network (DNN) with high efficiency. In this post, we outline our approach, evaluate its performance, and propose avenues for further optimization.
Introduction
Stable Signature is a watermarking (WM) scheme introduced by Meta AI to allow for identification of images generated by their latent diffusion model (LDM). It uses a Convolutional Neural Network(CNN) to extract the WM, and has the advantage of being very robust to image tampering, allowing for detection even if the image was filtered, cropped, compressed, etc. However, Stable Signature suffers from the drawback of having to keep the extractor model private, as exposing it makes the WM susceptible to attacks. This drawback can be significant if at any point there is a controversy about ownership of a certain image. While Meta, or any other company using such a WM method, can run the extractor and see that a certain image was generated by their model, they cannot prove it to a third party such as social media users, unless they expose the weights of the extractor model. Such exposure may lead to obsolescence of the WM system.
We propose using the emerging technology of Zero-Knowledge Proof (ZKP) as a solution to this problem. In a ZKP scheme, a prover wishes to convince a verifier that they have knowledge of a witness $W$, which obeys a certain functional relation $F(X,W)=0$, without revealing $W$. The relation $F$ is generally represented by an arithmetic circuit and is public. The variable $X$ contains public input and output to the circuit. In the case of Stable Signature, the relation $F$ is the architecture of the WM extractor, with its output compared to the key, and it is satisfied if-and-only-if the extracted WM from the input image matches the key. The public variable $X=(x_{in},x_{out})$ includes the image and the Boolean output, and the private witness $W=(w,k)$ includes the weights of the extractor and the key as is depicted in the diagram above. A verifier can then run a verification algorithm that tests the proof and returns True if the relation is indeed satisfied, and False otherwise.
An additional challenge is ensuring that the weights weren’t deliberately chosen to produce a specific result for a given image. Fortunately, this issue can be easily resolved by the following: