---
title: zkPDF
---
# zkPDF
A set of zero-knowledge SP1 circuits and tools for proving facts from digitally signed PDFs without revealing the full document. zkPDF combines digital signature checks with selective content proving, enabling privacy-preserving claims from the most widely used document format.
## Verification Process
`verify_pdf_signature` function performs two main checks:
1. **Content Integrity Check**
2. **Signature Authenticity Check**
### How Signed Data Works in PDFs
When a PDF is digitally signed, the signature typically doesn't cover the entire file. Instead, a specific portion of the PDF's data, defined by the **`ByteRange`**, is cryptographically signed. This `ByteRange` usually includes the document's main content and relevant metadata.
Within the PDF:
- The **`ByteRange`** specifies the exact byte sequences that were included in the signing process.
- During verification, only these byte ranges are read and processed; other parts of the PDF, such as the signature field itself or later additions like timestamps, are excluded from the initial integrity check.
## Content Integrity Check (Message Digest)
Here's the process:
- Extract the `signed_bytes` from the PDF using the `ByteRange`.
- Calculate the cryptographic hash of these bytes using the algorithm specified in the signature. Let's denote the hash function as \( H \).
- Retrieve the `MessageDigest` value (\( M \)) stored within the PDF's signature dictionary.
The verification then involves checking the following equation:
```math
Hash(\text{signed\_bytes}) == M
```
## Signature Authenticity Check
After Content Integrity Check, the actual digital signature is verified. This signature covers a data structure known as `signed_attributes`, which is typically an ASN.1 structure embedded within the PDF.
### What are signed_attributes?
These are structured data, often represented as an ASN.1 `SET`. It contain critical metadata associated with the signing event, including the MessageDigest, the time of signing, and potentially other relevant information.
Before the digital signature is generated, these signed_attributes are encoded and then cryptographically hashed.
```asn1
SET {
OBJECT IDENTIFIER (messageDigest)
OCTET STRING (hash value)
OBJECT IDENTIFIER (signingTime)
UTCTime (time value)
...
}
```
---
### Signature Verification Process:
( H(\text{signed_attributes}) ) as the cryptographic hash of the encoded signed_attributes.
( Sig ) as the digital signature extracted from the PDF.
( PK ) as the public key associated with the signer's certificate.
Then we verify:
\text{Verify}(PK, H(\text{signed_attributes}), Sig) == \text{true}
If:
## Supported Algorithms (PKCS#7 with RSA)
This crate currently supports verification for PDF signatures using **PKCS#7/CMS SignedData** structures with the following algorithms:
| Algorithm | Support |
| --------------------------- | ------- |
| SHA-1 with RSA encryption | Yes |
| SHA-256 with RSA encryption | Yes |
| SHA-384 with RSA encryption | Yes |
| SHA-512 with RSA encryption | Yes |
## Tests
PDFs tested:
1. DigiLocker-signed PAN card to verify birthdate for age verification.
2. Provided sample PDF to check whether it contains specific text.
Oyster-CVM setup: [oyster-sp1-zkpdf](https://github.com/KalypsoProver/oyster-sp1-zkpdf)
### Benchmarks
- Running CPU and GPU provers outside the enclave for zkPDF using the provided PDF:
- CPU (64 cores & 200 GB RAM): 13m 6.919s
- GPU (RTX 4090, 64 cores & 200 GB RAM): 1m 7.404s
- Running the CPU prover on 20 cores and 64 GB RAM instance with `oyster-cvm simulate` inside Docker:
- Proof generation time: **34 min**
- First boot: ~37 min (due to artifact downloads)
- On oyster cvm `c5a.12xlarge` (24 vCPUs, 50 GB RAM):
- Proof generation time: **1 hr 7 min**
### Notes
The Kalypso GPU prover may not be suitable for zkPDF, since private inputs need to be encrypted, so to ensure confidentiality of user documents, the SP1 CPU prover must be used.