ProveKit is a zero-knowledge proof toolkit optimized for mobile devices by the World Foundation.
ProveKit compiles Noir circuits (a domain-specific language for zero-knowledge proofs) into R1CS (Rank-1 Constraint System) constraints and generates/verifies WHIR proofs. The workflow is:
- Takes Noir circuits (written in the Noir language) as input
- Compiles them to R1CS constraint matrices using an optimized compiler
- Generates zero-knowledge proofs using the WHIR proof system with witness solving and commitment
- Verifies those proofs efficiently
- Can recursively verify proofs in Gnark for on-chain verification
The project is built in Rust (~95%) with a Go recursive verifier component. It's designed to be modular with clear separation between compilation, proving, and verification stages.
- Mobile-optimized performance with custom SIMD-accelerated field arithmetic
- Multiple hash function support (Skyscraper, SHA256, Keccak, Blake3)
- Memory compression techniques for resource-constrained devices
- FFI bindings for iOS, Android, Python, Swift, and Kotlin
- Benchmarking and profiling tools included
The codebase includes extensive Noir example circuits for testing and benchmarking, from basic programs to complex passport verification systems.
graph TD
A[Noir Source Code .nr] --> B[Noir Compiler]
B --> C[ACIR Circuit]
C --> D[R1CS Compiler<br/>provekit/r1cs-compiler]
D --> E[R1CS Matrices A, B, C<br/>+ Witness Builders]
E --> F[Witness Solver<br/>provekit/prover]
G[Input Values<br/>Prover.toml] --> F
F --> H[Complete Witness Vector w]
H --> I[WHIR Prover<br/>Polynomial Commitment]
E --> I
I --> J[WHIR Proof π₁]
J --> K{Recursive<br/>Verification?}
K -->|Yes| L[Gnark Groth16<br/>recursive-verifier]
K -->|No| M[Direct Verification]
L --> N[Groth16 Proof π₂<br/>for Ethereum]
J --> O[WHIR Verifier<br/>provekit/verifier]
O --> P[Accept/Reject]
N --> Q[On-chain Verifier<br/>Ethereum Smart Contract]
Q --> R[Accept/Reject]
style A fill:#e1f5ff
style J fill:#ffe1e1
style N fill:#ffe1e1
style P fill:#e1ffe1
style R fill:#e1ffe1
graph TD
A[Start: ACIR + Inputs] --> B[Initialize R1CS<br/>Topic 1: R1CS]
B --> C[Compile ACIR to R1CS<br/>Topic 12: Noir Compilation]
C --> D[Optimize R1CS<br/>Binop/Range Check Batching]
D --> E[Create Witness Builders<br/>Topic 3: Witness Generation]
E --> F[Split Builders: w1 / w2<br/>Dependency Analysis]
F --> G[Compress R1CS<br/>Topic 10: Memory Optimization]
G --> H[Solve w1 Witnesses<br/>Layer-by-layer]
H --> I{Has<br/>Challenges?}
I -->|No| J[w1 = all witnesses]
I -->|Yes| K[Compress w2 Layers<br/>Topic 10: Memory Optimization]
K --> L[Commit to w1<br/>Topic 4: Polynomial Commitment]
L --> M[Build Merkle Tree<br/>Topic 8: Hash Functions]
M --> N[Generate Challenges<br/>Topic 7: Fiat-Shamir]
N --> O[Decompress w2 Layers]
O --> P[Solve w2 Witnesses<br/>Using Challenges]
P --> Q[Commit to w2<br/>Merkle Tree]
J --> R[Decompress R1CS]
Q --> R
R --> S[Verify Constraints<br/>A·w × B·w = C·w]
S --> T[Extract Public Inputs<br/>w at positions 1 to k]
T --> U[Run Sumcheck Protocol<br/>Topic 6: Sumcheck]
U --> V[Generate WHIR Proof]
V --> W[End: NoirProof]
style A fill:#e1f5ff
style W fill:#ffe1e1
style L fill:#fff4e1
style M fill:#fff4e1
style Q fill:#fff4e1
graph TD
FFA[Topic 2:<br/>Finite Field Arithmetic<br/>BN254]
FFA --> R1CS[Topic 1:<br/>R1CS Constraints<br/>A·w × B·w = C·w]
FFA --> POLY[Topic 4:<br/>Polynomial Commitments<br/>MLE Evaluation]
FFA --> HASH[Topic 8:<br/>Hash Functions<br/>Poseidon2, SHA256]
R1CS --> WIT[Topic 3:<br/>Witness Generation<br/>Layered Solving]
WIT --> BATCH[Batch Inversion<br/>Montgomery's Trick]
FFA --> BATCH
POLY --> MLE[Multilinear Extension<br/>f̃x₁,...,xₙ]
POLY --> MERKLE[Merkle Tree<br/>Commitment]
HASH --> MERKLE
MLE --> SUM[Topic 6:<br/>Sumcheck Protocol<br/>Round Reduction]
MERKLE --> SUM
SUM --> FS[Topic 7:<br/>Fiat-Shamir<br/>Non-Interactive]
HASH --> FS
POLY --> NTT[Topic 5:<br/>NTT Transform<br/>Fast Polynomial Ops]
FFA --> NTT
R1CS --> SPARSE[Topic 9:<br/>Sparse Matrices<br/>Delta Encoding]
SPARSE --> MEM[Topic 10:<br/>Memory Optimization<br/>Compression]
POLY --> MEM
R1CS --> COMP[Topic 12:<br/>Noir Compilation<br/>ACIR → R1CS]
SUM --> REC[Topic 11:<br/>Recursive Verification<br/>Groth16 Wrapper]
style FFA fill:#ffe1e1
style R1CS fill:#e1f5ff
style POLY fill:#e1ffe1
style SUM fill:#fff4e1
graph TD
A[Witness Vector<br/>w = None, None, ...] --> B[Layer 0: Constants<br/>w at 0 equals 1]
B --> C[Layer 0: ACIR Inputs<br/>w at 1 to k = public inputs]
C --> D[Layer 1: Sums<br/>w at i = sum of terms]
D --> E[Layer 1: Products<br/>w at i = w at j times w at k]
E --> F[Layer 2: Inverse<br/>Batch Inversion]
F --> G[Collect values to invert<br/>v₁, v₂, ..., vₙ]
G --> H[Montgomery's Trick<br/>1 inverse + n multiplies]
H --> I[Write inverses<br/>w at i = inverse of vⱼ]
I --> J{More<br/>Layers?}
J -->|Yes| K[Layer k: Other Ops<br/>Challenges, Hints, etc.]
K --> J
J -->|No| L{Need<br/>Challenges?}
L -->|No| M[All Witnesses Solved]
L -->|Yes| N[Commit to w1<br/>Merkle Tree]
N --> O[Fiat-Shamir Challenges<br/>c = Hash of transcript]
O --> P[Layer k+1: Challenge<br/>w at i = c]
P --> Q[Continue w2 Layers<br/>May depend on challenges]
Q --> M
style A fill:#e1f5ff
style M fill:#e1ffe1
style F fill:#fff4e1
style H fill:#fff4e1
graph TD
A[Dense R1CS Matrix<br/>1000×1000 = 1M entries] --> B{Sparse?<br/>Mostly zeros?}
B -->|Yes 99%| C[Sparse Matrix Format<br/>CSR Compressed Sparse Row]
B -->|No| D[Keep Dense]
C --> E[Store Only Non-zeros<br/>row_starts, col_indices, values]
E --> F[Interner<br/>Deduplicate Field Elements]
F --> G[Delta Encoding<br/>Column indices as deltas]
G --> H[Serialization<br/>30-50% size reduction]
C --> I[Matrix-Vector Multiply<br/>y = A times x]
I --> J[Parallel Iteration<br/>rayon: per-row threads]
J --> K[For each row i:<br/>y at i = sum of A at i,j times x at j]
K --> L[Result Vector y]
C --> M[Transpose<br/>A to A-transpose]
M --> N[Collect entries<br/>row, col, value]
N --> O[Sort by new_row, new_col<br/>Parallel sort]
O --> P[Rebuild CSR structure<br/>Transposed matrix]
style A fill:#ffe1e1
style C fill:#e1f5ff
style H fill:#e1ffe1
style L fill:#e1ffe1
style P fill:#e1ffe1
graph TD
A[Initialize Transcript] --> B[Domain Separator<br/>Hash protocol params]
B --> C[Prover: Compute w1]
C --> D[Prover: Commit to w1<br/>Merkle root]
D --> E[Absorb: transcript ← Hash transcript, root]
E --> F[Squeeze: challenge₁ ← Hash transcript]
F --> G[Prover: Use challenge₁<br/>Compute w2]
G --> H[Prover: Commit to w2<br/>Merkle root]
H --> I[Absorb: transcript ← Hash transcript, root]
I --> J[Squeeze: challenge₂ ← Hash transcript]
J --> K[Prover: Sumcheck round 1<br/>Send g₁X]
K --> L[Absorb: transcript ← Hash transcript, g₁]
L --> M[Squeeze: r₁ ← Hash transcript]
M --> N{More<br/>Rounds?}
N -->|Yes| O[Prover: Sumcheck round i<br/>Send gᵢX]
O --> P[Absorb: transcript ← Hash transcript, gᵢ]
P --> Q[Squeeze: rᵢ ← Hash transcript]
Q --> N
N -->|No| R[Prover: Final evaluation<br/>Send hints]
R --> S[Complete Proof]
T[Verifier: Replay Transcript] --> U[Same Domain Separator]
U --> V[Absorb same messages<br/>in same order]
V --> W[Squeeze same challenges]
W --> X{Transcripts<br/>Match?}
X -->|Yes| Y[Verification Succeeds]
X -->|No| Z[Verification Fails]
style A fill:#e1f5ff
style S fill:#ffe1e1
style Y fill:#e1ffe1
style Z fill:#ffcccc
graph TD
A[Start Proving] --> B[Load R1CS<br/>200 MB in memory]
B --> C[Solve w1 witnesses<br/>+50 MB]
C --> D[Compress R1CS<br/>postcard serialization]
D --> E[R1CS → 20 MB blob<br/>Free 180 MB!]
E --> F[Compress w2 layers<br/>→ 5 MB blob]
F --> G[Free w2 structures<br/>Save 30 MB]
G --> H[Commit to w1<br/>Build Merkle tree]
H --> I[Merkle tree: 100 MB<br/>Peak memory here]
I --> J[Free Merkle tree<br/>Keep only root]
J --> K[Decompress w2 layers<br/>30 MB]
K --> L[Solve w2 witnesses<br/>+20 MB]
L --> M[Decompress R1CS<br/>200 MB]
M --> N[Run Sumcheck<br/>+50 MB]
N --> O[Generate Proof<br/>~100 KB]
O --> P[Free Everything]
Q[Memory Timeline] --> R[Peak: 200 MB<br/>Not 400+ MB!]
style A fill:#e1f5ff
style E fill:#e1ffe1
style G fill:#e1ffe1
style I fill:#ffe1e1
style J fill:#e1ffe1
style O fill:#e1ffe1
style R fill:#fff4e1
graph TD
CLI[tooling/cli<br/>provekit-cli] --> PROVER[provekit/prover<br/>Witness Solving + WHIR Prove]
CLI --> VERIFIER[provekit/verifier<br/>WHIR Verify]
CLI --> COMPILER[provekit/r1cs-compiler<br/>ACIR → R1CS]
PROVER --> COMMON[provekit/common<br/>R1CS, Witness, Types]
VERIFIER --> COMMON
COMPILER --> COMMON
COMMON --> SKY[skyscraper/core<br/>BN254 Hash Engine]
COMMON --> NTT[ntt/<br/>Number Theoretic Transform]
COMMON --> POS[poseidon2/<br/>Poseidon2 Hash]
COMMON --> WHIR[whir library<br/>Sumcheck Protocol]
SKY --> MULT[skyscraper/bn254-multiplier<br/>Field Multiplication]
SKY --> HLA[skyscraper/hla<br/>High-Level Assembly]
PROVER --> NOIR[noir libraries<br/>ACIR, ABI]
COMPILER --> NOIR
FFI[tooling/provekit-ffi<br/>C FFI Bindings] --> PROVER
FFI --> VERIFIER
WASM[tooling/provekit-wasm<br/>WebAssembly] --> PROVER
WASM --> VERIFIER
SERVER[tooling/verifier-server<br/>HTTP API] --> VERIFIER
SERVER --> GREC[recursive-verifier<br/>Go + Gnark]
GREC --> GNARK[gnark library<br/>Groth16]
BENCH[tooling/provekit-bench<br/>Benchmarks] --> PROVER
BENCH --> VERIFIER
style COMMON fill:#e1f5ff
style PROVER fill:#ffe1e1
style VERIFIER fill:#e1ffe1
style GREC fill:#fff4e1
Based on the codebase analysis, here are the fundamental mathematical concepts and topics involved in ProveKit:
What it is: R1CS is a way to represent any computation as a system of quadratic equations. Think of it like converting a program into a mathematical puzzle where solving the puzzle proves you ran the program correctly.
The Math:
An R1CS instance consists of three matrices A, B, C (each of size m × n) and a witness vector w (size n):
For each constraint i (where i = 0, 1, ..., m-1):
(A[i] · w) × (B[i] · w) = C[i] · w (mod p)
Where:
A[i] · wmeans: sum of (A[i,j] × w[j]) for all j from 0 to n-1pis the BN254 prime: 21888242871839275222246405745257275088548364400416034343698204186575808495617w[0] = 1(constant one)w[1] to w[num_public_inputs]= public inputs (visible to everyone)w[num_public_inputs+1] to w[n-1]= private witnesses (secret values)
Example: To prove you know x such that x² = 9:
Witness: w = [1, 9, 3] (w[0]=1, w[1]=public output 9, w[2]=secret input 3)
Constraint: (0·w[0] + 0·w[1] + 1·w[2]) × (0·w[0] + 0·w[1] + 1·w[2]) = (0·w[0] + 1·w[1] + 0·w[2])
Simplifies to: w[2] × w[2] = w[1]
Check: 3 × 3 = 9 ✓
Special case - Linear constraints: When A[i] or B[i] is constant (only references w[0]), the constraint becomes linear:
If B[i] = [5, 0, 0, ...], then:
(A[i] · w) × 5 = C[i] · w
Which is just: 5·(A[i] · w) - (C[i] · w) = 0
Code reference:
// From provekit/common/src/r1cs.rs
pub struct R1CS {
pub num_public_inputs: usize,
pub a: SparseMatrix, // Left matrix
pub b: SparseMatrix, // Middle matrix
pub c: SparseMatrix, // Right matrix
}
// Adding a constraint (line ~70):
pub fn add_constraint(
&mut self,
a: &[(FieldElement, usize)], // (coefficient, witness_index) pairs
b: &[(FieldElement, usize)],
c: &[(FieldElement, usize)],
)Key files:
provekit/common/src/r1cs.rs- R1CS structure and operationsprovekit/common/src/sparse_matrix.rs- Efficient sparse matrix representation with delta encoding
What it is: All numbers in ProveKit live in a finite field - like a clock that wraps around at a huge prime number. Instead of regular arithmetic, we do "modular arithmetic" where everything is computed modulo p.
The Math:
The BN254 scalar field 𝔽_p where:
p = 21888242871839275222246405745257275088548364400416034343698204186575808495617
≈ 2^254 (hence "BN254")
Field operations:
Addition: (a + b) mod p
Subtraction: (a - b) mod p
Multiplication: (a × b) mod p
Division: a / b = a × b^(-1) mod p
Inverse: b^(-1) such that b × b^(-1) ≡ 1 (mod p)
Why modular arithmetic?
- Finite: numbers don't grow infinitely
- Efficient: computers can handle fixed-size numbers
- Cryptographic: hard to reverse operations without knowing secrets
Montgomery Representation: To make multiplication faster, numbers are stored in "Montgomery form":
Instead of storing x, store: x̃ = x × R mod p
where R = 2^256
Multiplication becomes:
x̃ × ỹ = (x × R) × (y × R) mod p = (x × y) × R² mod p
Then reduce: (x × y) × R² × R^(-1) mod p = (x × y) × R mod p = (x̃y)
This trades expensive divisions for cheap bit shifts!
Batch Inversion (Montgomery's Trick): Computing many inverses efficiently:
To compute a₁⁻¹, a₂⁻¹, ..., aₙ⁻¹:
1. Compute products: p₁ = a₁, p₂ = a₁×a₂, ..., pₙ = a₁×a₂×...×aₙ
2. Compute ONE inverse: pₙ⁻¹ = (a₁×a₂×...×aₙ)⁻¹
3. Work backwards:
aₙ⁻¹ = pₙ⁻¹ × pₙ₋₁
aₙ₋₁⁻¹ = (pₙ⁻¹ × pₙ₋₁) × aₙ
...
Cost: n multiplications + 1 inversion (instead of n inversions!)
Code reference:
// From provekit/common/src/lib.rs
pub type FieldElement = ark_bn254::Fr; // BN254 scalar field
// Montgomery representation is handled by arkworks internally
// Addition (line in ark_ff crate):
impl Add for Fr {
fn add(self, other: Fr) -> Fr {
// Adds in Montgomery form, reduces mod p
}
}
// Batch inversion used in witness solving
// From provekit/prover/src/r1cs.rs (solve_witness_vec function)
// Inverse layer uses Montgomery's trick for efficiencyExample:
Let p = 7 (tiny example, real p is huge)
Field elements: {0, 1, 2, 3, 4, 5, 6}
5 + 4 = 9 mod 7 = 2
5 × 4 = 20 mod 7 = 6
5 / 4 = 5 × 4⁻¹ mod 7
where 4⁻¹ = 2 (because 4 × 2 = 8 ≡ 1 mod 7)
so 5 / 4 = 5 × 2 = 10 mod 7 = 3
Key modules:
skyscraper/core/- Custom BN254 hash engine with optimized field arithmeticskyscraper/bn254-multiplier/- Specialized BN254 multiplicationprovekit/common/src/u256_arith.rs- 256-bit arithmetic utilities
What it is: Given the R1CS constraints and some inputs, we need to compute ALL the witness values (including intermediate calculations) that satisfy every constraint. This is like filling in a Sudoku puzzle where you know some numbers and need to figure out the rest.
The Math:
Given:
- R1CS matrices A, B, C
- Public inputs: w[1], w[2], ..., w[num_public_inputs]
- ACIR witness values (from the Noir program execution)
Compute: All remaining witness values w[i] such that every constraint is satisfied.
Witness Builder Types:
-
Constant:
w[i] = c(some constant value) -
Sum:
w[i] = Σ(cⱼ × w[jⱼ])(linear combination)Example: w[5] = 3×w[1] + 7×w[2] + w[4] -
Product:
w[i] = w[j] × w[k]Example: w[6] = w[2] × w[3] -
Inverse:
w[i] = w[j]⁻¹(multiplicative inverse)Example: w[7] = w[3]⁻¹ Means: w[7] × w[3] = 1 -
Challenge:
w[i] = Hash(transcript)(Fiat-Shamir challenge)Used to generate random values from the proof transcript
Layered Solving:
Witnesses must be computed in dependency order. We organize them into layers:
Layer 0: Constants and ACIR inputs (no dependencies)
w[0] = 1
w[1] = public_input_1
w[2] = public_input_2
Layer 1: Values that depend only on Layer 0
w[3] = w[1] + w[2]
w[4] = w[1] × w[2]
Layer 2 (Inverse): Batch-inverted values
w[5] = w[3]⁻¹
w[6] = w[4]⁻¹
(Computed together using Montgomery's trick!)
Layer 3: Values depending on earlier layers
w[7] = w[5] × w[6]
... and so on
Multi-Round Protocols (w1 → challenges → w2):
For some circuits, we need to:
- Solve first batch of witnesses (w1)
- Commit to w1 (create Merkle tree)
- Generate challenges from commitment
- Solve remaining witnesses (w2) using the challenges
w1 witnesses: [w[0], w[1], ..., w[k-1]]
↓
Commit(w1) → Merkle root
↓
Challenge: c = Hash(Merkle_root, transcript)
↓
w2 witnesses: [w[k], w[k+1], ..., w[n-1]]
(may depend on challenge c)
Code reference:
// From provekit/common/src/witness/witness_builder.rs
pub enum WitnessBuilder {
Constant(ConstantTerm), // w[i] = c
Sum(usize, Vec<SumTerm>), // w[i] = Σ(cⱼ × w[jⱼ])
Product(usize, usize, usize), // w[i] = w[j] × w[k]
Inverse(usize, usize), // w[i] = w[j]⁻¹
Challenge(usize), // w[i] = Hash(transcript)
// ... 15+ more variants
}
// From provekit/prover/src/r1cs.rs
pub fn solve_witness_vec(
witness: &mut [Option<FieldElement>],
layers: LayeredWitnessBuilders,
acir_witness_map: &WitnessMap<NoirElement>,
merlin: &mut ProverState<TranscriptSponge>,
) {
for layer in layers.layers {
match layer.layer_type {
LayerType::Inverse => {
// Use Montgomery's batch inversion trick
let values: Vec<_> = /* collect values to invert */;
let inverses = batch_invert(&values);
// Write back inverses
}
LayerType::Other => {
// Solve each builder individually
for builder in layer.witness_builders {
solve_single_witness(witness, builder, ...);
}
}
}
}
}Example:
Circuit: Prove you know x, y such that (x + y)² = 25
Witnesses:
w[0] = 1 (constant)
w[1] = 25 (public output)
w[2] = 3 (secret input x)
w[3] = 2 (secret input y)
w[4] = w[2] + w[3] = 5 (sum)
w[5] = w[4] × w[4] = 25 (square)
Constraints:
1. w[4] = w[2] + w[3] (sum constraint)
2. w[5] = w[4] × w[4] (square constraint)
3. w[5] = w[1] (output constraint)
Key files:
provekit/common/src/witness/witness_builder.rs- Witness builder typesprovekit/common/src/witness/scheduling/- Layer scheduling and dependency resolutionprovekit/prover/- Witness solving implementation
What it is: A polynomial commitment is like putting a polynomial in a locked box. You can prove properties about the polynomial (like its value at a specific point) without revealing the polynomial itself.
The Math:
Multilinear Extension (MLE): Given a vector v = [v₀, v₁, v₂, ..., v_{2ⁿ-1}], we extend it to a polynomial f̃(x₁, x₂, ..., xₙ) such that:
f̃(b₁, b₂, ..., bₙ) = v[b₁×2^(n-1) + b₂×2^(n-2) + ... + bₙ×2⁰]
for all binary values bᵢ ∈ {0, 1}
The MLE formula:
f̃(x₁, x₂, ..., xₙ) = Σ v[i] × L_i(x₁, x₂, ..., xₙ)
i=0 to 2ⁿ-1
where L_i is the Lagrange basis polynomial:
L_i(x₁, ..., xₙ) = ∏ (xⱼ if bit j of i is 1, else (1-xⱼ))
j=1 to n
Example:
Vector: v = [3, 7, 2, 5] (length 4 = 2²)
MLE: f̃(x₁, x₂)
At binary points:
f̃(0,0) = v[0] = 3
f̃(0,1) = v[1] = 7
f̃(1,0) = v[2] = 2
f̃(1,1) = v[3] = 5
At arbitrary point (0.5, 0.3):
f̃(0.5, 0.3) = 3×(1-0.5)×(1-0.3) + 7×(1-0.5)×0.3
+ 2×0.5×(1-0.3) + 5×0.5×0.3
= 3×0.5×0.7 + 7×0.5×0.3 + 2×0.5×0.7 + 5×0.5×0.3
= 1.05 + 1.05 + 0.7 + 0.75 = 3.55
Merkle Tree Commitment: To commit to a polynomial (represented as a vector):
1. Hash pairs of values bottom-up:
Level 0: [v₀, v₁, v₂, v₃, v₄, v₅, v₆, v₇]
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
Level 1: [H(v₀,v₁), H(v₂,v₃), H(v₄,v₅), H(v₆,v₇)]
↓ ↓ ↓ ↓
Level 2: [H(H(v₀,v₁),H(v₂,v₃)), H(H(v₄,v₅),H(v₆,v₇))]
↓ ↓
Root: H(...)
2. Commitment = Merkle root (single hash value)
3. To prove v₃ is in the tree, provide:
- v₃
- v₂ (sibling)
- H(v₀,v₁) (uncle)
- H(H(v₄,v₅),H(v₆,v₇)) (uncle)
Verifier recomputes root and checks it matches
Linear Forms and Covectors: A covector α is a row vector that defines a linear function:
α · v = α₀×v₀ + α₁×v₁ + ... + αₙ₋₁×vₙ₋₁
For MLE evaluation at point r = (r₁, r₂, ..., rₙ):
The covector is the Lagrange basis: α = [L₀(r), L₁(r), ..., L_{2ⁿ-1}(r)]
Then: f̃(r) = α · v
Memory Optimization - PrefixCovector: When α has many trailing zeros: α = [α₀, α₁, ..., α_k, 0, 0, ..., 0] Store only the prefix [α₀, α₁, ..., α_k] and remember the full size.
Code reference:
// From provekit/common/src/prefix_covector.rs
// Multilinear extension evaluation
impl LinearForm<FieldElement> for PrefixCovector {
fn mle_evaluate(&self, point: &[FieldElement]) -> FieldElement {
let k = self.vector.len().trailing_zeros() as usize;
let r = point.len() - k;
// Head factor: ∏(1 - rᵢ) for i in 0..r
let head_factor: FieldElement =
point[..r].iter().map(|p| FieldElement::one() - p).product();
// Evaluate MLE on the prefix
let prefix_mle = multilinear_extend(&self.vector, &point[r..]);
head_factor * prefix_mle
}
}
// From whir library (external dependency)
// multilinear_extend computes: Σ v[i] × L_i(point)
pub fn multilinear_extend(v: &[F], point: &[F]) -> F {
// Efficient evaluation using recursive formula
}Public Input Binding: To bind public inputs to the proof:
Public inputs: [p₁, p₂, ..., pₖ]
Witness: w = [1, p₁, p₂, ..., pₖ, private_values...]
Create weight vector: α = [1, x, x², x³, ..., x^k, 0, 0, ...]
where x is a Fiat-Shamir challenge
Compute: α · w = 1×1 + x×p₁ + x²×p₂ + ... + x^k×pₖ
This binds the public inputs into a single field element!
Key files:
provekit/common/src/prefix_covector.rs- Memory-optimized covectors (PrefixCovector, OffsetCovector)provekit/common/src/whir_r1cs.rs- WHIR R1CS scheme configuration
What it is: NTT is like the Fast Fourier Transform (FFT) but for finite fields. It's used to quickly multiply polynomials and evaluate them at many points simultaneously.
The Math:
The Problem: Multiply two polynomials:
p(x) = a₀ + a₁x + a₂x² + ... + aₙ₋₁x^(n-1)
q(x) = b₀ + b₁x + b₂x² + ... + bₙ₋₁x^(n-1)
Naive multiplication: O(n²) operations
NTT multiplication: O(n log n) operations
How NTT Works:
- Forward NTT: Convert coefficient representation to evaluation representation
Input: coefficients [a₀, a₁, a₂, ..., aₙ₋₁]
Output: evaluations [p(ω⁰), p(ω¹), p(ω²), ..., p(ω^(n-1))]
where ω is a primitive n-th root of unity:
ω^n = 1 (mod p)
ω^i ≠ 1 for 0 < i < n
- Pointwise Multiplication:
r(ωⁱ) = p(ωⁱ) × q(ωⁱ) for each i
- Inverse NTT: Convert back to coefficients
Input: evaluations [r(ω⁰), r(ω¹), ..., r(ω^(n-1))]
Output: coefficients of r(x) = p(x) × q(x)
The NTT Formula:
Forward NTT:
P[k] = Σ p[j] × ω^(jk) for k = 0, 1, ..., n-1
j=0 to n-1
Inverse NTT:
p[j] = (1/n) × Σ P[k] × ω^(-jk) for j = 0, 1, ..., n-1
k=0 to n-1
Butterfly Operation (core of NTT):
For indices i and j = i + n/2:
temp = ω^k × a[j]
a[j] = a[i] - temp
a[i] = a[i] + temp
Interleaved Polynomials: ProveKit supports processing multiple polynomials simultaneously:
Instead of: [a₀, a₁, a₂, a₃], [b₀, b₁, b₂, b₃]
Store as: [a₀, b₀, a₁, b₁, a₂, b₂, a₃, b₃]
NTT processes both in one pass!
Example (tiny field for illustration):
Field: 𝔽₇ (mod 7)
n = 4, ω = 2 (since 2⁴ = 16 ≡ 2 mod 7, and 2² = 4 ≠ 1)
Polynomial: p(x) = 1 + 2x + 3x²
Coefficients: [1, 2, 3, 0]
Forward NTT:
P[0] = 1×2⁰ + 2×2⁰ + 3×2⁰ + 0×2⁰ = 1+2+3+0 = 6 mod 7
P[1] = 1×2⁰ + 2×2¹ + 3×2² + 0×2³ = 1+4+5+0 = 10 ≡ 3 mod 7
P[2] = 1×2⁰ + 2×2² + 3×2⁴ + 0×2⁶ = 1+1+6+0 = 8 ≡ 1 mod 7
P[3] = 1×2⁰ + 2×2³ + 3×2⁶ + 0×2⁹ = 1+2+5+0 = 8 ≡ 1 mod 7
Evaluations: [6, 3, 1, 1]
Code reference:
// From ntt/src/lib.rs
pub struct NTT<T, C: NTTContainer<T>> {
container: C,
order: Pow2<usize>, // Must be power of 2
}
impl<T, C: NTTContainer<T>> NTT<T, C> {
pub fn new(vec: C, number_of_polynomials: usize) -> Option<Self> {
let n = vec.as_ref().len();
// All polynomials must be same size
if number_of_polynomials == 0 || n % number_of_polynomials != 0 {
return None;
}
// Order must be power of 2
Pow2::new(n / number_of_polynomials).map(|order| Self {
container: vec,
order,
_phantom: PhantomData,
})
}
}
// Power-of-two validation
pub struct Pow2<T = usize>(T);
impl InPowerOfTwoSet for usize {
fn in_set(&self) -> bool {
usize::is_power_of_two(*self) || *self == 0
}
}Why Power of 2? NTT requires n to be a power of 2 because:
- Recursive divide-and-conquer works on halves
- Butterfly operations pair elements at distance n/2, n/4, n/8, ...
- Root of unity ω must have order exactly n
Key files:
ntt/src/lib.rs- NTT implementation and container types
What it is: Sumcheck is an interactive proof protocol that lets a prover convince a verifier that a sum over a large domain equals a claimed value, without the verifier checking every term. It's the heart of WHIR.
The Math:
The Claim:
Prover claims: Σ f(x₁, x₂, ..., xₙ) = C
x∈{0,1}ⁿ
Where the sum is over all 2ⁿ binary inputs, and C is the claimed sum.
The Protocol (n rounds):
Round 1:
Prover sends univariate polynomial g₁(X₁):
g₁(X₁) = Σ f(X₁, x₂, ..., xₙ)
x₂,...,xₙ∈{0,1}
Verifier checks: g₁(0) + g₁(1) = C
Verifier sends random challenge: r₁
Round 2:
Prover sends g₂(X₂):
g₂(X₂) = Σ f(r₁, X₂, x₃, ..., xₙ)
x₃,...,xₙ∈{0,1}
Verifier checks: g₂(0) + g₂(1) = g₁(r₁)
Verifier sends random challenge: r₂
... Continue for n rounds ...
Round n:
Prover sends gₙ(Xₙ):
gₙ(Xₙ) = f(r₁, r₂, ..., rₙ₋₁, Xₙ)
Verifier checks: gₙ(0) + gₙ(1) = gₙ₋₁(rₙ₋₁)
Verifier sends random challenge: rₙ
Final Check:
Verifier evaluates: f(r₁, r₂, ..., rₙ) and checks it equals gₙ(rₙ)
Why It Works:
- If prover is honest, all checks pass
- If prover cheats, they must guess the random challenges (probability 1/|𝔽|ⁿ ≈ 0)
- Reduces checking 2ⁿ values to checking n polynomials + 1 evaluation
Example (n=2, tiny domain):
Claim: Σ f(x₁, x₂) = 10
x₁,x₂∈{0,1}
Suppose f(0,0)=1, f(0,1)=2, f(1,0)=3, f(1,1)=4
True sum: 1+2+3+4 = 10 ✓
Round 1:
Prover: g₁(X₁) = f(X₁,0) + f(X₁,1)
g₁(0) = f(0,0) + f(0,1) = 1+2 = 3
g₁(1) = f(1,0) + f(1,1) = 3+4 = 7
g₁(X₁) = 3 + 4X₁ (linear polynomial)
Verifier: g₁(0) + g₁(1) = 3 + 7 = 10 ✓
Sends r₁ = 0.5
Round 2:
Prover: g₂(X₂) = f(0.5, X₂)
g₂(0) = f(0.5, 0) = 0.5×f(0,0) + 0.5×f(1,0) = 0.5×1 + 0.5×3 = 2
g₂(1) = f(0.5, 1) = 0.5×f(0,1) + 0.5×f(1,1) = 0.5×2 + 0.5×4 = 3
g₂(X₂) = 2 + X₂
Verifier: g₂(0) + g₂(1) = 2 + 3 = 5
g₁(0.5) = 3 + 4×0.5 = 5 ✓
Sends r₂ = 0.7
Final:
Verifier: Compute f(0.5, 0.7) directly
Check it equals g₂(0.7) = 2 + 0.7 = 2.7
Optimization - Batching: Instead of one sumcheck, batch multiple claims:
Claim: Σ (α₁×f₁(x) + α₂×f₂(x) + ... + αₖ×fₖ(x)) = C
x∈{0,1}ⁿ
where α₁, α₂, ..., αₖ are random coefficients
This proves all k claims simultaneously!
In WHIR/R1CS Context:
Sumcheck proves: Σ (A[i]·w) × (B[i]·w) × (C[i]·w) = 0
i
This verifies all R1CS constraints are satisfied!
Code reference:
// Sumcheck is implemented in the external WHIR library
// ProveKit calls it through the whir crate
// From provekit/prover/src/whir_r1cs.rs (conceptual)
impl WhirR1CSProver {
fn prove_noir(...) -> Result<WhirR1CSProof> {
// 1. Commit to witness polynomial
let commitment = commit_to_polynomial(witness);
// 2. Run sumcheck on R1CS relation
let sumcheck_proof = whir::sumcheck::prove(
&r1cs_polynomial,
&witness,
&mut transcript
);
// 3. Package into proof
WhirR1CSProof {
commitment,
sumcheck_proof,
...
}
}
}
// Verifier checks sumcheck
impl WhirR1CSVerifier {
fn verify(...) -> Result<()> {
// 1. Verify commitment
verify_commitment(proof.commitment);
// 2. Verify sumcheck
whir::sumcheck::verify(
proof.sumcheck_proof,
&r1cs_polynomial,
&mut transcript
)?;
Ok(())
}
}Key concepts:
- Round-by-round reduction
- Fiat-Shamir transformation for non-interactivity
- Transcript management for challenge generation
What it is: Fiat-Shamir converts an interactive proof (where prover and verifier exchange messages) into a non-interactive proof (just one message from prover to verifier). It replaces the verifier's random challenges with hash outputs.
The Math:
Interactive Protocol:
Prover → Verifier: message₁
Verifier → Prover: challenge₁ = random()
Prover → Verifier: message₂
Verifier → Prover: challenge₂ = random()
...
Fiat-Shamir (Non-Interactive):
Prover computes:
challenge₁ = Hash(domain_separator || message₁)
message₂ = f(challenge₁)
challenge₂ = Hash(domain_separator || message₁ || challenge₁ || message₂)
message₃ = f(challenge₂)
...
Prover sends: (message₁, message₂, message₃, ...)
Verifier recomputes:
challenge₁ = Hash(domain_separator || message₁)
challenge₂ = Hash(domain_separator || message₁ || challenge₁ || message₂)
...
Checks all messages are consistent
The Transcript: A transcript is a running hash of all messages:
transcript₀ = Hash(domain_separator)
transcript₁ = Hash(transcript₀ || message₁)
transcript₂ = Hash(transcript₁ || challenge₁)
transcript₃ = Hash(transcript₂ || message₂)
...
Challenges are derived from transcript:
challenge_i = Hash(transcript_i)
Domain Separator: Prevents cross-protocol attacks:
domain_separator = Hash(protocol_name || protocol_version || parameters)
Example:
domain_separator = Hash("ProveKit-WHIR-v1" || num_constraints || num_witnesses || ...)
Prover Message vs Prover Hint:
Prover Message: Absorbed into transcript (affects challenges)
transcript = Hash(transcript || prover_message)
challenge = Hash(transcript)
Prover Hint: NOT absorbed (doesn't affect challenges)
// Hint is sent but not hashed
// Used for values that are independently verified (e.g., polynomial evaluations)
CRITICAL: Misusing hints for values that should be messages is a soundness bug!
Example:
Interactive Sumcheck:
Round 1:
Prover → Verifier: g₁(X) = [3, 4] (coefficients)
Verifier → Prover: r₁ = random() = 0.5
Fiat-Shamir Sumcheck:
Round 1:
Prover computes: g₁(X) = [3, 4]
Prover computes: r₁ = Hash(transcript || g₁) = Hash(...) = 0.5
(No interaction needed!)
Transcript Determinism: CRITICAL: Prover and verifier must construct IDENTICAL transcripts!
If prover hashes: Hash(A || B || C)
But verifier hashes: Hash(A || C || B)
→ Different challenges → Verification fails!
Common bugs:
- Wrong message order
- Missing domain separator
- Forgetting to hash a message
- Using hint instead of message
Code reference:
// From provekit/common/src/transcript_sponge.rs
pub struct TranscriptSponge {
// Wraps a cryptographic sponge (hash function)
}
impl TranscriptSponge {
// Absorb a prover message (affects transcript)
pub fn absorb(&mut self, message: &[u8]) {
self.sponge.absorb(message);
}
// Squeeze a challenge (derive from transcript)
pub fn squeeze(&mut self) -> FieldElement {
self.sponge.squeeze()
}
}
// From provekit/common/src/whir_r1cs.rs
impl WhirR1CSScheme {
pub fn create_domain_separator(&self) -> DomainSeparator {
// Hash all protocol parameters
transcript::DomainSeparator::protocol(self)
}
}
// From provekit/prover/src/lib.rs
fn prove_with_witness(...) -> Result<NoirProof> {
// Create transcript with domain separator
let ds = self.whir_for_witness
.create_domain_separator()
.instance(&Empty);
let mut merlin = ProverState::new(
&ds,
TranscriptSponge::from_config(self.hash_config)
);
// Absorb commitment (prover message)
merlin.absorb_commitment(&commitment);
// Squeeze challenge
let challenge: FieldElement = merlin.verifier_message();
// Continue protocol...
}Security: Fiat-Shamir is secure in the Random Oracle Model:
- Hash function behaves like a truly random function
- Prover cannot predict challenges before committing to messages
- Soundness: cheating prover must guess hash outputs (probability ≈ 1/2^256)
Key files:
provekit/common/src/transcript_sponge.rs- Transcript managementprovekit/common/src/whir_r1cs.rs- Domain separator creation
What it is: Hash functions take arbitrary input and produce a fixed-size output (digest). They're one-way (can't reverse) and collision-resistant (hard to find two inputs with same output).
The Math:
Hash Function Properties:
H: {0,1}* → {0,1}^256 (maps any input to 256-bit output)
1. Deterministic: H(x) always gives same output
2. One-way: Given y, hard to find x where H(x) = y
3. Collision-resistant: Hard to find x ≠ x' where H(x) = H(x')
4. Avalanche effect: Changing 1 bit of input changes ~50% of output bits
Merkle Tree Hashing:
Compress two values into one:
H(left, right) → parent
Example tree:
H(H(a,b), H(c,d))
/ \
H(a,b) H(c,d)
/ \ / \
a b c d
Poseidon2 (Algebraic Hash): Special hash designed for zero-knowledge proofs:
State: [s₀, s₁, s₂] (3 field elements)
Round function:
1. Add round constants: sᵢ ← sᵢ + cᵢ
2. S-box (non-linear): sᵢ ← sᵢ^5
3. MDS matrix multiply: s ← M × s
Repeat for R rounds
Why algebraic? Easy to express as R1CS constraints!
Regular hash (SHA256) needs ~25,000 constraints
Poseidon2 needs ~100 constraints
Hash Configurations in ProveKit:
1. Skyscraper (default): Custom BN254-optimized hash
- SIMD-accelerated (ARM NEON)
- Fastest on mobile devices
2. SHA256: Standard cryptographic hash
- 256-bit output
- Well-studied security
3. Keccak: Ethereum-compatible
- Used in Ethereum blockchain
- 256-bit output
4. Blake3: Modern high-performance
- Parallelizable
- Very fast on modern CPUs
5. Poseidon2: Algebraic hash for BN254
- Efficient in zero-knowledge circuits
- Used for in-circuit hashing
Code reference:
// From provekit/common/src/hash_config.rs
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
pub enum HashConfig {
Skyscraper, // Default
SHA256,
Keccak,
Blake3,
}
// From poseidon2/src/permutation.rs
pub fn poseidon2_permutation(state: &mut [FieldElement; 3]) {
for round in 0..ROUNDS {
// Add round constants
for i in 0..3 {
state[i] += ROUND_CONSTANTS[round][i];
}
// S-box: x^5
for i in 0..3 {
let x = state[i];
let x2 = x * x;
let x4 = x2 * x2;
state[i] = x4 * x;
}
// MDS matrix multiply
let t0 = state[0] + state[1] + state[2];
state[0] = t0 + state[0] * 2;
state[1] = t0 + state[1] * 2;
state[2] = t0 + state[2] * 2;
}
}
// From skyscraper/core/src/lib.rs
pub fn compress(left: [u64; 4], right: [u64; 4]) -> [u64; 4] {
// Custom BN254 compression using SIMD
// Optimized for ARM NEON instructions
}Key files:
provekit/common/src/hash_config.rs- Hash function configurationposeidon2/src/- Poseidon2 implementationskyscraper/core/- Skyscraper hash engine
What it is: Most R1CS matrices are sparse (mostly zeros). Sparse matrix representation stores only non-zero entries, saving massive amounts of memory.
The Math:
Dense vs Sparse:
Dense matrix (1000×1000 with 0.1% non-zero):
Storage: 1,000,000 field elements × 32 bytes = 32 MB
Sparse matrix (same matrix):
Storage: 1,000 non-zero entries × (32 + 4) bytes = 36 KB
Savings: 99.9%!
Sparse Matrix Format (CSR - Compressed Sparse Row):
Matrix:
[0 3 0 0]
[0 0 7 0]
[2 0 0 5]
Stored as:
row_starts: [0, 1, 2, 4] (where each row begins)
col_indices: [1, 2, 0, 3] (column of each non-zero)
values: [3, 7, 2, 5] (the non-zero values)
To get row 2: entries from row_starts[2]=2 to row_starts[3]=4
→ col_indices[2..4] = [0, 3]
→ values[2..4] = [2, 5]
→ Row 2 has: (0,2) and (3,5)
Delta Encoding Optimization: Within each row, store column indices as deltas:
Row: columns [5, 17, 100, 105]
Absolute encoding: [5, 17, 100, 105] (4 bytes each = 16 bytes)
Delta encoding: [5, 12, 83, 5] (smaller numbers = 1-2 bytes each = 6 bytes)
Savings: 30-50% on column indices!
Variable-length Integer Encoding:
Value range → Bytes needed:
0-127 → 1 byte
128-16383 → 2 bytes
16384-... → 3+ bytes
Delta encoding makes values smaller → fewer bytes!
Matrix-Vector Multiplication:
y = A × x
For each row i:
y[i] = Σ A[i,j] × x[j] (sum over non-zero entries only)
j
Sparse: O(nnz) operations (nnz = number of non-zeros)
Dense: O(n²) operations
Example: 1000×1000 matrix with 1000 non-zeros
Sparse: 1,000 operations
Dense: 1,000,000 operations
1000× faster!
Parallel Matrix-Vector Multiply:
Each row is independent, so parallelize:
Thread 1: Compute y[0..249]
Thread 2: Compute y[250..499]
Thread 3: Compute y[500..749]
Thread 4: Compute y[750..999]
Linear speedup with number of cores!
Interner (Deduplication): Many field elements repeat in R1CS matrices:
Without interner:
Matrix has 10,000 entries, 9,000 are the value "1"
Storage: 10,000 × 32 bytes = 320 KB
With interner:
Store unique values: {0, 1, 2, 3, ...} → [val₀, val₁, val₂, ...]
Matrix stores indices: [1, 1, 1, ..., 2, 3, 1, ...]
Storage: ~100 unique values × 32 bytes + 10,000 × 2 bytes = 23 KB
Savings: 93%!
Code reference:
// From provekit/common/src/sparse_matrix.rs
pub struct SparseMatrix {
pub num_rows: usize,
pub num_cols: usize,
new_row_indices: Vec<u32>, // Where each row starts
col_indices: Vec<u32>, // Column of each entry (absolute in memory)
values: Vec<InternedFieldElement>, // Interned values
}
// Delta encoding for serialization
fn encode_col_deltas(
col_indices: &[u32],
new_row_indices: &[u32],
total_entries: usize,
) -> Vec<u32> {
let mut deltas = Vec::with_capacity(col_indices.len());
for row in 0..num_rows {
let row_cols = &col_indices[start..end];
// First column is absolute
deltas.push(row_cols[0]);
// Subsequent columns are deltas
for i in 1..row_cols.len() {
deltas.push(row_cols[i] - row_cols[i - 1]);
}
}
deltas
}
// Parallel matrix-vector multiply
impl Mul<&[FieldElement]> for HydratedSparseMatrix<'_> {
type Output = Vec<FieldElement>;
fn mul(self, rhs: &[FieldElement]) -> Self::Output {
(0..self.matrix.num_rows)
.into_par_iter() // Parallel iterator (rayon)
.map(|row| {
self.iter_row(row)
.map(|(col, value)| value * rhs[col])
.fold(FieldElement::zero(), |acc, x| acc + x)
})
.collect()
}
}
// From provekit/common/src/interner.rs
pub struct Interner {
values: Vec<FieldElement>, // Unique values
map: HashMap<FieldElement, InternedFieldElement>, // Value → index
}
impl Interner {
pub fn intern(&mut self, value: FieldElement) -> InternedFieldElement {
if let Some(&idx) = self.map.get(&value) {
return idx; // Already interned
}
let idx = self.values.len();
self.values.push(value);
self.map.insert(value, idx);
idx
}
}Key files:
provekit/common/src/sparse_matrix.rs- Sparse matrix implementationprovekit/common/src/interner.rs- Field element deduplication
What it is: Mobile devices have limited RAM. ProveKit uses clever techniques to minimize memory usage during proving, making it possible to generate proofs on phones.
The Math & Techniques:
1. Prefix Covectors:
Full covector: α = [a₀, a₁, a₂, a₃, 0, 0, 0, 0, ..., 0]
↑___ k non-zero ___↑ ↑___ n-k zeros ___↑
Storage:
- Naive: n field elements × 32 bytes
- Prefix: k field elements × 32 bytes + remember n
For k=1000, n=1,000,000:
Naive: 32 MB
Prefix: 32 KB
Savings: 99.97%!
MLE Evaluation with Prefix:
f̃(r₁, ..., rₙ) where f has prefix of length 2^k
Split point: r = (r₁, ..., r_{n-k}, r_{n-k+1}, ..., rₙ)
↑___ head ___↑ ↑___ tail ___↑
f̃(r) = [∏(1 - rᵢ) for i in head] × MLE_prefix(tail)
i=1 to n-k
Only evaluate MLE on the small prefix!
2. Offset Covectors:
Covector with values at specific offset:
α = [0, 0, 0, a₀, a₁, a₂, 0, 0, 0]
↑___ offset=3 ___↑
Storage: Just [a₀, a₁, a₂] + offset + domain_size
3. R1CS Compression:
During proving:
1. Solve w1 witnesses
2. Compress R1CS to binary blob (postcard serialization)
3. Free R1CS matrices (saves ~100s of MB)
4. Commit to w1
5. Solve w2 witnesses
6. Decompress R1CS for sumcheck
7. Run sumcheck
Peak memory = max(R1CS_size, commitment_size)
Without compression: R1CS_size + commitment_size
Savings: ~50% peak memory!
4. Witness Layer Compression:
Similar to R1CS:
1. Compress w2_layers to binary blob
2. Free w2_layers data structures
3. Commit to w1
4. Decompress w2_layers
5. Solve w2
Saves memory during expensive commitment operation
5. Streaming Serialization:
Instead of:
1. Build entire proof in memory
2. Serialize to bytes
3. Write to disk
Do:
1. Serialize piece-by-piece
2. Write each piece immediately
3. Never hold full proof in memory
Peak memory = size of largest piece (not full proof)
6. Zero-Copy Operations:
Instead of copying data:
&[u8] → deserialize → T → process → serialize → &[u8]
Use references:
&[u8] → view as T → process in-place → &[u8]
Saves: 2× memory (no copies) + faster
Code reference:
// From provekit/common/src/prefix_covector.rs
pub struct PrefixCovector {
vector: Vec<FieldElement>, // Only non-zero prefix
domain_size: usize, // Full logical size
}
impl LinearForm<FieldElement> for PrefixCovector {
fn mle_evaluate(&self, point: &[FieldElement]) -> FieldElement {
let k = self.vector.len().trailing_zeros() as usize;
let r = point.len() - k;
// Head factor: ∏(1 - rᵢ)
let head_factor: FieldElement =
point[..r].iter()
.map(|p| FieldElement::one() - p)
.product();
// Evaluate MLE only on prefix
let prefix_mle = multilinear_extend(&self.vector, &point[r..]);
head_factor * prefix_mle
}
}
// From provekit/prover/src/r1cs.rs
pub struct CompressedR1CS {
blob: Vec<u8>, // Postcard-serialized R1CS
num_witnesses: usize,
num_constraints: usize,
}
impl CompressedR1CS {
pub fn compress(r1cs: R1CS) -> Result<Self> {
let blob = postcard::to_allocvec(&r1cs)?;
Ok(Self {
blob,
num_witnesses: r1cs.num_witnesses(),
num_constraints: r1cs.num_constraints(),
})
}
pub fn decompress(self) -> Result<R1CS> {
postcard::from_bytes(&self.blob)
}
}
// From provekit/prover/src/lib.rs
fn prove_with_witness(...) -> Result<NoirProof> {
// Compress R1CS
let compressed_r1cs = CompressedR1CS::compress(self.r1cs)?;
// Solve w1
solve_witness_vec(&mut witness, w1_layers, ...);
// Compress w2 layers
let compressed_w2 = CompressedLayers::compress(w2_layers)?;
// Commit to w1 (R1CS and w2_layers not in memory!)
let commitment = commit(&w1)?;
// Decompress for w2 solving
let w2_layers = compressed_w2.decompress()?;
solve_witness_vec(&mut witness, w2_layers, ...);
// Decompress R1CS for sumcheck
let r1cs = compressed_r1cs.decompress()?;
// Run sumcheck
let proof = sumcheck(&r1cs, &witness)?;
Ok(proof)
}Memory Timeline:
Time →
|--- Load R1CS (200 MB)
|--- Solve w1 (50 MB)
|--- Compress R1CS → 20 MB blob
|--- Free R1CS matrices (saves 180 MB)
|--- Compress w2_layers → 5 MB blob
|--- Free w2_layers (saves 30 MB)
|--- Commit w1 (100 MB Merkle tree)
|--- Free commitment (saves 100 MB)
|--- Decompress w2_layers (30 MB)
|--- Solve w2 (20 MB)
|--- Decompress R1CS (200 MB)
|--- Sumcheck (50 MB)
Peak memory: 200 MB (not 200+50+100+30+20 = 400 MB!)
Key concepts:
- Peak memory reduction during proving
- Streaming serialization/deserialization
- Zero-copy operations where possible
What it is: Verifying a WHIR proof inside another proof system. This creates a Groth16 proof that "I verified a WHIR proof correctly", which can be verified on-chain (Ethereum) very cheaply.
The Math:
The Problem:
WHIR proof: ~100 KB, verification takes ~100ms
Ethereum gas cost: ~10M gas (too expensive!)
Solution: Create a Groth16 proof of WHIR verification
Groth16 proof: ~200 bytes, verification takes ~1ms
Ethereum gas cost: ~300K gas (affordable!)
Recursive Verification:
Original statement: "I know x such that f(x) = y"
↓ WHIR proof
WHIR proof π₁: Proves the statement
Recursive statement: "I verified WHIR proof π₁ correctly"
↓ Groth16 proof
Groth16 proof π₂: Proves the verification
Ethereum verifies π₂ (cheap!)
→ π₂ is valid
→ WHIR verification was correct
→ π₁ is valid
→ Original statement is true
Groth16 Proof System:
Proof: (A, B, C) (3 elliptic curve points)
Verification equation:
e(A, B) = e(α, β) × e(C, γ) × e(public_inputs, δ)
where e is a pairing function on elliptic curves
Verification: 3 pairings + 1 multi-exponentiation
Cost: ~300K gas on Ethereum
The Workflow:
1. Compile WHIR verifier to R1CS:
- Input: WHIR proof π₁
- Computation: WHIR verification algorithm
- Output: accept/reject
- Result: R1CS with ~1M constraints
2. Generate Groth16 proving key (PK) and verification key (VK):
- PK: ~1 GB (used by prover)
- VK: ~1 KB (used by verifier)
- CRITICAL: PK and VK must match the R1CS exactly!
3. Prove with Groth16:
- Input: WHIR proof π₁, PK
- Output: Groth16 proof π₂
- Time: ~10 seconds
4. Verify on-chain:
- Input: π₂, VK, public inputs
- Output: accept/reject
- Cost: ~300K gas
R1CS Matching:
CRITICAL: The R1CS used to generate PK/VK must EXACTLY match
the R1CS used to create the WHIR proof!
If they don't match:
- Groth16 proof will be invalid
- Or worse: unsound (accepts invalid WHIR proofs)
Matching means:
- Same number of constraints
- Same number of witnesses
- Same constraint structure
- Same public input positions
Code reference:
// From recursive-verifier/cmd/cli/main.go (Go code)
func main() {
// Load WHIR proof
whirProof := loadWHIRProof("proof.np")
// Load R1CS for WHIR verifier circuit
r1cs := loadR1CS("r1cs.json")
// Generate or load Groth16 keys
pk, vk := groth16.Setup(r1cs)
// Create witness (WHIR proof + verification computation)
witness := createWitness(whirProof, r1cs)
// Generate Groth16 proof
groth16Proof, err := groth16.Prove(r1cs, pk, witness)
// Verify Groth16 proof
valid := groth16.Verify(groth16Proof, vk, publicInputs)
if valid {
fmt.Println("WHIR proof verified successfully!")
}
}Why Go + Gnark?
Gnark is a Go library for zero-knowledge proofs
- Mature Groth16 implementation
- Ethereum-compatible
- Good performance
- Well-tested
ProveKit uses:
- Rust for WHIR (performance-critical)
- Go for Groth16 wrapper (leverage Gnark)
Deployment Modes:
1. CLI mode:
./recursive-verifier --proof proof.np --r1cs r1cs.json
2. HTTP server mode:
./recursive-verifier-server
POST /verify { "proof": "...", "r1cs": "..." }
Used by tooling/verifier-server (Rust + Go hybrid)
Key directory:
recursive-verifier/- Go-based Groth16 wrapper
What it is: Translating high-level Noir programs (which look like Rust) into low-level R1CS constraints that the proof system can work with.
The Math:
Compilation Pipeline:
Noir source code (.nr)
↓ Noir compiler
ACIR (Abstract Circuit Intermediate Representation)
↓ ProveKit r1cs-compiler
R1CS (A, B, C matrices) + Witness Builders
↓ ProveKit prover
Proof
Example Noir Program:
fn main(x: Field, y: pub Field) {
assert(x * x == y);
}ACIR Representation:
Opcodes:
1. Mul: w[2] = w[0] × w[0] (square x)
2. AssertZero: w[2] - w[1] = 0 (check x² = y)
Witnesses:
w[0] = x (private input)
w[1] = y (public input)
w[2] = x² (intermediate)
R1CS Compilation:
From ACIR opcode: Mul(w[2], w[0], w[0])
Generate R1CS constraint:
A[0] = [0, 0, 1, 0, ...] (select w[0])
B[0] = [0, 0, 1, 0, ...] (select w[0])
C[0] = [0, 0, 0, 1, ...] (select w[2])
Constraint: (A[0]·w) × (B[0]·w) = C[0]·w
Simplifies: w[0] × w[0] = w[2] ✓
From ACIR opcode: AssertZero(w[2] - w[1])
Generate R1CS constraint:
A[1] = [1, 0, 0, 0, ...] (constant 1)
B[1] = [0, 1, -1, 0, ...] (w[1] - w[2])
C[1] = [0, 0, 0, 0, ...] (zero)
Constraint: 1 × (w[1] - w[2]) = 0
Simplifies: w[1] = w[2] ✓
Optimization Passes:
1. Binop Batching:
Without batching:
x AND y → 1 constraint
a AND b → 1 constraint
p AND q → 1 constraint
Total: 3 constraints
With batching:
All ANDs use shared lookup table
Table has 2^(2×bits) entries
Multiplicities track how often each entry is used
Total: 1 table + 3 lookups (more efficient!)
2. Range Check Batching:
Without batching:
Check x ∈ [0, 255] → 8 bit constraints
Check y ∈ [0, 255] → 8 bit constraints
Check z ∈ [0, 255] → 8 bit constraints
Total: 24 constraints
With batching:
Shared range table [0, 1, 2, ..., 255]
Multiplicities track usage
Total: 1 table + 3 lookups
3. Spread Table Caching:
Spread function: interleave bits with zeros
spread(0b1011) = 0b01_00_01_01
Without caching:
Compute spread(x) → constraints
Compute spread(y) → constraints
Compute spread(z) → constraints
With caching:
Build spread table once: [spread(0), spread(1), ..., spread(255)]
Look up values: O(1) per lookup
Black Box Functions: Special optimized implementations for common operations:
1. Poseidon2 hash:
- Native: ~25,000 constraints
- Black box: ~100 constraints
- 250× more efficient!
2. SHA256:
- Native: ~100,000 constraints
- Black box: ~20,000 constraints
- 5× more efficient
3. ECDSA signature verification:
- Native: ~1,000,000 constraints
- Black box: ~50,000 constraints
- 20× more efficient
Public Input Binding:
Noir: fn main(x: Field, y: pub Field)
↑ private ↑ public
R1CS witness layout:
w[0] = 1 (constant)
w[1] = y (public input 1)
w[2] = ... (public input 2, if any)
...
w[k] = x (private input)
...
Public inputs MUST be at positions 1..num_public_inputs
This is enforced during compilation!
Witness Mapping:
ACIR witness indices → R1CS witness indices
ACIR may have:
- Witness 0 = x
- Witness 1 = y
- Witness 2 = x²
R1CS must have:
- w[0] = 1 (constant)
- w[1] = y (public)
- w[2] = x (private)
- w[3] = x² (intermediate)
Compiler creates mapping: ACIR[i] → R1CS[j]
Code reference:
// From provekit/r1cs-compiler/src/lib.rs (conceptual)
pub fn noir_to_r1cs(
acir_circuit: &Circuit,
optimizations: &Optimizations,
) -> Result<(R1CS, Vec<WitnessBuilder>)> {
let mut r1cs = R1CS::new();
let mut witness_builders = Vec::new();
// Reserve witness 0 for constant 1
r1cs.add_witnesses(1);
witness_builders.push(WitnessBuilder::Constant(0, FieldElement::one()));
// Add public inputs
for pub_input in &acir_circuit.public_inputs {
let idx = r1cs.add_witnesses(1);
witness_builders.push(WitnessBuilder::Acir(idx, pub_input.index));
}
// Compile each ACIR opcode
for opcode in &acir_circuit.opcodes {
match opcode {
Opcode::AssertZero(expr) => {
compile_assert_zero(&mut r1cs, expr)?;
}
Opcode::BlackBoxFuncCall(bb) => {
compile_black_box(&mut r1cs, &mut witness_builders, bb)?;
}
// ... more opcodes
}
}
// Apply optimizations
if optimizations.batch_binops {
batch_binary_operations(&mut r1cs, &mut witness_builders)?;
}
if optimizations.batch_range_checks {
batch_range_checks(&mut r1cs, &mut witness_builders)?;
}
Ok((r1cs, witness_builders))
}
fn compile_assert_zero(r1cs: &mut R1CS, expr: &Expression) -> Result<()> {
// Convert linear expression to R1CS constraint
// expr = Σ(cᵢ × wᵢ) = 0
// Becomes: 1 × (Σ(cᵢ × wᵢ)) = 0
let a = vec![(FieldElement::one(), 0)]; // Constant 1
let b = expr.to_witness_coefficients(); // Linear combination
let c = vec![]; // Zero
r1cs.add_constraint(&a, &b, &c);
Ok(())
}Key files:
provekit/r1cs-compiler/- Noir to R1CS compilation
To deeply understand ProveKit's core math, you should be familiar with:
- Abstract Algebra: Finite fields, groups, rings, field extensions
- Linear Algebra: Matrices, vectors, linear systems, rank
- Polynomial Algebra: Multivariate polynomials, evaluation, interpolation
- Cryptography: Hash functions, commitment schemes, random oracles
- Complexity Theory: NP relations, witness-based proofs
- Number Theory: Modular arithmetic, prime fields, discrete logarithms
- Start with R1CS: understand constraint satisfaction and witness vectors
- Study finite field arithmetic: BN254 field operations
- Learn polynomial commitments: Merkle trees and evaluation proofs
- Understand sumcheck protocol: the heart of WHIR
- Study Fiat-Shamir: making proofs non-interactive
- Explore optimizations: sparse matrices, memory techniques
What it is: DKIM is an email authentication method that allows the receiver to verify that an email was actually sent by the domain it claims to be from. Every email has a cryptographic signature that can be verified using the sender's public key.
The Math:
DKIM Signature Algorithm:
signature = RSA_sign(SHA256(email_headers + body_hash), private_key)
Where:
- email_headers = "from:..., to:..., subject:..., date:..."
- body_hash = SHA256(email_body)
- private_key = sender domain's private key (kept secret)
Verification:
1. Extract signature from email header (DKIM-Signature field)
2. Fetch public key from DNS TXT records (domain._domainkey.domain.com)
3. Compute: SHA256(email_headers + body_hash)
4. Verify: RSA_verify(signature, hash, public_key) == true
RSA Signature Verification:
Given:
- signature s (2048-bit number)
- public key (n, e) where n is modulus, e is exponent (usually 65537)
- message hash h
Verify:
s^e mod n == PKCS1_padding(h)
If equal → signature is valid
If not equal → signature is forged
Example DKIM Header:
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20230601;
h=from:to:subject:date;
bh=47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=;
b=ABC123...XYZ789
Where:
- v=1: DKIM version
- a=rsa-sha256: Algorithm (RSA with SHA256)
- d=gmail.com: Signing domain
- s=20230601: Selector (identifies which key)
- h=...: Headers included in signature
- bh=...: Body hash (base64 encoded)
- b=...: Signature (base64 encoded)
ZK Email Innovation: Instead of verifying DKIM on-chain (expensive!), we verify it in a ZK circuit:
Public Inputs:
- sender_domain (e.g., "twitter.com")
- rsa_modulus (public key)
- masked_message (only revealed parts, e.g., username)
Private Inputs:
- full_email_headers
- full_email_body
- dkim_signature
Circuit Proves:
1. SHA256(headers + body_hash) is correct
2. RSA signature verifies
3. Email structure is valid (proper headers)
4. Masked message matches regex extraction
Why This Matters:
- Proves email authenticity without revealing full email
- No need to trust centralized oracles
- Privacy-preserving: only reveal what you want
- Works with existing email infrastructure (billions of emails/day)
Code reference:
// From @zk-email/circuits
template RSAVerify(n, k) {
// n = number of bits in RSA modulus
// k = chunk size for big integer arithmetic
signal input message[k]; // SHA256 hash
signal input signature[k]; // RSA signature
signal input modulus[k]; // RSA public key modulus
// Compute signature^e mod modulus
component rsa = RSAPow(n, k, 65537);
rsa.base <== signature;
rsa.modulus <== modulus;
// Verify PKCS1 padding and hash match
component pkcs = PKCS1Verify(n, k);
pkcs.padded <== rsa.out;
pkcs.message <== message;
}Trust Assumptions:
- DNS public key is correct (fetched from DNS TXT records)
- Sending mailserver is honest (they control the private key)
- Receiving mailserver doesn't forge emails (they can read plaintext)
What it is: Proving that a string matches a regular expression pattern without revealing the string itself. Used to extract specific fields from emails (like username, amount, date) while keeping the rest private.
The Math:
Deterministic Finite Automaton (DFA): A regex is converted to a state machine:
DFA = (Q, Σ, δ, q₀, F)
Where:
- Q = set of states {q₀, q₁, q₂, ...}
- Σ = alphabet (all possible characters)
- δ = transition function: Q × Σ → Q
- q₀ = initial state
- F = set of accepting (final) states
Example Regex: to: ([a-z]+)@mit.edu
States:
q₀ (start) → q₁ (saw 't') → q₂ (saw 'to') → q₃ (saw 'to:')
→ q₄ (saw 'to: ') → q₅ (capturing username) → q₆ (saw '@')
→ q₇ (saw '@m') → q₈ (saw '@mi') → q₉ (saw '@mit')
→ q₁₀ (saw '@mit.') → q₁₁ (saw '@mit.e') → q₁₂ (saw '@mit.ed')
→ q₁₃ (accept: saw '@mit.edu')
Transitions:
δ(q₀, 't') = q₁
δ(q₁, 'o') = q₂
δ(q₂, ':') = q₃
δ(q₃, ' ') = q₄
δ(q₄, 'a'-'z') = q₅ (start capturing)
δ(q₅, 'a'-'z') = q₅ (continue capturing)
δ(q₅, '@') = q₆ (stop capturing)
...
ZK Circuit for DFA:
For each character c in input string:
current_state = next_state
// Compute next state based on current state and character
next_state = 0
for each possible (state, char) → new_state:
is_match = (current_state == state) AND (c == char)
next_state += is_match * new_state
// Track if we're in capturing state
if current_state in capturing_states:
revealed[i] = c
else:
revealed[i] = 0
// Final check: are we in an accepting state?
is_valid = (final_state in F)
Constraint Generation:
Each state transition becomes constraints:
// Check if current state matches
is_state_q5 = (state == 5)
// Check if character is in range 'a'-'z'
is_lowercase = (c >= 97) AND (c <= 122)
// Compute next state
next_state = is_state_q5 * is_lowercase * 5 // stay in q5
+ is_state_q5 * (c == 64) * 6 // '@' → q6
+ ... (all other transitions)
Substring Extraction:
To extract matched text:
reveal_mask[i] = (state in capturing_states) ? 1 : 0
revealed_char[i] = reveal_mask[i] * input[i]
Public output: revealed_char (zeros except for captured parts)
Example:
Input: "to: alice@mit.edu\r\nsubject: hello"
Regex: to: ([a-z]+)@mit.edu
DFA execution:
Position 0: 't' → state 1
Position 1: 'o' → state 2
Position 2: ':' → state 3
Position 3: ' ' → state 4
Position 4: 'a' → state 5 (START CAPTURE)
Position 5: 'l' → state 5 (CAPTURING)
Position 6: 'i' → state 5 (CAPTURING)
Position 7: 'c' → state 5 (CAPTURING)
Position 8: 'e' → state 5 (CAPTURING)
Position 9: '@' → state 6 (STOP CAPTURE)
Position 10: 'm' → state 7
...
Position 16: 'u' → state 13 (ACCEPT)
Revealed: [0,0,0,0,'a','l','i','c','e',0,0,...]
Public output: "alice"
Optimization - State Compression:
Instead of checking all transitions:
1. Group states by common transitions
2. Use lookup tables for character ranges
3. Precompute transition matrices
Reduces constraints from O(|Q| × |Σ|) to O(|Q| × log|Σ|)
Code reference:
// From @zk-email/circuits (generated by zk-regex)
template EmailRegex(msg_bytes) {
signal input msg[msg_bytes];
signal output out; // 1 if match, 0 if no match
signal states[msg_bytes+1][num_states];
states[0][0] <== 1; // Start in state 0
for (var i = 0; i < msg_bytes; i++) {
// Transition logic (auto-generated from regex)
states[i+1][0] <== states[i][0] * (1 - eq[0][i]);
states[i+1][1] <== states[i][0] * eq[0][i] + states[i][1] * (1 - eq[1][i]);
// ... more transitions
}
// Check if final state is accepting
out <== states[msg_bytes][accepting_state];
}Why This Matters:
- Extract specific fields (username, amount, date) from emails
- Prove email structure without revealing full content
- Flexible: any regex pattern can be converted to circuit
- Efficient: DFA is linear in string length
Tools:
- zkregex.com: Convert regex to Circom code
- @zk-email/zk-regex: Library for regex circuits
What it is: Verifying RSA signatures inside ZK circuits is challenging because RSA uses very large numbers (2048-4096 bits) and operates in a different field than the ZK circuit's native field (BN254).
The Math:
The Problem:
RSA verification: s^e mod n == h
Where:
- s = signature (2048 bits)
- e = public exponent (usually 65537)
- n = modulus (2048 bits)
- h = message hash (256 bits)
But ZK circuit operates in BN254 field (254 bits)!
Solution: Big Integer Arithmetic:
Represent large numbers as arrays of smaller chunks:
2048-bit number = [chunk₀, chunk₁, ..., chunk₁₅]
where each chunk is 128 bits (fits in BN254 field)
Example:
n = 0x1234...5678 (2048 bits)
= [0x5678, 0x..., 0x..., 0x1234] (16 chunks of 128 bits)
Modular Exponentiation:
Compute s^e mod n using repeated squaring:
function pow_mod(base, exp, mod):
result = 1
base = base mod mod
while exp > 0:
if exp is odd:
result = (result * base) mod mod
exp = exp >> 1
base = (base * base) mod mod
return result
For e = 65537 = 0b10000000000000001:
Only 2 bits set, so only 2 multiplications needed!
Big Integer Multiplication:
Multiply two k-chunk numbers:
function bigint_mul(a, b, k):
// School multiplication algorithm
result = [0] * (2*k)
for i in 0..k:
carry = 0
for j in 0..k:
prod = a[i] * b[j] + result[i+j] + carry
result[i+j] = prod mod 2^128
carry = prod >> 128
result[i+k] = carry
return result
Constraints: O(k²) multiplications
For k=16 (2048 bits): 256 multiplications
Modular Reduction:
Reduce result modulo n using Barrett reduction:
function barrett_reduce(x, n, k):
// Precompute: μ = floor(2^(2*k*128) / n)
// Estimate quotient: q ≈ floor(x / n)
q = floor((x * μ) >> (2*k*128))
// Compute remainder: r = x - q*n
r = x - q * n
// Final adjustment (at most 2 subtractions)
while r >= n:
r = r - n
return r
Constraints: O(k²) for multiplication, O(k) for subtraction
PKCS#1 Padding Verification:
RSA signature includes padding:
padded = 0x0001 || 0xFF...FF || 0x00 || DigestInfo || hash
Where:
- 0x0001: padding type
- 0xFF...FF: padding bytes (fills to modulus size)
- 0x00: separator
- DigestInfo: algorithm identifier (SHA256)
- hash: actual message hash
Circuit must verify:
1. First bytes are 0x0001
2. Middle bytes are all 0xFF
3. Separator 0x00 exists
4. DigestInfo matches expected algorithm
5. Final bytes match message hash
Constraint Count:
RSA verification in circuit:
1. Big integer multiplication: ~256 constraints per multiply
2. Modular reduction: ~256 constraints
3. Exponentiation (e=65537): ~17 multiplications
4. PKCS padding check: ~100 constraints
Total: ~5,000 - 10,000 constraints for 2048-bit RSA
Compare to:
- ECDSA verification: ~1,500 constraints
- EdDSA verification: ~2,000 constraints
RSA is 3-5× more expensive but necessary for DKIM!
Code reference:
// From @zk-email/circuits
template RSAVerifier(n, k) {
// n = bits in modulus (2048)
// k = chunk size (128)
signal input signature[k];
signal input modulus[k];
signal input message[32]; // SHA256 hash
// Compute signature^65537 mod modulus
component exp = BigIntExp(n, k, 65537);
exp.base <== signature;
exp.modulus <== modulus;
// Verify PKCS#1 v1.5 padding
component pkcs = PKCS1Verify(n, k);
pkcs.padded <== exp.out;
pkcs.message <== message;
pkcs.out === 1; // Must be valid
}
template BigIntMul(k) {
signal input a[k];
signal input b[k];
signal output out[2*k];
// School multiplication with carry
for (var i = 0; i < k; i++) {
for (var j = 0; j < k; j++) {
// Multiply and accumulate
// (constraints generated here)
}
}
}Optimization Techniques:
1. Chunk size selection:
- Larger chunks: fewer chunks, less overhead
- Smaller chunks: fits better in field, less overflow risk
- Sweet spot: 128-bit chunks for 2048-bit RSA
2. Exponent optimization:
- e = 65537 = 2^16 + 1 (only 2 bits set)
- Only 17 squarings + 1 multiplication needed
- Much faster than arbitrary exponent
3. Lazy reduction:
- Don't reduce after every operation
- Accumulate multiple operations
- Reduce only when necessary (before overflow)
4. Precomputation:
- Barrett reduction constant μ
- Montgomery form conversion
- Lookup tables for small multiplications
Why This Matters:
- DKIM uses RSA (not ECDSA), so we must support it
- Enables email verification in ZK
- Most expensive part of ZK Email circuit
- Optimization is critical for practical performance
What it is: Computing SHA256 hash inside a ZK circuit, with support for variable-length inputs. This is needed because emails have different lengths, but we want one circuit to handle all of them.
The Math:
SHA256 Algorithm:
SHA256 is a Merkle-Damgård hash function:
1. Pad message to multiple of 512 bits:
message || 0x80 || 0x00...00 || length (64 bits)
2. Split into 512-bit blocks: M₀, M₁, M₂, ...
3. Initialize state: H = [h₀, h₁, ..., h₇] (8 × 32-bit words)
4. For each block Mᵢ:
H = compress(H, Mᵢ)
5. Output: H (256 bits)
Compression Function:
compress(H, M):
// Expand M into 64 words
W[0..15] = M (split into 16 × 32-bit words)
for i in 16..63:
s0 = rotr(W[i-15], 7) XOR rotr(W[i-15], 18) XOR (W[i-15] >> 3)
s1 = rotr(W[i-2], 17) XOR rotr(W[i-2], 19) XOR (W[i-2] >> 10)
W[i] = W[i-16] + s0 + W[i-7] + s1
// Initialize working variables
a, b, c, d, e, f, g, h = H
// 64 rounds
for i in 0..63:
S1 = rotr(e, 6) XOR rotr(e, 11) XOR rotr(e, 25)
ch = (e AND f) XOR ((NOT e) AND g)
temp1 = h + S1 + ch + K[i] + W[i]
S0 = rotr(a, 2) XOR rotr(a, 13) XOR rotr(a, 22)
maj = (a AND b) XOR (a AND c) XOR (b AND c)
temp2 = S0 + maj
h = g
g = f
f = e
e = d + temp1
d = c
c = b
b = a
a = temp1 + temp2
// Add to state
H = [H[0]+a, H[1]+b, ..., H[7]+h]
return H
Where:
- rotr(x, n) = rotate right by n bits
- K[i] = round constants (first 32 bits of cube roots of first 64 primes)
Variable Length Challenge:
Problem: Circuit must handle emails of different lengths
Naive solution: Create different circuit for each length
→ Need different proving key for each length
→ Impractical!
Better solution: Partial hash computation
Partial Hash Technique:
Key insight: SHA256 processes blocks sequentially
For long email:
Block 0 → Block 1 → Block 2 → ... → Block N
↓ ↓ ↓ ↓
H₀ → H₁ → H₂ → ... Hₙ (final hash)
Optimization:
1. Compute H₀, H₁, ..., Hₖ₋₁ OUTSIDE the circuit (fast!)
2. Only compute Hₖ, ..., Hₙ INSIDE the circuit (slow but necessary)
3. Prove: "I have a substring that, when hashed with partial state Hₖ₋₁,
produces final hash Hₙ"
This works because SHA256 is a sponge function!
Example:
Email: 100 KB = 800,000 bits = 1,563 blocks
Without optimization:
- Circuit must process all 1,563 blocks
- ~100,000 constraints per block
- Total: ~156 million constraints (infeasible!)
With partial hash:
- Compute first 1,500 blocks outside circuit → H₁₅₀₀
- Circuit only processes last 63 blocks
- Total: ~6.3 million constraints (feasible!)
Savings: 96% reduction in constraints!
Padding Handling:
SHA256 padding must be done correctly:
message || 0x80 || 0x00...00 || length
In circuit:
1. Find actual message length (variable)
2. Add 0x80 byte after message
3. Fill with 0x00 until 64 bits remain
4. Add message length as 64-bit big-endian integer
Constraints:
- Length check: O(1)
- Padding insertion: O(block_size)
- Conditional logic: use selectors
Selector Pattern:
To handle variable length:
for i in 0..max_length:
is_message[i] = (i < actual_length) ? 1 : 0
is_padding[i] = (i == actual_length) ? 1 : 0
is_zero[i] = (i > actual_length AND i < length_field_start) ? 1 : 0
byte[i] = is_message[i] * message[i]
+ is_padding[i] * 0x80
+ is_zero[i] * 0x00
+ is_length[i] * length_byte[i]
Each selector is a constraint!
Code reference:
// From @zk-email/circuits
template Sha256Bytes(max_num_bytes) {
signal input in_padded[max_num_bytes];
signal input in_len_padded_bytes;
signal output out[256];
var num_blocks = max_num_bytes / 64;
component ha0 = H(0);
component hb0 = H(1);
// ... initialize state
for (var i = 0; i < num_blocks; i++) {
// Extract 512-bit block
component block[16];
for (var j = 0; j < 16; j++) {
block[j] = Bytes2Word(4);
for (var k = 0; k < 4; k++) {
block[j].in[k] <== in_padded[i*64 + j*4 + k];
}
}
// Compress
component compress = Sha256Compress();
compress.hin <== [ha0.out, hb0.out, ...];
compress.inp <== [block[0].out, block[1].out, ...];
// Update state
ha0.out <== compress.out[0];
hb0.out <== compress.out[1];
// ...
}
// Output final hash
out <== [ha0.out, hb0.out, ...];
}Constraint Count:
SHA256 in circuit:
Per block (512 bits):
- Message expansion: 48 rounds × ~10 constraints = 480
- Compression: 64 rounds × ~50 constraints = 3,200
- State update: ~100 constraints
Total per block: ~3,800 constraints
For email with 10 blocks in circuit:
Total: ~38,000 constraints
This is why partial hashing is critical!
Why This Matters:
- SHA256 is part of DKIM signature
- Must verify hash inside ZK circuit
- Variable length support enables one circuit for all emails
- Partial hashing makes it practical
- Still one of the most expensive operations in ZK Email
- R1CS Constraint Satisfaction: All constraints must be satisfied by the witness
- Public Input Binding:
public_inputs[i] == witness[1 + i](witness[0] = 1) - Transcript Determinism: Prover and verifier must construct identical transcripts
- Witness Layer Ordering: Dependencies must be respected in layered solving
- Prover Message vs Hint: Only messages affect Fiat-Shamir transcript
- DKIM Signature Validity: RSA signature must verify with correct public key
- Regex State Transitions: DFA must reach accepting state for valid match
- SHA256 Padding: Must follow RFC specification exactly
- Email Structure: Headers must be properly formatted (CRLF separators)
- DNS Key Trust: Public key must match domain's DNS TXT records
What it is: QAP is the polynomial representation of R1CS. It's a crucial transformation that enables efficient zero-knowledge proofs by converting constraint checking from linear algebra to polynomial algebra.
The Math:
From R1CS to QAP:
R1CS has m constraints and n variables
Each constraint: (A[i]·w) × (B[i]·w) = C[i]·w
QAP converts this to polynomials:
- For each variable position j, create 3 polynomials: Aⱼ(x), Bⱼ(x), Cⱼ(x)
- These polynomials pass through specific points
- Evaluating at x=i gives the i-th constraint
Lagrange Interpolation:
Given points: (1, y₁), (2, y₂), ..., (m, yₘ)
Find polynomial P(x) such that P(i) = yᵢ
Lagrange basis polynomial for point i:
Lᵢ(x) = ∏ (x - j) / (i - j) for all j ≠ i
j=1 to m
Final polynomial:
P(x) = Σ yᵢ × Lᵢ(x)
i=1 to m
Example:
R1CS constraint matrices (4 constraints, 6 variables):
A = [[0,1,0,0,0,0], B = [[0,1,0,0,0,0], C = [[0,0,0,1,0,0],
[0,0,0,1,0,0], [0,1,0,0,0,0], [0,0,0,0,1,0],
[0,1,0,0,1,0], [1,0,0,0,0,0], [0,0,0,0,0,1],
[5,0,0,0,0,1]] [1,0,0,0,0,0]] [0,0,1,0,0,0]]
For variable 0 (constant 1):
A₀ values at x=1,2,3,4: [0, 0, 0, 5]
Interpolate: A₀(x) = -5 + 9.166x - 5x² + 0.833x³
For variable 1 (input x):
A₁ values at x=1,2,3,4: [1, 0, 1, 0]
Interpolate: A₁(x) = 8 - 11.333x + 5x² - 0.666x³
... repeat for all 6 variables and all 3 matrices
The QAP Check:
Instead of checking m constraints individually:
(A[i]·w) × (B[i]·w) = C[i]·w for i = 1,2,...,m
Check ONE polynomial equation:
A(x) × B(x) - C(x) = H(x) × Z(x)
Where:
- A(x) = Σ wⱼ × Aⱼ(x) (linear combination of A polynomials)
- B(x) = Σ wⱼ × Bⱼ(x)
- C(x) = Σ wⱼ × Cⱼ(x)
- Z(x) = (x-1)(x-2)...(x-m) (target polynomial, zero at all constraint points)
- H(x) = quotient polynomial
Key insight: If all constraints are satisfied, then A(x)×B(x)-C(x)
equals zero at x=1,2,...,m, so it must be divisible by Z(x)!
Why This Works:
At each constraint point x=i:
A(i) = A[i]·w (by construction of Lagrange interpolation)
B(i) = B[i]·w
C(i) = C[i]·w
If constraint i is satisfied:
A(i) × B(i) - C(i) = 0
If ALL constraints satisfied:
A(x) × B(x) - C(x) = 0 at x=1,2,...,m
By polynomial division theorem:
A(x) × B(x) - C(x) = H(x) × Z(x) for some polynomial H(x)
Succinctness:
R1CS check: O(m) constraint checks
QAP check: 1 polynomial division check
But polynomials can be committed using cryptography!
This enables constant-size proofs regardless of m.
Code reference:
# From Vitalik's QAP tutorial
def r1cs_to_qap(A, B, C):
m = len(A) # number of constraints
n = len(A[0]) # number of variables
# For each variable, interpolate its values across all constraints
A_polys = []
B_polys = []
C_polys = []
for j in range(n):
# Extract column j from all constraints
A_vals = [A[i][j] for i in range(m)]
B_vals = [B[i][j] for i in range(m)]
C_vals = [C[i][j] for i in range(m)]
# Interpolate to get polynomials
A_polys.append(lagrange_interp(range(1, m+1), A_vals))
B_polys.append(lagrange_interp(range(1, m+1), B_vals))
C_polys.append(lagrange_interp(range(1, m+1), C_vals))
# Target polynomial Z(x) = (x-1)(x-2)...(x-m)
Z = [1]
for i in range(1, m+1):
Z = poly_mul(Z, [-i, 1])
return A_polys, B_polys, C_polys, ZInterview Questions:
-
Q: Why do we need QAP? Why not just use R1CS directly? A: QAP enables polynomial commitment schemes and pairings. Checking m R1CS constraints requires O(m) operations, but with QAP we can use cryptographic commitments to check all constraints with O(1) verification using pairings. This is the "S" (succinct) in zkSNARK.
-
Q: What's the degree of the polynomials in QAP? A: The degree is m-1 where m is the number of constraints. This is because we're interpolating through m points, which requires a polynomial of degree at most m-1.
-
Q: Why must A(x)×B(x)-C(x) be divisible by Z(x)? A: If all constraints are satisfied, then A(x)×B(x)-C(x) equals zero at x=1,2,...,m. Any polynomial that has zeros at these m points must be divisible by the minimal polynomial Z(x)=(x-1)(x-2)...(x-m). This is a fundamental theorem of algebra.
-
Q: What happens if one constraint is not satisfied? A: Then A(i)×B(i)-C(i) ≠ 0 for some i, which means A(x)×B(x)-C(x) ≠ 0 at x=i. This polynomial won't be divisible by Z(x), and the division will leave a remainder, causing verification to fail.
-
Q: How does QAP relate to Groth16? A: Groth16 uses QAP as its arithmetization. The prover computes H(x) such that A(x)×B(x)-C(x)=H(x)×Z(x), then creates cryptographic commitments to these polynomials using elliptic curve pairings. The verifier checks the pairing equation without seeing the actual polynomials.
What it is: A pairing is a special mathematical operation that takes two elliptic curve points and produces a field element, with the magical property that it "preserves multiplication" in a specific way. This enables checking polynomial equations on committed (encrypted) values.
The Math:
Bilinear Map Definition:
A pairing e: G₁ × G₂ → Gₜ is bilinear if:
e(aP, bQ) = e(P, Q)^(ab) for all a,b ∈ ℤ and P ∈ G₁, Q ∈ G₂
Where:
- G₁, G₂ are elliptic curve groups
- Gₜ is a multiplicative group (usually in a field extension)
- P, Q are points on elliptic curves
- a, b are scalars
Key Properties:
1. Bilinearity:
e(aP, bQ) = e(P, Q)^(ab) = e(bP, aQ) = e(P, abQ) = e(abP, Q)
2. Non-degeneracy:
If P ≠ 0 and Q ≠ 0, then e(P, Q) ≠ 1
3. Computability:
e(P, Q) can be efficiently computed (Miller's algorithm)
Why Pairings are "Magical":
Problem: Check if a × b = c without knowing a, b, c
With pairings:
Given: [a]G₁, [b]G₂, [c]Gₜ (commitments to a, b, c)
Check: e([a]G₁, [b]G₂) = [c]Gₜ
Because:
e([a]G₁, [b]G₂) = e(G₁, G₂)^(ab) = [ab]Gₜ
If ab = c, then [ab]Gₜ = [c]Gₜ ✓
Application to QAP Verification:
QAP equation: A(s) × B(s) = C(s) + H(s) × Z(s)
Prover commits:
- [A(s)]₁ = A(s) × G₁
- [B(s)]₂ = B(s) × G₂
- [C(s)]₁ = C(s) × G₁
- [H(s)]₁ = H(s) × G₁
Verifier checks using pairing:
e([A(s)]₁, [B(s)]₂) = e([C(s)]₁ + [H(s)]₁ × [Z(s)]₁, G₂)
This works because:
LHS = e(G₁, G₂)^(A(s)×B(s))
RHS = e(G₁, G₂)^(C(s)+H(s)×Z(s))
If A(s)×B(s) = C(s)+H(s)×Z(s), then LHS = RHS ✓
Concrete Example (BN254 curve):
BN254 pairing: e: G₁ × G₂ → Gₜ
G₁: Points on E(𝔽ₚ) where p ≈ 2^254
G₂: Points on E(𝔽ₚ²) (quadratic extension)
Gₜ: Elements in 𝔽ₚ¹² (12th degree extension)
Example computation:
P = (x₁, y₁) ∈ G₁
Q = (x₂, y₂) ∈ G₂
e(P, Q) computed using Miller's algorithm:
1. Build a function f with divisor [r]P - [P] - [r-1]O
2. Evaluate f at Q
3. Apply final exponentiation
Result: element in 𝔽ₚ¹²
Tate Pairing (simplified):
e(P, Q) = f_r,P(Q)^((p^k - 1)/r)
Where:
- r = order of the curve
- k = embedding degree
- f_r,P is a rational function
- Final exponentiation ensures result is in correct subgroup
Why BN254?
1. Efficient pairing computation (k=12 embedding degree)
2. 128-bit security level
3. Fast arithmetic in 𝔽ₚ
4. Widely supported (Ethereum precompiles)
5. Good balance between security and performance
Code reference:
// Ethereum precompile for BN254 pairing check
// Address: 0x08
function pairingCheck(
uint256[] memory input
) public view returns (bool) {
// Input: pairs of (G1_point, G2_point)
// Returns: true if e(P1,Q1) * e(P2,Q2) * ... = 1
// Used in Groth16 verification:
// e(A, B) = e(α, β) * e(L, γ) * e(C, δ)
// Rearranged: e(A, B) * e(-α, β) * e(-L, γ) * e(-C, δ) = 1
}Interview Questions:
-
Q: What makes pairings "bilinear"? A: Bilinearity means e(aP, bQ) = e(P, Q)^(ab). This property allows us to "extract" scalar multiplications from inside the pairing, enabling verification of polynomial equations on committed values without revealing the polynomials themselves.
-
Q: Why can't we just use regular elliptic curve operations? A: Regular EC operations only support addition: P + Q. We can't multiply two EC points. Pairings give us a way to check multiplicative relationships: e(aP, bQ) = e(P, Q)^(ab), which is essential for verifying A(s)×B(s)=C(s).
-
Q: What's the computational cost of pairings? A: Pairings are expensive! On BN254, one pairing takes ~10-20ms. This is why Groth16 verification uses only 3 pairings (or 1 multi-pairing), making it practical for on-chain verification (~300k gas on Ethereum).
-
Q: What's the difference between G₁ and G₂? A: G₁ is the base curve group (points in 𝔽ₚ), while G₂ is a twisted curve group (points in 𝔽ₚ²). G₁ operations are faster, so we put most commitments there. G₂ is used for verification keys and some proof elements.
-
Q: Why do we need a trusted setup for pairings? A: The trusted setup generates [s^i]G₁ and [s^i]G₂ for i=0,1,...,d where s is a secret "toxic waste". These powers of s allow the prover to evaluate polynomials at s without knowing s. If s is revealed, fake proofs can be created. The setup must be done once per circuit (Groth16) or once for all circuits up to size d (PLONK).
What it is: A trusted setup is a ceremony where cryptographic parameters are generated using a secret random value that must be destroyed. If the secret leaks, the security of the proof system is compromised.
The Math:
The Secret:
Generate random τ (tau) ∈ 𝔽ₚ
This is the "toxic waste" that must be destroyed
Powers of Tau:
Compute: [τ^0]G, [τ^1]G, [τ^2]G, ..., [τ^d]G
Where:
- [τ^i]G = τ^i × G (scalar multiplication on elliptic curve)
- G is the generator point
- d is the maximum degree of polynomials we can commit to
These are the Structured Reference String (SRS)
Why We Need This:
To commit to polynomial P(x) = Σ aᵢx^i:
Commitment: [P(τ)]G = Σ aᵢ[τ^i]G
The prover can compute this using the SRS without knowing τ!
[P(τ)]G = a₀[τ^0]G + a₁[τ^1]G + ... + aₐ[τ^d]G
The Danger:
If attacker knows τ:
1. They can compute [P(τ)]G for ANY polynomial P
2. They can create fake proofs that verify
3. They can break the binding property of commitments
Example attack:
- Claim to prove P(x) = x² + 2x + 3
- Actually use Q(x) = x² + 5x + 1
- Compute [Q(τ)]G and claim it's [P(τ)]G
- Verifier can't tell the difference!
Multi-Party Computation (MPC) Ceremony:
To prevent single point of failure:
Participant 1: Generates τ₁, computes [τ₁^i]G
Participant 2: Generates τ₂, computes [τ₂^i × τ₁^i]G
Participant 3: Generates τ₃, computes [τ₃^i × τ₂^i × τ₁^i]G
...
Participant n: Generates τₙ, computes [τₙ^i × ... × τ₁^i]G
Final τ = τ₁ × τ₂ × ... × τₙ
Security: As long as ONE participant destroys their τᵢ,
the final τ is unknown to everyone!
The Ceremony Protocol:
Phase 1: Powers of Tau (universal, circuit-independent)
1. Generate [τ^i]G₁ for i=0 to d
2. Generate [τ^i]G₂ for i=0 to d
3. Generate [α×τ^i]G₁ for i=0 to d (for zero-knowledge)
4. Generate [β×τ^i]G₁ for i=0 to d (for zero-knowledge)
Phase 2: Circuit-specific (for Groth16)
1. Evaluate QAP polynomials at τ
2. Generate proving key and verification key
3. Destroy τ and all intermediate values
For PLONK/KZG: Only Phase 1 needed (universal setup)
Verification of Ceremony:
Each participant must prove they contributed correctly:
Check: e([τ^i]G₁, G₂) = e(G₁, [τ^i]G₂)
This ensures [τ^i]G₁ and [τ^i]G₂ use the same τ^i
Participants also provide a hash of their contribution
for public verifiability
Real-World Ceremonies:
1. Zcash Sprout (2016): 6 participants
- Destroyed computers used in ceremony
- Ceremony took place in separate locations
2. Zcash Sapling (2018): 90+ participants
- More decentralized
- Used MPC protocol
3. Ethereum KZG Ceremony (2023): 140,000+ participants
- Largest ceremony ever
- For EIP-4844 (proto-danksharding)
- Universal setup for all future use
Code reference:
// Simplified Powers of Tau contribution
fn contribute_to_ceremony(
previous_params: &[G1Point],
tau: &Scalar,
) -> Vec<G1Point> {
let mut new_params = Vec::new();
// Multiply each previous parameter by tau^i
for (i, prev) in previous_params.iter().enumerate() {
let tau_power = tau.pow(i as u64);
new_params.push(prev * tau_power);
}
// CRITICAL: Securely delete tau after this!
// In practice, use secure memory wiping
new_params
}Interview Questions:
-
Q: Why is it called "toxic waste"? A: The secret τ is called toxic waste because if it's not destroyed, it can be used to create fake proofs. It's "toxic" to the security of the system, and must be treated like hazardous waste - carefully handled and completely destroyed.
-
Q: What's the difference between Groth16 and PLONK trusted setups? A: Groth16 requires a circuit-specific setup - you need a new ceremony for each different circuit. PLONK uses a universal setup - one ceremony can be reused for all circuits up to a maximum size d. This makes PLONK much more practical for applications with many different circuits.
-
Q: How do we know participants actually destroyed their secrets? A: We don't! That's why we use MPC with many participants. We only need ONE honest participant who destroys their secret for the entire ceremony to be secure. With 100+ participants, the probability that ALL of them are malicious or compromised is negligible.
-
Q: Can we avoid trusted setup entirely? A: Yes! STARKs (using FRI commitment) and Bulletproofs (using Pedersen commitments) don't require trusted setup. They're called "transparent" proof systems. The tradeoff is larger proof sizes or slower verification. WHIR (used in ProveKit) also avoids trusted setup.
-
Q: What happens if the toxic waste is leaked years later? A: All proofs created with that setup become potentially forgeable. This is a long-term security risk. That's why ceremonies use MPC with many participants, and why transparent proof systems (no trusted setup) are being developed.
What it is: A fundamental theorem in polynomial identity testing that says: if two different polynomials are evaluated at a random point, they're very unlikely to give the same answer. This is the mathematical foundation for why polynomial commitments are secure.
The Math:
The Lemma:
Let P(x₁, x₂, ..., xₙ) be a non-zero polynomial of degree d over field 𝔽
If we choose random values r₁, r₂, ..., rₙ from a finite set S ⊆ 𝔽:
Pr[P(r₁, r₂, ..., rₙ) = 0] ≤ d / |S|
Where |S| is the size of the set S
Intuition:
A polynomial of degree d has at most d roots
If you pick a random point, the probability of hitting a root
is at most d / (total number of possible points)
For large fields, this probability is negligible
Example:
Polynomial: P(x) = x² - 5x + 6 = (x-2)(x-3)
Degree: d = 2
Roots: x = 2 and x = 3
If we pick random x from {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}:
Pr[P(x) = 0] = 2/10 = 0.2 = d/|S|
If we pick from 𝔽ₚ where p ≈ 2^254:
Pr[P(x) = 0] ≈ 2/2^254 ≈ 0 (negligible!)
Application to Polynomial Identity Testing:
Problem: Check if P(x) = Q(x) for all x
Naive: Expand both polynomials, compare all coefficients
Cost: O(d) where d is degree
Schwartz-Zippel: Pick random r, check if P(r) = Q(r)
Cost: O(1) evaluation
If P ≠ Q, then P(x) - Q(x) is non-zero polynomial of degree ≤ d
Pr[P(r) - Q(r) = 0] ≤ d / |𝔽|
For 𝔽 = 𝔽ₚ with p ≈ 2^254:
Pr[false positive] ≈ d / 2^254 ≈ 0
Application to zkSNARKs:
QAP check: A(x) × B(x) - C(x) = H(x) × Z(x)
Instead of checking for all x, pick random s:
A(s) × B(s) - C(s) = H(s) × Z(s)
If the equation is false (prover is cheating):
LHS(x) - RHS(x) is non-zero polynomial
By Schwartz-Zippel:
Pr[LHS(s) = RHS(s) | LHS ≠ RHS] ≤ d / |𝔽ₚ| ≈ 0
So checking at one random point is sufficient!
Multivariate Version:
For P(x₁, ..., xₙ) with total degree d:
Pr[P(r₁, ..., rₙ) = 0] ≤ d / |S|
Example: P(x,y) = x²y + xy² - 6xy
Degree: d = 3 (highest total degree term is x²y)
Pick random (r₁, r₂) from S × S:
Pr[P(r₁, r₂) = 0] ≤ 3 / |S|
Security Parameter:
For λ-bit security, we need:
d / |𝔽| ≤ 2^(-λ)
Therefore: |𝔽| ≥ d × 2^λ
For 128-bit security and d ≈ 2^20:
|𝔽| ≥ 2^20 × 2^128 = 2^148
BN254 field: |𝔽| ≈ 2^254 ✓ (more than enough!)
Proof Sketch:
Base case (n=1): P(x) has degree d
- P(x) has at most d roots
- |S| possible values
- Pr[hit a root] = (# roots) / |S| ≤ d / |S|
Inductive case (n>1): P(x₁, ..., xₙ)
- Write P = Σ Pᵢ(x₂, ..., xₙ) × x₁^i
- Pick r₁: P(r₁, x₂, ..., xₙ) = Σ Pᵢ(x₂, ..., xₙ) × r₁^i
- Either P(r₁, x₂, ..., xₙ) ≡ 0 (probability ≤ d/|S|)
- Or apply induction on remaining variables
Code reference:
def polynomial_identity_test(P, Q, field_size):
"""
Test if P(x) = Q(x) using Schwartz-Zippel
"""
# Pick random point
r = random.randint(0, field_size - 1)
# Evaluate both polynomials
p_val = evaluate_polynomial(P, r)
q_val = evaluate_polynomial(Q, r)
# If equal at random point, likely equal everywhere
# Probability of false positive: degree(P-Q) / field_size
return p_val == q_valInterview Questions:
-
Q: Why is Schwartz-Zippel important for zkSNARKs? A: It allows us to check polynomial equations at a single random point instead of all points. This is what makes verification succinct - O(1) instead of O(d). Without it, we'd need to check the entire polynomial, defeating the purpose of SNARKs.
-
Q: What happens if the field is too small? A: If |𝔽| is small relative to d, the probability d/|𝔽| becomes non-negligible. An attacker could try many fake proofs until one passes by chance. For security, we need |𝔽| >> d, which is why we use 254-bit fields even for relatively small degree polynomials.
-
Q: Can the prover exploit knowledge of the random point? A: No, because the random point s is chosen during the trusted setup and kept secret (it's part of the toxic waste). The prover only gets [s^i]G (commitments to powers of s), not s itself. They can evaluate polynomials at s but can't choose s.
-
Q: How does this relate to the Fiat-Shamir heuristic? A: Fiat-Shamir uses hash functions to generate "random" challenges from the transcript. Schwartz-Zippel ensures that if the prover tries to cheat, the probability of the hash output landing on a point where their fake polynomial matches is negligible.
-
Q: What's the difference between Schwartz-Zippel and Fermat's Little Theorem? A: Schwartz-Zippel is about polynomial identity testing (checking if P=Q). Fermat's Little Theorem is about primality testing (checking if n is prime). Both use randomized algorithms with negligible error probability, but they solve different problems.
Q1: What's the difference between R1CS and arithmetic circuits? A: Arithmetic circuits are directed acyclic graphs (DAGs) where nodes are operations (+, ×) and edges are wires carrying values. R1CS is a flattened matrix representation of the same computation. Every gate in an arithmetic circuit becomes a constraint in R1CS. R1CS is preferred for proof systems because matrix operations are well-studied and efficient.
Q2: Why do we need three matrices (A, B, C) instead of just one? A: The three matrices allow us to express quadratic constraints: (A·w) × (B·w) = C·w. With just one matrix, we could only express linear constraints. The quadratic form is the minimal expressive power needed to represent arbitrary computations efficiently. Higher-degree constraints would be more expensive.
Q3: How do you convert a simple program like "y = x² + 3x + 5" to R1CS? A: Break it into steps:
- w[0] = 1 (constant)
- w[1] = y (public output)
- w[2] = x (private input)
- w[3] = x × x (intermediate: x²)
- w[4] = 3 × x (intermediate: 3x)
- w[5] = w[3] + w[4] + 5 (intermediate: x² + 3x + 5)
Constraints:
- w[3] = w[2] × w[2] (compute x²)
- w[4] = 3 × w[2] (compute 3x)
- w[5] = w[3] + w[4] + 5 (sum)
- w[1] = w[5] (output)
Q4: What's the computational complexity of verifying R1CS constraints? A: For m constraints and n witnesses, naive verification is O(m×n) because each constraint requires a dot product of size n. With sparse matrices (typical in practice), it's O(nnz) where nnz is the number of non-zero entries. For ProveKit's circuits, nnz << m×n, making verification much faster.
Q5: Can R1CS express division? A: Not directly. Division y = a/b is expressed as multiplication: b × y = a. We add a constraint that b × w[y] = w[a], and the witness solver computes w[y] = w[a] × w[b]^(-1). The inverse is computed using Fermat's Little Theorem: b^(-1) = b^(p-2) mod p.
Q1: Why use finite fields instead of regular integers or floating-point numbers? A: Finite fields have several critical properties: (1) All operations are deterministic and exact (no rounding errors), (2) Every non-zero element has a multiplicative inverse, (3) The field size is fixed, preventing overflow, (4) Cryptographic security relies on the discrete logarithm problem in finite fields, (5) Efficient implementation using modular arithmetic.
Q2: What's Montgomery form and why is it faster? A: Montgomery form represents x as x̃ = xR mod p where R = 2^256. Multiplication in Montgomery form: x̃ × ỹ = (x×y)R mod p, which can be computed using bit shifts instead of expensive divisions. The REDC (Montgomery reduction) algorithm replaces division by p with division by R (a power of 2), which is just a right shift. This gives ~2-3× speedup for field multiplication.
Q3: Explain Montgomery's batch inversion trick. Why is it faster? A: To invert n elements [a₁, a₂, ..., aₙ]:
- Compute products: p₁=a₁, p₂=a₁a₂, ..., pₙ=a₁a₂...aₙ (n-1 multiplications)
- Compute ONE inverse: pₙ^(-1) (1 inversion)
- Work backwards: aₙ^(-1)=pₙ^(-1)×pₙ₋₁, aₙ₋₁^(-1)=aₙ^(-1)×aₙ, ... (n multiplications) Total: 3n multiplications + 1 inversion vs n inversions. Since inversion costs ~100× multiplication, this is ~30× faster!
Q4: What's the BN254 curve and why is it used? A: BN254 (also called BN128 or alt_bn128) is a Barreto-Naehrig pairing-friendly elliptic curve with a 254-bit prime field. It's used because: (1) Efficient pairing computation (embedding degree k=12), (2) 128-bit security level, (3) Native support in Ethereum (precompiles at addresses 0x06-0x09), (4) Good balance between security and performance, (5) Widely adopted in zkSNARK libraries.
Q5: How do you handle overflow in field arithmetic? A: In software, use lazy reduction: allow intermediate results to grow beyond p, then reduce only when necessary (before overflow). For example, when adding n field elements, reduce only after every k additions where k = (2^256 - p) / p. In hardware/SIMD, use wider registers (e.g., 512-bit for 256-bit field) to hold intermediate results without overflow.
Q1: What's the difference between a witness and a proof? A: A witness is the complete set of values (public inputs + private inputs + intermediate values) that satisfy all R1CS constraints. A proof is a cryptographic object that convinces a verifier that such a witness exists, without revealing the private parts. Witness generation is deterministic (given inputs, compute all values), while proof generation involves randomness and cryptographic operations.
Q2: Why do we need layered witness solving? Can't we just solve all witnesses at once? A: Witnesses have dependencies: w[5] = w[2] × w[3] can't be computed until w[2] and w[3] are known. Layered solving respects these dependencies by organizing witnesses into layers where each layer only depends on previous layers. This is essentially topological sorting of the dependency graph. Solving out of order would require solving a system of polynomial equations, which is much harder.
Q3: What happens if a witness builder has a circular dependency? A: Circular dependencies indicate a bug in the circuit or compiler. For example, if w[5] depends on w[7] and w[7] depends on w[5], there's no valid solution. The witness scheduler should detect this during layer construction and reject the circuit. In practice, well-formed circuits from Noir compilation never have circular dependencies.
Q4: How do you handle non-deterministic operations like random number generation in circuits? A: Circuits are deterministic - same inputs always produce same outputs. For "randomness", we use: (1) Fiat-Shamir challenges derived from the transcript (deterministic but unpredictable to prover before commitment), (2) Private inputs provided by the prover (prover chooses "random" values outside the circuit), (3) Hash functions applied to inputs (deterministic but appears random).
Q5: What's the cost of witness generation compared to proof generation? A: Witness generation is typically 10-100× faster than proof generation. For a circuit with 1M constraints: witness solving takes ~100ms (mostly field arithmetic), while proof generation takes ~10s (polynomial commitments, FFTs, cryptographic operations). However, witness generation is sequential (hard to parallelize due to dependencies), while proof generation has more parallelism opportunities.
Q1: What's the difference between Merkle tree commitments and KZG commitments? A: Merkle trees commit to vectors by hashing: root = Hash(Hash(v₀,v₁), Hash(v₂,v₃), ...). Opening requires O(log n) hashes. KZG commits to polynomials using elliptic curves: C = [P(s)]G where s is from trusted setup. Opening requires 1 pairing. Merkle is transparent (no setup) but larger proofs. KZG has smaller proofs but needs trusted setup. WHIR uses Merkle trees for transparency.
Q2: Explain multilinear extension (MLE). Why is it useful? A: MLE extends a vector v of length 2^n to a polynomial f̃(x₁,...,xₙ) such that f̃(b₁,...,bₙ) = v[index] for all binary bᵢ ∈ {0,1}. It's unique and has degree 1 in each variable. Useful because: (1) Evaluation at random point binds to entire vector, (2) Sumcheck protocol works on MLEs, (3) Degree is low (n instead of 2^n), (4) Efficient evaluation using recursive formula.
Q3: What's a covector and how does it relate to polynomial evaluation? A: A covector α is a row vector that defines a linear form: α·v = Σ αᵢvᵢ. For MLE evaluation at point r, the covector is the Lagrange basis: α = [L₀(r), L₁(r), ..., L_{2^n-1}(r)]. Then f̃(r) = α·v. This converts polynomial evaluation to a dot product, which is easier to work with in proof systems.
Q4: Why use prefix covectors? What's the memory savings? A: Many covectors have trailing zeros: α = [a₀, a₁, ..., aₖ, 0, 0, ..., 0]. Prefix covectors store only [a₀, ..., aₖ] plus the full size. For k=1000, n=1M: naive storage is 32MB, prefix is 32KB (99.9% savings). This is common when evaluating MLEs at points with many leading zeros or when working with sparse polynomials.
Q5: How do you verify a Merkle tree opening? A: To prove v[i] is in the tree:
- Prover sends: v[i], sibling hashes on path to root
- Verifier recomputes: h₀ = Hash(v[i], sibling₀), h₁ = Hash(h₀, sibling₁), ..., root = Hash(hₖ, siblingₖ)
- Verifier checks: computed root == committed root Cost: O(log n) hashes. Security: collision resistance of hash function.
Q1: What's the difference between NTT and FFT? A: FFT (Fast Fourier Transform) works over complex numbers using complex roots of unity (e^(2πi/n)). NTT works over finite fields using field elements as roots of unity (ω^n = 1 mod p). NTT is preferred for cryptography because: (1) No floating-point errors, (2) Works in the same field as the rest of the computation, (3) Deterministic and exact, (4) Efficient modular arithmetic.
Q2: Why must the polynomial size be a power of 2 for NTT? A: NTT uses divide-and-conquer: split polynomial into even/odd coefficients recursively. This requires splitting in half at each level, which only works cleanly for powers of 2. Also, we need an n-th root of unity ω where ω^n = 1 and ω^(n/2) ≠ 1. For non-power-of-2 sizes, we'd need to pad with zeros or use more complex algorithms.
Q3: What's the butterfly operation in NTT? A: The core operation in NTT's divide-and-conquer:
temp = ω^k × a[j]
a[j] = a[i] - temp
a[i] = a[i] + temp
This combines results from even/odd halves. Called "butterfly" because the data flow diagram looks like butterfly wings. Each butterfly does 1 multiplication and 2 additions. Total butterflies: n log n.
Q4: How do you find a primitive n-th root of unity in a finite field? A: For field 𝔽ₚ, we need ω where ω^n = 1 and ω^k ≠ 1 for 0 < k < n. Requirements: (1) n divides p-1 (by Fermat's Little Theorem), (2) Find generator g of 𝔽ₚ*, (3) Compute ω = g^((p-1)/n). For BN254 with p-1 = 2^s × t, we can do NTT for sizes up to 2^s. BN254 supports NTT up to size 2^28.
Q5: What's the inverse NTT and why do we need it? A: Inverse NTT converts from evaluation representation back to coefficient representation. Formula: same as forward NTT but with ω^(-1) instead of ω, and divide by n at the end. Needed for polynomial multiplication: coefficients → NTT → pointwise multiply → inverse NTT → coefficients. The division by n comes from the normalization factor in the DFT formula.
Q1: Why is sumcheck called "succinct"? What's the communication complexity? A: For a sum over 2^n points, naive verification requires checking all 2^n terms. Sumcheck reduces this to n rounds, each sending a degree-d polynomial (d+1 field elements). Total communication: O(n×d) field elements, which is logarithmic in the domain size 2^n. For n=20, d=3: only 60 field elements instead of 1 million checks!
Q2: What happens if the prover sends an incorrect polynomial in round i? A: The verifier checks g_i(0) + g_i(1) = g_{i-1}(r_{i-1}). If the prover cheats, this check will fail with high probability. Even if it passes (prover got lucky), the final check f(r₁,...,rₙ) = gₙ(rₙ) will fail unless the prover can predict the random challenges, which is negligible probability by Schwartz-Zippel lemma.
Q3: How does sumcheck relate to the WHIR proof system? A: WHIR uses sumcheck to verify R1CS constraints. The sum is: Σ (A[i]·w) × (B[i]·w) - (C[i]·w) = 0 over all constraints i. This is a multilinear polynomial in the witness values. Sumcheck proves this sum is zero without revealing the witness. WHIR adds polynomial commitments and Fiat-Shamir to make it non-interactive and zero-knowledge.
Q4: Can sumcheck be parallelized? A: The rounds are inherently sequential (round i needs challenge from round i-1). However, within each round, computing the polynomial g_i can be parallelized: evaluate f at many points in parallel, then interpolate. Also, multiple sumcheck instances can run in parallel (useful for batching multiple proofs).
Q5: What's the verifier's computational cost in sumcheck? A: Per round: (1) Check g_i(0) + g_i(1) = claimed_sum (2 evaluations + 1 addition), (2) Generate random challenge (1 hash). Final: evaluate f(r₁,...,rₙ) (cost depends on f). Total: O(n) field operations + 1 evaluation of f. For R1CS, evaluating f requires O(nnz) operations where nnz is non-zeros in constraint matrices.
Q1: What's the security assumption behind Fiat-Shamir? A: The Random Oracle Model (ROM): the hash function behaves like a truly random function. In ROM, the prover cannot predict hash outputs before committing to inputs, so they can't cheat by choosing inputs that produce favorable challenges. In practice, we use cryptographic hash functions (SHA256, Poseidon2) that approximate random oracles.
Q2: What's the difference between a prover message and a prover hint? A: Prover messages are absorbed into the transcript and affect future challenges: transcript ← Hash(transcript || message). Prover hints are sent but NOT absorbed - they don't affect challenges. Hints are used for values that are independently verified (e.g., polynomial evaluations that are checked via commitments). Misusing hints for messages is a soundness bug!
Q3: Why do we need a domain separator? A: To prevent cross-protocol attacks. Without domain separator, a proof for protocol A might be valid for protocol B if they use the same transcript structure. Domain separator = Hash(protocol_name || version || parameters) ensures transcripts from different protocols/versions are incompatible. Example: "ProveKit-WHIR-v1" vs "ProveKit-WHIR-v2".
Q4: Can Fiat-Shamir be applied to any interactive proof? A: Only to public-coin protocols where the verifier's messages are random (not based on secret state). Private-coin protocols (where verifier has secrets) cannot be made non-interactive with Fiat-Shamir. Fortunately, most efficient proof systems (sumcheck, polynomial commitments) are public-coin.
Q5: What happens if the prover and verifier use different hash functions? A: They'll generate different challenges, and verification will fail. This is why the hash function must be part of the protocol specification and domain separator. ProveKit supports multiple hash functions (Skyscraper, SHA256, Keccak, Blake3) but both parties must agree on which one to use.
Q1: Why is Poseidon2 more efficient than SHA256 in ZK circuits? A: Poseidon2 is an algebraic hash designed for ZK: operations are field additions and multiplications, which map directly to R1CS constraints. SHA256 uses bitwise operations (XOR, AND, rotations) that are expensive in field arithmetic. Result: Poseidon2 needs ~100 constraints per hash, SHA256 needs ~25,000 constraints. That's 250× more efficient!
Q2: What's the security level of different hash functions? A: SHA256: 128-bit security (2^128 operations to find collision). Keccak-256: 128-bit security. Blake3: 128-bit security. Poseidon2 (BN254): ~128-bit security (depends on field size and number of rounds). Skyscraper: Custom design, security analysis ongoing. For cryptographic applications, 128-bit security is considered safe against all known attacks.
Q3: What's a sponge construction? A: A design pattern for hash functions: (1) Absorb phase: XOR input blocks into state, apply permutation, (2) Squeeze phase: output parts of state, apply permutation, repeat. State has capacity c (never output) and rate r (absorbed/output). Security: c/2 bits. Example: Keccak has 1600-bit state, 256-bit output, 1088-bit rate, 512-bit capacity → 256-bit security.
Q4: Why does ProveKit support multiple hash functions? A: Different use cases: (1) Skyscraper: fastest on mobile (SIMD-optimized), (2) SHA256: standard, widely trusted, (3) Keccak: Ethereum compatibility, (4) Blake3: modern, parallelizable, (5) Poseidon2: efficient in circuits. Users choose based on: performance needs, compatibility requirements, trust assumptions.
Q5: What's the Merkle tree depth for 1 million leaves? A: Depth = ⌈log₂(1,000,000)⌉ = 20. Each proof requires 20 sibling hashes. Verification: 20 hash computations. Storage: ~2 million nodes (2×1M - 1). For 32-byte hashes: ~64 MB for full tree. Sparse Merkle trees can reduce storage for sparse data.
Q1: What percentage of R1CS matrices are typically sparse? A: Very sparse! Typical circuits have 99-99.9% zeros. Example: 1M×1M matrix with 1M non-zeros = 99.9999% sparse. This is because each constraint only involves a few variables (usually 2-10), not all n variables. Sparse representation reduces 1TB (dense) to 1MB (sparse) - 1 million× savings!
Q2: What's CSR format and why is it better than COO (coordinate) format? A: COO stores (row, col, value) triples - simple but redundant. CSR (Compressed Sparse Row) stores: row_starts (where each row begins), col_indices, values. For m rows, n cols, nnz non-zeros: COO uses 3×nnz integers, CSR uses m+2×nnz integers. For m << nnz, CSR is more compact. Also, CSR enables faster row-wise operations (common in matrix-vector multiply).
Q3: Explain delta encoding. When does it help? A: Within each row, store column indices as differences: [5, 17, 100, 105] → [5, 12, 83, 5]. Smaller numbers compress better with variable-length encoding. Helps when: (1) Non-zeros are clustered (common in structured matrices), (2) Using compression (postcard, gzip), (3) Cache-friendly (smaller data fits in cache). Typical savings: 30-50% on column indices.
Q4: How do you transpose a sparse matrix efficiently? A: Can't just swap row/col in CSR (would need random access). Algorithm: (1) Collect all (row, col, value) entries, (2) Sort by (col, row) - this is the transposed order, (3) Rebuild CSR structure with swapped dimensions. Use parallel sort for large matrices. Complexity: O(nnz log nnz) for sort, O(nnz) for rebuild.
Q5: What's the cache performance of sparse matrix-vector multiply? A: Depends on sparsity pattern. Best case (sequential columns): cache-friendly, ~90% peak performance. Worst case (random columns): cache misses on every access, ~10% peak performance. Optimization: (1) Reorder columns for locality, (2) Block sparse format for better cache reuse, (3) Prefetching for predictable patterns. ProveKit uses parallel iteration to hide latency.
Q1: What's the peak memory usage during proving without optimization? A: For a large circuit: R1CS matrices (200 MB) + witness vectors (50 MB) + Merkle tree (100 MB) + intermediate values (50 MB) + proof data (50 MB) = 450 MB. With optimization (compression, streaming): peak is ~200 MB (the R1CS size during sumcheck). That's 2.25× reduction!
Q2: Why compress R1CS during proving? Doesn't decompression take time? A: Yes, but it's worth it! Compression: ~100ms, saves 180 MB. Decompression: ~100ms. Total overhead: 200ms. Benefit: enables proving on devices with limited RAM (phones with 2-4 GB). Without compression, the prover would crash due to out-of-memory. Time vs space tradeoff favors space for mobile.
Q3: What's postcard serialization and why use it? A: Postcard is a Rust serialization format: compact, fast, no schema. Compared to JSON: 10× smaller, 100× faster. Compared to bincode: similar size, slightly faster. Used for: (1) Compressing R1CS/witness data, (2) Proof serialization, (3) FFI data transfer. Alternative: protobuf (more portable but slower).
Q4: How do prefix covectors reduce memory in polynomial commitments? A: Full covector: n field elements × 32 bytes. Prefix covector: k field elements × 32 bytes (where k << n). For n=1M, k=1K: 32 MB → 32 KB (1000× savings). This is common because: (1) MLE evaluation at points with leading zeros, (2) Sparse polynomials, (3) Structured matrices. Enables committing to large polynomials on mobile.
Q5: What's the memory bandwidth bottleneck in proving? A: Modern CPUs: ~50 GB/s memory bandwidth. Field operations: ~10 cycles per operation. For 1M constraints: ~100M field operations = ~1 second of compute, but ~3 GB of data movement = ~60ms of bandwidth. Bandwidth is NOT the bottleneck for typical circuits. Bottleneck is: (1) Polynomial commitments (compute-bound), (2) FFTs (cache-bound), (3) Hash functions (compute-bound).
Q1: Why is recursive verification needed? Can't we just verify WHIR proofs on-chain? A: WHIR verification requires: (1) Merkle tree verification (~20 hashes), (2) Sumcheck verification (~20 rounds), (3) Field arithmetic in BN254. On Ethereum: ~10M gas, ~$100 at 50 gwei. Groth16 verification: 3 pairings, ~300K gas, ~$3. That's 30× cheaper! For applications with many proofs, this saves millions in gas costs.
Q2: What's the constraint count for verifying a WHIR proof in a circuit? A: Rough breakdown: (1) Merkle tree verification: 20 hashes × 25K constraints = 500K, (2) Sumcheck: 20 rounds × 10K constraints = 200K, (3) Field arithmetic: 100K, (4) Polynomial evaluation: 200K. Total: ~1M constraints. This becomes the Groth16 circuit size. Proving time: ~10 seconds. Verification: ~300K gas.
Q3: Why use Gnark (Go) instead of implementing Groth16 in Rust? A: Gnark is mature, well-tested, and optimized for Groth16. It has: (1) Efficient pairing implementation, (2) Ethereum-compatible output, (3) Trusted setup tools, (4) Active development. Reimplementing in Rust would take months and risk bugs. ProveKit uses FFI to call Gnark from Rust - best of both worlds.
Q4: What's the trusted setup size for recursive verification? A: For 1M constraint circuit: Proving key (PK) ~1 GB, Verification key (VK) ~1 KB. PK must be downloaded once by provers. VK is embedded in smart contracts. Setup ceremony: ~100 participants, ~1 week. This is circuit-specific - if the WHIR verifier changes, need new setup. PLONK would use universal setup (reusable).
Q5: Can you recursively verify Groth16 proofs? A: Yes! Verify Groth16 proof inside another Groth16 circuit. This creates proof composition: prove "I verified a proof that verified a proof that...". Used for: (1) Proof aggregation (combine many proofs into one), (2) Incrementally verifiable computation (IVC), (3) Blockchain scalability (rollups). Challenge: pairing verification in circuits is expensive (~10M constraints).
Q1: What's ACIR and how does it differ from R1CS? A: ACIR (Abstract Circuit Intermediate Representation) is Noir's IR: higher-level than R1CS, with opcodes like Mul, AssertZero, BlackBoxFuncCall. R1CS is lower-level: just matrices and constraints. ACIR → R1CS compilation: (1) Each ACIR opcode becomes one or more R1CS constraints, (2) Witness builders are generated for each operation, (3) Optimizations are applied (batching, deduplication).
Q2: What are black box functions and why are they special? A: Black box functions are optimized implementations of common operations: Poseidon2, SHA256, ECDSA, Schnorr, Pedersen. Instead of compiling to R1CS naively (expensive), they use specialized constraint systems. Example: SHA256 black box uses ~20K constraints vs ~100K for naive compilation. Implemented in the backend (ProveKit), not in Noir.
Q3: How does binop batching work? A: Without batching: each AND/XOR/OR operation is a separate constraint. With batching: (1) Collect all binops of same type, (2) Create lookup table for that operation, (3) Each binop becomes a table lookup (cheaper). Example: 100 AND operations: 100 constraints → 1 table + 100 lookups (~50 constraints). Savings: 50%.
Q4: What's the difference between public and private inputs in Noir?
A: Public inputs (marked pub): visible to verifier, part of the statement being proved. Private inputs: hidden from verifier, part of the witness. In R1CS: public inputs are at positions w[1..k], private inputs at w[k+1..n]. Verification checks that public inputs match the claimed values. Privacy: only private inputs are zero-knowledge.
Q5: How do you debug a Noir circuit that fails to compile?
A: Common issues: (1) Type errors: check field element vs integer types, (2) Array bounds: ensure indices are in range, (3) Constraint violations: check assertions are satisfiable, (4) Recursion: Noir doesn't support recursion (use loops instead), (5) Unsupported operations: some Rust features aren't available. Use nargo check for type checking, nargo test for unit tests, nargo prove to see constraint violations.
Q1: How does DKIM prevent email spoofing? A: DKIM adds a cryptographic signature to email headers. Sender's mail server signs: signature = RSA_sign(SHA256(headers + body_hash), private_key). Receiver verifies using public key from DNS. If signature is valid, email is authentic. Attacker can't forge signature without private key (which only sender has). This proves the email came from the claimed domain.
Q2: What are the trust assumptions in DKIM-based ZK proofs? A: Three trust points: (1) DNS: public key fetched from DNS must be correct (trust DNS infrastructure), (2) Sending mailserver: must not sign fake emails (trust domain owner), (3) Receiving mailserver: could modify email before user sees it (trust email provider). ZK Email proves "this email was signed by domain X" but doesn't prove "user actually sent this email".
Q3: Why use RSA instead of ECDSA for DKIM? A: Historical reasons: DKIM was designed in 2007 when RSA was standard. RSA is: (1) Widely supported, (2) Simple to implement, (3) Compatible with existing infrastructure. Downside: RSA signatures are large (2048 bits) and expensive to verify in ZK circuits (~10K constraints). ECDSA would be more efficient (~1.5K constraints) but requires updating DKIM standard.
Q4: How do you handle email headers with different encodings? A: DKIM specifies canonicalization: (1) Simple: use headers as-is, (2) Relaxed: normalize whitespace, lowercase header names, trim values. Circuit must implement same canonicalization. Challenge: variable-length headers, different character encodings (UTF-8, quoted-printable). Solution: parse headers in circuit, apply canonicalization rules, then hash.
Q5: What happens if the DKIM public key is rotated? A: Email providers rotate keys periodically (every 6-12 months). Old emails signed with old keys become unverifiable if old key is removed from DNS. Solutions: (1) Archive old public keys, (2) Use key selectors (multiple keys active simultaneously), (3) Timestamp proofs (prove email was valid at time of signing). ZK Email circuits must handle multiple keys.
Q1: What's the constraint count for a typical email regex? A: Depends on regex complexity. Simple regex (match "@gmail.com"): ~100 constraints. Complex regex (extract username, domain, validate format): ~1K-10K constraints. Each character position needs: (1) State transition logic (~5-10 constraints), (2) Character range checks (~5 constraints), (3) Capture group logic (~5 constraints). For 1000-character email: ~10K-100K constraints.
Q2: How do you handle variable-length strings in regex circuits? A: Use maximum length and padding. Circuit processes all positions up to max_length. For each position i: is_valid[i] = (i < actual_length). Constraints are conditional: if is_valid[i] then check_character[i]. Unused positions are filled with zeros. Tradeoff: larger max_length = more constraints but handles longer strings.
Q3: What's the difference between DFA and NFA for regex? A: NFA (Non-deterministic Finite Automaton): can be in multiple states simultaneously, epsilon transitions allowed. DFA (Deterministic): exactly one state at a time, no epsilon transitions. For circuits, DFA is preferred: (1) Simpler logic (one state variable), (2) Fewer constraints, (3) Easier to optimize. NFA → DFA conversion is done offline (exponential blowup in worst case, but usually manageable).
Q4: How do you extract captured groups from regex matches? A: Use reveal masks. For each position i: reveal[i] = (state in capturing_states) ? 1 : 0. Revealed character: revealed[i] = reveal[i] × input[i]. Public output: revealed array (zeros except captured parts). Verifier sees only captured text, not full input. Multiple capture groups: use separate reveal masks for each group.
Q5: Can you use lookahead/lookbehind in ZK regex? A: Lookahead/lookbehind require backtracking or multiple passes. In circuits: (1) Positive lookahead: check condition without consuming input (add state for lookahead), (2) Negative lookahead: check condition fails (add negation logic), (3) Lookbehind: store previous characters (add memory). Cost: ~2-5× more constraints. Most ZK regex tools support limited lookahead/lookbehind.
Q1: Why is RSA verification expensive in ZK circuits? A: RSA uses 2048-bit numbers, but BN254 field is 254 bits. Must use big integer arithmetic: (1) Split into 16 chunks of 128 bits each, (2) Each multiplication: 16×16 = 256 chunk multiplications, (3) Modular reduction: ~256 operations, (4) Exponentiation: ~17 multiplications (for e=65537). Total: ~5K-10K constraints. Compare to ECDSA: ~1.5K constraints (native field operations).
Q2: What's PKCS#1 padding and why must it be verified? A: PKCS#1 v1.5 padding format: 0x0001 || 0xFF...FF || 0x00 || DigestInfo || hash. Prevents attacks: (1) Without padding, attacker could forge signatures for small messages, (2) DigestInfo specifies hash algorithm (prevents algorithm substitution), (3) 0xFF padding ensures signature is large (prevents low-exponent attacks). Circuit must check all padding bytes are correct.
Q3: How do you optimize RSA exponentiation for e=65537? A: 65537 = 2^16 + 1 (only 2 bits set). Use binary exponentiation: (1) Compute s^2, s^4, s^8, ..., s^(2^16) (16 squarings), (2) Multiply s^(2^16) × s (1 multiplication). Total: 17 operations instead of 65537. This is why e=65537 is standard - it's the largest Fermat prime (2^(2^k) + 1), giving good security with minimal operations.
Q4: What's Barrett reduction and why use it? A: Barrett reduction computes x mod n using precomputed μ = ⌊2^(2k)/n⌋. Algorithm: q ≈ ⌊x×μ / 2^(2k)⌋, r = x - q×n, adjust r if needed. Replaces expensive division with multiplication (which is faster). In circuits: division is very expensive (requires inverse), multiplication is cheap. Barrett reduction: ~256 constraints vs ~1000 for naive division.
Q5: Can you use smaller RSA keys (1024-bit) to reduce constraints? A: Technically yes, but 1024-bit RSA is considered weak (broken by nation-states). NIST recommends 2048-bit minimum. For ZK Email: must match DKIM standard (2048-bit). Reducing to 1024-bit would: (1) Halve constraint count (~2.5K-5K), (2) Compromise security (not acceptable). Better optimization: use ECDSA for new protocols (but can't change DKIM).
Q1: Why is SHA256 expensive in ZK circuits? A: SHA256 uses bitwise operations (XOR, AND, NOT, rotate) that don't map well to field arithmetic. Each bitwise operation requires: (1) Decompose field elements to bits (~8 constraints per byte), (2) Perform operation on bits (~1-3 constraints per bit), (3) Recompose to field elements (~8 constraints per byte). Result: ~25K constraints per SHA256 block (512 bits). Compare to Poseidon2: ~100 constraints.
Q2: Explain partial hashing. How much does it save? A: SHA256 processes 512-bit blocks sequentially: H₀ → H₁ → H₂ → ... → Hₙ. Partial hashing: compute H₀...Hₖ outside circuit (fast), only compute Hₖ₊₁...Hₙ inside circuit (slow). For 100KB email (1563 blocks), process last 63 blocks in circuit: 1563 blocks → 63 blocks = 96% reduction in constraints!
Q3: How do you handle variable-length messages in SHA256? A: SHA256 padding: message || 0x80 || 0x00...00 || length (64 bits). In circuit: (1) Use selector pattern: byte[i] = is_message[i]×message[i] + is_padding[i]×0x80 + is_zero[i]×0x00 + is_length[i]×length_byte[i], (2) Each selector is a constraint, (3) Process all blocks up to max_length. Cost: ~100 constraints per byte for selectors + ~25K per block for hashing.
Q4: What's the difference between SHA256 and SHA512 in circuits? A: SHA256: 32-bit words, 64 rounds, 256-bit output. SHA512: 64-bit words, 80 rounds, 512-bit output. In circuits: SHA512 is ~2× more expensive (larger words, more rounds). But SHA512 has advantage: can process 1024-bit blocks (vs 512-bit for SHA256), so fewer blocks for same message. Tradeoff: 2× per block but 0.5× blocks = similar total cost.
Q5: Can you use SHA256 hardware acceleration in ZK circuits? A: No! Hardware acceleration (Intel SHA extensions, ARM crypto) speeds up native SHA256 computation, but doesn't help with circuit constraints. Circuit must prove the computation was done correctly, which requires expressing every operation as constraints. Hardware acceleration helps: (1) Witness generation (compute hash outside circuit), (2) Verifier (check hash outside circuit), but not proof generation.
Q1: What's the relationship between R1CS and QAP? A: R1CS is the constraint representation, QAP is the polynomial representation. Conversion: (1) Each variable j gets 3 polynomials Aⱼ(x), Bⱼ(x), Cⱼ(x), (2) These polynomials interpolate the constraint matrices at points x=1,2,...,m, (3) Checking all m constraints reduces to checking one polynomial equation. QAP enables succinct proofs via polynomial commitments.
Q2: Why is polynomial division important in QAP? A: The QAP check is: A(x)×B(x) - C(x) = H(x)×Z(x). If all constraints are satisfied, the LHS is zero at x=1,2,...,m, so it's divisible by Z(x)=(x-1)(x-2)...(x-m). Computing H(x) = (A(x)×B(x) - C(x)) / Z(x) proves all constraints are satisfied. If any constraint fails, division leaves a remainder, and verification fails.
Q3: What's the degree of H(x) in QAP? A: A(x), B(x), C(x) have degree m-1 (interpolating m points). A(x)×B(x) has degree 2(m-1) = 2m-2. Z(x) has degree m. Therefore H(x) = (A(x)×B(x) - C(x)) / Z(x) has degree (2m-2) - m = m-2. This is important for trusted setup: need powers of τ up to degree 2m-2.
Q4: How does QAP enable zero-knowledge? A: Add random blinding factors: A'(x) = A(x) + r₁×Z(x), B'(x) = B(x) + r₂×Z(x), C'(x) = C(x) + r₃×Z(x) where r₁, r₂, r₃ are random. The equation still holds: A'(x)×B'(x) - C'(x) = H'(x)×Z(x) for some H'(x). But A'(x), B'(x), C'(x) are randomized, hiding the witness. This is the "ZK" in zkSNARK.
Q5: Can QAP be used with other proof systems besides Groth16? A: Yes! QAP is used in: (1) Groth16 (most efficient, circuit-specific setup), (2) Pinocchio (earlier SNARK), (3) GGPR (original QAP-based SNARK). But modern systems use different arithmetizations: (4) PLONK uses permutation arguments, (5) STARKs use AIR (Algebraic Intermediate Representation), (6) Bulletproofs use inner product arguments. QAP is specific to pairing-based SNARKs.
Q1: What makes a pairing "bilinear"? A: Bilinearity: e(aP, bQ) = e(P, Q)^(ab). This means: (1) Linear in first argument: e(P₁+P₂, Q) = e(P₁,Q)×e(P₂,Q), (2) Linear in second argument: e(P, Q₁+Q₂) = e(P,Q₁)×e(P,Q₂), (3) Scalar multiplication: e(aP, Q) = e(P, aQ) = e(P,Q)^a. This property enables checking polynomial equations on committed values.
Q2: Why can't we use pairings on any elliptic curve? A: Need pairing-friendly curves with: (1) Embedding degree k (smallest k where r divides p^k - 1), (2) Efficient pairing computation (k not too large), (3) Security (k large enough to prevent attacks). BN254 has k=12 (good balance). Secp256k1 (Bitcoin) has k > 10^70 (pairings infeasible). Only special curves work.
Q3: What's the computational cost of a pairing? A: On BN254: ~10-20ms per pairing (depends on implementation). Breakdown: (1) Miller loop: ~70% of time, (2) Final exponentiation: ~30%. Multi-pairing optimization: e(P₁,Q₁)×e(P₂,Q₂)×... can share final exponentiation, saving ~30% per additional pairing. Groth16 uses 3 pairings but computes as 1 multi-pairing.
Q4: What's the difference between Type-1, Type-2, and Type-3 pairings? A: Type-1: G₁ = G₂ (symmetric), e: G × G → Gₜ. Type-2: G₁ ≠ G₂ but efficient homomorphism G₂ → G₁. Type-3: G₁ ≠ G₂, no efficient homomorphism (asymmetric). BN254 is Type-3: most efficient but requires careful protocol design (can't move points between G₁ and G₂). Groth16 uses Type-3.
Q5: How do pairings enable succinct verification? A: Pairings check polynomial equations on committed values. Example: verify A(s)×B(s) = C(s) using e([A(s)]₁, [B(s)]₂) = e([C(s)]₁, G₂). Verifier doesn't need to know A(s), B(s), C(s) - just their commitments. This is O(1) verification regardless of polynomial degree. Without pairings, verifier would need to evaluate polynomials (O(degree) operations).
Q1: What exactly is the "toxic waste" in a trusted setup? A: The secret random value τ (tau) used to generate the SRS (Structured Reference String). The SRS contains [τ⁰]G, [τ¹]G, [τ²]G, ..., [τᵈ]G. If τ is known, an attacker can: (1) Compute [P(τ)]G for any polynomial P, (2) Create fake proofs that verify, (3) Break the binding property of commitments. τ must be destroyed after setup - hence "toxic waste".
Q2: How does multi-party computation (MPC) make trusted setup secure? A: Each participant i contributes τᵢ. Final τ = τ₁×τ₂×...×τₙ. Security: as long as ONE participant destroys their τᵢ, the final τ is unknown to everyone. With 100 participants, attacker must compromise ALL 100 (probability ≈ 0). This is why large ceremonies (Zcash: 90+, Ethereum: 140K+) are considered secure.
Q3: What's the difference between Groth16 and PLONK trusted setups? A: Groth16: circuit-specific setup. Each circuit needs its own ceremony. SRS depends on circuit structure. PLONK: universal setup. One ceremony for all circuits up to size d. SRS is just powers of τ, independent of circuit. Tradeoff: Groth16 has smaller proofs (~200 bytes) but needs new setup per circuit. PLONK has larger proofs (~400 bytes) but reuses setup.
Q4: Can you update a trusted setup without starting over? A: Yes, for universal setups (PLONK, KZG). To extend from degree d to degree 2d: (1) New participant generates τ', (2) Computes [τ'^i × τ^i]G for i=d+1 to 2d, (3) Appends to existing SRS. This is called "perpetual powers of tau". For circuit-specific setups (Groth16), must redo entire ceremony if circuit changes.
Q5: What happens if a trusted setup is compromised years later? A: All proofs created with that setup become potentially forgeable. Attacker can: (1) Create fake proofs for any statement, (2) Break privacy (extract witnesses from proofs), (3) Undermine trust in the system. This is a long-term security risk. Mitigations: (1) Use MPC with many participants, (2) Transparent proof systems (STARKs, Bulletproofs, WHIR), (3) Periodic re-setup with new ceremonies.
Q1: Why is Schwartz-Zippel fundamental to zkSNARKs? A: It enables succinct verification. Instead of checking polynomial P(x) = Q(x) for all x (infinite checks), check at one random point r. If P ≠ Q, probability of P(r) = Q(r) is ≤ d/|𝔽| ≈ 0 for large fields. This reduces verification from O(degree) to O(1). Without Schwartz-Zippel, we couldn't have "succinct" in SNARK.
Q2: What's the security parameter calculation for Schwartz-Zippel? A: For λ-bit security, need: Pr[false positive] ≤ 2^(-λ). By Schwartz-Zippel: d/|𝔽| ≤ 2^(-λ), so |𝔽| ≥ d×2^λ. For 128-bit security and d=2^20 (1M degree): |𝔽| ≥ 2^148. BN254 has |𝔽| ≈ 2^254, giving 128-bit security even for degree 2^126 polynomials!
Q3: Does Schwartz-Zippel work for multivariate polynomials? A: Yes! For P(x₁,...,xₙ) with total degree d, picking random (r₁,...,rₙ): Pr[P(r₁,...,rₙ) = 0] ≤ d/|S|. Total degree is the maximum sum of exponents in any term. Example: P(x,y) = x²y + xy² has total degree 3. This is crucial for sumcheck protocol which uses multivariate polynomials.
Q4: What if the random point is not truly random? A: Security breaks! If prover can predict or influence the random point, they can craft a fake polynomial that matches at that specific point. This is why: (1) Trusted setup uses secret τ (prover never knows it), (2) Fiat-Shamir uses cryptographic hashes (unpredictable), (3) Interactive protocols use verifier's randomness (prover can't control it).
Q5: How does Schwartz-Zippel relate to the Fiat-Shamir heuristic? A: Fiat-Shamir generates "random" challenges via hashing: r = Hash(transcript). Schwartz-Zippel ensures that if prover cheats (uses wrong polynomial), the probability that Hash output lands on a point where fake polynomial matches is ≤ d/|𝔽|. Combined: Fiat-Shamir provides unpredictability, Schwartz-Zippel provides soundness guarantee.
| Feature | Groth16 | PLONK | STARK | WHIR |
|---|---|---|---|---|
| Proof Size | ~200 bytes | ~400 bytes | ~100-200 KB | ~100 KB |
| Verification Time | ~2ms (3 pairings) | ~5ms (1 pairing) | ~10-50ms | ~50-100ms |
| Prover Time | Fast (~10s for 1M constraints) | Medium (~20s) | Slow (~60s) | Medium (~30s) |
| Trusted Setup | Yes (circuit-specific) | Yes (universal) | No (transparent) | No (transparent) |
| Post-Quantum | No (relies on pairings) | No (relies on pairings) | Yes (hash-based) | Yes (hash-based) |
| Recursion | Expensive (~10M constraints) | Efficient (~100K constraints) | Native | Efficient |
| Memory Usage | Low (~1 GB) | Medium (~2 GB) | High (~10 GB) | Low (~500 MB) |
| Mobile-Friendly | Yes | Moderate | No | Yes (ProveKit!) |
| Ethereum Gas | ~300K gas | ~400K gas | ~5M gas | ~5M gas (via Groth16 wrapper) |
When to use each:
- Groth16: Smallest proofs, fastest verification, on-chain verification, don't mind circuit-specific setup
- PLONK: Universal setup (reusable), custom gates, moderate proof size
- STARK: No trusted setup, post-quantum security, don't mind large proofs
- WHIR: Mobile devices, no trusted setup, memory-constrained environments
| Hash Function | Constraints per Hash | Security Level | Use Case |
|---|---|---|---|
| Poseidon2 | ~100 | 128-bit | In-circuit hashing (Merkle trees, commitments) |
| Rescue | ~150 | 128-bit | Alternative algebraic hash |
| SHA256 | ~25,000 | 128-bit | DKIM verification, Bitcoin compatibility |
| Keccak | ~30,000 | 128-bit | Ethereum compatibility |
| Blake2 | ~20,000 | 128-bit | General purpose |
| Blake3 | ~20,000 | 128-bit | Modern, parallelizable |
| Pedersen | ~500 | 128-bit | Commitments (not collision-resistant) |
| MiMC | ~200 | 128-bit | Older algebraic hash |
Key Insights:
- Algebraic hashes (Poseidon2, Rescue, MiMC) are 100-250× more efficient than traditional hashes
- Use algebraic hashes for in-circuit operations (Merkle trees, commitments)
- Use traditional hashes only when required for compatibility (DKIM, Ethereum)
- Poseidon2 is the current state-of-the-art for ZK circuits
| Signature Scheme | Constraints | Key Size | Signature Size | Use Case |
|---|---|---|---|---|
| ECDSA (secp256k1) | ~1,500 | 32 bytes | 64 bytes | Bitcoin, Ethereum |
| ECDSA (BN254) | ~500 | 32 bytes | 64 bytes | Native curve (most efficient) |
| EdDSA (Ed25519) | ~2,000 | 32 bytes | 64 bytes | Modern, deterministic |
| Schnorr | ~1,000 | 32 bytes | 64 bytes | Simple, provably secure |
| RSA-2048 | ~10,000 | 256 bytes | 256 bytes | DKIM (legacy) |
| RSA-4096 | ~40,000 | 512 bytes | 512 bytes | High security (overkill) |
| BLS | ~500 | 48 bytes | 96 bytes | Aggregatable signatures |
Key Insights:
- ECDSA on native curve (BN254) is most efficient: ~500 constraints
- ECDSA on foreign curves (secp256k1) requires non-native field arithmetic: ~1,500 constraints
- RSA is 10-20× more expensive than ECDSA but required for DKIM
- EdDSA is deterministic (no random nonce) but slightly more expensive than ECDSA
- BLS signatures can be aggregated (many signatures → one signature)
| Operation | Cycles (x86) | Cycles (ARM) | Notes |
|---|---|---|---|
| Addition | ~5 | ~8 | Cheapest operation |
| Subtraction | ~5 | ~8 | Same as addition |
| Multiplication | ~50 | ~80 | 10× more expensive than addition |
| Squaring | ~40 | ~65 | Slightly cheaper than multiplication |
| Inversion | ~5,000 | ~8,000 | 100× more expensive than multiplication |
| Batch Inversion (n=100) | ~200 per element | ~320 per element | 25× faster than individual inversions |
| Exponentiation | ~10,000 | ~16,000 | Depends on exponent size |
Optimization Strategies:
- Avoid inversions: Use batch inversion (Montgomery's trick) when possible
- Prefer squaring: x² is cheaper than x×y
- Lazy reduction: Don't reduce modulo p after every operation
- SIMD: Use vector instructions (AVX2, NEON) for parallel operations
- Montgomery form: Faster multiplication via bit shifts
Circuit: 1M constraints, 500K witnesses, 10K public inputs
| Component | Without Optimization | With Optimization | Savings |
|---|---|---|---|
| R1CS Matrices | 200 MB (dense) | 20 MB (sparse + compressed) | 90% |
| Witness Vector | 16 MB | 16 MB | 0% |
| Witness Builders | 50 MB | 5 MB (compressed) | 90% |
| Merkle Tree | 100 MB | 32 bytes (root only) | 99.99% |
| Polynomial Commitments | 80 MB | 8 MB (prefix covectors) | 90% |
| Intermediate Values | 50 MB | 10 MB (streaming) | 80% |
| Proof Data | 100 KB | 100 KB | 0% |
| Total Peak | 496 MB | 59 MB | 88% |
ProveKit Optimizations:
- Compress R1CS during witness solving (saves 180 MB)
- Free Merkle tree after commitment (saves 100 MB)
- Use prefix covectors for sparse polynomials (saves 72 MB)
- Stream intermediate values (saves 40 MB)
- Compress witness builders (saves 45 MB)
Result: Can prove 1M constraint circuits on phones with 2-4 GB RAM!
1. Unconstrained Witness Values
// BAD: x is not constrained
fn main(x: Field) {
let y = x * x;
// y is computed but never used
}
// GOOD: Constrain the output
fn main(x: Field) -> pub Field {
let y = x * x;
y // This creates a constraint
}2. Integer Overflow
// BAD: Can overflow field modulus
fn main(a: Field, b: Field) {
let c = a + b; // If a + b > p, wraps around
}
// GOOD: Add range checks
fn main(a: Field, b: Field) {
assert(a < 2.pow_32(16)); // a < 2^16
assert(b < 2.pow_32(16)); // b < 2^16
let c = a + b; // Now guaranteed < 2^17
}3. Division by Zero
// BAD: No check for zero divisor
fn main(a: Field, b: Field) {
let c = a / b; // Fails if b = 0
}
// GOOD: Check divisor is non-zero
fn main(a: Field, b: Field) {
assert(b != 0);
let c = a / b;
}4. Array Out of Bounds
// BAD: Index not validated
fn main(arr: [Field; 10], idx: Field) {
let val = arr[idx]; // Fails if idx >= 10
}
// GOOD: Validate index
fn main(arr: [Field; 10], idx: Field) {
assert(idx < 10);
let val = arr[idx];
}5. Incorrect Public Input Ordering
// BAD: Public inputs in wrong order
fn main(x: Field, y: pub Field, z: pub Field) {
// Verifier expects [y, z] but might provide [z, y]
}
// GOOD: Document expected order
// Public inputs: [y, z] in that order
fn main(x: Field, y: pub Field, z: pub Field) {
// Clear documentation prevents confusion
}1. Constraint Reduction
- Use lookup tables for repeated operations (AND, XOR, range checks)
- Batch similar operations (binop batching, range check batching)
- Use black box functions for expensive operations (SHA256, ECDSA)
- Minimize public inputs (each public input adds constraints)
2. Witness Generation Optimization
- Use batch inversion (Montgomery's trick) for multiple inversions
- Parallelize independent witness computations
- Cache intermediate values to avoid recomputation
- Use lazy evaluation for conditional branches
3. Memory Optimization
- Compress R1CS matrices during proving
- Use sparse matrix representation (CSR format)
- Stream large data structures instead of loading all at once
- Free memory immediately after use (Merkle trees, intermediate values)
4. Polynomial Commitment Optimization
- Use prefix covectors for sparse polynomials
- Batch multiple commitments together
- Use offset covectors for structured sparsity
- Precompute Lagrange basis polynomials
5. FFT/NTT Optimization
- Use radix-4 or radix-8 FFT for better cache locality
- Parallelize FFT butterflies across cores
- Use SIMD instructions (AVX2, NEON) for vector operations
- Precompute twiddle factors (roots of unity)
6. Hash Function Optimization
- Use algebraic hashes (Poseidon2) instead of SHA256 when possible
- Batch multiple hashes together
- Use SIMD-accelerated implementations (Skyscraper)
- Precompute round constants and MDS matrices
1. Soundness Bugs
- Unconstrained witnesses: Values computed but not constrained
- Missing range checks: Field elements treated as integers without bounds
- Incorrect constraint logic: Constraints don't enforce intended property
- Hint vs message confusion: Using hints instead of prover messages in Fiat-Shamir
2. Completeness Bugs
- Unsatisfiable constraints: Valid inputs rejected due to overly strict constraints
- Numerical instability: Constraints fail due to rounding or overflow
- Missing edge cases: Constraints don't handle boundary conditions
3. Zero-Knowledge Leaks
- Public input leakage: Private data accidentally marked as public
- Constraint leakage: Constraint structure reveals information about witness
- Timing attacks: Proving time depends on private inputs
- Memory access patterns: Cache timing reveals witness values
4. Trusted Setup Vulnerabilities
- Toxic waste exposure: Secret τ not properly destroyed
- Insufficient participants: MPC ceremony with too few participants
- Compromised participants: Attacker controls multiple ceremony participants
- Setup reuse: Using same setup for different circuits (Groth16)
5. Implementation Bugs
- Field arithmetic errors: Incorrect modular reduction or overflow
- Transcript inconsistency: Prover and verifier use different transcript construction
- Serialization bugs: Proof deserialization doesn't validate all fields
- Memory safety: Buffer overflows, use-after-free in FFI code
Best Practices:
- Audit circuits with formal verification tools
- Use well-tested libraries (arkworks, gnark, circom)
- Add comprehensive test cases including edge cases
- Use fuzzing to find unexpected inputs
- Have security experts review critical circuits
- Use transparent proof systems when possible (avoid trusted setup)
Exercise 1: R1CS Construction
Given the computation y = (x + 3) × (x - 2), construct the R1CS representation:
- Define witness vector w
- Write constraint matrices A, B, C
- Verify constraints are satisfied for x = 5
Exercise 2: Witness Solving Given R1CS constraints and public inputs, implement a witness solver:
- Handle dependencies between witnesses
- Implement batch inversion for efficiency
- Test with various input values
Exercise 3: Polynomial Commitment Implement a simple Merkle tree commitment scheme:
- Build tree from vector of field elements
- Generate opening proof for specific index
- Verify opening proof
Exercise 4: Sumcheck Protocol Implement a simple sumcheck protocol for univariate polynomial:
- Prover sends polynomial coefficients each round
- Verifier checks consistency
- Final evaluation check
Exercise 5: Fiat-Shamir Transformation Convert an interactive protocol to non-interactive:
- Implement transcript management
- Generate challenges via hashing
- Ensure deterministic challenge generation
Exercise 6: Circuit Optimization Optimize a Noir circuit:
- Identify unconstrained witnesses
- Add necessary range checks
- Use lookup tables for repeated operations
- Measure constraint count before/after
Foundational Papers:
- Groth16: "On the Size of Pairing-based Non-interactive Arguments" by Jens Groth (2016)
- PLONK: "PLONK: Permutations over Lagrange-bases for Oecumenical Noninteractive arguments of Knowledge" (2019)
- STARKs: "Scalable, transparent, and post-quantum secure computational integrity" by Ben-Sasson et al. (2018)
- Sumcheck: "Efficient Arguments without Short PCPs" by Kilian (1992)
- QAP: "Quadratic Span Programs and Succinct NIZKs without PCPs" by Gennaro et al. (2013)
Books:
- "Proofs, Arguments, and Zero-Knowledge" by Justin Thaler (2022) - Comprehensive modern treatment
- "A Graduate Course in Applied Cryptography" by Boneh and Shoup - Chapter on zero-knowledge
- "Mathematics of Public Key Cryptography" by Galbraith - Elliptic curves and pairings
Online Courses:
- ZK Whiteboard Sessions by ZK Hack - Video series on ZK fundamentals
- MIT 6.875 Cryptography - Lecture notes on zero-knowledge proofs
- Stanford CS355 Topics in Cryptography - Advanced ZK topics
Interactive Tools:
- zkREPL - Interactive Noir playground
- Circom Playground - Test Circom circuits online
- ZK Bug Tracker - Database of known ZK vulnerabilities
Community Resources:
- ZK Podcast - Interviews with researchers and practitioners
- ZK Research Forum - Technical discussions
- Awesome Zero-Knowledge - Curated list of ZK resources
- ZK Hack - Hackathons and workshops
Implementation References:
- arkworks - Rust library for ZK (used by ProveKit)
- gnark - Go library for ZK (used for Groth16 wrapper)
- circom - Circuit language and compiler
- libsnark - C++ library for zkSNARKs
ACIR: Abstract Circuit Intermediate Representation - Noir's intermediate language between source code and R1CS
Arithmetization: Converting a computation into a mathematical representation (R1CS, QAP, AIR, etc.)
Bilinear Map: A function e: G₁ × G₂ → Gₜ where e(aP, bQ) = e(P, Q)^(ab) - enables pairing-based cryptography
Commitment Scheme: Cryptographic primitive for committing to a value without revealing it
Constraint: An equation that must be satisfied by the witness in R1CS: (A·w) × (B·w) = (C·w)
Covector: A row vector that defines a linear form - used for polynomial evaluation
DFA: Deterministic Finite Automaton - state machine for regex matching
Domain Separator: Hash of protocol parameters to prevent cross-protocol attacks
Embedding Degree: Smallest k where r divides p^k - 1 for pairing-friendly curves
Fiat-Shamir: Technique to make interactive proofs non-interactive using hash functions
Field Element: Element of a finite field 𝔽ₚ - all arithmetic is modulo p
Lagrange Basis: Polynomial basis where Lᵢ(xⱼ) = δᵢⱼ (1 if i=j, 0 otherwise)
MLE: Multilinear Extension - extends vector to multilinear polynomial
Montgomery Form: Representation x̃ = xR mod p for faster multiplication
NTT: Number Theoretic Transform - FFT over finite fields
Pairing: Bilinear map on elliptic curves - enables checking equations on committed values
QAP: Quadratic Arithmetic Program - polynomial representation of R1CS
R1CS: Rank-1 Constraint System - matrix representation of computation
Schwartz-Zippel: Lemma about polynomial identity testing - foundation of succinct verification
SRS: Structured Reference String - public parameters from trusted setup
Sumcheck: Interactive proof protocol for verifying sums over large domains
Toxic Waste: Secret value τ from trusted setup that must be destroyed
Transcript: Running hash of all messages in Fiat-Shamir transformation
Witness: Complete set of values satisfying all constraints (public + private + intermediate)
zkSNARK: Zero-Knowledge Succinct Non-interactive Argument of Knowledge
For Technical Interviews at ZK Companies:
1. Understand the Fundamentals (Week 1-2)
- Master finite field arithmetic (BN254 specifically)
- Understand R1CS and constraint systems
- Learn polynomial commitments and evaluation
- Study Fiat-Shamir transformation
2. Deep Dive into Proof Systems (Week 3-4)
- Compare Groth16, PLONK, STARKs, WHIR
- Understand trusted setup and its implications
- Learn sumcheck protocol in detail
- Study recursive verification
3. Practical Implementation (Week 5-6)
- Write simple circuits in Noir or Circom
- Implement witness solver from scratch
- Build a toy Merkle tree commitment scheme
- Optimize circuits for constraint count
4. Domain-Specific Knowledge (Week 7-8)
- For ZK Email: DKIM, RSA in circuits, regex in ZK, SHA256 optimization
- For ZK Rollups: batch verification, data availability, sequencer design
- For ZK Identity: Merkle proofs, nullifiers, privacy-preserving credentials
- For ZK ML: quantization, lookup tables, activation functions in circuits
5. System Design (Week 9-10)
- Design a ZK application end-to-end
- Consider prover/verifier architecture
- Plan for key management and trusted setup
- Think about gas costs and on-chain verification
Common Interview Questions:
- "Explain zero-knowledge proofs to a non-technical person"
- "Why is RSA verification expensive in ZK circuits?"
- "How would you optimize a circuit with 1M constraints?"
- "What are the tradeoffs between Groth16 and PLONK?"
- "Design a ZK voting system"
- "How do you prevent double-spending in a ZK payment system?"
- "Explain the Schwartz-Zippel lemma and why it matters"
- "What's the difference between soundness and zero-knowledge?"
Red Flags to Avoid:
- Claiming to understand something you don't
- Not asking clarifying questions
- Ignoring security considerations
- Over-engineering solutions
- Not considering practical constraints (gas costs, proving time, memory)
This documentation covers 20 core mathematical topics in ProveKit and zero-knowledge proofs:
Core Concepts (1-6):
- R1CS - Constraint system representation
- Finite Field Arithmetic - BN254 field operations
- Witness Generation - Solving constraints
- Polynomial Commitments - Merkle trees and MLE
- NTT - Fast polynomial operations
- Sumcheck Protocol - Succinct verification
Cryptographic Primitives (7-9): 7. Fiat-Shamir - Non-interactive proofs 8. Hash Functions - Poseidon2, SHA256, Keccak 9. Sparse Matrices - Efficient storage
Optimization & Implementation (10-12): 10. Memory Optimization - Mobile-friendly proving 11. Recursive Verification - Groth16 wrapper 12. Noir Compilation - ACIR to R1CS
ZK Email Specific (13-16): 13. DKIM Verification - Email authentication 14. ZK-Regex - Pattern matching in circuits 15. RSA in ZK - Signature verification 16. SHA256 in ZK - Variable-length hashing
Advanced Theory (17-20): 17. QAP - Polynomial representation 18. Elliptic Curve Pairings - Bilinear maps 19. Trusted Setup - Powers of tau 20. Schwartz-Zippel - Polynomial identity testing
Each topic includes:
- Mathematical foundations with formulas and examples
- Code references from ProveKit codebase
- Interview questions with detailed answers
- Practical applications and optimizations
Total Content:
- 20 mathematical topics with full explanations
- 100+ interview questions with answers
- 8 flowcharts showing system architecture
- 5 comparison tables (proof systems, hashes, signatures, etc.)
- Security considerations and common bugs
- Performance optimization techniques
- Practical exercises and learning resources
This documentation is designed for interview preparation at ZK companies like zk.email, Aztec, zkSync, Polygon, and others working on zero-knowledge proof systems.