Skip to content

Instantly share code, notes, and snippets.

@chaosma
Created December 30, 2025 12:02
Show Gist options
  • Select an option

  • Save chaosma/6123fb3389a2e26a99ce47e1b0a6d24b to your computer and use it in GitHub Desktop.

Select an option

Save chaosma/6123fb3389a2e26a99ce47e1b0a6d24b to your computer and use it in GitHub Desktop.

Instruction ReadRAF Sumcheck — Memory + Residency Cheat Sheet

Stage 1 (log_K address rounds)

Large vectors allocated during ReadRafSumcheckProver::gen:

Object Size driver Lifetime Residency
lookup_indices, lookup_indices_uninterleave T packed keys / interleaved subsets Needed until Stage‑2 start GPU-only per shard once Stage‑2 materialization moves to GPU; host can drop them because ra never rematerializes on CPU
lookup_indices_identity Up to T indices Needed per phase and again in cache_openings (to sum RAF-flag EQs) CPU master (required for final RAF flag) + GPU shard slices
lookup_indices_by_table T total entries grouped by opcode Used every phase in init_suffix_polys and later in cache_openings CPU master (needed for lookup-table flags) + GPU shard buckets
lookup_tables, is_interleaved_operands T entries Needed for Stage‑1 suffixes and Stage‑2 value materialization GPU-only per shard after Stage‑2 materialization; CPU only holds them if it needs to redo Stage‑2 locally
u_evals_rv, u_evals_raf 2T field elems Rescaled at each phase, dropped before Stage‑2 GPU only
suffix_polys NUM_TABLE × suffixes × M (256) Rebuilt each phase, bound every round GPU produces, CPU consumes (no GPU copy once sent)
Prefix-suffix Q buffers (left/right/identity_ps) ORDER × M each Rebuilt/bound every round CPU only
Expanding tables v[phase] 16 × M Maintain prefix products, feed Stage‑2 CPU only
GruenSplitEqPolynomial (eq_r_spartan, eq_r_branch) O(T) Used to form u_evals_* and later Gruen rounds CPU master; per-phase slices can be cached on GPU if needed

Per-phase flow (16 phases, 8 rounds each):

  1. CPU sends the 8 challenges from previous phase to GPUs (8 field elements).
  2. Each GPU rescales its local u_evals_*, buckets cycles, and computes suffix polynomials (NUM_TABLE × 256 values per GPU).
  3. GPUs ship suffix polynomials to CPU; CPU stitches shards.
  4. CPU runs the 8 rounds: binds prefix/suffix structures, updates prefix_registry, and evolves v[phase].

Dual-resident objects (Stage‑1): only the data needed at proof finalization keeps CPU copies: lookup_indices_by_table and lookup_indices_identity. Everything else (e.g., lookup_indices, lookup_tables) can be GPU-only once Stage‑2 materializes on device.

Stage 2 (log_T cycle rounds)

Improved plan keeps large vectors on GPUs until domains shrink:

Object Size driver Phase‑2 lifetime Residency
ra, combined_val, combined_raf T each Materialize at Stage‑2 start, bound every round GPU-only while shard length stays power-of-two; once small, download final slices to CPU and free GPU buffers
prefixes (Vec<PrefixEval>) #prefixes (~tens) Broadcast once CPU storage, copied to GPUs as constants
prev_round_poly_spartan/branch Degree‑3 sacs Maintained per shard while GPUs run GPU while active; CPU recomputes after fallback
eq_r_spartan, eq_r_branch O(T) Needed throughout log_T CPU master copy; upload per-GPU slices as needed
eq_r_cycle_prime T Only used when caching lookup-table openings CPU-only vector created in cache_openings; we keep no GPU copy

GPU→CPU fallback: once a shard length is no longer a power of two (or below a bandwidth threshold), copy the residual ra/Val/RafVal to CPU, merge shards, and run the remaining small number of rounds on CPU exclusively. After the copy, those vectors are CPU-only; GPUs release the storage.

Data Transfer Summary

  • Per-phase Stage‑1 traffic: 8 field elements CPU→GPU; NUM_TABLE × 256 field elements GPU→CPU.
  • Stage‑2 handoff: only the final compact shards are copied once, when GPU domain shrinks.
  • Dual-resident data is limited to lookup_indices_by_table and lookup_indices_identity (both needed for final cache_openings); every other large vector lives on exactly one side at any moment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment