Skip to content

Instantly share code, notes, and snippets.

@dbist
Last active January 28, 2026 15:04
Show Gist options
  • Select an option

  • Save dbist/3c8f923abb6c2eb53c29fe4973064351 to your computer and use it in GitHub Desktop.

Select an option

Save dbist/3c8f923abb6c2eb53c29fe4973064351 to your computer and use it in GitHub Desktop.
Accelerating PostgreSQL with ReadySet Caching

Accelerating PostgreSQL with ReadySet Caching

Disclaimer: The opinions expressed in this article are my own and do not necessarily reflect the views of my employer.



This tutorial demonstrates how to use ReadySet as a caching layer for PostgreSQL databases. ReadySet is a wire-compatible caching solution that sits between your application and database, providing sub-millisecond query performance without requiring code changes.


Table of Contents




Why Cache PostgreSQL?

PostgreSQL is the world's most advanced open-source relational database, widely used for transactional workloads. Whether you're running PostgreSQL on-premises, in the cloud, or using a managed service, every query incurs:

  1. Network round-trip latency – Depending on region and network conditions (e.g., 14ms from a client in New Jersey to a cluster in AWS us-east-1)
  2. Query processing overhead – Parsing, planning, and execution on the database server
  3. Result serialization – Packaging and transmitting results back to your application

For read-heavy workloads with repetitive queries (dashboards, APIs, reports), this latency adds up quickly—especially when the same query is executed hundreds or thousands of times per minute.




What is ReadySet?

ReadySet is a SQL caching engine that sits between your application and database. It's wire-compatible with PostgreSQL and MySQL, meaning your application connects to ReadySet exactly like it would connect to your database—no code changes required.


ReadySet Cloud vs. Self-Hosted

ReadySet is available in several deployment options:

Option Description
ReadySet Cloud Fully managed service with automatic scaling, monitoring, and maintenance. Ideal for production workloads.
Self-Hosted Deploy ReadySet yourself using Docker or binaries. Full control over infrastructure.

Note for this tutorial: We use a self-hosted ReadySet instance to demonstrate the setup process. For production workloads, consider using ReadySet Cloud for automatic scaling and maintenance.


How ReadySet Works

┌─────────────┐         ┌─────────────┐         ┌──────────────┐
│ Application │────────▶│  ReadySet   │────────▶│  PostgreSQL  │
│             │  :5433  │   (Cache)   │  :5432  │  (Upstream)  │
└─────────────┘         └─────────────┘         └──────────────┘
                              │
                              ▼
                     ┌─────────────────┐
                     │ In-Memory Cache │
                     │ (Dataflow Graph)│
                     └─────────────────┘

  1. Transparent Proxy – ReadySet intercepts SQL queries from your application
  2. Selective Caching – You explicitly choose which queries to cache using CREATE CACHE commands
  3. Incremental Maintenance – For deep cache mode, ReadySet subscribes to PostgreSQL's replication stream and automatically updates cached results when underlying data changes
  4. Cache Hits – Cached queries are served directly from ReadySet's in-memory dataflow graph, bypassing the upstream database entirely

Key Differentiators

Feature Traditional Cache (Redis/Memcached) ReadySet
Query language Key-value API Standard SQL
Cache invalidation Manual (application logic) Automatic (via replication)
Code changes required Yes (caching logic) No (same connection string)
Result freshness Depends on TTL/manual invalidation Real-time (deep cache) or TTL-based (shallow cache)



Why Pair PostgreSQL with ReadySet?

The combination addresses a specific architectural challenge: you want the convenience of a managed PostgreSQL service, but you can't tolerate the inherent latency of a cloud-hosted database for high-frequency reads.


Latency Impact

Scenario Typical Latency Use Case
Direct to PostgreSQL 14-20ms Acceptable for occasional queries, batch jobs
Via ReadySet (uncached) 15-21ms Passthrough with minimal overhead
Via ReadySet (cached) 1-5ms Dashboards, APIs, real-time apps

When This Pairing Makes Sense

Good fit:

  • Read-heavy workloads (80%+ reads)
  • Repetitive queries (same query patterns executed frequently)
  • Latency-sensitive applications (dashboards, APIs, search)
  • Teams wanting to reduce database load and improve response times

Not ideal for:

  • Write-heavy workloads (ReadySet caches reads, not writes)
  • Highly dynamic queries with unique parameters each time



Prerequisites

Before starting, ensure you have:

  • Docker installed and running
  • A PostgreSQL database with:
    • Tables: customers, orders, products
    • Logical replication enabled (for deep cache)
  • Database credentials with appropriate permissions

Enabling Logical Replication on PostgreSQL

For deep cache mode, ReadySet requires logical replication to be enabled on your PostgreSQL database. This allows ReadySet to subscribe to the Write-Ahead Log (WAL) and automatically update cached query results when data changes.


Step 1: Check Current Replication Setting

Connect to your PostgreSQL database and run:

SHOW wal_level;

If the result is replica or minimal, you need to change it to logical.


Step 2: Enable Logical Replication

Modify the wal_level parameter:

ALTER SYSTEM SET wal_level = 'logical';

Note: This change requires a database restart to take effect. In managed environments, this may be handled automatically or require a support request depending on your configuration.


Step 3: Verify the Change

After the restart, verify logical replication is enabled:

SHOW wal_level;

Expected output:

 wal_level 
-----------
 logical
(1 row)

Required Database Permissions

The database user connecting to ReadySet needs the following permissions:

Permission Purpose
SELECT on cached tables Read data for snapshotting and query execution
REPLICATION Subscribe to the logical replication stream (deep cache)
CREATE on the database Create the replication slot (or use --disable-create-publication if manually created)

If you cannot grant REPLICATION privileges (common in managed environments), you can still use shallow cache mode, which doesn't require logical replication.

Shallow Cache Alternative: If enabling logical replication isn't possible, skip to the Shallow Cache Setup section, which uses TTL-based caching instead of replication.




Sample Schema

This tutorial uses the following sample tables. Create these in your PostgreSQL database before starting:

-- Customers table
CREATE TABLE public.customers (
    customer_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Products table
CREATE TABLE public.products (
    product_id SERIAL PRIMARY KEY,
    product_name VARCHAR(100) NOT NULL,
    category VARCHAR(50) NOT NULL,
    price DECIMAL(10, 2) NOT NULL
);

-- Orders table
CREATE TABLE public.orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES public.customers(customer_id),
    product_id INTEGER REFERENCES public.products(product_id),
    quantity INTEGER NOT NULL,
    order_date DATE DEFAULT CURRENT_DATE
);

-- Sample data: Customers
INSERT INTO public.customers (first_name, last_name, email) VALUES
    ('Alice', 'Smith', 'alice@example.com'),
    ('Bob', 'Johnson', 'bob@example.com'),
    ('Charlie', 'Brown', 'charlie@example.com'),
    ('Diana', 'Prince', 'diana@example.com'),
    ('Evan', 'Wright', 'evan@example.com'),
    ('Fiona', 'Gallagher', 'fiona@example.com'),
    ('George', 'Martin', 'george@example.com'),
    ('Hannah', 'Montana', 'hannah@example.com');

-- Sample data: Products
INSERT INTO public.products (product_name, category, price) VALUES
    ('Wireless Mouse', 'Electronics', 25.99),
    ('Gaming Monitor', 'Electronics', 300.00),
    ('Novel: The Great Gatsby', 'Books', 12.50),
    ('Mechanical Keyboard', 'Electronics', 120.50),
    ('Running Shoes', 'Footwear', 89.99),
    ('Coffee Maker', 'Kitchen', 45.00),
    ('Yoga Mat', 'Fitness', 20.00),
    ('Leather Jacket', 'Apparel', 150.00);

-- Sample data: Orders
INSERT INTO public.orders (customer_id, product_id, quantity, order_date) VALUES
    (1, 1, 1, '2023-10-01'),
    (2, 2, 1, '2023-10-02'),
    (1, 3, 2, '2023-10-03'),
    (3, 4, 1, '2023-10-04'),
    (4, 5, 1, '2023-10-05'),
    (5, 6, 1, '2023-10-06'),
    (2, 4, 1, '2023-10-07'),
    (6, 7, 3, '2023-10-08'),
    (7, 3, 1, '2023-10-09'),
    (8, 8, 1, '2023-10-10');



Baseline Performance (Direct to PostgreSQL)

Before implementing ReadySet, let's establish baseline query performance by connecting directly to PostgreSQL. This helps us measure the impact of caching.

Enable timing in psql: To see query execution times, enable timing in your psql session:

\timing on

This displays the elapsed time after each query, which we'll use throughout this tutorial to measure performance.


Query 1: Revenue by Product Category

SELECT 
    p.category,
    SUM(o.quantity) as items_sold,
    SUM(p.price * o.quantity) as revenue
FROM orders o
JOIN products p ON o.product_id = p.product_id
GROUP BY p.category
ORDER BY revenue DESC;

Result:

  category   | items_sold | revenue 
-------------+------------+---------
 Electronics |          4 |  566.99
 Apparel     |          1 |  150.00
 Footwear    |          1 |   89.99
 Fitness     |          3 |   60.00
 Kitchen     |          1 |   45.00
 Books       |          3 |   37.50
(6 rows)

Time: 19.402 ms

Query 2: Top Customers by Spending

SELECT 
    c.first_name,
    c.last_name,
    COUNT(o.order_id) as number_of_orders,
    SUM(p.price * o.quantity) as total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN products p ON o.product_id = p.product_id
GROUP BY c.first_name, c.last_name
ORDER BY total_spent DESC
LIMIT 3;

Result:

 first_name | last_name | number_of_orders | total_spent 
------------+-----------+------------------+-------------
 Bob        | Johnson   |                2 |      420.50
 Hannah     | Montana   |                1 |      150.00
 Charlie    | Brown     |                1 |      120.50
(3 rows)

Time: 14.595 ms

Query 3: Order Details with Customer Names

SELECT 
    o.order_id,
    o.order_date,
    c.first_name || ' ' || c.last_name AS customer_name,
    p.product_name,
    p.price,
    o.quantity,
    (p.price * o.quantity) AS total_cost
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON o.product_id = p.product_id;

Result:

 order_id | order_date |  customer_name  |      product_name       | price  | quantity | total_cost 
----------+------------+-----------------+-------------------------+--------+----------+------------
       11 | 2023-10-01 | Alice Smith     | Wireless Mouse          |  25.99 |        1 |      25.99
       12 | 2023-10-02 | Bob Johnson     | Gaming Monitor          | 300.00 |        1 |     300.00
       ...
(10 rows)

Time: 16.441 ms

Baseline Summary: Direct queries may take longer depending on network latency and the distance between your client application and the PostgreSQL server. Co-locating the client and database in the same region or availability zone isn't always feasible, making caching an attractive alternative.




Understanding ReadySet Cache Modes

ReadySet supports two caching strategies:

Mode Description Use Case
Deep Cache Uses PostgreSQL logical replication to keep caches in sync in real-time When you need always-fresh data with automatic invalidation
Shallow Cache Periodically refreshes cached data based on TTL and refresh intervals When eventual consistency is acceptable and replication isn't available



Deep Cache Setup

Deep cache leverages PostgreSQL's logical replication stream to automatically update cached query results whenever the underlying data changes.


Step 1: Create a Docker Volume for Cache Persistence

Create a named Docker volume to persist ReadySet's cache data across container restarts:

docker volume create readyset_cache

Step 2: Create a Publication (Manual Step)

If your PostgreSQL environment doesn't provide superuser access, you must manually create a publication for the tables you want to replicate:

-- Create publication for specific tables
CREATE PUBLICATION readyset FOR TABLE public.customers, public.products, public.orders;

Note: This step is required because the --disable-create-publication flag is used, which prevents ReadySet from attempting to create the publication itself (which would fail without superuser privileges).


Step 3: Run ReadySet with Deep Cache Configuration

docker run -d -p 5433:5433 -p 6034:6034         \
--name readyset --rm                            \
-e UPSTREAM_DB_URL=postgres://<username>:<password>@<host>:5432/postgres \
-e REPLICATION_TABLES=public.customers,public.orders,public.products \
-e LISTEN_ADDRESS=0.0.0.0:5433                  \
-e STORAGE_DIR=/data/cache \
-v readyset_cache:/data/cache \
readysettech/readyset:latest \
  --verify-skip \
  --disable-upstream-ssl-verification \
  --disable-create-publication \
  --disable-telemetry

Docker Command Parameters Explained

Port Mappings (-p)

Parameter Description
-p 5433:5433 Maps the ReadySet PostgreSQL-compatible port. Your application connects here instead of directly to your database
-p 6034:6034 Maps the ReadySet metrics and health check HTTP endpoint

Container Options

Parameter Description
-d Runs the container in detached (background) mode
--name readyset Assigns a friendly name to the container for easier management
--rm Automatically removes the container when it stops (useful for development)

Environment Variables (-e)

Variable Description
UPSTREAM_DB_URL The full PostgreSQL connection string to your database. Format: postgres://user:password@host:port/database
REPLICATION_TABLES Comma-separated list of tables to replicate (schema.table format). ReadySet will subscribe to changes on these tables
LISTEN_ADDRESS The address and port ReadySet listens on for incoming connections. 0.0.0.0:5433 allows connections from any network interface
STORAGE_DIR Directory path inside the container where ReadySet stores persistent cache data

Volume Mount (-v)

Parameter Description
-v readyset_cache:/data/cache Creates a Docker named volume readyset_cache and mounts it to /data/cache inside the container. This persists cache data across container restarts

ReadySet Flags (Command Arguments)

Flag Description
--verify-skip Skips verification of cached query results against the upstream database. Improves performance but reduces safety checks
--disable-upstream-ssl-verification Disables SSL certificate verification when connecting to the upstream database. Required when the upstream uses self-signed certificates
--disable-create-publication Prevents ReadySet from attempting to create a PostgreSQL publication. Use this when you don't have superuser privileges and have manually created the publication
--disable-telemetry Disables sending anonymous usage statistics to ReadySet

Step 4: Monitor Snapshot Progress

After starting ReadySet, monitor the Docker logs to verify that the snapshot completes successfully:

docker logs readyset --follow

Look for these INFO messages indicating successful table snapshotting:

2026-01-27T16:15:09.210564Z  INFO replicators::postgres_connector::snapshot: Snapshotting tables tables=3 max_parallel_snapshot_tables=7
2026-01-27T16:15:09.210589Z  INFO Snapshotting table{table="public"."customers"}: replicators::postgres_connector::snapshot: Snapshotting table
2026-01-27T16:15:09.222170Z  INFO Snapshotting table{table="public"."orders"}: replicators::postgres_connector::snapshot: Snapshotting table
2026-01-27T16:15:09.229690Z  INFO Snapshotting table{table="public"."products"}: replicators::postgres_connector::snapshot: Snapshotting table
...
2026-01-27T16:15:09.536299Z  INFO replicators::noria_adapter: Snapshot finished
2026-01-27T16:15:09.574068Z  INFO replicators::noria_adapter: Streaming replication started position=(0/D014690, 0/D014690)

Important: Wait for the "Snapshot finished" and "Streaming replication started" messages before creating caches. This ensures ReadySet has a complete copy of your data.




Shallow Cache Setup

Shallow cache is ideal when logical replication isn't available or when eventual consistency is acceptable. It uses TTL-based caching with periodic refreshes.

Switching from Deep Cache? If you previously ran ReadySet in deep cache mode, stop the existing container first:

docker stop readyset

Step 1: Create a Docker Volume for Cache Persistence

If you haven't already created the volume (from the deep cache setup), create it now:

docker volume create readyset_cache

Step 2: Run ReadySet with Shallow Cache Configuration

docker run -d -p 5433:5433 -p 6034:6034         \
--name readyset --rm                            \
-e UPSTREAM_DB_URL=postgres://<username>:<password>@<host>:5432/postgres \
-e LISTEN_ADDRESS=0.0.0.0:5433                  \
-e STORAGE_DIR=/data/cache \
-v readyset_cache:/data/cache \
readysettech/readyset:latest \
  --verify-skip \
  --disable-upstream-ssl-verification \
  --replication-tables-ignore '*.*' \
  --disable-telemetry

Additional Shallow Cache Parameters

Flag Description
--replication-tables-ignore '*.*' Ignores all tables for replication. This effectively disables deep cache mode and enables shallow caching only

Note: The REPLICATION_TABLES environment variable is not set in shallow cache mode since we're ignoring all tables for replication.




Creating Cached Queries

Step 1: Connect to ReadySet

Once ReadySet is running, connect to it using any PostgreSQL client:

psql 'postgresql://<username>:<password>@127.0.0.1:5433/postgres'

Important: Use the same username and password as your upstream PostgreSQL database. ReadySet authenticates against the upstream database using these credentials.

Note: The connection string uses 127.0.0.1:5433 (ReadySet) instead of your PostgreSQL host. ReadySet proxies your queries to the upstream database while caching results.


Step 2: Verify ReadySet Status

Before creating caches, verify that ReadySet is connected and ready:

SHOW READYSET STATUS;

Expected output:

            name            |          value           
----------------------------+--------------------------
 Database Connection        | Connected
 Connection Count           | 1
 Status                     | Online
 Maximum Replication Offset | (0/B3021CF8, 0/B3021D28)
 Minimum Replication Offset | (0/B3021CF8, 0/B3021D28)
 Last started Controller    | 2026-01-27 14:37:48 UTC
 Last completed snapshot    | 2026-01-27 14:37:51 UTC
 Last started replication   | 2026-01-27 14:37:51 UTC
 Enabled Features           | None
(9 rows)

Verify there are no caches yet:

SHOW CACHES;
 query id | cache name | query text | properties | count 
----------+------------+------------+------------+-------
(0 rows)

Step 3: Create Your Cached Queries

Deep Cache Queries (CREATE DEEP CACHE ALWAYS)

Deep cache queries use CREATE DEEP CACHE ALWAYS and are automatically kept in sync via replication:

Query 1: Revenue by Product Category

CREATE DEEP CACHE ALWAYS q1 FROM SELECT 
    p.category,
    SUM(o.quantity) as items_sold,
    SUM(p.price * o.quantity) as revenue
FROM orders o
JOIN products p ON o.product_id = p.product_id
GROUP BY p.category
ORDER BY revenue DESC;

Query 2: Top Customers by Spending

CREATE DEEP CACHE ALWAYS q2 FROM SELECT 
    c.first_name,
    c.last_name,
    COUNT(o.order_id) as number_of_orders,
    SUM(p.price * o.quantity) as total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN products p ON o.product_id = p.product_id
GROUP BY c.first_name, c.last_name
ORDER BY total_spent DESC
LIMIT 3;

Query 3: Order Details with Customer Names

CREATE DEEP CACHE ALWAYS q3 FROM SELECT 
    o.order_id,
    o.order_date,
    CONCAT(c.first_name, ' ', c.last_name) AS customer_name,
    p.product_name,
    p.price,
    o.quantity,
    (p.price * o.quantity) AS total_cost
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON o.product_id = p.product_id;

Shallow Cache Queries (CREATE SHALLOW CACHE)

Shallow cache queries include TTL and refresh policies:

CREATE SHALLOW CACHE POLICY TTL 3600 SECONDS REFRESH EVERY 600 SECONDS 
shallow_cache_q1 FROM SELECT 
    p.category,
    SUM(o.quantity) as items_sold,
    SUM(p.price * o.quantity) as revenue
FROM orders o
JOIN products p ON o.product_id = p.product_id
GROUP BY p.category
ORDER BY revenue DESC;

Shallow Cache Policy Parameters

Parameter Description
TTL 3600 SECONDS Time-to-live: cached results expire after 3600 seconds (1 hour)
REFRESH EVERY 600 SECONDS ReadySet proactively refreshes the cache every 600 seconds (10 minutes)

Additional Shallow Cache Examples

-- Top customers with shallow caching
CREATE SHALLOW CACHE POLICY TTL 3600 SECONDS REFRESH EVERY 600 SECONDS 
shallow_cache_q2 FROM SELECT 
    c.first_name,
    c.last_name,
    COUNT(o.order_id) as number_of_orders,
    SUM(p.price * o.quantity) as total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN products p ON o.product_id = p.product_id
GROUP BY c.first_name, c.last_name
ORDER BY total_spent DESC
LIMIT 3;

-- Order details with shallow caching
CREATE SHALLOW CACHE POLICY TTL 3600 SECONDS REFRESH EVERY 600 SECONDS 
shallow_cache_q3 FROM SELECT 
    o.order_id,
    o.order_date,
    c.first_name || ' ' || c.last_name AS customer_name,
    p.product_name,
    p.price,
    o.quantity,
    (p.price * o.quantity) AS total_cost
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON o.product_id = p.product_id;

Note: The || string concatenation operator is supported in shallow cache mode but not in deep cache mode (see Troubleshooting).


Step 4: Verify Queries Are Being Cached

After creating caches, confirm that your queries are hitting the cache instead of the upstream database.


Check Cache Status

SHOW CACHES;

The count column shows how many times each cache has been hit:

      query id      | cache name |  query text (truncated)  | properties | count 
--------------------+------------+--------------------------+------------+-------
 q_663bfa3e73146832 | q1         | SELECT "p"."category"... | always     |     0
 q_272f720b813f03ee | q2         | SELECT "c"."first_name"..| always     |     0
 q_a1b2c3d4e5f67890 | q3         | SELECT "o"."order_id"... | always     |     0

Note: Immediately after creating caches, the count will be 0. Run your cached queries a few times, then check SHOW CACHES again to see the hit count increase.


View All Proxied Queries

To see which queries are being cached vs. proxied to upstream:

SHOW PROXIED QUERIES;

This displays all queries ReadySet has seen, their cache status, and whether they're supported for caching.


Confirm Cache Hit with Timing

The easiest way to verify caching is working: run the same query twice and compare times.

-- First run (may be uncached): ~15-20ms
SELECT p.category, SUM(o.quantity) as items_sold ...

-- Second run (cached): ~1-5ms
SELECT p.category, SUM(o.quantity) as items_sold ...

If the second query is significantly faster, it's being served from the cache.




Verifying Automatic Cache Invalidation (Deep Cache)

Deep cache automatically updates when underlying data changes. Let's demonstrate this with an INSERT.


Step 1: Query the Current Cache

First, run the revenue by category query and note the results:

SELECT 
    p.category,
    SUM(o.quantity) as items_sold,
    SUM(p.price * o.quantity) as revenue
FROM orders o
JOIN products p ON o.product_id = p.product_id
GROUP BY p.category
ORDER BY revenue DESC;

Result:

  category   | items_sold | revenue 
-------------+------------+---------
 Electronics |          4 |  566.99
 Apparel     |          1 |     150
 Footwear    |          1 |   89.99
 Fitness     |          3 |   60.00
 Kitchen     |          1 |      45
 Books       |          3 |   37.50
(6 rows)

Time: 4.344 ms

Step 2: Insert New Data

Add a new order for an Electronics product:

INSERT INTO orders (customer_id, product_id, quantity, order_date) 
VALUES (1, 2, 2, '2024-01-15');

Step 3: Verify Cache Updated Automatically

Re-run the same query—the cache should reflect the new data without any manual invalidation:

SELECT 
    p.category,
    SUM(o.quantity) as items_sold,
    SUM(p.price * o.quantity) as revenue
FROM orders o
JOIN products p ON o.product_id = p.product_id
GROUP BY p.category
ORDER BY revenue DESC;

Result:

  category   | items_sold | revenue 
-------------+------------+---------
 Electronics |          6 |  807.99
 Apparel     |          1 |     150
 Footwear    |          1 |   89.99
 Fitness     |          3 |   60.00
 Kitchen     |          1 |      45
 Books       |          3 |   37.50
(6 rows)

Time: 4.648 ms

How it works: ReadySet subscribes to PostgreSQL's replication stream. When you INSERT/UPDATE/DELETE data, the change flows through ReadySet's dataflow graph, automatically updating all affected cached query results in real-time.




Performance Impact & Latency Analysis

As discussed in Why Cache PostgreSQL?, databases incur network latency per query due to network round-trips and processing overhead. ReadySet eliminates most of these costs by serving pre-computed results from local memory.


A Note on Data Size

The performance results in this tutorial use a small sample dataset (10 orders, 8 customers, 8 products). With small datasets, the improvements may seem modest—but this actually understates ReadySet's value in production:

Factor Small Dataset Large Dataset (millions of rows)
Upstream query time Fast (data fits in memory) Slower (disk I/O, larger joins, more aggregation work)
Cached query time ~1-5ms ~1-5ms (constant)
Relative improvement 3-12x 10-100x or more

Key insight: As your data grows, upstream query latency increases due to larger table scans, more complex joins, and heavier aggregation workloads. However, cached query latency remains constant—ReadySet serves pre-computed results regardless of the underlying data size. This means the performance gap between cached and uncached queries widens significantly as your dataset scales.


Deep Cache Performance Results

Query Direct to PostgreSQL Via ReadySet (uncached) Via ReadySet (cached) Improvement
Q1: Revenue by Category 19.4 ms 18.5 ms 3.7 ms 5.3x faster
Q2: Top Customers 14.6 ms ~15 ms 1.2 ms 12.4x faster
Q3: Order Details 16.4 ms ~17 ms 5.1 ms 3.2x faster

Key Observations

  • Uncached queries pass through ReadySet to the upstream database with minimal overhead (~1ms)
  • Cached queries are served directly from ReadySet's in-memory dataflow graph
  • Complex aggregations (Q2 with COUNT, SUM, GROUP BY, ORDER BY, LIMIT) see the largest improvement
  • Simple joins (Q3) still benefit significantly but have lower relative improvement

Shallow Cache Performance Results

Query Direct to PostgreSQL Via ReadySet (shallow cached) Improvement
Q1: Revenue by Category 19.4 ms 3.9 ms 5.0x faster
Q2: Top Customers 14.6 ms 4.0 ms 3.6x faster
Q3: Order Details 16.4 ms 4.2 ms 3.9x faster

When to Use Each Cache Mode

Tip: To verify your caches are working, see Step 4: Verify Queries Are Being Cached in the Creating Cached Queries section.

Consideration Deep Cache Shallow Cache
Data freshness Real-time (milliseconds) Eventual (TTL-based)
Setup complexity Requires publication + replication Simpler setup
Best for OLTP, user-facing apps Analytics, dashboards, reports
Latency reduction 3-12x improvement 3-5x improvement
Upstream load Minimal (only replication stream) Periodic refresh queries

Latency Implications for Production

  1. User Experience: Reducing query latency from 15-20ms to 1-5ms dramatically improves perceived application responsiveness, especially for:

    • Dashboard loading times
    • Autocomplete/search-as-you-type features
    • Real-time analytics displays
  2. Throughput: Lower latency means each connection can handle more queries per second, improving overall system capacity

  3. Cost Optimization: Fewer queries hitting the upstream database can reduce compute costs

  4. Consistency Trade-offs:

    • Deep cache: Always consistent with upstream (via replication)
    • Shallow cache: May serve stale data within TTL window—acceptable for many read-heavy workloads

Comparing Query Execution Plans

To understand query execution behavior, let's compare EXPLAIN ANALYZE output when executed directly on PostgreSQL vs. through ReadySet.

Understanding the Timing Discrepancy: The EXPLAIN ANALYZE output below shows database-side execution time (0.1-0.15ms), while the performance tables earlier show total query time (14-20ms). The difference is network round-trip latency—the time for your query to travel to the database server and the results to return. This is why caching locally with ReadySet provides such dramatic improvements: it eliminates the network hop entirely for cached queries.


Query 1: Revenue by Category (2-table join)

PostgreSQL Direct:

                                              QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=77.67..78.17 rows=200 width=158) (actual time=0.059..0.060 rows=6.00 loops=1)
   Sort Key: (sum((p.price * (o.quantity)::numeric))) DESC
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=2
   ->  HashAggregate  (cost=67.53..70.03 rows=200 width=158) (actual time=0.049..0.053 rows=6.00 loops=1)
         Group Key: p.category
         ->  Hash Join  (cost=14.72..46.28 rows=1700 width=138) (actual time=0.029..0.034 rows=10.00 loops=1)
               Hash Cond: (o.product_id = p.product_id)
               ->  Seq Scan on orders o
               ->  Hash -> Seq Scan on products p
 Planning Time: 0.137 ms
 Execution Time: 0.103 ms
(20 rows)

Via ReadySet (Proxied to Upstream):

                                              QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=77.67..78.17 rows=200 width=158) (actual time=0.057..0.059 rows=6.00 loops=1)
   Sort Key: (sum((p.price * (o.quantity)::numeric))) DESC
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=2
   ->  HashAggregate  (cost=67.53..70.03 rows=200 width=158) (actual time=0.048..0.052 rows=6.00 loops=1)
         Group Key: p.category
         ->  Hash Join  (cost=14.72..46.28 rows=1700 width=138) (actual time=0.028..0.032 rows=10.00 loops=1)
               Hash Cond: (o.product_id = p.product_id)
               ->  Seq Scan on orders o
               ->  Hash -> Seq Scan on products p
 Planning Time: 0.134 ms
 Execution Time: 0.102 ms
(20 rows)

Query 2: Top Customers (3-table join with aggregation)

PostgreSQL Direct:

                                              QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=92.56..92.57 rows=3 width=276) (actual time=0.090..0.092 rows=3.00 loops=1)
   Buffers: shared hit=3
   ->  Sort  (cost=92.56..92.88 rows=130 width=276) (actual time=0.089..0.091 rows=3.00 loops=1)
         Sort Key: (sum((p.price * (o.quantity)::numeric))) DESC
         Sort Method: top-N heapsort  Memory: 25kB
         ->  HashAggregate  (cost=89.25..90.88 rows=130 width=276) (actual time=0.077..0.082 rows=8.00 loops=1)
               Group Key: c.first_name, c.last_name
               ->  Hash Join  (cost=27.65..63.75 rows=1700 width=260) (actual time=0.054..0.061 rows=10.00 loops=1)
                     Hash Cond: (o.product_id = p.product_id)
                     ->  Hash Join  (cost=12.93..44.48 rows=1700 width=248) (actual time=0.028..0.032 rows=10.00 loops=1)
                           Hash Cond: (o.customer_id = c.customer_id)
                           ->  Seq Scan on orders o
                           ->  Hash -> Seq Scan on customers c
                     ->  Hash -> Seq Scan on products p
 Planning Time: 0.189 ms
 Execution Time: 0.149 ms
(30 rows)

Via ReadySet (Proxied to Upstream):

                                              QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=92.56..92.57 rows=3 width=276) (actual time=0.079..0.082 rows=3.00 loops=1)
   Buffers: shared hit=3
   ->  Sort  (cost=92.56..92.88 rows=130 width=276) (actual time=0.078..0.080 rows=3.00 loops=1)
         Sort Key: (sum((p.price * (o.quantity)::numeric))) DESC
         Sort Method: top-N heapsort  Memory: 25kB
         ->  HashAggregate  (cost=89.25..90.88 rows=130 width=276) (actual time=0.067..0.072 rows=8.00 loops=1)
               Group Key: c.first_name, c.last_name
               ->  Hash Join  (cost=27.65..63.75 rows=1700 width=260) (actual time=0.043..0.050 rows=10.00 loops=1)
                     Hash Cond: (o.product_id = p.product_id)
                     ->  Hash Join  (cost=12.93..44.48 rows=1700 width=248) (actual time=0.021..0.026 rows=10.00 loops=1)
                           Hash Cond: (o.customer_id = c.customer_id)
                           ->  Seq Scan on orders o
                           ->  Hash -> Seq Scan on customers c
                     ->  Hash -> Seq Scan on products p
 Planning Time: 0.187 ms
 Execution Time: 0.140 ms
(30 rows)

Execution Time Comparison Summary

Query Metric PostgreSQL Direct Via ReadySet Improvement
Q1 Planning Time 0.14 ms 0.13 ms -2%
Q1 Execution Time 0.10 ms 0.10 ms -1%
Q2 Planning Time 0.19 ms 0.19 ms -1%
Q2 Execution Time 0.15 ms 0.14 ms -6%

Key Insights

  1. Same Query Plan Structure: Both executions use identical query plans (Limit → Sort → HashAggregate → Hash Joins → Seq Scans). ReadySet doesn't modify how the upstream database executes queries.

  2. Marginal Improvement on Proxied Queries: Even for uncached queries that pass through to the upstream database, there's a slight performance improvement (~6% faster execution).

  3. The Real Win is Caching: The EXPLAIN output above shows uncached query behavior. For cached queries, ReadySet bypasses this entire execution pipeline and serves pre-computed results directly from memory—reducing latency from 14-20ms to 1-5ms (a 3-12x improvement).

Note: When a query is cached, ReadySet performs a simple key-value lookup in its in-memory dataflow graph rather than executing a traditional query plan against the database.




Troubleshooting


String Concatenation Error in Deep Cache

Problem: Using || for string concatenation in deep cache mode fails:

-- This FAILS in deep cache mode
CREATE DEEP CACHE ALWAYS q3 FROM SELECT 
    c.first_name || ' ' || c.last_name AS customer_name
    ...

Error:

ERROR: unknown: Error during RPC (extend_recipe): SQL SELECT query 'q3' couldn't be added: 
MIR node '20' couldn't be lowered to dataflow: The provided query is invalid: 
cannot invoke '||' on left-side operand type VarChar(50); expected JSONB

Solution: Use the CONCAT() function instead:

-- This WORKS in deep cache mode
CREATE DEEP CACHE ALWAYS q3 FROM SELECT 
    CONCAT(c.first_name, ' ', c.last_name) AS customer_name
    ...

Common Issues

Issue Solution
Cannot connect to ReadySet Ensure port 5433 is not blocked and the container is running (docker ps)
Publication doesn't exist Manually create the publication with CREATE PUBLICATION readyset FOR TABLE ...
SSL verification errors Use --disable-upstream-ssl-verification flag
Cache not updating For deep cache, verify the publication includes all required tables



Useful Commands


View Cached Queries

SHOW CACHES;

View Query Status

SHOW PROXIED QUERIES;

Drop a Cache

DROP CACHE q1;

Check ReadySet Status

SHOW READYSET STATUS;

Monitor Docker Logs

docker logs readyset --follow

Stop ReadySet

docker stop readyset

Remove Cache Volume (reset all caches)

docker volume rm readyset_cache



Cleanup

To completely remove ReadySet and start fresh:


Step 1: Stop the ReadySet Container

docker stop readyset

Note: If you used the --rm flag when starting the container, it will be automatically removed when stopped.


Step 2: Remove the Docker Volume

docker volume rm readyset_cache

Step 3: Drop the Publication (Deep Cache Only)

If you created a publication for deep cache mode, remove it from your PostgreSQL database:

DROP PUBLICATION IF EXISTS readyset;

Step 4: Verify Cleanup

Confirm everything is removed:

# Check no readyset container exists
docker ps -a | grep readyset

# Check volume is removed
docker volume ls | grep readyset_cache



Acknowledgments

Thanks to the ReadySet team for their support and guidance in getting this tutorial together—their responsiveness and deep product knowledge made the process smooth.




References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment