Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save jacoblyles/2ce474e030dd2fb40dd2fc91ffbda894 to your computer and use it in GitHub Desktop.

Select an option

Save jacoblyles/2ce474e030dd2fb40dd2fc91ffbda894 to your computer and use it in GitHub Desktop.
<!DOCTYPE html>
<html lang="en" data-theme="dark">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Cheetah Center Data Sync &mdash; Design Document</title>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/@picocss/pico@2/css/pico.min.css">
<script type="module">
import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';
mermaid.initialize({ startOnLoad: true, theme: 'dark', securityLevel: 'loose' });
</script>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/themes/prism-tomorrow.min.css">
<script src="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/prism.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/components/prism-python.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/components/prism-typescript.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/components/prism-sql.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/components/prism-bash.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/components/prism-hcl.min.js"></script>
<style>
:root {
--gap: clamp(0.8rem, 2vw, 1.5rem);
}
.field-table td:nth-child(2),
.field-table td:nth-child(3),
.field-table td:nth-child(4) { text-align: center; }
.field-table .yes { color: #3fb950; }
.field-table .no { color: #6e7681; }
.field-table .new { color: #d29922; font-weight: 600; }
.badge {
display: inline-block; padding: 0.15em 0.5em; border-radius: 4px;
font-size: 0.75em; font-weight: 600; vertical-align: middle;
}
.badge-green { background: #23863633; color: #3fb950; border: 1px solid #3fb95044; }
.badge-yellow { background: #d2992233; color: #d29922; border: 1px solid #d2992244; }
.badge-blue { background: #58a6ff33; color: #58a6ff; border: 1px solid #58a6ff44; }
.badge-red { background: #f8514933; color: #f85149; border: 1px solid #f8514944; }
summary { cursor: pointer; font-weight: 600; }
.mermaid { background: transparent !important; }
.decision-box {
border-left: 3px solid #58a6ff; padding-left: 1rem; margin: 1rem 0;
}
.callout {
background: #161b2233; border: 1px solid #30363d;
border-radius: 8px; padding: 1rem 1.2rem; margin: 1rem 0;
}
.callout-warn {
border-color: #d29922; background: #d2992210;
}
h3 { margin-top: 2rem; }
article { margin-bottom: var(--gap); }
</style>
</head>
<body>
<main class="container">
<hgroup>
<h1>Cheetah Center Data Sync</h1>
<p>Design document &mdash; Ingesting legacy Bellum API center data into Connected Play</p>
</hgroup>
<hr>
<!-- ───────────────── PROBLEM ───────────────── -->
<section>
<h2>Problem</h2>
<p>The legacy Bellum API (<code>cheetah-api-proxy.bellum.ai</code>) exposes rich center data&mdash;management hierarchy, operating hours, maintenance windows, capacity, booking config, and more. Our system captures only a thin slice (name, location, contact) seeded once from Apollo and <strong>never refreshed</strong>.</p>
<div class="callout">
<strong>Current state:</strong> 382 centers in Postgres, seeded from Apollo. No ongoing sync. Snowflake <code>DIM_CENTER</code> exists but has no DDL or Dagster asset backing it. The <code>LUCKYSTRIKE_RAW.CHEETAH</code> schema exists in Terraform but is empty.
</div>
</section>
<!-- ───────────────── FIELD GAP ───────────────── -->
<section>
<h2>Field Gap Analysis</h2>
<p>The API returns ~45 fields per center. We currently store 12. Below is the complete mapping.</p>
<details open>
<summary>Identity &amp; Core Fields</summary>
<table class="field-table">
<thead><tr><th>API Field</th><th>Postgres</th><th>Snowflake</th><th>Action</th></tr></thead>
<tbody>
<tr><td><code>centerNumber</code></td><td class="yes">center_id</td><td class="yes">CENTER_ID</td><td><span class="badge badge-green">Exists</span></td></tr>
<tr><td><code>id</code> <small>(ULID)</small></td><td class="no">&mdash;</td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>externalId</code> <small>(UUID)</small></td><td class="no">&mdash;</td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>intercardCenterNumber</code></td><td class="no">&mdash;</td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>centerName</code></td><td class="yes">name</td><td class="yes">NAME</td><td><span class="badge badge-green">Exists</span></td></tr>
<tr><td><code>brand</code></td><td class="yes">brand</td><td class="no">&mdash;</td><td><span class="badge badge-green">Exists</span></td></tr>
<tr><td><code>status</code> <small>(open/closed)</small></td><td class="no">&mdash;</td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>archived</code></td><td class="no"><small>soft delete</small></td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>numLanes</code></td><td class="no"><small>derived</small></td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>maxCapacity</code></td><td class="no">&mdash;</td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
</tbody>
</table>
</details>
<details>
<summary>Location &amp; Contact</summary>
<table class="field-table">
<thead><tr><th>API Field</th><th>Postgres</th><th>Snowflake</th><th>Action</th></tr></thead>
<tbody>
<tr><td><code>latitude</code> / <code>longitude</code></td><td class="yes">lat/lng</td><td class="no">&mdash;</td><td><span class="badge badge-green">Exists</span></td></tr>
<tr><td><code>streetAddress</code></td><td class="yes">address</td><td class="no">&mdash;</td><td><span class="badge badge-green">Exists</span></td></tr>
<tr><td><code>city</code> / <code>state</code></td><td class="yes">city/state</td><td class="no">&mdash;</td><td><span class="badge badge-green">Exists</span></td></tr>
<tr><td><code>zipCode</code> / <code>zip4</code></td><td class="yes">postal_code</td><td class="no">&mdash;</td><td><span class="badge badge-green">Exists</span> <small>+ zip4</small></td></tr>
<tr><td><code>country</code></td><td class="no">&mdash;</td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>timezone</code></td><td class="yes">timezone</td><td class="yes">TIMEZONE</td><td><span class="badge badge-green">Exists</span></td></tr>
<tr><td><code>phone</code></td><td class="yes">phone</td><td class="no">&mdash;</td><td><span class="badge badge-green">Exists</span></td></tr>
<tr><td><code>centerEmail</code></td><td class="no">&mdash;</td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>managerEmail</code></td><td class="no">&mdash;</td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>websiteUrl</code></td><td class="no">&mdash;</td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
</tbody>
</table>
</details>
<details>
<summary>Management &amp; Org Hierarchy</summary>
<table class="field-table">
<thead><tr><th>API Field</th><th>Postgres</th><th>Snowflake</th><th>Action</th></tr></thead>
<tbody>
<tr><td><code>centerManager</code></td><td class="no">&mdash;</td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>kitchenManager</code></td><td class="no">&mdash;</td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>eventManager</code></td><td class="no">&mdash;</td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>areaManager</code></td><td class="no">&mdash;</td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>districtManager</code></td><td class="no">&mdash;</td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>directorOfSales</code></td><td class="no">&mdash;</td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>regionalVicePresident</code></td><td class="no">&mdash;</td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>district</code></td><td class="no">&mdash;</td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>region</code></td><td class="no">&mdash;</td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>salesRegion</code> / <code>salesTerritory</code></td><td class="no">&mdash;</td><td class="no">&mdash;</td><td><span class="badge badge-yellow">Add</span></td></tr>
</tbody>
</table>
</details>
<details>
<summary>Operations &amp; Scheduling (nested objects)</summary>
<table class="field-table">
<thead><tr><th>API Field</th><th>Type</th><th>Notes</th><th>Action</th></tr></thead>
<tbody>
<tr><td><code>maintenanceTimeStart</code> / <code>End</code></td><td>string</td><td>Daily maintenance window</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>timetable</code></td><td>array</td><td>Weekly open/close hours per day</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>specialHours</code></td><td>array</td><td>Holiday/override hours</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>primeTimes</code></td><td>array</td><td>Peak pricing windows</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>schedules</code></td><td>array</td><td>Scheduled activities</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>eventTypeRoutingRules</code></td><td>array</td><td>Booking event routing config</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>defaultResourceType</code></td><td>string</td><td>Booking resource type</td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>vendor</code></td><td>string</td><td></td><td><span class="badge badge-yellow">Add</span></td></tr>
<tr><td><code>tags</code></td><td>array|null</td><td>Metadata tags</td><td><span class="badge badge-yellow">Add</span></td></tr>
</tbody>
</table>
</details>
</section>
<!-- ───────────────── UPDATE FREQUENCY ───────────────── -->
<section>
<h2>Update Frequency</h2>
<p>The API's <code>updatedAt</code> field shows <code>2026-02-10 05:00:02 UTC</code> (today), suggesting a <strong>nightly batch update</strong>. Fields like management assignments, schedules, and center status are the most likely to change. A <strong>once-daily sync</strong> is sufficient, but this cadence should be treated as an assumption and validated over several days of run history.</p>
</section>
<!-- ───────────────── ARCHITECTURE ───────────────── -->
<section>
<h2>Proposed Architecture</h2>
<div class="callout">
<strong>Design principle:</strong> Dagster stays focused on Snowflake transforms (its existing domain). A separate TypeScript service handles the RDS Postgres sync, reusing the existing Kysely types and database tooling from <code>@connected-play/pg</code>. Both services fetch from the same API independently&mdash;no coupling.
</div>
<h3>Production</h3>
<pre class="mermaid">
flowchart LR
API["Cheetah API
/api/centers"]
subgraph CheetahSync ["cheetah-sync (ECS scheduled task)"]
TS["TypeScript
Kysely upsert"]
end
subgraph Dagster ["Dagster (daily @ 3 AM)"]
FETCH["cheetah_centers
asset"]
DIM["dim_center
asset"]
end
subgraph Snowflake
RAW["RAW.CHEETAH.CENTER
(full JSON as VARIANT)"]
REPORT["REPORT.DIM_CENTER
(structured dimension)"]
end
subgraph RDS ["Postgres (RDS)"]
CT["center table
(expanded)"]
end
API --> TS
TS -->|"upsert all fields"| CT
API --> FETCH
FETCH --> RAW
RAW --> DIM --> REPORT
</pre>
<h3>Local Development</h3>
<pre class="mermaid">
flowchart LR
API["Cheetah API
/api/centers"]
subgraph LocalStack ["localhost"]
direction TB
TS["npx tsx
apps/cheetah-sync/src/index.ts"]
PG["Postgres :5433"]
end
API --> TS
TS -->|"upsert all fields"| PG
</pre>
<div class="callout">
<strong>Why separate services?</strong>
<ul style="margin-bottom:0">
<li>Dagster currently has <strong>no RDS connection</strong>&mdash;adding one means a new resource class, <code>psycopg2</code>, RDS env vars in ECS, security group changes</li>
<li>The TS service gets Kysely types, <code>@connected-play/pg</code>, and the existing database patterns <strong>for free</strong></li>
<li>Matches the existing pattern: <code>pg-sync</code> writes events to RDS, Dagster transforms data in Snowflake</li>
<li>Simpler to test locally&mdash;just <code>npx tsx</code> against local Postgres, no Dagster needed</li>
</ul>
</div>
</section>
<!-- ───────────────── DATA MODELING ───────────────── -->
<section>
<h2>Data Modeling</h2>
<h3>Snowflake: Raw Landing</h3>
<p>Full API response as VARIANT in <code>LUCKYSTRIKE_RAW.CHEETAH.CENTER</code>. One row per center, full-replace daily.</p>
<pre><code class="language-sql">CREATE OR REPLACE TABLE LUCKYSTRIKE_RAW.CHEETAH.CENTER (
CENTER_NUMBER INTEGER NOT NULL, -- centerNumber from API
RAW_DATA VARIANT NOT NULL, -- full JSON object
SYNCED_AT TIMESTAMP_TZ DEFAULT CURRENT_TIMESTAMP(),
API_UPDATED_AT TIMESTAMP_TZ, -- updatedAt from API
PRIMARY KEY (CENTER_NUMBER)
);</code></pre>
<h3>Snowflake: DIM_CENTER (Dagster asset)</h3>
<p>Structured dimension table in <code>LUCKYSTRIKE_ANALYSIS.CONNECTED_PLAY.DIM_CENTER</code>, built from raw + existing Postgres data.</p>
<pre><code class="language-sql">CREATE OR REPLACE TABLE LUCKYSTRIKE_ANALYSIS.CONNECTED_PLAY.DIM_CENTER (
CENTER_NUMBER INTEGER PRIMARY KEY,
CHEETAH_ID VARCHAR, -- ULID from API
EXTERNAL_ID VARCHAR, -- UUID from API
QUBICA_ID INTEGER, -- from Postgres center table
NAME VARCHAR NOT NULL,
BRAND VARCHAR,
STATUS VARCHAR, -- open/closed
ARCHIVED BOOLEAN,
NUM_LANES INTEGER,
MAX_CAPACITY INTEGER,
-- Location
LATITUDE NUMBER(9,6),
LONGITUDE NUMBER(9,6),
STREET_ADDRESS VARCHAR,
CITY VARCHAR,
STATE VARCHAR,
ZIP_CODE VARCHAR,
ZIP4 VARCHAR,
COUNTRY VARCHAR,
TIMEZONE VARCHAR,
-- Contact
PHONE VARCHAR,
CENTER_EMAIL VARCHAR,
MANAGER_EMAIL VARCHAR,
WEBSITE_URL VARCHAR,
-- Management
CENTER_MANAGER VARCHAR,
KITCHEN_MANAGER VARCHAR,
EVENT_MANAGER VARCHAR,
AREA_MANAGER VARCHAR,
DISTRICT_MANAGER VARCHAR,
DIRECTOR_OF_SALES VARCHAR,
REGIONAL_VP VARCHAR,
-- Org hierarchy
DISTRICT INTEGER,
REGION INTEGER,
SALES_REGION INTEGER,
SALES_TERRITORY INTEGER,
INTERCARD_CENTER_NUMBER INTEGER,
-- Operations
MAINTENANCE_TIME_START VARCHAR,
MAINTENANCE_TIME_END VARCHAR,
DEFAULT_RESOURCE_TYPE VARCHAR,
VENDOR VARCHAR,
-- Nested (VARIANT)
TIMETABLE VARIANT,
SPECIAL_HOURS VARIANT,
PRIME_TIMES VARIANT,
SCHEDULES VARIANT,
EVENT_TYPE_ROUTING_RULES VARIANT,
TAGS VARIANT,
-- Metadata
API_UPDATED_AT TIMESTAMP_TZ,
SYNCED_AT TIMESTAMP_TZ,
REC_INS_TS TIMESTAMP_TZ DEFAULT CURRENT_TIMESTAMP()
);</code></pre>
<h3>Postgres: Expanded center table</h3>
<p>Add columns to the existing <code>center</code> table for operationally useful fields. Use JSONB for nested objects.</p>
<pre><code class="language-sql">-- Migration: expand_center_table
ALTER TABLE center ADD COLUMN cheetah_id varchar(30); -- ULID
ALTER TABLE center ADD COLUMN external_id uuid;
ALTER TABLE center ADD COLUMN intercard_center_number integer;
ALTER TABLE center ADD COLUMN status varchar(20); -- open/closed
ALTER TABLE center ADD COLUMN archived boolean DEFAULT false;
ALTER TABLE center ADD COLUMN num_lanes integer;
ALTER TABLE center ADD COLUMN max_capacity integer;
ALTER TABLE center ADD COLUMN country varchar(100);
ALTER TABLE center ADD COLUMN zip4 varchar(10);
ALTER TABLE center ADD COLUMN center_email varchar(200);
ALTER TABLE center ADD COLUMN manager_email varchar(200);
ALTER TABLE center ADD COLUMN website_url varchar(500);
-- Management
ALTER TABLE center ADD COLUMN center_manager varchar(200);
ALTER TABLE center ADD COLUMN kitchen_manager varchar(200);
ALTER TABLE center ADD COLUMN event_manager varchar(200);
ALTER TABLE center ADD COLUMN area_manager varchar(200);
ALTER TABLE center ADD COLUMN district_manager varchar(200);
ALTER TABLE center ADD COLUMN director_of_sales varchar(200);
ALTER TABLE center ADD COLUMN regional_vp varchar(200);
-- Org hierarchy
ALTER TABLE center ADD COLUMN district integer;
ALTER TABLE center ADD COLUMN region integer;
ALTER TABLE center ADD COLUMN sales_region integer;
ALTER TABLE center ADD COLUMN sales_territory integer;
-- Operations
ALTER TABLE center ADD COLUMN maintenance_time_start varchar(20);
ALTER TABLE center ADD COLUMN maintenance_time_end varchar(20);
ALTER TABLE center ADD COLUMN default_resource_type varchar(50);
ALTER TABLE center ADD COLUMN vendor varchar(200);
-- Nested as JSONB
ALTER TABLE center ADD COLUMN timetable jsonb;
ALTER TABLE center ADD COLUMN special_hours jsonb;
ALTER TABLE center ADD COLUMN prime_times jsonb;
ALTER TABLE center ADD COLUMN schedules jsonb;
ALTER TABLE center ADD COLUMN event_type_routing_rules jsonb;
ALTER TABLE center ADD COLUMN tags jsonb;
-- Sync metadata
ALTER TABLE center ADD COLUMN cheetah_synced_at timestamptz;
ALTER TABLE center ADD COLUMN cheetah_updated_at timestamptz;</code></pre>
</section>
<!-- ───────────────── JOIN KEY ───────────────── -->
<section>
<h2>Join Key: How Systems Connect</h2>
<pre class="mermaid">
flowchart LR
CHEETAH["Cheetah API
centerNumber: 13"] -->|"="| PG["Postgres
center_id: 13"]
PG -->|"="| SF["Snowflake
CENTER_NUMBER: 13"]
QUBICA["Qubica EventHub
center-10580"] -->|maps to| PG
PG -->|"qubica_id: 10580"| SF
</pre>
<div class="callout callout-warn">
<strong>Reconciliation needed:</strong> We have 382 centers in Postgres. The API may have a different count. 46 centers have no <code>qubica_id</code>&mdash;these are real bowling centers that simply don't have Qubica scoring enabled, so we don't receive real-time event data from them. The sync should handle centers that exist in one system but not the other gracefully (upsert, don't delete).
</div>
</section>
<!-- ───────────────── IMPLEMENTATION ───────────────── -->
<section>
<h2>Implementation Plan</h2>
<h3>Phase 1: Postgres Schema + cheetah-sync Service</h3>
<p><span class="badge badge-blue">RDS path</span> Testable locally without Dagster or Snowflake.</p>
<ol>
<li><strong>Postgres migration</strong> &mdash; Add Cheetah columns + indexes for <code>cheetah_id</code>, <code>external_id</code>, and <code>cheetah_updated_at</code></li>
<li><strong><code>apps/cheetah-sync</code></strong> &mdash; TypeScript service using Kysely + <code>@connected-play/pg</code>, with schema validation and pagination support</li>
<li><strong>Atomic write path</strong> &mdash; Load API response into a staging table, then upsert into <code>center</code> in one transaction</li>
<li><strong>Run metadata</strong> &mdash; Add <code>cheetah_sync_run</code> table (run_id, row counts, warnings, status, error) for observability and audits</li>
<li><strong>Retry policy</strong> &mdash; Add bounded retries with exponential backoff + jitter for API and transient DB errors</li>
<li><strong>Test locally</strong> &mdash; <code>npx tsx apps/cheetah-sync/src/index.ts</code> against <code>localhost:5433</code></li>
</ol>
<pre><code class="language-typescript">// apps/cheetah-sync/src/index.ts (sketch)
import { createDatabase } from "@connected-play/pg";
const API_URL = process.env.CHEETAH_API_URL
?? "https://cheetah-api-proxy.bellum.ai/api/centers";
async function main() {
const db = createDatabase();
const headers: Record&lt;string, string&gt; = {};
if (process.env.CHEETAH_API_KEY) {
headers["x-api-key"] = process.env.CHEETAH_API_KEY;
}
const centers = await fetchAllPagesWithRetry(API_URL, headers);
const validCenters = validateCentersPayload(centers); // fail fast on schema drift
console.log(`Fetched ${validCenters.length} centers from Cheetah API`);
const run = await startSyncRun(db, validCenters.length);
const { inserted, updated, warnings } = await db.transaction().execute(async (trx) =&gt; {
await loadCenterStaging(trx, validCenters); // batched insert, not row-by-row
const merge = await mergeStagingIntoCenter(trx); // single atomic upsert
await markMissingCenters(trx); // sets cheetah_missing_since after N misses
return merge;
});
await finishSyncRun(db, run.runId, { inserted, updated, warnings, status: "success" });
console.log(`Done: ${inserted} created, ${updated} updated`);
await db.destroy();
}</code></pre>
<h3>Phase 2: Dagster &rarr; Snowflake</h3>
<p><span class="badge badge-blue">Analytics path</span> Dagster stays purely Snowflake&mdash;no RDS connection.</p>
<ol>
<li><strong>Snowflake DDLs</strong> &mdash; <code>RAW.CHEETAH.CENTER</code> + <code>REPORT.DIM_CENTER</code></li>
<li><strong><code>cheetah_centers</code> asset</strong> &mdash; Fetch API, load staging table, then <code>SWAP</code> (or <code>MERGE</code>) into Snowflake RAW atomically</li>
<li><strong><code>dim_center</code> asset</strong> &mdash; MERGE RAW &rarr; REPORT</li>
<li>Add <code>requests</code> to Dagster <code>requirements.txt</code></li>
<li>Wire assets into existing <code>daily_transformations</code> job</li>
</ol>
<pre><code class="language-python"># dagster_code/assets/cheetah_centers.py (sketch)
@asset(
key=["CONNECTED_PLAY", "RAW", "cheetah_centers"],
group_name="snowflake_transformations",
)
def cheetah_centers(context, snowflake: SnowflakeResource):
headers = {}
api_key = os.environ.get("CHEETAH_API_KEY")
if api_key:
headers["x-api-key"] = api_key
resp = requests.get(CHEETAH_API_URL, headers=headers, timeout=30)
centers = resp.json()
context.log.info(f"Fetched {len(centers)} centers from Cheetah API")
# Snowflake only: staging load + atomic swap to avoid partial/empty table windows
sf_conn = snowflake.get_connection()
create_and_load_staging(sf_conn, centers)
sf_conn.cursor().execute("ALTER TABLE ... SWAP WITH ...")
return Output(None, metadata={"centers_synced": len(centers)})</code></pre>
<h3>Phase 3: Production Deployment</h3>
<ol>
<li><strong>cheetah-sync ECS infrastructure</strong> &mdash; Terraform for ECR, scheduled Fargate task (CloudWatch Events rule, daily)</li>
<li><strong>Dagster schedule</strong> &mdash; <code>cheetah_centers</code> runs as part of existing 3 AM daily job</li>
<li><strong>Data quality checks</strong> &mdash; Row count anomaly detection, zero-row failure, and max <code>api_updated_at</code> freshness checks</li>
<li><strong>Cross-system drift checks</strong> &mdash; Compare row count and max <code>api_updated_at</code> between Postgres sync and Snowflake asset runs</li>
<li><strong>Safe rollout</strong> &mdash; Dry-run diff in prod, then one canary scheduled run before enabling daily schedule broadly</li>
<li><strong>DIM_CENTER wiring</strong> &mdash; Replace hardcoded DIM_CENTER references in <code>center_status_interval</code> and <code>fact_game_bowler</code></li>
</ol>
</section>
<!-- ───────────────── OPERATIONAL GUARDRAILS ───────────────── -->
<section>
<h2>Operational Guardrails</h2>
<ul>
<li><strong>Idempotency + atomicity</strong> &mdash; No partial writes on failure; rerunning the same payload produces the same final state</li>
<li><strong>Run auditability</strong> &mdash; Every run is persisted with inputs, counts, warnings, and error details</li>
<li><strong>Schema drift detection</strong> &mdash; Unknown/new fields are surfaced in logs/metrics before silently dropping data</li>
<li><strong>Missing-center policy</strong> &mdash; Do not hard-delete; mark as stale with <code>cheetah_missing_since</code> after repeated misses</li>
<li><strong>API evolution readiness</strong> &mdash; Pagination loop + retry/backoff built in from day one</li>
</ul>
</section>
<!-- ───────────────── LOCAL DEV ───────────────── -->
<section>
<h2>Local Development Setup</h2>
<h3>Testing cheetah-sync (Postgres path)</h3>
<pre><code class="language-bash"># Start local Postgres
docker compose up -d postgres
# Run the migration
cd apps/pg && pnpm migrate:up && cd ../..
# Run the sync
npx tsx apps/cheetah-sync/src/index.ts
# Verify
psql -p 5433 -U connected_play -d connected_play \
-c "SELECT center_id, name, status, num_lanes, cheetah_synced_at
FROM center WHERE cheetah_synced_at IS NOT NULL LIMIT 5"</code></pre>
<h3>Testing Dagster assets (Snowflake path)</h3>
<pre><code class="language-bash"># Start Dagster stack (requires Snowflake credentials in env)
docker compose --profile dagster up -d
# Visit http://localhost:3070
# Materialize "cheetah_centers" then "dim_center"</code></pre>
<div class="callout">
The two paths are independent. You can test the Postgres sync without Dagster, and vice versa.
</div>
</section>
<!-- ───────────────── OPEN QUESTIONS ───────────────── -->
<section>
<h2>Open Questions</h2>
<div class="decision-box">
<p><strong>1. API Authentication</strong> <span class="badge badge-green">Resolved</span></p>
<p>The sync supports an optional <code>CHEETAH_API_KEY</code> env var, sent as an <code>x-api-key</code> header. No auth is required today, but the plumbing is in place for when it's added.</p>
</div>
<div class="decision-box">
<p><strong>2. Centers missing from the API</strong></p>
<p>If a center exists in Postgres but not in the API response, we do not delete it. We log a warning and track staleness with <code>cheetah_missing_since</code>; optional archival can be applied only after a configured miss threshold.</p>
</div>
<div class="decision-box">
<p><strong>3. New centers in the API</strong></p>
<p>If the API returns a center we don't have, we create the center row, record it in run metadata, and emit a high-visibility log for ops verification.</p>
</div>
<div class="decision-box">
<p><strong>4. Cheetah API pagination</strong> <span class="badge badge-green">Resolved</span></p>
<p>Implement a pagination loop now. Treat the current single-response behavior as incidental, not guaranteed.</p>
</div>
<div class="decision-box">
<p><strong>5. Dagster asset scheduling</strong></p>
<p>The <code>cheetah_centers</code> asset doesn't need daily partitioning (it's a full-replace dimension). It should be an <strong>unpartitioned asset</strong> that runs daily as part of the existing schedule, upstream of <code>dim_center</code>.</p>
</div>
</section>
<!-- ───────────────── FILE MAP ───────────────── -->
<section>
<h2>Files to Create/Modify</h2>
<h3>cheetah-sync Service (RDS path)</h3>
<table>
<thead><tr><th>File</th><th>Action</th><th>Description</th></tr></thead>
<tbody>
<tr><td><code>apps/pg/migrations/&lt;ts&gt;_expand_center_table.ts</code></td><td><span class="badge badge-yellow">Create</span></td><td>Add ~30 columns to center table, plus indexes and <code>cheetah_missing_since</code></td></tr>
<tr><td><code>apps/pg/migrations/&lt;ts&gt;_create_cheetah_sync_run.ts</code></td><td><span class="badge badge-yellow">Create</span></td><td>Run-audit table for row counts, warnings, status, and errors</td></tr>
<tr><td><code>apps/cheetah-sync/src/index.ts</code></td><td><span class="badge badge-yellow">Create</span></td><td>Main entry point: fetch API, upsert Postgres</td></tr>
<tr><td><code>apps/cheetah-sync/src/schema.ts</code></td><td><span class="badge badge-yellow">Create</span></td><td>Runtime payload validation and drift detection</td></tr>
<tr><td><code>apps/cheetah-sync/src/retry.ts</code></td><td><span class="badge badge-yellow">Create</span></td><td>Bounded retries with exponential backoff + jitter</td></tr>
<tr><td><code>apps/cheetah-sync/package.json</code></td><td><span class="badge badge-yellow">Create</span></td><td>Dependencies: <code>@connected-play/pg</code>, <code>dotenv</code></td></tr>
<tr><td><code>apps/cheetah-sync/Dockerfile</code></td><td><span class="badge badge-yellow">Create</span></td><td>Container for ECS scheduled task</td></tr>
<tr><td><code>apps/cheetah-sync/terraform/</code></td><td><span class="badge badge-yellow">Create</span></td><td>ECR, Fargate task def, CloudWatch Events rule</td></tr>
<tr><td><code>apps/cheetah-sync/Makefile</code></td><td><span class="badge badge-yellow">Create</span></td><td>ECR deploy targets (same pattern as other services)</td></tr>
</tbody>
</table>
<h3>Dagster Assets (Snowflake path)</h3>
<table>
<thead><tr><th>File</th><th>Action</th><th>Description</th></tr></thead>
<tbody>
<tr><td><code>apps/dagster/dagster_code/assets/cheetah_centers.py</code></td><td><span class="badge badge-yellow">Create</span></td><td>Dagster asset: API &rarr; Snowflake RAW with atomic staging/swap</td></tr>
<tr><td><code>apps/dagster/dagster_code/assets/dim_center.py</code></td><td><span class="badge badge-yellow">Create</span></td><td>Dagster asset: RAW &rarr; REPORT</td></tr>
<tr><td><code>apps/dagster/dagster_code/sql/ddl/cheetah_centers.ddl.sql</code></td><td><span class="badge badge-yellow">Create</span></td><td>RAW.CHEETAH.CENTER DDL</td></tr>
<tr><td><code>apps/dagster/dagster_code/sql/ddl/dim_center.ddl.sql</code></td><td><span class="badge badge-yellow">Create</span></td><td>REPORT.DIM_CENTER DDL</td></tr>
<tr><td><code>apps/dagster/dagster_code/sql/dim_center.jinja.sql</code></td><td><span class="badge badge-yellow">Create</span></td><td>RAW &rarr; REPORT transform SQL</td></tr>
<tr><td><code>apps/dagster/dagster_code/repository.py</code></td><td><span class="badge badge-blue">Modify</span></td><td>Register new assets</td></tr>
<tr><td><code>apps/dagster/requirements.txt</code></td><td><span class="badge badge-blue">Modify</span></td><td>Add <code>requests</code></td></tr>
</tbody>
</table>
</section>
<hr>
<footer>
<small>Connected Play &mdash; Cheetah Center Sync Design &mdash; Feb 2026</small>
</footer>
</main>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment