Last active
February 11, 2026 18:03
-
-
Save jacoblyles/2ce474e030dd2fb40dd2fc91ffbda894 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <!DOCTYPE html> | |
| <html lang="en" data-theme="dark"> | |
| <head> | |
| <meta charset="UTF-8"> | |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> | |
| <title>Cheetah Center Data Sync — Design Document</title> | |
| <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/@picocss/pico@2/css/pico.min.css"> | |
| <script type="module"> | |
| import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs'; | |
| mermaid.initialize({ startOnLoad: true, theme: 'dark', securityLevel: 'loose' }); | |
| </script> | |
| <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/themes/prism-tomorrow.min.css"> | |
| <script src="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/prism.min.js"></script> | |
| <script src="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/components/prism-python.min.js"></script> | |
| <script src="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/components/prism-typescript.min.js"></script> | |
| <script src="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/components/prism-sql.min.js"></script> | |
| <script src="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/components/prism-bash.min.js"></script> | |
| <script src="https://cdn.jsdelivr.net/npm/prismjs@1.29.0/components/prism-hcl.min.js"></script> | |
| <style> | |
| :root { | |
| --gap: clamp(0.8rem, 2vw, 1.5rem); | |
| } | |
| .field-table td:nth-child(2), | |
| .field-table td:nth-child(3), | |
| .field-table td:nth-child(4) { text-align: center; } | |
| .field-table .yes { color: #3fb950; } | |
| .field-table .no { color: #6e7681; } | |
| .field-table .new { color: #d29922; font-weight: 600; } | |
| .badge { | |
| display: inline-block; padding: 0.15em 0.5em; border-radius: 4px; | |
| font-size: 0.75em; font-weight: 600; vertical-align: middle; | |
| } | |
| .badge-green { background: #23863633; color: #3fb950; border: 1px solid #3fb95044; } | |
| .badge-yellow { background: #d2992233; color: #d29922; border: 1px solid #d2992244; } | |
| .badge-blue { background: #58a6ff33; color: #58a6ff; border: 1px solid #58a6ff44; } | |
| .badge-red { background: #f8514933; color: #f85149; border: 1px solid #f8514944; } | |
| summary { cursor: pointer; font-weight: 600; } | |
| .mermaid { background: transparent !important; } | |
| .decision-box { | |
| border-left: 3px solid #58a6ff; padding-left: 1rem; margin: 1rem 0; | |
| } | |
| .callout { | |
| background: #161b2233; border: 1px solid #30363d; | |
| border-radius: 8px; padding: 1rem 1.2rem; margin: 1rem 0; | |
| } | |
| .callout-warn { | |
| border-color: #d29922; background: #d2992210; | |
| } | |
| h3 { margin-top: 2rem; } | |
| article { margin-bottom: var(--gap); } | |
| </style> | |
| </head> | |
| <body> | |
| <main class="container"> | |
| <hgroup> | |
| <h1>Cheetah Center Data Sync</h1> | |
| <p>Design document — Ingesting legacy Bellum API center data into Connected Play</p> | |
| </hgroup> | |
| <hr> | |
| <!-- ───────────────── PROBLEM ───────────────── --> | |
| <section> | |
| <h2>Problem</h2> | |
| <p>The legacy Bellum API (<code>cheetah-api-proxy.bellum.ai</code>) exposes rich center data—management hierarchy, operating hours, maintenance windows, capacity, booking config, and more. Our system captures only a thin slice (name, location, contact) seeded once from Apollo and <strong>never refreshed</strong>.</p> | |
| <div class="callout"> | |
| <strong>Current state:</strong> 382 centers in Postgres, seeded from Apollo. No ongoing sync. Snowflake <code>DIM_CENTER</code> exists but has no DDL or Dagster asset backing it. The <code>LUCKYSTRIKE_RAW.CHEETAH</code> schema exists in Terraform but is empty. | |
| </div> | |
| </section> | |
| <!-- ───────────────── FIELD GAP ───────────────── --> | |
| <section> | |
| <h2>Field Gap Analysis</h2> | |
| <p>The API returns ~45 fields per center. We currently store 12. Below is the complete mapping.</p> | |
| <details open> | |
| <summary>Identity & Core Fields</summary> | |
| <table class="field-table"> | |
| <thead><tr><th>API Field</th><th>Postgres</th><th>Snowflake</th><th>Action</th></tr></thead> | |
| <tbody> | |
| <tr><td><code>centerNumber</code></td><td class="yes">center_id</td><td class="yes">CENTER_ID</td><td><span class="badge badge-green">Exists</span></td></tr> | |
| <tr><td><code>id</code> <small>(ULID)</small></td><td class="no">—</td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>externalId</code> <small>(UUID)</small></td><td class="no">—</td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>intercardCenterNumber</code></td><td class="no">—</td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>centerName</code></td><td class="yes">name</td><td class="yes">NAME</td><td><span class="badge badge-green">Exists</span></td></tr> | |
| <tr><td><code>brand</code></td><td class="yes">brand</td><td class="no">—</td><td><span class="badge badge-green">Exists</span></td></tr> | |
| <tr><td><code>status</code> <small>(open/closed)</small></td><td class="no">—</td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>archived</code></td><td class="no"><small>soft delete</small></td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>numLanes</code></td><td class="no"><small>derived</small></td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>maxCapacity</code></td><td class="no">—</td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| </tbody> | |
| </table> | |
| </details> | |
| <details> | |
| <summary>Location & Contact</summary> | |
| <table class="field-table"> | |
| <thead><tr><th>API Field</th><th>Postgres</th><th>Snowflake</th><th>Action</th></tr></thead> | |
| <tbody> | |
| <tr><td><code>latitude</code> / <code>longitude</code></td><td class="yes">lat/lng</td><td class="no">—</td><td><span class="badge badge-green">Exists</span></td></tr> | |
| <tr><td><code>streetAddress</code></td><td class="yes">address</td><td class="no">—</td><td><span class="badge badge-green">Exists</span></td></tr> | |
| <tr><td><code>city</code> / <code>state</code></td><td class="yes">city/state</td><td class="no">—</td><td><span class="badge badge-green">Exists</span></td></tr> | |
| <tr><td><code>zipCode</code> / <code>zip4</code></td><td class="yes">postal_code</td><td class="no">—</td><td><span class="badge badge-green">Exists</span> <small>+ zip4</small></td></tr> | |
| <tr><td><code>country</code></td><td class="no">—</td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>timezone</code></td><td class="yes">timezone</td><td class="yes">TIMEZONE</td><td><span class="badge badge-green">Exists</span></td></tr> | |
| <tr><td><code>phone</code></td><td class="yes">phone</td><td class="no">—</td><td><span class="badge badge-green">Exists</span></td></tr> | |
| <tr><td><code>centerEmail</code></td><td class="no">—</td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>managerEmail</code></td><td class="no">—</td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>websiteUrl</code></td><td class="no">—</td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| </tbody> | |
| </table> | |
| </details> | |
| <details> | |
| <summary>Management & Org Hierarchy</summary> | |
| <table class="field-table"> | |
| <thead><tr><th>API Field</th><th>Postgres</th><th>Snowflake</th><th>Action</th></tr></thead> | |
| <tbody> | |
| <tr><td><code>centerManager</code></td><td class="no">—</td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>kitchenManager</code></td><td class="no">—</td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>eventManager</code></td><td class="no">—</td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>areaManager</code></td><td class="no">—</td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>districtManager</code></td><td class="no">—</td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>directorOfSales</code></td><td class="no">—</td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>regionalVicePresident</code></td><td class="no">—</td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>district</code></td><td class="no">—</td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>region</code></td><td class="no">—</td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>salesRegion</code> / <code>salesTerritory</code></td><td class="no">—</td><td class="no">—</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| </tbody> | |
| </table> | |
| </details> | |
| <details> | |
| <summary>Operations & Scheduling (nested objects)</summary> | |
| <table class="field-table"> | |
| <thead><tr><th>API Field</th><th>Type</th><th>Notes</th><th>Action</th></tr></thead> | |
| <tbody> | |
| <tr><td><code>maintenanceTimeStart</code> / <code>End</code></td><td>string</td><td>Daily maintenance window</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>timetable</code></td><td>array</td><td>Weekly open/close hours per day</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>specialHours</code></td><td>array</td><td>Holiday/override hours</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>primeTimes</code></td><td>array</td><td>Peak pricing windows</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>schedules</code></td><td>array</td><td>Scheduled activities</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>eventTypeRoutingRules</code></td><td>array</td><td>Booking event routing config</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>defaultResourceType</code></td><td>string</td><td>Booking resource type</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>vendor</code></td><td>string</td><td></td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| <tr><td><code>tags</code></td><td>array|null</td><td>Metadata tags</td><td><span class="badge badge-yellow">Add</span></td></tr> | |
| </tbody> | |
| </table> | |
| </details> | |
| </section> | |
| <!-- ───────────────── UPDATE FREQUENCY ───────────────── --> | |
| <section> | |
| <h2>Update Frequency</h2> | |
| <p>The API's <code>updatedAt</code> field shows <code>2026-02-10 05:00:02 UTC</code> (today), suggesting a <strong>nightly batch update</strong>. Fields like management assignments, schedules, and center status are the most likely to change. A <strong>once-daily sync</strong> is sufficient, but this cadence should be treated as an assumption and validated over several days of run history.</p> | |
| </section> | |
| <!-- ───────────────── ARCHITECTURE ───────────────── --> | |
| <section> | |
| <h2>Proposed Architecture</h2> | |
| <div class="callout"> | |
| <strong>Design principle:</strong> Dagster stays focused on Snowflake transforms (its existing domain). A separate TypeScript service handles the RDS Postgres sync, reusing the existing Kysely types and database tooling from <code>@connected-play/pg</code>. Both services fetch from the same API independently—no coupling. | |
| </div> | |
| <h3>Production</h3> | |
| <pre class="mermaid"> | |
| flowchart LR | |
| API["Cheetah API | |
| /api/centers"] | |
| subgraph CheetahSync ["cheetah-sync (ECS scheduled task)"] | |
| TS["TypeScript | |
| Kysely upsert"] | |
| end | |
| subgraph Dagster ["Dagster (daily @ 3 AM)"] | |
| FETCH["cheetah_centers | |
| asset"] | |
| DIM["dim_center | |
| asset"] | |
| end | |
| subgraph Snowflake | |
| RAW["RAW.CHEETAH.CENTER | |
| (full JSON as VARIANT)"] | |
| REPORT["REPORT.DIM_CENTER | |
| (structured dimension)"] | |
| end | |
| subgraph RDS ["Postgres (RDS)"] | |
| CT["center table | |
| (expanded)"] | |
| end | |
| API --> TS | |
| TS -->|"upsert all fields"| CT | |
| API --> FETCH | |
| FETCH --> RAW | |
| RAW --> DIM --> REPORT | |
| </pre> | |
| <h3>Local Development</h3> | |
| <pre class="mermaid"> | |
| flowchart LR | |
| API["Cheetah API | |
| /api/centers"] | |
| subgraph LocalStack ["localhost"] | |
| direction TB | |
| TS["npx tsx | |
| apps/cheetah-sync/src/index.ts"] | |
| PG["Postgres :5433"] | |
| end | |
| API --> TS | |
| TS -->|"upsert all fields"| PG | |
| </pre> | |
| <div class="callout"> | |
| <strong>Why separate services?</strong> | |
| <ul style="margin-bottom:0"> | |
| <li>Dagster currently has <strong>no RDS connection</strong>—adding one means a new resource class, <code>psycopg2</code>, RDS env vars in ECS, security group changes</li> | |
| <li>The TS service gets Kysely types, <code>@connected-play/pg</code>, and the existing database patterns <strong>for free</strong></li> | |
| <li>Matches the existing pattern: <code>pg-sync</code> writes events to RDS, Dagster transforms data in Snowflake</li> | |
| <li>Simpler to test locally—just <code>npx tsx</code> against local Postgres, no Dagster needed</li> | |
| </ul> | |
| </div> | |
| </section> | |
| <!-- ───────────────── DATA MODELING ───────────────── --> | |
| <section> | |
| <h2>Data Modeling</h2> | |
| <h3>Snowflake: Raw Landing</h3> | |
| <p>Full API response as VARIANT in <code>LUCKYSTRIKE_RAW.CHEETAH.CENTER</code>. One row per center, full-replace daily.</p> | |
| <pre><code class="language-sql">CREATE OR REPLACE TABLE LUCKYSTRIKE_RAW.CHEETAH.CENTER ( | |
| CENTER_NUMBER INTEGER NOT NULL, -- centerNumber from API | |
| RAW_DATA VARIANT NOT NULL, -- full JSON object | |
| SYNCED_AT TIMESTAMP_TZ DEFAULT CURRENT_TIMESTAMP(), | |
| API_UPDATED_AT TIMESTAMP_TZ, -- updatedAt from API | |
| PRIMARY KEY (CENTER_NUMBER) | |
| );</code></pre> | |
| <h3>Snowflake: DIM_CENTER (Dagster asset)</h3> | |
| <p>Structured dimension table in <code>LUCKYSTRIKE_ANALYSIS.CONNECTED_PLAY.DIM_CENTER</code>, built from raw + existing Postgres data.</p> | |
| <pre><code class="language-sql">CREATE OR REPLACE TABLE LUCKYSTRIKE_ANALYSIS.CONNECTED_PLAY.DIM_CENTER ( | |
| CENTER_NUMBER INTEGER PRIMARY KEY, | |
| CHEETAH_ID VARCHAR, -- ULID from API | |
| EXTERNAL_ID VARCHAR, -- UUID from API | |
| QUBICA_ID INTEGER, -- from Postgres center table | |
| NAME VARCHAR NOT NULL, | |
| BRAND VARCHAR, | |
| STATUS VARCHAR, -- open/closed | |
| ARCHIVED BOOLEAN, | |
| NUM_LANES INTEGER, | |
| MAX_CAPACITY INTEGER, | |
| -- Location | |
| LATITUDE NUMBER(9,6), | |
| LONGITUDE NUMBER(9,6), | |
| STREET_ADDRESS VARCHAR, | |
| CITY VARCHAR, | |
| STATE VARCHAR, | |
| ZIP_CODE VARCHAR, | |
| ZIP4 VARCHAR, | |
| COUNTRY VARCHAR, | |
| TIMEZONE VARCHAR, | |
| -- Contact | |
| PHONE VARCHAR, | |
| CENTER_EMAIL VARCHAR, | |
| MANAGER_EMAIL VARCHAR, | |
| WEBSITE_URL VARCHAR, | |
| -- Management | |
| CENTER_MANAGER VARCHAR, | |
| KITCHEN_MANAGER VARCHAR, | |
| EVENT_MANAGER VARCHAR, | |
| AREA_MANAGER VARCHAR, | |
| DISTRICT_MANAGER VARCHAR, | |
| DIRECTOR_OF_SALES VARCHAR, | |
| REGIONAL_VP VARCHAR, | |
| -- Org hierarchy | |
| DISTRICT INTEGER, | |
| REGION INTEGER, | |
| SALES_REGION INTEGER, | |
| SALES_TERRITORY INTEGER, | |
| INTERCARD_CENTER_NUMBER INTEGER, | |
| -- Operations | |
| MAINTENANCE_TIME_START VARCHAR, | |
| MAINTENANCE_TIME_END VARCHAR, | |
| DEFAULT_RESOURCE_TYPE VARCHAR, | |
| VENDOR VARCHAR, | |
| -- Nested (VARIANT) | |
| TIMETABLE VARIANT, | |
| SPECIAL_HOURS VARIANT, | |
| PRIME_TIMES VARIANT, | |
| SCHEDULES VARIANT, | |
| EVENT_TYPE_ROUTING_RULES VARIANT, | |
| TAGS VARIANT, | |
| -- Metadata | |
| API_UPDATED_AT TIMESTAMP_TZ, | |
| SYNCED_AT TIMESTAMP_TZ, | |
| REC_INS_TS TIMESTAMP_TZ DEFAULT CURRENT_TIMESTAMP() | |
| );</code></pre> | |
| <h3>Postgres: Expanded center table</h3> | |
| <p>Add columns to the existing <code>center</code> table for operationally useful fields. Use JSONB for nested objects.</p> | |
| <pre><code class="language-sql">-- Migration: expand_center_table | |
| ALTER TABLE center ADD COLUMN cheetah_id varchar(30); -- ULID | |
| ALTER TABLE center ADD COLUMN external_id uuid; | |
| ALTER TABLE center ADD COLUMN intercard_center_number integer; | |
| ALTER TABLE center ADD COLUMN status varchar(20); -- open/closed | |
| ALTER TABLE center ADD COLUMN archived boolean DEFAULT false; | |
| ALTER TABLE center ADD COLUMN num_lanes integer; | |
| ALTER TABLE center ADD COLUMN max_capacity integer; | |
| ALTER TABLE center ADD COLUMN country varchar(100); | |
| ALTER TABLE center ADD COLUMN zip4 varchar(10); | |
| ALTER TABLE center ADD COLUMN center_email varchar(200); | |
| ALTER TABLE center ADD COLUMN manager_email varchar(200); | |
| ALTER TABLE center ADD COLUMN website_url varchar(500); | |
| -- Management | |
| ALTER TABLE center ADD COLUMN center_manager varchar(200); | |
| ALTER TABLE center ADD COLUMN kitchen_manager varchar(200); | |
| ALTER TABLE center ADD COLUMN event_manager varchar(200); | |
| ALTER TABLE center ADD COLUMN area_manager varchar(200); | |
| ALTER TABLE center ADD COLUMN district_manager varchar(200); | |
| ALTER TABLE center ADD COLUMN director_of_sales varchar(200); | |
| ALTER TABLE center ADD COLUMN regional_vp varchar(200); | |
| -- Org hierarchy | |
| ALTER TABLE center ADD COLUMN district integer; | |
| ALTER TABLE center ADD COLUMN region integer; | |
| ALTER TABLE center ADD COLUMN sales_region integer; | |
| ALTER TABLE center ADD COLUMN sales_territory integer; | |
| -- Operations | |
| ALTER TABLE center ADD COLUMN maintenance_time_start varchar(20); | |
| ALTER TABLE center ADD COLUMN maintenance_time_end varchar(20); | |
| ALTER TABLE center ADD COLUMN default_resource_type varchar(50); | |
| ALTER TABLE center ADD COLUMN vendor varchar(200); | |
| -- Nested as JSONB | |
| ALTER TABLE center ADD COLUMN timetable jsonb; | |
| ALTER TABLE center ADD COLUMN special_hours jsonb; | |
| ALTER TABLE center ADD COLUMN prime_times jsonb; | |
| ALTER TABLE center ADD COLUMN schedules jsonb; | |
| ALTER TABLE center ADD COLUMN event_type_routing_rules jsonb; | |
| ALTER TABLE center ADD COLUMN tags jsonb; | |
| -- Sync metadata | |
| ALTER TABLE center ADD COLUMN cheetah_synced_at timestamptz; | |
| ALTER TABLE center ADD COLUMN cheetah_updated_at timestamptz;</code></pre> | |
| </section> | |
| <!-- ───────────────── JOIN KEY ───────────────── --> | |
| <section> | |
| <h2>Join Key: How Systems Connect</h2> | |
| <pre class="mermaid"> | |
| flowchart LR | |
| CHEETAH["Cheetah API | |
| centerNumber: 13"] -->|"="| PG["Postgres | |
| center_id: 13"] | |
| PG -->|"="| SF["Snowflake | |
| CENTER_NUMBER: 13"] | |
| QUBICA["Qubica EventHub | |
| center-10580"] -->|maps to| PG | |
| PG -->|"qubica_id: 10580"| SF | |
| </pre> | |
| <div class="callout callout-warn"> | |
| <strong>Reconciliation needed:</strong> We have 382 centers in Postgres. The API may have a different count. 46 centers have no <code>qubica_id</code>—these are real bowling centers that simply don't have Qubica scoring enabled, so we don't receive real-time event data from them. The sync should handle centers that exist in one system but not the other gracefully (upsert, don't delete). | |
| </div> | |
| </section> | |
| <!-- ───────────────── IMPLEMENTATION ───────────────── --> | |
| <section> | |
| <h2>Implementation Plan</h2> | |
| <h3>Phase 1: Postgres Schema + cheetah-sync Service</h3> | |
| <p><span class="badge badge-blue">RDS path</span> Testable locally without Dagster or Snowflake.</p> | |
| <ol> | |
| <li><strong>Postgres migration</strong> — Add Cheetah columns + indexes for <code>cheetah_id</code>, <code>external_id</code>, and <code>cheetah_updated_at</code></li> | |
| <li><strong><code>apps/cheetah-sync</code></strong> — TypeScript service using Kysely + <code>@connected-play/pg</code>, with schema validation and pagination support</li> | |
| <li><strong>Atomic write path</strong> — Load API response into a staging table, then upsert into <code>center</code> in one transaction</li> | |
| <li><strong>Run metadata</strong> — Add <code>cheetah_sync_run</code> table (run_id, row counts, warnings, status, error) for observability and audits</li> | |
| <li><strong>Retry policy</strong> — Add bounded retries with exponential backoff + jitter for API and transient DB errors</li> | |
| <li><strong>Test locally</strong> — <code>npx tsx apps/cheetah-sync/src/index.ts</code> against <code>localhost:5433</code></li> | |
| </ol> | |
| <pre><code class="language-typescript">// apps/cheetah-sync/src/index.ts (sketch) | |
| import { createDatabase } from "@connected-play/pg"; | |
| const API_URL = process.env.CHEETAH_API_URL | |
| ?? "https://cheetah-api-proxy.bellum.ai/api/centers"; | |
| async function main() { | |
| const db = createDatabase(); | |
| const headers: Record<string, string> = {}; | |
| if (process.env.CHEETAH_API_KEY) { | |
| headers["x-api-key"] = process.env.CHEETAH_API_KEY; | |
| } | |
| const centers = await fetchAllPagesWithRetry(API_URL, headers); | |
| const validCenters = validateCentersPayload(centers); // fail fast on schema drift | |
| console.log(`Fetched ${validCenters.length} centers from Cheetah API`); | |
| const run = await startSyncRun(db, validCenters.length); | |
| const { inserted, updated, warnings } = await db.transaction().execute(async (trx) => { | |
| await loadCenterStaging(trx, validCenters); // batched insert, not row-by-row | |
| const merge = await mergeStagingIntoCenter(trx); // single atomic upsert | |
| await markMissingCenters(trx); // sets cheetah_missing_since after N misses | |
| return merge; | |
| }); | |
| await finishSyncRun(db, run.runId, { inserted, updated, warnings, status: "success" }); | |
| console.log(`Done: ${inserted} created, ${updated} updated`); | |
| await db.destroy(); | |
| }</code></pre> | |
| <h3>Phase 2: Dagster → Snowflake</h3> | |
| <p><span class="badge badge-blue">Analytics path</span> Dagster stays purely Snowflake—no RDS connection.</p> | |
| <ol> | |
| <li><strong>Snowflake DDLs</strong> — <code>RAW.CHEETAH.CENTER</code> + <code>REPORT.DIM_CENTER</code></li> | |
| <li><strong><code>cheetah_centers</code> asset</strong> — Fetch API, load staging table, then <code>SWAP</code> (or <code>MERGE</code>) into Snowflake RAW atomically</li> | |
| <li><strong><code>dim_center</code> asset</strong> — MERGE RAW → REPORT</li> | |
| <li>Add <code>requests</code> to Dagster <code>requirements.txt</code></li> | |
| <li>Wire assets into existing <code>daily_transformations</code> job</li> | |
| </ol> | |
| <pre><code class="language-python"># dagster_code/assets/cheetah_centers.py (sketch) | |
| @asset( | |
| key=["CONNECTED_PLAY", "RAW", "cheetah_centers"], | |
| group_name="snowflake_transformations", | |
| ) | |
| def cheetah_centers(context, snowflake: SnowflakeResource): | |
| headers = {} | |
| api_key = os.environ.get("CHEETAH_API_KEY") | |
| if api_key: | |
| headers["x-api-key"] = api_key | |
| resp = requests.get(CHEETAH_API_URL, headers=headers, timeout=30) | |
| centers = resp.json() | |
| context.log.info(f"Fetched {len(centers)} centers from Cheetah API") | |
| # Snowflake only: staging load + atomic swap to avoid partial/empty table windows | |
| sf_conn = snowflake.get_connection() | |
| create_and_load_staging(sf_conn, centers) | |
| sf_conn.cursor().execute("ALTER TABLE ... SWAP WITH ...") | |
| return Output(None, metadata={"centers_synced": len(centers)})</code></pre> | |
| <h3>Phase 3: Production Deployment</h3> | |
| <ol> | |
| <li><strong>cheetah-sync ECS infrastructure</strong> — Terraform for ECR, scheduled Fargate task (CloudWatch Events rule, daily)</li> | |
| <li><strong>Dagster schedule</strong> — <code>cheetah_centers</code> runs as part of existing 3 AM daily job</li> | |
| <li><strong>Data quality checks</strong> — Row count anomaly detection, zero-row failure, and max <code>api_updated_at</code> freshness checks</li> | |
| <li><strong>Cross-system drift checks</strong> — Compare row count and max <code>api_updated_at</code> between Postgres sync and Snowflake asset runs</li> | |
| <li><strong>Safe rollout</strong> — Dry-run diff in prod, then one canary scheduled run before enabling daily schedule broadly</li> | |
| <li><strong>DIM_CENTER wiring</strong> — Replace hardcoded DIM_CENTER references in <code>center_status_interval</code> and <code>fact_game_bowler</code></li> | |
| </ol> | |
| </section> | |
| <!-- ───────────────── OPERATIONAL GUARDRAILS ───────────────── --> | |
| <section> | |
| <h2>Operational Guardrails</h2> | |
| <ul> | |
| <li><strong>Idempotency + atomicity</strong> — No partial writes on failure; rerunning the same payload produces the same final state</li> | |
| <li><strong>Run auditability</strong> — Every run is persisted with inputs, counts, warnings, and error details</li> | |
| <li><strong>Schema drift detection</strong> — Unknown/new fields are surfaced in logs/metrics before silently dropping data</li> | |
| <li><strong>Missing-center policy</strong> — Do not hard-delete; mark as stale with <code>cheetah_missing_since</code> after repeated misses</li> | |
| <li><strong>API evolution readiness</strong> — Pagination loop + retry/backoff built in from day one</li> | |
| </ul> | |
| </section> | |
| <!-- ───────────────── LOCAL DEV ───────────────── --> | |
| <section> | |
| <h2>Local Development Setup</h2> | |
| <h3>Testing cheetah-sync (Postgres path)</h3> | |
| <pre><code class="language-bash"># Start local Postgres | |
| docker compose up -d postgres | |
| # Run the migration | |
| cd apps/pg && pnpm migrate:up && cd ../.. | |
| # Run the sync | |
| npx tsx apps/cheetah-sync/src/index.ts | |
| # Verify | |
| psql -p 5433 -U connected_play -d connected_play \ | |
| -c "SELECT center_id, name, status, num_lanes, cheetah_synced_at | |
| FROM center WHERE cheetah_synced_at IS NOT NULL LIMIT 5"</code></pre> | |
| <h3>Testing Dagster assets (Snowflake path)</h3> | |
| <pre><code class="language-bash"># Start Dagster stack (requires Snowflake credentials in env) | |
| docker compose --profile dagster up -d | |
| # Visit http://localhost:3070 | |
| # Materialize "cheetah_centers" then "dim_center"</code></pre> | |
| <div class="callout"> | |
| The two paths are independent. You can test the Postgres sync without Dagster, and vice versa. | |
| </div> | |
| </section> | |
| <!-- ───────────────── OPEN QUESTIONS ───────────────── --> | |
| <section> | |
| <h2>Open Questions</h2> | |
| <div class="decision-box"> | |
| <p><strong>1. API Authentication</strong> <span class="badge badge-green">Resolved</span></p> | |
| <p>The sync supports an optional <code>CHEETAH_API_KEY</code> env var, sent as an <code>x-api-key</code> header. No auth is required today, but the plumbing is in place for when it's added.</p> | |
| </div> | |
| <div class="decision-box"> | |
| <p><strong>2. Centers missing from the API</strong></p> | |
| <p>If a center exists in Postgres but not in the API response, we do not delete it. We log a warning and track staleness with <code>cheetah_missing_since</code>; optional archival can be applied only after a configured miss threshold.</p> | |
| </div> | |
| <div class="decision-box"> | |
| <p><strong>3. New centers in the API</strong></p> | |
| <p>If the API returns a center we don't have, we create the center row, record it in run metadata, and emit a high-visibility log for ops verification.</p> | |
| </div> | |
| <div class="decision-box"> | |
| <p><strong>4. Cheetah API pagination</strong> <span class="badge badge-green">Resolved</span></p> | |
| <p>Implement a pagination loop now. Treat the current single-response behavior as incidental, not guaranteed.</p> | |
| </div> | |
| <div class="decision-box"> | |
| <p><strong>5. Dagster asset scheduling</strong></p> | |
| <p>The <code>cheetah_centers</code> asset doesn't need daily partitioning (it's a full-replace dimension). It should be an <strong>unpartitioned asset</strong> that runs daily as part of the existing schedule, upstream of <code>dim_center</code>.</p> | |
| </div> | |
| </section> | |
| <!-- ───────────────── FILE MAP ───────────────── --> | |
| <section> | |
| <h2>Files to Create/Modify</h2> | |
| <h3>cheetah-sync Service (RDS path)</h3> | |
| <table> | |
| <thead><tr><th>File</th><th>Action</th><th>Description</th></tr></thead> | |
| <tbody> | |
| <tr><td><code>apps/pg/migrations/<ts>_expand_center_table.ts</code></td><td><span class="badge badge-yellow">Create</span></td><td>Add ~30 columns to center table, plus indexes and <code>cheetah_missing_since</code></td></tr> | |
| <tr><td><code>apps/pg/migrations/<ts>_create_cheetah_sync_run.ts</code></td><td><span class="badge badge-yellow">Create</span></td><td>Run-audit table for row counts, warnings, status, and errors</td></tr> | |
| <tr><td><code>apps/cheetah-sync/src/index.ts</code></td><td><span class="badge badge-yellow">Create</span></td><td>Main entry point: fetch API, upsert Postgres</td></tr> | |
| <tr><td><code>apps/cheetah-sync/src/schema.ts</code></td><td><span class="badge badge-yellow">Create</span></td><td>Runtime payload validation and drift detection</td></tr> | |
| <tr><td><code>apps/cheetah-sync/src/retry.ts</code></td><td><span class="badge badge-yellow">Create</span></td><td>Bounded retries with exponential backoff + jitter</td></tr> | |
| <tr><td><code>apps/cheetah-sync/package.json</code></td><td><span class="badge badge-yellow">Create</span></td><td>Dependencies: <code>@connected-play/pg</code>, <code>dotenv</code></td></tr> | |
| <tr><td><code>apps/cheetah-sync/Dockerfile</code></td><td><span class="badge badge-yellow">Create</span></td><td>Container for ECS scheduled task</td></tr> | |
| <tr><td><code>apps/cheetah-sync/terraform/</code></td><td><span class="badge badge-yellow">Create</span></td><td>ECR, Fargate task def, CloudWatch Events rule</td></tr> | |
| <tr><td><code>apps/cheetah-sync/Makefile</code></td><td><span class="badge badge-yellow">Create</span></td><td>ECR deploy targets (same pattern as other services)</td></tr> | |
| </tbody> | |
| </table> | |
| <h3>Dagster Assets (Snowflake path)</h3> | |
| <table> | |
| <thead><tr><th>File</th><th>Action</th><th>Description</th></tr></thead> | |
| <tbody> | |
| <tr><td><code>apps/dagster/dagster_code/assets/cheetah_centers.py</code></td><td><span class="badge badge-yellow">Create</span></td><td>Dagster asset: API → Snowflake RAW with atomic staging/swap</td></tr> | |
| <tr><td><code>apps/dagster/dagster_code/assets/dim_center.py</code></td><td><span class="badge badge-yellow">Create</span></td><td>Dagster asset: RAW → REPORT</td></tr> | |
| <tr><td><code>apps/dagster/dagster_code/sql/ddl/cheetah_centers.ddl.sql</code></td><td><span class="badge badge-yellow">Create</span></td><td>RAW.CHEETAH.CENTER DDL</td></tr> | |
| <tr><td><code>apps/dagster/dagster_code/sql/ddl/dim_center.ddl.sql</code></td><td><span class="badge badge-yellow">Create</span></td><td>REPORT.DIM_CENTER DDL</td></tr> | |
| <tr><td><code>apps/dagster/dagster_code/sql/dim_center.jinja.sql</code></td><td><span class="badge badge-yellow">Create</span></td><td>RAW → REPORT transform SQL</td></tr> | |
| <tr><td><code>apps/dagster/dagster_code/repository.py</code></td><td><span class="badge badge-blue">Modify</span></td><td>Register new assets</td></tr> | |
| <tr><td><code>apps/dagster/requirements.txt</code></td><td><span class="badge badge-blue">Modify</span></td><td>Add <code>requests</code></td></tr> | |
| </tbody> | |
| </table> | |
| </section> | |
| <hr> | |
| <footer> | |
| <small>Connected Play — Cheetah Center Sync Design — Feb 2026</small> | |
| </footer> | |
| </main> | |
| </body> | |
| </html> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment