Created
February 2, 2026 16:21
-
-
Save deepakjois/db306ba50b2c2050053067aef702c477 to your computer and use it in GitHub Desktop.
RAS (Recording Analytics Service) Deep Dive - Temporal Migration Analysis
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ╭─── Claude Code v2.1.20 ──────────────────────────────────────────────────────╮ | |
| │ │ Tips for getting │ | |
| │ Welcome back Deepak! │ started │ | |
| │ │ Run /init to create a … │ | |
| │ ▐▛███▜▌ │ ─────────────────────── │ | |
| │ ▝▜█████▛▘ │ Recent activity │ | |
| │ ▘▘ ▝▝ │ No recent activity │ | |
| │ │ │ | |
| │ Opus 4.5 · Claude Pro · deepak.jois@gmail.com's │ │ | |
| │ Organization │ │ | |
| │ ~/code/veo/recording-analytics │ │ | |
| ╰──────────────────────────────────────────────────────────────────────────────╯ | |
| ❯ do a deep dive into this codebase and describe the various parts of it. | |
| give a conceptual overview of the codebase with data flows and ascii | |
| diagrams whereever possible | |
| Here is the ticket where a dev from another infrastructure team will | |
| assist us with migrating the codebase over to use Temporal workflows. | |
| Organize the explanation of the codebase around themes in terms of the | |
| ticket overview below | |
| # Centralized orchestration > Move analytics orchestration from RAS to | |
| temporal | |
| Overall responsibilities of RAS: | |
| 1. Orchestrating AI jobs, i.e. sending requests to EMA and keeping track | |
| of the order of jobs. In this process some validation and a few | |
| transformations of the EMA output happens (e.g. changing a pts column to | |
| a timestamp column). | |
| 2. Delivering data to clients (Zola+MES) for features in the product. | |
| Here some transformation of the data happens on-the-fly. | |
| 3. Storing output files from AI jobs and storing the location of those | |
| files. | |
| --- | |
| Stages of how to roll out: | |
| 1. Move orchestration to temporal | |
| 1. | |
| 2. Move delivery to MES | |
| 3. Move what remains of storage to recordings-api | |
| --- | |
| Migration plan: | |
| * RAS should be able to handle both ways of doing orchestration (temporal | |
| + RAS's internal orchestration) | |
| * Shadow the existing processing workflow. Send traffic through both | |
| flows and check that the temporal workflow produces the right stuff. | |
| * Send a percentage of "orchestration traffic" to temporal instead of | |
| RAS's internal orchestration | |
| --- | |
| Questions to figure out: | |
| * Should we completely reuse the storage in RAS in the first round? | |
| * Should temporal query RAS for a presigned URL of where to upload | |
| files (to the RAS S3 bucket), i.e. reuse storage in RAS completely? | |
| * Or should the temporal workflow store the files in veo-content-ii? | |
| * Should temporal update the job states of RAS? This will make sure | |
| that RAS can still serve it clients exactly as it does today. | |
| * Where does RAS "stop"? | |
| * What is the interface between RAS/MES, RAS/Zola? | |
| * Which transformations in RAS goes to temporal and which goes to MES? | |
| * Should the first part of the project not include the transformation, | |
| that RAS currently performs today? | |
| * RAS is producing files based on EMA output. These files are deleted | |
| after 7 days. So RAS is essentially caching files. Should those files be | |
| produced as part of the temporal workflow? | |
| * Figure out if we should add separate asset types for "raw AI output" | |
| and "RAS transformed" files? | |
| --- | |
| For the future project about moving storage out of RAS: | |
| * Figure out what clients of RAS use the file locations which are stored | |
| in RAS? Ideally we would like RAS not to store file location. Can the | |
| clients simply look of file location in recordings-api instead? | |
| * Make sure that RAS job states is not required by any client | |
| * When we start to orchestrate via temporal, RAS will not store the | |
| state of jobs. We should make sure, that no client breaks if the job | |
| state is no longer updated. Also, L2/L3 supporters use the Zola admin | |
| page to see, if RAS/analytics failed for some reason. We need to make | |
| sure, that L2/L3 can still get information about whether or not analytics | |
| failed. Coordinate with L2/L3 about a solution for this. | |
| ## Metadata | |
| - URL: [https://linear.app/veo-technologies/project/centralized-orchestra | |
| tion-move-analytics-orchestration-from-ras-to-2d39bcc34b44](https://linea | |
| r.app/veo-technologies/project/centralized-orchestration-move-analytics-o | |
| rchestration-from-ras-to-2d39bcc34b44) | |
| - Status: Now | |
| - Lead: Tobias Slott Jensen | |
| - Members: Tobias Slott Jensen | |
| - Start date: Jan 12th | |
| - Target date: Not set | |
| ⏺ I'll do a thorough exploration of the codebase to build a comprehensive | |
| picture. Let me launch several exploration agents in parallel to cover | |
| different aspects. | |
| ⏺ 5 Explore agents finished (ctrl+o to expand) | |
| ├─ Explore top-level structure · 36 tool uses · 39.1k tokens | |
| │ ⎿ Done | |
| ├─ Explore orchestration/jobs system · 55 tool uses · 67.4k tokens | |
| │ ⎿ Done | |
| ├─ Explore API/delivery layer · 52 tool uses · 57.0k tokens | |
| │ ⎿ Done | |
| ├─ Explore storage layer · 60 tool uses · 68.4k tokens | |
| │ ⎿ Done | |
| └─ Explore external integrations · 59 tool uses · 61.3k tokens | |
| ⎿ Done | |
| ⏺ Now let me put together the comprehensive overview organized around the | |
| migration themes. | |
| --- | |
| RAS (Recording Analytics Service) — Deep Dive | |
| Tech Stack at a Glance | |
| - Language: TypeScript on Node.js 22 | |
| - Runtime: AWS Lambda (serverless) | |
| - IaC: AWS CDK | |
| - Storage: DynamoDB (state), S3 (files) | |
| - Messaging: SQS (job queues), EventBridge (domain events) | |
| - Architecture: Hexagonal (Ports & Adapters), Event Sourcing | |
| --- | |
| High-Level Architecture | |
| ┌─────────────────────────────────────────────────┐ | |
| │ CLIENTS | |
| │ | |
| │ Zola (Admin UI) MES (via | |
| EventBridge)│ | |
| └──────────┬──────────────────────┬───────────────┘ | |
| │ HTTP (X-Token) │ EventBridge | |
| ▼ ▲ | |
| ┌─────────────────────────┐ │ | |
| │ CloudFront CDN │ │ | |
| │ (7-14 day cache on GET) │ │ | |
| └───────────┬─────────────┘ │ | |
| │ │ | |
| ┌───────────▼─────────────┐ │ | |
| │ API Gateway │ │ | |
| │ (REST, token auth) │ │ | |
| └───────────┬─────────────┘ │ | |
| │ │ | |
| ┌───────────────────────────────┼────────────────────────┤ | |
| │ RAS (Lambda Functions) │ | |
| │ │ | |
| │ ┌──────────────────────────────────────────────┐ │ | |
| │ │ 1. ORCHESTRATION │ │ | |
| │ │ │ │ | |
| │ │ jobPrepper ──► jobScheduler ──► jobTrigger │ │ | |
| │ │ │ ▲ │ │ │ | |
| │ │ │ DynamoDB │ Streams │ │ │ | |
| │ │ ▼ │ ▼ │ │ | |
| │ │ [AnalyticsState] ◄──┘ EMA (AI) │ │ | |
| │ │ │ │ │ | |
| │ │ jobImporter ◄── jobCallback ◄─────┘ │ │ | |
| │ └──────────────────────────────────────────────┘ │ | |
| │ │ | |
| │ ┌──────────────────────────────────────────────┐ │ | |
| │ │ 2. DELIVERY (API Handlers) │ │ | |
| │ │ │ │ | |
| │ │ GET /player-tracking (normalize + merge) │ │ | |
| │ │ GET /events (filter by type) │ │ | |
| │ │ GET /match-details (serve from S3) │ │ | |
| │ │ GET /shot-details (serve from S3) │ │ | |
| │ │ GET /match-momentum (serve from DynamoDB)│ │ | |
| │ │ GET /match-report (serve from S3) │ │ | |
| │ │ GET /analytics (job state summary) │──────┘ | |
| │ └──────────────────────────────────────────────┘ publishes | |
| │ domain events | |
| │ ┌──────────────────────────────────────────────┐ | |
| │ │ 3. STORAGE │ | |
| │ │ │ | |
| │ │ S3: raw files (CSV, JSON, Parquet) │ | |
| │ │ S3: processed files (JSON) │ | |
| │ │ S3: cache bucket (7-day TTL) │ | |
| │ │ DynamoDB: analytics state + change log │ | |
| │ │ DynamoDB: recording metadata │ | |
| │ └──────────────────────────────────────────────┘ | |
| └──────────────────────────────────────────────────── | |
| --- | |
| Responsibility 1: Orchestrating AI Jobs | |
| This is the core of what the Temporal migration targets. RAS manages a | |
| pipeline of 6 AI job types with dependency ordering, communicating with the | |
| EMA inference service. | |
| Job Types & Dependencies | |
| INDEPENDENT (run in parallel) DEPENDENT (need all 3 independent | |
| jobs) | |
| ┌──────────────────┐ ┌──────────────────┐ | |
| │ event-detection │──────────┐ │ match-details │ | |
| └──────────────────┘ │ └──────────────────┘ | |
| ┌──────────────────┐ ├─────►┌──────────────────┐ | |
| │ player-tracking │──────────┤ │ shot-details │ | |
| └──────────────────┘ │ └──────────────────┘ | |
| ┌──────────────────┐ │ ┌──────────────────┐ | |
| │ ball-tracking │──────────┘ │ match-report │ | |
| └──────────────────┘ └──────────────────┘ | |
| The three independent jobs can run concurrently. The three dependent jobs each | |
| require all three independent jobs to reach succeeded/completed before they | |
| can be triggered. | |
| Job State Machine | |
| Each job goes through these states: | |
| null ──► pending ──► scheduled ──► starting ──► running ──► succeeded ──► | |
| completed | |
| │ | |
| ▲ | |
| └── failed | |
| │ | |
| (import | |
| finishes) | |
| - pending → job is validated and ready | |
| - scheduled → EMA has accepted the job | |
| - starting / running → EMA is processing | |
| - succeeded → EMA finished; raw output is on S3 | |
| - completed → RAS has imported and transformed the output | |
| - failed → error at any stage | |
| The 5-Lambda Orchestration Pipeline | |
| The orchestration is driven by SQS queues and DynamoDB Streams, spread across | |
| 5 Lambda functions: | |
| Zola (POST /jobs) | |
| │ | |
| ▼ | |
| ┌────────────┐ PrepJobQueue ┌────────────┐ | |
| │ API Handler │ ──── (SQS) ────► │ jobPrepper │ | |
| └────────────┘ └──────┬─────┘ | |
| │ Validates recording metadata | |
| │ Sets jobs to "pending" | |
| │ Writes AnalyticsChange to DynamoDB | |
| ▼ | |
| ┌──────────────┐ | |
| │ DynamoDB │ | |
| │ (analytics │ | |
| │ changes) │ | |
| └──────┬───────┘ | |
| │ DynamoDB Stream | |
| ▼ | |
| ┌──────────────┐ | |
| │ jobScheduler │ Compares old vs new state | |
| │ │ Evaluates dependencies | |
| └──┬───────┬──┘ | |
| │ │ | |
| ┌──────────────────────┘ └──────────────────┐ | |
| ▼ ▼ | |
| JobTriggerQueue JobImporterQueue | |
| (SQS) (SQS) | |
| │ │ | |
| ▼ ▼ | |
| ┌─────────────┐ ┌──────────────┐ | |
| │ jobTrigger │ │ jobImporter │ | |
| │ │ │ │ | |
| │ Calls EMA │ │ Reads raw S3 │ | |
| │ via HTTP │ │ Transforms │ | |
| └──────┬──────┘ │ Saves processed│ | |
| │ └──────────────┘ | |
| ▼ | |
| ┌─────────────┐ | |
| │ EMA │ (external AI inference service) | |
| │ /v2, /v3 │ | |
| └──────┬──────┘ | |
| │ HTTP webhook callback | |
| ▼ | |
| JobCallbackQueue (SQS) | |
| │ | |
| ▼ | |
| ┌─────────────┐ | |
| │ jobCallback │ | |
| │ │ | |
| │ Maps EMA │ | |
| │ status to │ | |
| │ Analytics │ | |
| │ Change │──────► DynamoDB ──► Stream ──► jobScheduler | |
| └─────────────┘ (loop back to evaluate next steps) | |
| Key design pattern: The jobScheduler is triggered by DynamoDB Streams on the | |
| analytics changes table. Every state change (from any source) triggers it to | |
| re-evaluate what should happen next. This is the central decision point that | |
| would be replaced by Temporal workflow logic. | |
| EMA Communication Details | |
| Triggering a job (src/integrations/ema/client.ts): | |
| - RAS generates presigned S3 PUT URLs where EMA should upload results | |
| - RAS sends HTTP POST to EMA with job inputs, metadata, and a callback URL | |
| - EMA versions: v2 for | |
| event-detection/shot-details/match-details/match-report, v3 for | |
| ball-tracking/player-tracking | |
| Callback from EMA (src/integrations/ema/callbackHandler.ts): | |
| - EMA POSTs to /recordings/{id}/jobs/callback when job state changes | |
| - The callback goes through SQS → jobCallback Lambda | |
| - RAS validates the job_id is still current (prevents stale callbacks) | |
| - Maps EMA status to an AnalyticsChange record in DynamoDB | |
| Transformations During Import | |
| When a job succeeds, the importer reads raw EMA output from S3 and transforms | |
| it: | |
| Job Type: player-tracking | |
| Raw Format: CSV (v2/v3) | |
| Transformation: Normalize coordinates to [0,1] range using field dimensions; | |
| merge ball tracking data by PTS/timestamp; v2: convert | |
| frameId→timestamp via FPS | |
| ──────────────────────────────────────── | |
| Job Type: ball-tracking | |
| Raw Format: CSV | |
| Transformation: Stored raw; merged into player-tracking during that import | |
| ──────────────────────────────────────── | |
| Job Type: event-detection | |
| Raw Format: JSON | |
| Transformation: Validated against schema (v2, v2.1, v2.2, v3) | |
| ──────────────────────────────────────── | |
| Job Type: match-details | |
| Raw Format: CSV | |
| Transformation: Parsed and stored as JSON | |
| ──────────────────────────────────────── | |
| Job Type: shot-details | |
| Raw Format: JSON | |
| Transformation: Validated and stored | |
| ──────────────────────────────────────── | |
| Job Type: match-report | |
| Raw Format: JSON | |
| Transformation: Validated and stored | |
| ──────────────────────────────────────── | |
| Job Type: match-momentum | |
| Raw Format: CSV | |
| Transformation: Parsed and stored in DynamoDB | |
| The PTS-to-timestamp conversion mentioned in the ticket happens in the player | |
| tracking importer (src/core/playerTracking/index.ts): v2 format uses frameId / | |
| fps to get seconds, while v3 has timestamp_in_seconds directly. | |
| --- | |
| Responsibility 2: Delivering Data to Clients | |
| RAS serves processed analytics data via REST API behind CloudFront. | |
| API Endpoint Map | |
| /recordings/{recording_id}/ | |
| │ | |
| ├── analytics GET (job state summary) No cache | |
| ├── analytics POST (replay state - Zola auth) No cache | |
| ├── analytics/logs GET (processing logs) No cache | |
| │ | |
| ├── player-tracking GET (merged + normalized data) 7-14 day cache | |
| ├── player-tracking/ | |
| │ └── jersey-numbers GET (jersey numbers by period) 7-14 day cache | |
| ├── player-moments GET (key player moments) 7-14 day cache | |
| ├── match-momentum GET (momentum timeline) 7-14 day cache | |
| ├── match-details GET (match stats) 7-14 day cache | |
| ├── match-details/ | |
| │ └── attacking-moments GET (sub-analysis) 7-14 day cache | |
| ├── events GET (event detections, ?type=) 7-14 day cache | |
| ├── shot-details GET (shot/goal data) 7-14 day cache | |
| ├── match-report GET (match summary) No cache | |
| │ | |
| ├── jobs POST (trigger jobs - Zola auth) | |
| ├── jobs DELETE (cancel jobs - Zola auth) | |
| ├── jobs/callback POST (EMA callback → SQS) | |
| │ | |
| ├── upload-url GET (presigned S3 URL - EMA auth) | |
| ├── raw-files GET (raw file links - Zola auth) 60s cache | |
| └── duplicate POST (copy analytics - Zola auth) | |
| On-the-Fly Transformations in Delivery | |
| The GET endpoints don't just serve stored files — several do real-time | |
| processing: | |
| /player-tracking — The most complex: | |
| Raw player tracking CSV (S3) | |
| + | |
| Raw ball tracking CSV (S3) ──► Merge by timestamp/PTS | |
| + │ | |
| Recording metadata (field dims) ▼ | |
| + Normalize coordinates | |
| Query params (?start, ?end, │ | |
| ?jersey, ?side, ?filters) ▼ | |
| Filter by time range, | |
| jersey, side, etc. | |
| │ | |
| ▼ | |
| JSON response with | |
| merged player+ball data | |
| /events — Filters raw event JSON by ?type= query parameter | |
| /jersey-numbers — Extracts and groups jersey numbers by team side and period | |
| Authentication | |
| ┌────────────┐ X-Token header ┌──────────────────┐ | |
| │ Zola │ ─────────────────► │ ZolaTokenAuth │──► POST/DELETE | |
| endpoints | |
| └────────────┘ │ (Secrets Manager) │ | |
| └──────────────────┘ | |
| ┌────────────┐ X-Token header ┌──────────────────┐ | |
| │ EMA │ ─────────────────► │ EmaTokenAuth │──► /upload-url | |
| endpoint | |
| └────────────┘ │ (Secrets Manager) │ | |
| └──────────────────┘ | |
| GET endpoints (analytics data) ──► No auth (public, cached by CloudFront) | |
| Caching Architecture | |
| Client | |
| │ | |
| ▼ | |
| CloudFront (Layer 1) | |
| │ Cache-Control: public, max-age=604800 (7 days) | |
| │ Cache keys include query params: start, end, jersey, side, type, etc. | |
| │ | |
| ▼ | |
| API Gateway → Lambda (Layer 2) | |
| │ Application-level cache for player tracking (CacheRepository → S3) | |
| │ | |
| ▼ | |
| Recordings API calls (Layer 3) | |
| │ Undici HTTP cache interceptor: 5-minute TTL on GET 200s | |
| │ | |
| ▼ | |
| S3 / DynamoDB (data source) | |
| Domain Event Publishing | |
| When jobs complete, RAS publishes to EventBridge (this is how MES and other | |
| services learn about results): | |
| DynamoDB Stream (analytics state changes) | |
| │ | |
| ▼ | |
| OutboundEventPublisher Lambda | |
| │ | |
| ▼ | |
| DomainEventBroadcaster evaluates: | |
| │ | |
| ├── shot-details completed? ──► ShotDetailsProcessedV1 | |
| ├── event-detection completed? ──► EventDetectionProcessedV1 | |
| ├── match-report completed? ──► MatchReportProcessedV1 | |
| └── ALL jobs completed? ──► RecordingAnalysisProcessedV1 | |
| (includes links to all endpoints) | |
| Event format: Source: "com.veo.recording-analytics", published to shared | |
| EventBridge bus. | |
| --- | |
| Responsibility 3: Storage | |
| S3 Bucket Layout | |
| ras-raw-analytics-bucket-{env}/ | |
| ├── ball-tracking/{recordingId}.csv | |
| ├── player-tracking/{recordingId}.csv | |
| ├── match-details/{recordingId}.csv | |
| ├── match-momentum/{recordingId}.csv | |
| ├── events/{recordingId}.json | |
| ├── shot-details/{recordingId}.json | |
| ├── match-report/{recordingId}.json | |
| └── physical-metrics/{recordingId}.parquet | |
| ras-events-bucket-{env}/ | |
| └── {recordingId}.json (processed events) | |
| ras-player-tracking-bucket-{env}/ | |
| └── {recordingId}.json (processed player tracking) | |
| ras-match-details-bucket-{env}/ | |
| └── {recordingId}.json (processed match details) | |
| ras-shot-details-bucket-{env}/ | |
| └── {recordingId}.json (processed shot details) | |
| ras-match-report-bucket-{env}/ | |
| └── {recordingId}.json (processed match report) | |
| ras-cache-bucket-{env}/ (7-day auto-expiry lifecycle rule) | |
| └── playerTracking:{recordingId} (cached transformed data) | |
| DynamoDB Tables | |
| Table: analyticsStates | |
| PK: id (recordingId) | |
| SK: — | |
| Purpose: Current job statuses (projected view) | |
| ──────────────────────────────────────── | |
| Table: analyticsStateChanges | |
| PK: id (recordingId) | |
| SK: version | |
| Purpose: Event sourcing log (immutable) | |
| ──────────────────────────────────────── | |
| Table: recordingMetadata | |
| PK: id (recordingId) | |
| SK: — | |
| Purpose: FPS, duration, pitch dims, camera URIs | |
| ──────────────────────────────────────── | |
| Table: matchMomentum | |
| PK: id (recordingId) | |
| SK: — | |
| Purpose: Momentum timeline data | |
| ──────────────────────────────────────── | |
| Table: dataDuplication | |
| PK: id (recordingId) | |
| SK: — | |
| Purpose: Tracks duplicated recordings | |
| The analyticsStateChanges table has DynamoDB Streams enabled — this is the | |
| backbone of the orchestration, triggering the jobScheduler on every state | |
| change. | |
| Presigned URL Flow | |
| EMA needs to upload results: | |
| EMA ──► GET /upload-url?jobType=ball-tracking | |
| │ | |
| ▼ | |
| RAS generates PutObject presigned URL (3-day expiry) | |
| for: ras-raw-analytics-bucket/{jobType}/{recordingId}.csv | |
| │ | |
| ▼ | |
| EMA uploads file directly to S3 | |
| Client needs to read raw files: | |
| Zola ──► GET /raw-files | |
| │ | |
| ▼ | |
| RAS generates GetObject presigned URLs (15-min expiry) | |
| for each raw file that exists | |
| │ | |
| ▼ | |
| Returns JSON with download links | |
| File Lifecycle | |
| - Raw files: Created by EMA upload, persist indefinitely (no lifecycle rule) | |
| - Processed files: Created by importers, persist indefinitely | |
| - Cache files: Auto-deleted after 7 days (S3 lifecycle rule on cache bucket) | |
| - Data duplication: DataDuplicator can copy all files from one recording to | |
| another | |
| --- | |
| Mapping to Temporal Migration Concerns | |
| Here's how the codebase maps to the questions in the ticket: | |
| What moves to Temporal? | |
| The entire orchestration pipeline (the 5 Lambdas + DynamoDB Streams trigger | |
| loop): | |
| TODAY (RAS internal orchestration): | |
| PrepJobQueue → jobPrepper → DynamoDB → Stream → jobScheduler | |
| → JobTriggerQueue → jobTrigger → EMA → callback → jobCallback | |
| → DynamoDB → Stream → jobScheduler → ImporterQueue → jobImporter | |
| FUTURE (Temporal): | |
| Temporal Workflow: | |
| 1. Validate recording (activity) | |
| 2. Trigger independent jobs in parallel (activities) | |
| 3. Wait for EMA callbacks (signals/activities) | |
| 4. When all 3 independent done → trigger dependent jobs (activities) | |
| 5. Wait for EMA callbacks | |
| 6. Import results (activities) | |
| 7. Publish domain events (activity) | |
| What stays in RAS? | |
| - Delivery layer: All GET API endpoints with on-the-fly transformations | |
| - Storage: S3 buckets, DynamoDB tables, presigned URL generation | |
| - Authentication: Token-based authorizers for Zola/EMA | |
| Shadow mode / dual orchestration | |
| The ticket mentions running both flows in parallel. Key integration points: | |
| - Job state updates: Temporal would need to write to the same | |
| analyticsStateChanges DynamoDB table (or RAS needs an API to accept state | |
| updates) | |
| - EMA callbacks: Currently go to /recordings/{id}/jobs/callback → SQS. | |
| Temporal could either receive these callbacks directly or RAS could forward | |
| them | |
| - Domain events: Currently published via DynamoDB Streams → EventBridge. If | |
| Temporal manages state, it needs to trigger these events too | |
| Transformations: Temporal vs MES vs RAS? | |
| ┌───────────────────────┬─────────────────────────────┬──────────────────────┐ | |
| │ Transformation │ Current Location │ Migration Target │ | |
| ├───────────────────────┼─────────────────────────────┼──────────────────────┤ | |
| │ PTS → timestamp │ jobImporter │ Temporal activity or │ | |
| │ │ (player-tracking) │ MES │ | |
| ├───────────────────────┼─────────────────────────────┼──────────────────────┤ | |
| │ Coordinate │ Delivery (player-tracking │ MES │ | |
| │ normalization │ GET) │ │ | |
| ├───────────────────────┼─────────────────────────────┼──────────────────────┤ | |
| │ Ball+player merge │ Delivery (player-tracking │ MES │ | |
| │ │ GET) │ │ | |
| ├───────────────────────┼─────────────────────────────┼──────────────────────┤ | |
| │ Event type filtering │ Delivery (events GET) │ MES │ | |
| ├───────────────────────┼─────────────────────────────┼──────────────────────┤ | |
| │ Jersey number │ Delivery (player-tracking │ MES │ | |
| │ extraction │ GET) │ │ | |
| ├───────────────────────┼─────────────────────────────┼──────────────────────┤ | |
| │ JSON schema │ jobImporter (events) │ Temporal activity │ | |
| │ validation │ │ │ | |
| └───────────────────────┴─────────────────────────────┴──────────────────────┘ | |
| File storage questions from ticket | |
| - "Should Temporal query RAS for presigned URLs?" — RAS already has GET | |
| /upload-url (EMA-auth protected). Temporal could use this, or generate URLs | |
| directly if it has S3 access. | |
| - "Raw AI output vs RAS transformed files?" — Today these are in separate S3 | |
| buckets (raw-analytics-bucket vs per-type processed buckets). Making them | |
| distinct asset types in recordings-api would be clean. | |
| - "7-day cached files" — Only the ras-cache-bucket has the 7-day lifecycle. | |
| Raw and processed files persist indefinitely. The cache stores pre-computed | |
| player-tracking responses. | |
| Client interfaces (where RAS "stops") | |
| Zola ◄──── RAS API (GET endpoints + POST /jobs, /analytics, /duplicate) | |
| │ | |
| ├── Zola uses: player-tracking, events, match-details, | |
| │ shot-details, match-report, match-momentum, | |
| │ player-moments, analytics state, raw-files | |
| │ | |
| MES ◄──── EventBridge domain events | |
| │ | |
| ├── MES listens for: RecordingAnalysisProcessedV1, | |
| │ EventDetectionProcessedV1, ShotDetailsProcessedV1, | |
| │ MatchReportProcessedV1 | |
| │ | |
| EMA ◄──── RAS (HTTP POST to trigger jobs) | |
| EMA ────► RAS (HTTP webhook callback + S3 upload) | |
| Key files for the migration team | |
| Concern: Job types & dependencies | |
| Key Files: src/core/models/types/index.ts | |
| ──────────────────────────────────────── | |
| Concern: Orchestration logic | |
| Key Files: src/core/job/jobScheduler.ts | |
| ──────────────────────────────────────── | |
| Concern: DynamoDB stream trigger | |
| Key Files: src/apps/jobScheduler/analyticsStateDynamoDBStreamHandler.ts | |
| ──────────────────────────────────────── | |
| Concern: EMA client | |
| Key Files: src/integrations/ema/client.ts | |
| ──────────────────────────────────────── | |
| Concern: EMA callback handling | |
| Key Files: src/integrations/ema/callbackHandler.ts | |
| ──────────────────────────────────────── | |
| Concern: Job state model | |
| Key Files: src/core/models/analyticState.ts | |
| ──────────────────────────────────────── | |
| Concern: Data importers | |
| Key Files: src/core/dataImporters/ (6 importers) | |
| ──────────────────────────────────────── | |
| Concern: Domain event publishing | |
| Key Files: src/core/domainEventBroadcaster/domainEventBroadcaster.ts | |
| ──────────────────────────────────────── | |
| Concern: API endpoints (delivery) | |
| Key Files: src/adapters/driving/AWSLambda/controllers/ | |
| ──────────────────────────────────────── | |
| Concern: Storage infrastructure | |
| Key Files: stack/recording-analytics-storage-stack.ts | |
| ──────────────────────────────────────── | |
| Concern: Queue definitions | |
| Key Files: stack/recordingAnalyticsQueue.ts | |
| ──────────────────────────────────────── | |
| Concern: Full stack wiring | |
| Key Files: stack/recording-analytics-stack.ts | |
| ✻ Crunched for 4m 19s | |
| ❯ /rename | |
| ⎿ Please provide a name for the session. Usage: /rename | |
| ❯ /rename ras-deep-dive | |
| ⎿ Session renamed to: ras-deep-dive | |
| ❯ /export | |
| ⎿ Conversation copied to clipboard | |
| ❯ ok let's focus on just the /jobs POST endpoint, and the best way to | |
| deprecate that with incorporating a Temporal workflow. For now, let's | |
| keep the rest of the GET endpoints as APIs in RAS. all the orchestration | |
| with Dynamodb streams needs to go away. Also explore the question of | |
| whether we need any of the SQS queues involved or do we handle that | |
| within temporal itself. | |
| ⏺ I'll explore the specific components involved in the /jobs POST endpoint and | |
| the orchestration flow to understand exactly what needs to change for | |
| Temporal. | |
| ⏺ 3 Explore agents finished (ctrl+o to expand) | |
| ├─ Explore /jobs POST endpoint · 35 tool uses · 39.7k tokens | |
| │ ⎿ Done | |
| ├─ Explore SQS queues and purposes · 57 tool uses · 54.3k tokens | |
| │ ⎿ Done | |
| └─ Explore EMA callback flow · 48 tool uses · 59.3k tokens | |
| ⎿ Done | |
| ⏺ Excellent - I now have a detailed understanding of the orchestration flow. Let | |
| me launch a Plan agent to design the migration approach. | |
| ⏺ Plan(Design Temporal migration plan) | |
| ⎿ Done (45 tool uses · 63.0k tokens · 2m 46s) | |
| ⏺ Let me write the plan to the plan file and then ask some clarifying questions. | |
| ⏺ Updated plan | |
| ⎿ /plan to preview | |
| ⏺ I've written the migration plan. Before finalizing, I have a few questions to | |
| clarify: | |
| ⏺ User answered Claude's questions: | |
| ⎿ · Where will Temporal be hosted? → Temporal Cloud (Recommended) | |
| · How should the Temporal worker be deployed? → TBD with infra team | |
| · For shadow mode, EMA will receive duplicate job requests. How should | |
| this be handled? → Skip shadow mode | |
| · How should EMA callbacks reach the Temporal workflow? → Keep SQS | |
| buffer initially | |
| ⏺ Thanks for the clarifications. Let me update the plan with these decisions. | |
| ⏺ Updated plan | |
| ⎿ /plan to preview | |
| ⏺ Updated plan | |
| ⎿ /plan to preview | |
| ⏺ Updated plan | |
| ⎿ /plan to preview | |
| ⏺ Updated plan | |
| ⎿ /plan to preview | |
| ⏺ Read 1 file (ctrl+o to expand) | |
| ⏺ Let me fix the verification section to remove the shadow mode reference since | |
| we're skipping it. | |
| ⏺ Updated plan | |
| ⎿ /plan to preview | |
| ⏺ The plan is complete and reflects your decisions: | |
| Summary: | |
| - Temporal Cloud for hosting (managed) | |
| - Keep SQS buffer for EMA callbacks (SQS → Lambda → Temporal signal) | |
| - Skip shadow mode - validate in staging, then cut over to production | |
| - Worker deployment TBD with infrastructure team | |
| Key changes: | |
| - POST /jobs will start a Temporal workflow instead of publishing to SQS | |
| - 4 Lambdas eliminated: jobPrepper, jobScheduler, jobTrigger, jobImporter | |
| - 3 SQS queues eliminated: PrepJobQueue, TriggerJobQueue, ImporterJobQueue | |
| - JobCallbackQueue kept as buffer for EMA callbacks | |
| - DynamoDB streams orchestration removed; workflow handles dependency logic | |
| - Activities wrap existing services (PrepareJobsService, TriggerJobService, | |
| importers) | |
| ⏺ User approved Claude's plan | |
| ⎿ Plan saved to: ~/.claude/plans/silly-questing-turtle.md · /plan to edit | |
| RAS Temporal Migration Plan | |
| Goal | |
| Replace the /jobs POST endpoint orchestration with Temporal workflows, | |
| removing DynamoDB streams-based orchestration and SQS queues while | |
| keeping GET endpoints unchanged. | |
| --- | |
| Current State | |
| POST /jobs → InitJobService → PrepJobQueue (SQS) | |
| ↓ | |
| jobPrepper Lambda | |
| ↓ | |
| DynamoDB (analyticsStateChanges) | |
| ↓ | |
| DynamoDB Stream | |
| ↓ | |
| jobScheduler Lambda | |
| ↓ ↓ | |
| TriggerJobQueue ImporterJobQueue | |
| ↓ ↓ | |
| jobTrigger jobImporter | |
| ↓ | |
| EMA (async) | |
| ↓ | |
| /jobs/callback → JobCallbackQueue → jobCallback | |
| ↓ | |
| DynamoDB → Stream → | |
| jobScheduler (loop) | |
| 4 SQS Queues to eliminate: | |
| - PrepJobQueue | |
| - JobTriggerQueue | |
| - JobImporterQueue | |
| - JobCallbackQueue | |
| 5 Lambdas to replace: | |
| - jobPrepper | |
| - jobScheduler | |
| - jobTrigger | |
| - jobImporter | |
| - jobCallback (partial - keeps HTTP endpoint) | |
| --- | |
| Target Architecture | |
| POST /jobs → CreateJobs Controller | |
| ↓ | |
| Temporal Client.start() | |
| ↓ | |
| RecordingAnalyticsWorkflow | |
| ↓ | |
| ┌───────────────┴───────────────┐ | |
| ↓ ↓ | |
| prepareJobs [parallel] triggerJob × 3 | |
| activity (player-tracking, ball-tracking, | |
| event-detection) | |
| ↓ ↓ | |
| DynamoDB EMA HTTP calls | |
| (metadata) ↓ | |
| await signal (emaJobCallback) | |
| ↓ | |
| [parallel] importers | |
| ↓ | |
| [parallel] triggerJob × 3 | |
| (match-details, shot-details, match-report) | |
| ↓ | |
| await signals + importers | |
| ↓ | |
| Workflow complete | |
| EMA Callbacks (keep SQS buffer): | |
| /jobs/callback → SQS → Lambda → Temporal signal to workflow | |
| --- | |
| Workflow Design | |
| Single Workflow per Recording | |
| recordingAnalyticsWorkflow(recordingId, jobTypes[], actor) | |
| │ | |
| ├── Activity: prepareJobs() | |
| │ └── Validates recording, fetches metadata, writes to DynamoDB | |
| │ | |
| ├── Phase 1: Independent Jobs (parallel) | |
| │ ├── triggerJob("player-tracking") → await signal → | |
| importPlayerTracking() | |
| │ ├── triggerJob("ball-tracking") → await signal | |
| │ └── triggerJob("event-detection") → await signal → | |
| importEvents() | |
| │ | |
| └── Phase 2: Dependent Jobs (parallel, after Phase 1 succeeds) | |
| ├── triggerJob("match-details") → await signal → | |
| importMatchDetails() | |
| ├── triggerJob("shot-details") → await signal → | |
| importShotDetails() | |
| └── triggerJob("match-report") → await signal → | |
| importMatchReport() | |
| EMA Callback → SQS → Temporal Signal | |
| Keep SQS buffer for durability during transition: | |
| EMA HTTP callback → API Gateway → JobCallbackQueue (SQS) | |
| ↓ | |
| jobCallback Lambda (modified) | |
| ↓ | |
| Temporal signal to workflow | |
| // Modified SQSJobCallbackConsumer | |
| async processSQSEvent(event: SQSEvent): Promise<void> { | |
| const { body, messageAttributes } = event.Records[0]; | |
| const recordingId = messageAttributes.recordingId.stringValue; | |
| const emaData = JSON.parse(body) as emaCallbackType; | |
| // Send signal to Temporal workflow instead of writing to DynamoDB | |
| const client = getTemporalClient(); | |
| const handle = client.workflow.getHandle(`ras-${recordingId}`); | |
| await handle.signal('emaJobCallback', { | |
| jobType: emaData.job_kind, | |
| jobId: emaData.job_id, | |
| status: emaData.job_state, | |
| exitCode: emaData.job_exit_code, | |
| reason: emaData.job_reason, | |
| }); | |
| } | |
| State Persistence (for GET endpoints) | |
| Activities write to DynamoDB so existing GET endpoints work unchanged: | |
| - analyticsStateChanges table (event sourcing) | |
| - analyticsStates table (projected view via existing stream handler) | |
| - recordingMetadata table | |
| --- | |
| Activities to Create | |
| Activity: prepareJobs | |
| Wraps: PrepareJobsService.prepareJobs() | |
| Purpose: Validation, metadata fetch | |
| ──────────────────────────────────────── | |
| Activity: triggerJob | |
| Wraps: TriggerJobService.triggerJob() | |
| Purpose: Call EMA to start job | |
| ──────────────────────────────────────── | |
| Activity: importEvents | |
| Wraps: EventsImporter | |
| Purpose: Import event detection results | |
| ──────────────────────────────────────── | |
| Activity: importPlayerTracking | |
| Wraps: PlayerTrackingImporter + MatchMomentumImporter | |
| Purpose: Import tracking data | |
| ──────────────────────────────────────── | |
| Activity: importMatchDetails | |
| Wraps: MatchDetailsImporter | |
| Purpose: Import match details | |
| ──────────────────────────────────────── | |
| Activity: importShotDetails | |
| Wraps: ShotDetailsImporter | |
| Purpose: Import shot details | |
| ──────────────────────────────────────── | |
| Activity: importMatchReport | |
| Wraps: MatchReportImporter | |
| Purpose: Import match report | |
| ──────────────────────────────────────── | |
| Activity: persistJobState | |
| Wraps: AnalyticsChangeRepository.put() | |
| Purpose: Write state to DynamoDB | |
| --- | |
| Files to Create | |
| src/temporal/ | |
| ├── worker.ts # Temporal worker entry point | |
| ├── client.ts # Temporal client factory | |
| ├── workflows/ | |
| │ └── recordingAnalytics.ts # Main workflow | |
| ├── activities/ | |
| │ ├── index.ts # Activity exports | |
| │ ├── prepareJobs.ts # Wrap PrepareJobsService | |
| │ ├── triggerJob.ts # Wrap TriggerJobService | |
| │ ├── importers.ts # Wrap all data importers | |
| │ └── persistState.ts # DynamoDB state writes | |
| └── signals/ | |
| └── emaCallback.ts # Signal type definitions | |
| Files to Modify | |
| File: src/adapters/driving/AWSLambda/controllers/job/createJobs.ts | |
| Change: Add Temporal workflow start | |
| ──────────────────────────────────────── | |
| File: src/apps/jobCallback/SQSJobCallbackConsumer.ts | |
| Change: Replace SQS processing with Temporal signal | |
| ──────────────────────────────────────── | |
| File: stack/recording-analytics-stack.ts | |
| Change: Add Temporal worker infrastructure | |
| ──────────────────────────────────────── | |
| File: package.json | |
| Change: Add @temporalio/client, @temporalio/worker, | |
| @temporalio/workflow | |
| Files to Remove (after migration) | |
| - src/apps/jobPrepper/* | |
| - src/apps/jobScheduler/* | |
| - src/apps/jobTrigger/* | |
| - src/apps/jobImporter/* | |
| - src/core/job/jobScheduler.ts | |
| - src/core/job/initJobService.ts | |
| - src/core/publishers/prepJobQueue.ts | |
| - src/core/publishers/jobTriggerQueue.ts | |
| - src/core/publishers/jobImporterQueue.ts | |
| - Queue definitions in stack/recordingAnalyticsQueue.ts | |
| --- | |
| Migration Phases | |
| Phase 1: Foundation | |
| - Set up Temporal Cloud namespace | |
| - Create src/temporal/ directory structure | |
| - Implement activities wrapping existing services | |
| - Implement workflow with dependency logic | |
| - Unit test workflows with mocked activities | |
| Phase 2: Integration (Staging) | |
| - Deploy Temporal worker to staging | |
| - Modify callback handler to send Temporal signals (keep SQS buffer) | |
| - Add ORCHESTRATION_MODE env var (legacy | temporal) | |
| - Disable legacy flow in staging, enable Temporal | |
| - End-to-end testing: trigger jobs, verify callbacks, check GET | |
| endpoints | |
| Phase 3: Production Cutover | |
| - Deploy Temporal worker to production | |
| - Switch ORCHESTRATION_MODE to temporal | |
| - Monitor for issues | |
| - Keep legacy code available for rollback (2 weeks) | |
| Phase 4: Cleanup | |
| - Remove legacy Lambdas (jobPrepper, jobScheduler, jobTrigger, | |
| jobImporter) | |
| - Remove SQS queues (PrepJobQueue, TriggerJobQueue, ImporterJobQueue) | |
| - Keep JobCallbackQueue (still used for SQS → Signal flow) | |
| - Update CDK stack | |
| - Update documentation | |
| --- | |
| Verification | |
| 1. Unit tests: Mock activities, test workflow logic (dependency | |
| ordering, signal handling, timeout handling) | |
| 2. Integration tests (staging): | |
| - Trigger jobs via POST /jobs | |
| - Verify EMA receives requests and sends callbacks | |
| - Verify callbacks signal the correct workflow | |
| - Verify DynamoDB state is updated correctly | |
| 3. GET endpoint verification: Ensure all GET endpoints return same data | |
| after migration | |
| 4. Rollback test: Verify switching ORCHESTRATION_MODE back to legacy | |
| works | |
| --- | |
| Decisions | |
| - Temporal hosting: Temporal Cloud (managed) | |
| - Worker deployment: TBD with infrastructure team | |
| - Shadow mode: Skip - go directly from staging validation to production | |
| cutover | |
| - Callback approach: Keep SQS buffer initially (SQS → Lambda → Temporal | |
| signal) | |
| ⏺ The plan is approved. Let me create tasks to track the implementation. | |
| ⏺ Read 1 file (ctrl+o to expand) | |
| ⎿ Interrupted · What should Claude do instead? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment