deepakjois · February 2, 2026 16:21
diff --git a/ras-deep-dive.txt b/ras-deep-dive.txt

 ╭─── Claude Code v2.1.20 ──────────────────────────────────────────────────────╮
 │                                                    │ Tips for getting        │
 │                Welcome back Deepak!                │ started                 │
 │                                                    │ Run /init to create a … │
 │                       ▐▛███▜▌                      │ ─────────────────────── │
 │                      ▝▜█████▛▘                     │ Recent activity         │
 │                        ▘▘ ▝▝                       │ No recent activity      │
 │                                                    │                         │
 │  Opus 4.5 · Claude Pro · deepak.jois@gmail.com's   │                         │
 │  Organization                                      │                         │
 │           ~/code/veo/recording-analytics           │                         │
 ╰──────────────────────────────────────────────────────────────────────────────╯

 ❯ do a deep dive into this codebase and describe the various parts of it.   
                                                                            
  give a conceptual overview of the codebase with data flows and ascii      
  diagrams whereever possible                                               
                                                                            
  Here is the ticket where a dev from another infrastructure team will      
  assist us with migrating the codebase over to use Temporal workflows.     
  Organize the explanation of the codebase around themes in terms of the    
  ticket overview below                                                     
                                                                            
  # Centralized orchestration > Move analytics orchestration from RAS to    
  temporal                                                                  
                                                                            
  Overall responsibilities of RAS:                                          
                                                                            
  1. Orchestrating AI jobs, i.e. sending requests to EMA and keeping track  
  of the order of jobs. In this process some validation and a few           
  transformations of the EMA output happens (e.g. changing a pts column to  
  a timestamp column).                                                      
  2. Delivering data to clients (Zola+MES) for features in the product.     
  Here some transformation of the data happens on-the-fly.                  
  3. Storing output files from AI jobs and storing the location of those    
  files.                                                                    
                                                                            
  ---                                                                       
                                                                            
  Stages of how to roll out:                                                
                                                                            
  1. Move orchestration to temporal                                         
  1.                                                                        
  2. Move delivery to MES                                                   
  3. Move what remains of storage to recordings-api                         
                                                                            
  ---                                                                       
                                                                            
  Migration plan:                                                           
                                                                            
  * RAS should be able to handle both ways of doing orchestration (temporal 
  + RAS's internal orchestration)                                           
  * Shadow the existing processing workflow. Send traffic through both      
  flows and check that the temporal workflow produces the right stuff.      
  * Send a percentage of "orchestration traffic" to temporal instead of     
  RAS's internal orchestration                                              
                                                                            
  ---                                                                       
                                                                            
  Questions to figure out:                                                  
                                                                            
  * Should we completely reuse the storage in RAS in the first round?       
  * Should temporal query RAS for a presigned URL of where to upload        
  files (to the RAS S3 bucket), i.e. reuse storage in RAS completely?       
  * Or should the temporal workflow store the files in veo-content-ii?      
  * Should temporal update the job states of RAS? This will make sure       
  that RAS can still serve it clients exactly as it does today.             
  * Where does RAS "stop"?                                                  
  * What is the interface between RAS/MES, RAS/Zola?                        
  * Which transformations in RAS goes to temporal and which goes to MES?    
  * Should the first part of the project not include the transformation,    
  that RAS currently performs today?                                        
  * RAS is producing files based on EMA output. These files are deleted     
  after 7 days. So RAS is essentially caching files. Should those files be  
  produced as part of the temporal workflow?                                
  * Figure out if we should add separate asset types for "raw AI output"    
  and "RAS transformed" files?                                              
                                                                            
  ---                                                                       
                                                                            
  For the future project about moving storage out of RAS:                   
                                                                            
  * Figure out what clients of RAS use the file locations which are stored  
  in RAS? Ideally we would like RAS not to store file location. Can the     
  clients simply look of file location in recordings-api instead?           
  * Make sure that RAS job states is not required by any client             
  * When we start to orchestrate via temporal, RAS will not store the       
  state of jobs. We should make sure, that no client breaks if the job      
  state is no longer updated. Also, L2/L3 supporters use the Zola admin     
  page to see, if RAS/analytics failed for some reason. We need to make     
  sure, that L2/L3 can still get information about whether or not analytics 
  failed. Coordinate with L2/L3 about a solution for this.                  
                                                                            
  ## Metadata                                                               
  - URL: [https://linear.app/veo-technologies/project/centralized-orchestra 
  tion-move-analytics-orchestration-from-ras-to-2d39bcc34b44](https://linea 
  r.app/veo-technologies/project/centralized-orchestration-move-analytics-o 
  rchestration-from-ras-to-2d39bcc34b44)                                    
  - Status: Now                                                             
  - Lead: Tobias Slott Jensen                                               
  - Members: Tobias Slott Jensen                                            
  - Start date: Jan 12th                                                    
  - Target date: Not set                                                    

 ⏺ I'll do a thorough exploration of the codebase to build a comprehensive
  picture. Let me launch several exploration agents in parallel to cover
  different aspects.

 ⏺ 5 Explore agents finished (ctrl+o to expand)
   ├─ Explore top-level structure · 36 tool uses · 39.1k tokens
   │  ⎿  Done
   ├─ Explore orchestration/jobs system · 55 tool uses · 67.4k tokens
   │  ⎿  Done
   ├─ Explore API/delivery layer · 52 tool uses · 57.0k tokens
   │  ⎿  Done
   ├─ Explore storage layer · 60 tool uses · 68.4k tokens
   │  ⎿  Done
   └─ Explore external integrations · 59 tool uses · 61.3k tokens
      ⎿  Done

 ⏺ Now let me put together the comprehensive overview organized around the
  migration themes.

  ---
  RAS (Recording Analytics Service) — Deep Dive

  Tech Stack at a Glance

  - Language: TypeScript on Node.js 22
  - Runtime: AWS Lambda (serverless)
  - IaC: AWS CDK
  - Storage: DynamoDB (state), S3 (files)
  - Messaging: SQS (job queues), EventBridge (domain events)
  - Architecture: Hexagonal (Ports & Adapters), Event Sourcing

  ---
  High-Level Architecture


  ┌─────────────────────────────────────────────────┐
                              │                  CLIENTS
  │
                              │    Zola (Admin UI)         MES (via
  EventBridge)│

  └──────────┬──────────────────────┬───────────────┘
                                         │ HTTP (X-Token)       │ EventBridge
                                         ▼                      ▲
                           ┌─────────────────────────┐          │
                           │      CloudFront CDN      │          │
                           │  (7-14 day cache on GET) │          │
                           └───────────┬─────────────┘          │
                                       │                        │
                           ┌───────────▼─────────────┐          │
                           │     API Gateway          │          │
                           │  (REST, token auth)      │          │
                           └───────────┬─────────────┘          │
                                       │                        │
       ┌───────────────────────────────┼────────────────────────┤
       │            RAS (Lambda Functions)                       │
       │                                                        │
       │  ┌──────────────────────────────────────────────┐      │
       │  │         1. ORCHESTRATION                     │      │
       │  │                                              │      │
       │  │  jobPrepper ──► jobScheduler ──► jobTrigger  │      │
       │  │       │              ▲               │       │      │
       │  │       │     DynamoDB │ Streams       │       │      │
       │  │       ▼              │               ▼       │      │
       │  │  [AnalyticsState] ◄──┘         EMA (AI)     │      │
       │  │                                    │        │      │
       │  │  jobImporter ◄── jobCallback ◄─────┘        │      │
       │  └──────────────────────────────────────────────┘      │
       │                                                        │
       │  ┌──────────────────────────────────────────────┐      │
       │  │         2. DELIVERY (API Handlers)            │      │
       │  │                                              │      │
       │  │  GET /player-tracking   (normalize + merge)  │      │
       │  │  GET /events            (filter by type)     │      │
       │  │  GET /match-details     (serve from S3)      │      │
       │  │  GET /shot-details      (serve from S3)      │      │
       │  │  GET /match-momentum    (serve from DynamoDB)│      │
       │  │  GET /match-report      (serve from S3)      │      │
       │  │  GET /analytics         (job state summary)  │──────┘
       │  └──────────────────────────────────────────────┘  publishes
       │                                                    domain events
       │  ┌──────────────────────────────────────────────┐
       │  │         3. STORAGE                            │
       │  │                                              │
       │  │  S3: raw files (CSV, JSON, Parquet)          │
       │  │  S3: processed files (JSON)                  │
       │  │  S3: cache bucket (7-day TTL)                │
       │  │  DynamoDB: analytics state + change log      │
       │  │  DynamoDB: recording metadata                │
       │  └──────────────────────────────────────────────┘
       └────────────────────────────────────────────────────

  ---
  Responsibility 1: Orchestrating AI Jobs

  This is the core of what the Temporal migration targets. RAS manages a
  pipeline of 6 AI job types with dependency ordering, communicating with the
  EMA inference service.

  Job Types & Dependencies

    INDEPENDENT (run in parallel)         DEPENDENT (need all 3 independent
  jobs)
    ┌──────────────────┐                  ┌──────────────────┐
    │ event-detection   │──────────┐      │  match-details    │
    └──────────────────┘           │      └──────────────────┘
    ┌──────────────────┐           ├─────►┌──────────────────┐
    │ player-tracking   │──────────┤      │  shot-details     │
    └──────────────────┘           │      └──────────────────┘
    ┌──────────────────┐           │      ┌──────────────────┐
    │ ball-tracking     │──────────┘      │  match-report     │
    └──────────────────┘                  └──────────────────┘

  The three independent jobs can run concurrently. The three dependent jobs each
   require all three independent jobs to reach succeeded/completed before they
  can be triggered.

  Job State Machine

  Each job goes through these states:

    null ──► pending ──► scheduled ──► starting ──► running ──► succeeded ──►
  completed
                                                       │
   ▲
                                                       └── failed
   │
                                                                      (import
  finishes)

  - pending → job is validated and ready
  - scheduled → EMA has accepted the job
  - starting / running → EMA is processing
  - succeeded → EMA finished; raw output is on S3
  - completed → RAS has imported and transformed the output
  - failed → error at any stage

  The 5-Lambda Orchestration Pipeline

  The orchestration is driven by SQS queues and DynamoDB Streams, spread across
  5 Lambda functions:

   Zola (POST /jobs)
       │
       ▼
   ┌────────────┐   PrepJobQueue    ┌────────────┐
   │ API Handler │ ──── (SQS) ────► │ jobPrepper  │
   └────────────┘                   └──────┬─────┘
                                           │ Validates recording metadata
                                           │ Sets jobs to "pending"
                                           │ Writes AnalyticsChange to DynamoDB
                                           ▼
                                    ┌──────────────┐
                                    │  DynamoDB     │
                                    │  (analytics   │
                                    │   changes)    │
                                    └──────┬───────┘
                                           │ DynamoDB Stream
                                           ▼
                                    ┌──────────────┐
                                    │ jobScheduler  │  Compares old vs new state
                                    │              │  Evaluates dependencies
                                    └──┬───────┬──┘
                                       │       │
                ┌──────────────────────┘       └──────────────────┐
                ▼                                                  ▼
         JobTriggerQueue                                  JobImporterQueue
           (SQS)                                              (SQS)
                │                                                  │
                ▼                                                  ▼
         ┌─────────────┐                                  ┌──────────────┐
         │ jobTrigger   │                                  │ jobImporter   │
         │             │                                  │              │
         │ Calls EMA   │                                  │ Reads raw S3  │
         │ via HTTP    │                                  │ Transforms    │
         └──────┬──────┘                                  │ Saves processed│
                │                                         └──────────────┘
                ▼
         ┌─────────────┐
         │    EMA       │  (external AI inference service)
         │  /v2, /v3    │
         └──────┬──────┘
                │ HTTP webhook callback
                ▼
         JobCallbackQueue (SQS)
                │
                ▼
         ┌─────────────┐
         │ jobCallback  │
         │             │
         │ Maps EMA    │
         │ status to   │
         │ Analytics   │
         │ Change      │──────► DynamoDB ──► Stream ──► jobScheduler
         └─────────────┘        (loop back to evaluate next steps)

  Key design pattern: The jobScheduler is triggered by DynamoDB Streams on the
  analytics changes table. Every state change (from any source) triggers it to
  re-evaluate what should happen next. This is the central decision point that
  would be replaced by Temporal workflow logic.

  EMA Communication Details

  Triggering a job (src/integrations/ema/client.ts):
  - RAS generates presigned S3 PUT URLs where EMA should upload results
  - RAS sends HTTP POST to EMA with job inputs, metadata, and a callback URL
  - EMA versions: v2 for
  event-detection/shot-details/match-details/match-report, v3 for
  ball-tracking/player-tracking

  Callback from EMA (src/integrations/ema/callbackHandler.ts):
  - EMA POSTs to /recordings/{id}/jobs/callback when job state changes
  - The callback goes through SQS → jobCallback Lambda
  - RAS validates the job_id is still current (prevents stale callbacks)
  - Maps EMA status to an AnalyticsChange record in DynamoDB

  Transformations During Import

  When a job succeeds, the importer reads raw EMA output from S3 and transforms
  it:
  Job Type: player-tracking
  Raw Format: CSV (v2/v3)
  Transformation: Normalize coordinates to [0,1] range using field dimensions;
    merge ball tracking data by PTS/timestamp; v2: convert
    frameId→timestamp via FPS
  ────────────────────────────────────────
  Job Type: ball-tracking
  Raw Format: CSV
  Transformation: Stored raw; merged into player-tracking during that import
  ────────────────────────────────────────
  Job Type: event-detection
  Raw Format: JSON
  Transformation: Validated against schema (v2, v2.1, v2.2, v3)
  ────────────────────────────────────────
  Job Type: match-details
  Raw Format: CSV
  Transformation: Parsed and stored as JSON
  ────────────────────────────────────────
  Job Type: shot-details
  Raw Format: JSON
  Transformation: Validated and stored
  ────────────────────────────────────────
  Job Type: match-report
  Raw Format: JSON
  Transformation: Validated and stored
  ────────────────────────────────────────
  Job Type: match-momentum
  Raw Format: CSV
  Transformation: Parsed and stored in DynamoDB
  The PTS-to-timestamp conversion mentioned in the ticket happens in the player
  tracking importer (src/core/playerTracking/index.ts): v2 format uses frameId /
   fps to get seconds, while v3 has timestamp_in_seconds directly.

  ---
  Responsibility 2: Delivering Data to Clients

  RAS serves processed analytics data via REST API behind CloudFront.

  API Endpoint Map

    /recordings/{recording_id}/
        │
        ├── analytics          GET  (job state summary)         No cache
        ├── analytics          POST (replay state - Zola auth)  No cache
        ├── analytics/logs     GET  (processing logs)           No cache
        │
        ├── player-tracking    GET  (merged + normalized data)  7-14 day cache
        ├── player-tracking/
        │   └── jersey-numbers GET  (jersey numbers by period)  7-14 day cache
        ├── player-moments     GET  (key player moments)        7-14 day cache
        ├── match-momentum     GET  (momentum timeline)         7-14 day cache
        ├── match-details      GET  (match stats)               7-14 day cache
        ├── match-details/
        │   └── attacking-moments GET (sub-analysis)            7-14 day cache
        ├── events             GET  (event detections, ?type=)  7-14 day cache
        ├── shot-details       GET  (shot/goal data)            7-14 day cache
        ├── match-report       GET  (match summary)             No cache
        │
        ├── jobs               POST (trigger jobs - Zola auth)
        ├── jobs               DELETE (cancel jobs - Zola auth)
        ├── jobs/callback      POST (EMA callback → SQS)
        │
        ├── upload-url         GET  (presigned S3 URL - EMA auth)
        ├── raw-files          GET  (raw file links - Zola auth) 60s cache
        └── duplicate          POST (copy analytics - Zola auth)

  On-the-Fly Transformations in Delivery

  The GET endpoints don't just serve stored files — several do real-time
  processing:

  /player-tracking — The most complex:
    Raw player tracking CSV (S3)
            +
    Raw ball tracking CSV (S3)         ──►  Merge by timestamp/PTS
            +                                    │
    Recording metadata (field dims)              ▼
            +                             Normalize coordinates
    Query params (?start, ?end,                  │
      ?jersey, ?side, ?filters)                  ▼
                                          Filter by time range,
                                          jersey, side, etc.
                                                 │
                                                 ▼
                                          JSON response with
                                          merged player+ball data

  /events — Filters raw event JSON by ?type= query parameter

  /jersey-numbers — Extracts and groups jersey numbers by team side and period

  Authentication

    ┌────────────┐   X-Token header   ┌──────────────────┐
    │   Zola     │ ─────────────────► │ ZolaTokenAuth     │──► POST/DELETE
  endpoints
    └────────────┘                    │ (Secrets Manager)  │
                                      └──────────────────┘

    ┌────────────┐   X-Token header   ┌──────────────────┐
    │   EMA      │ ─────────────────► │ EmaTokenAuth      │──► /upload-url
  endpoint
    └────────────┘                    │ (Secrets Manager)  │
                                      └──────────────────┘

    GET endpoints (analytics data) ──► No auth (public, cached by CloudFront)

  Caching Architecture

    Client
      │
      ▼
    CloudFront (Layer 1)
      │  Cache-Control: public, max-age=604800 (7 days)
      │  Cache keys include query params: start, end, jersey, side, type, etc.
      │
      ▼
    API Gateway → Lambda (Layer 2)
      │  Application-level cache for player tracking (CacheRepository → S3)
      │
      ▼
    Recordings API calls (Layer 3)
      │  Undici HTTP cache interceptor: 5-minute TTL on GET 200s
      │
      ▼
    S3 / DynamoDB (data source)

  Domain Event Publishing

  When jobs complete, RAS publishes to EventBridge (this is how MES and other
  services learn about results):

    DynamoDB Stream (analytics state changes)
            │
            ▼
    OutboundEventPublisher Lambda
            │
            ▼
    DomainEventBroadcaster evaluates:
            │
            ├── shot-details completed?  ──► ShotDetailsProcessedV1
            ├── event-detection completed? ──► EventDetectionProcessedV1
            ├── match-report completed? ──► MatchReportProcessedV1
            └── ALL jobs completed?      ──► RecordingAnalysisProcessedV1
                                              (includes links to all endpoints)

  Event format: Source: "com.veo.recording-analytics", published to shared
  EventBridge bus.

  ---
  Responsibility 3: Storage

  S3 Bucket Layout

    ras-raw-analytics-bucket-{env}/
    ├── ball-tracking/{recordingId}.csv
    ├── player-tracking/{recordingId}.csv
    ├── match-details/{recordingId}.csv
    ├── match-momentum/{recordingId}.csv
    ├── events/{recordingId}.json
    ├── shot-details/{recordingId}.json
    ├── match-report/{recordingId}.json
    └── physical-metrics/{recordingId}.parquet

    ras-events-bucket-{env}/
    └── {recordingId}.json             (processed events)

    ras-player-tracking-bucket-{env}/
    └── {recordingId}.json             (processed player tracking)

    ras-match-details-bucket-{env}/
    └── {recordingId}.json             (processed match details)

    ras-shot-details-bucket-{env}/
    └── {recordingId}.json             (processed shot details)

    ras-match-report-bucket-{env}/
    └── {recordingId}.json             (processed match report)

    ras-cache-bucket-{env}/            (7-day auto-expiry lifecycle rule)
    └── playerTracking:{recordingId}   (cached transformed data)

  DynamoDB Tables
  Table: analyticsStates
  PK: id (recordingId)
  SK: —
  Purpose: Current job statuses (projected view)
  ────────────────────────────────────────
  Table: analyticsStateChanges
  PK: id (recordingId)
  SK: version
  Purpose: Event sourcing log (immutable)
  ────────────────────────────────────────
  Table: recordingMetadata
  PK: id (recordingId)
  SK: —
  Purpose: FPS, duration, pitch dims, camera URIs
  ────────────────────────────────────────
  Table: matchMomentum
  PK: id (recordingId)
  SK: —
  Purpose: Momentum timeline data
  ────────────────────────────────────────
  Table: dataDuplication
  PK: id (recordingId)
  SK: —
  Purpose: Tracks duplicated recordings
  The analyticsStateChanges table has DynamoDB Streams enabled — this is the
  backbone of the orchestration, triggering the jobScheduler on every state
  change.

  Presigned URL Flow

    EMA needs to upload results:

    EMA ──► GET /upload-url?jobType=ball-tracking
                  │
                  ▼
            RAS generates PutObject presigned URL (3-day expiry)
            for: ras-raw-analytics-bucket/{jobType}/{recordingId}.csv
                  │
                  ▼
            EMA uploads file directly to S3

    Client needs to read raw files:

    Zola ──► GET /raw-files
                  │
                  ▼
            RAS generates GetObject presigned URLs (15-min expiry)
            for each raw file that exists
                  │
                  ▼
            Returns JSON with download links

  File Lifecycle

  - Raw files: Created by EMA upload, persist indefinitely (no lifecycle rule)
  - Processed files: Created by importers, persist indefinitely
  - Cache files: Auto-deleted after 7 days (S3 lifecycle rule on cache bucket)
  - Data duplication: DataDuplicator can copy all files from one recording to
  another

  ---
  Mapping to Temporal Migration Concerns

  Here's how the codebase maps to the questions in the ticket:

  What moves to Temporal?

  The entire orchestration pipeline (the 5 Lambdas + DynamoDB Streams trigger
  loop):

    TODAY (RAS internal orchestration):

    PrepJobQueue → jobPrepper → DynamoDB → Stream → jobScheduler
         → JobTriggerQueue → jobTrigger → EMA → callback → jobCallback
         → DynamoDB → Stream → jobScheduler → ImporterQueue → jobImporter

    FUTURE (Temporal):

    Temporal Workflow:
      1. Validate recording (activity)
      2. Trigger independent jobs in parallel (activities)
      3. Wait for EMA callbacks (signals/activities)
      4. When all 3 independent done → trigger dependent jobs (activities)
      5. Wait for EMA callbacks
      6. Import results (activities)
      7. Publish domain events (activity)

  What stays in RAS?

  - Delivery layer: All GET API endpoints with on-the-fly transformations
  - Storage: S3 buckets, DynamoDB tables, presigned URL generation
  - Authentication: Token-based authorizers for Zola/EMA

  Shadow mode / dual orchestration

  The ticket mentions running both flows in parallel. Key integration points:
  - Job state updates: Temporal would need to write to the same
  analyticsStateChanges DynamoDB table (or RAS needs an API to accept state
  updates)
  - EMA callbacks: Currently go to /recordings/{id}/jobs/callback → SQS.
  Temporal could either receive these callbacks directly or RAS could forward
  them
  - Domain events: Currently published via DynamoDB Streams → EventBridge. If
  Temporal manages state, it needs to trigger these events too

  Transformations: Temporal vs MES vs RAS?
  ┌───────────────────────┬─────────────────────────────┬──────────────────────┐
  │    Transformation     │      Current Location       │   Migration Target   │
  ├───────────────────────┼─────────────────────────────┼──────────────────────┤
  │ PTS → timestamp       │ jobImporter                 │ Temporal activity or │
  │                       │ (player-tracking)           │  MES                 │
  ├───────────────────────┼─────────────────────────────┼──────────────────────┤
  │ Coordinate            │ Delivery (player-tracking   │ MES                  │
  │ normalization         │ GET)                        │                      │
  ├───────────────────────┼─────────────────────────────┼──────────────────────┤
  │ Ball+player merge     │ Delivery (player-tracking   │ MES                  │
  │                       │ GET)                        │                      │
  ├───────────────────────┼─────────────────────────────┼──────────────────────┤
  │ Event type filtering  │ Delivery (events GET)       │ MES                  │
  ├───────────────────────┼─────────────────────────────┼──────────────────────┤
  │ Jersey number         │ Delivery (player-tracking   │ MES                  │
  │ extraction            │ GET)                        │                      │
  ├───────────────────────┼─────────────────────────────┼──────────────────────┤
  │ JSON schema           │ jobImporter (events)        │ Temporal activity    │
  │ validation            │                             │                      │
  └───────────────────────┴─────────────────────────────┴──────────────────────┘
  File storage questions from ticket

  - "Should Temporal query RAS for presigned URLs?" — RAS already has GET 
  /upload-url (EMA-auth protected). Temporal could use this, or generate URLs
  directly if it has S3 access.
  - "Raw AI output vs RAS transformed files?" — Today these are in separate S3
  buckets (raw-analytics-bucket vs per-type processed buckets). Making them
  distinct asset types in recordings-api would be clean.
  - "7-day cached files" — Only the ras-cache-bucket has the 7-day lifecycle.
  Raw and processed files persist indefinitely. The cache stores pre-computed
  player-tracking responses.

  Client interfaces (where RAS "stops")

    Zola ◄──── RAS API (GET endpoints + POST /jobs, /analytics, /duplicate)
                  │
                  ├── Zola uses: player-tracking, events, match-details,
                  │              shot-details, match-report, match-momentum,
                  │              player-moments, analytics state, raw-files
                  │
    MES  ◄──── EventBridge domain events
                  │
                  ├── MES listens for: RecordingAnalysisProcessedV1,
                  │   EventDetectionProcessedV1, ShotDetailsProcessedV1,
                  │   MatchReportProcessedV1
                  │
    EMA  ◄──── RAS (HTTP POST to trigger jobs)
    EMA  ────► RAS (HTTP webhook callback + S3 upload)

  Key files for the migration team
  Concern: Job types & dependencies
  Key Files: src/core/models/types/index.ts
  ────────────────────────────────────────
  Concern: Orchestration logic
  Key Files: src/core/job/jobScheduler.ts
  ────────────────────────────────────────
  Concern: DynamoDB stream trigger
  Key Files: src/apps/jobScheduler/analyticsStateDynamoDBStreamHandler.ts
  ────────────────────────────────────────
  Concern: EMA client
  Key Files: src/integrations/ema/client.ts
  ────────────────────────────────────────
  Concern: EMA callback handling
  Key Files: src/integrations/ema/callbackHandler.ts
  ────────────────────────────────────────
  Concern: Job state model
  Key Files: src/core/models/analyticState.ts
  ────────────────────────────────────────
  Concern: Data importers
  Key Files: src/core/dataImporters/ (6 importers)
  ────────────────────────────────────────
  Concern: Domain event publishing
  Key Files: src/core/domainEventBroadcaster/domainEventBroadcaster.ts
  ────────────────────────────────────────
  Concern: API endpoints (delivery)
  Key Files: src/adapters/driving/AWSLambda/controllers/
  ────────────────────────────────────────
  Concern: Storage infrastructure
  Key Files: stack/recording-analytics-storage-stack.ts
  ────────────────────────────────────────
  Concern: Queue definitions
  Key Files: stack/recordingAnalyticsQueue.ts
  ────────────────────────────────────────
  Concern: Full stack wiring
  Key Files: stack/recording-analytics-stack.ts

 ✻ Crunched for 4m 19s

 ❯ /rename 
  ⎿  Please provide a name for the session. Usage: /rename

 ❯ /rename ras-deep-dive 
  ⎿  Session renamed to: ras-deep-dive

 ❯ /export 
  ⎿  Conversation copied to clipboard

 ❯ ok let's focus on just the /jobs POST endpoint, and the best way to       
  deprecate that with incorporating a Temporal workflow. For now, let's     
  keep the rest of the GET endpoints as APIs in RAS. all the orchestration  
  with Dynamodb streams needs to go away. Also explore the question of      
  whether we need any of the SQS queues involved or do we handle that       
  within temporal itself.                                                   

 ⏺ I'll explore the specific components involved in the /jobs POST endpoint and
  the orchestration flow to understand exactly what needs to change for
  Temporal.

 ⏺ 3 Explore agents finished (ctrl+o to expand)
   ├─ Explore /jobs POST endpoint · 35 tool uses · 39.7k tokens
   │  ⎿  Done
   ├─ Explore SQS queues and purposes · 57 tool uses · 54.3k tokens
   │  ⎿  Done
   └─ Explore EMA callback flow · 48 tool uses · 59.3k tokens
      ⎿  Done

 ⏺ Excellent - I now have a detailed understanding of the orchestration flow. Let
   me launch a Plan agent to design the migration approach.

 ⏺ Plan(Design Temporal migration plan)
  ⎿  Done (45 tool uses · 63.0k tokens · 2m 46s)

 ⏺ Let me write the plan to the plan file and then ask some clarifying questions.

 ⏺ Updated plan
  ⎿  /plan to preview

 ⏺ I've written the migration plan. Before finalizing, I have a few questions to
  clarify:

 ⏺ User answered Claude's questions:
  ⎿  · Where will Temporal be hosted? → Temporal Cloud (Recommended)
     · How should the Temporal worker be deployed? → TBD with infra team
     · For shadow mode, EMA will receive duplicate job requests. How should 
     this be handled? → Skip shadow mode

     · How should EMA callbacks reach the Temporal workflow? → Keep SQS 
     buffer initially

 ⏺ Thanks for the clarifications. Let me update the plan with these decisions.

 ⏺ Updated plan
  ⎿  /plan to preview

 ⏺ Updated plan
  ⎿  /plan to preview

 ⏺ Updated plan
  ⎿  /plan to preview

 ⏺ Updated plan
  ⎿  /plan to preview

 ⏺ Read 1 file (ctrl+o to expand)

 ⏺ Let me fix the verification section to remove the shadow mode reference since
  we're skipping it.

 ⏺ Updated plan
  ⎿  /plan to preview

 ⏺ The plan is complete and reflects your decisions:

  Summary:
  - Temporal Cloud for hosting (managed)
  - Keep SQS buffer for EMA callbacks (SQS → Lambda → Temporal signal)
  - Skip shadow mode - validate in staging, then cut over to production
  - Worker deployment TBD with infrastructure team

  Key changes:
  - POST /jobs will start a Temporal workflow instead of publishing to SQS
  - 4 Lambdas eliminated: jobPrepper, jobScheduler, jobTrigger, jobImporter
  - 3 SQS queues eliminated: PrepJobQueue, TriggerJobQueue, ImporterJobQueue
  - JobCallbackQueue kept as buffer for EMA callbacks
  - DynamoDB streams orchestration removed; workflow handles dependency logic
  - Activities wrap existing services (PrepareJobsService, TriggerJobService,
  importers)

 ⏺ User approved Claude's plan
  ⎿  Plan saved to: ~/.claude/plans/silly-questing-turtle.md · /plan to edit

     RAS Temporal Migration Plan

     Goal

     Replace the /jobs POST endpoint orchestration with Temporal workflows,
     removing DynamoDB streams-based orchestration and SQS queues while
     keeping GET endpoints unchanged.

     ---
     Current State

     POST /jobs → InitJobService → PrepJobQueue (SQS)
                                         ↓
                                 jobPrepper Lambda
                                         ↓
                                 DynamoDB (analyticsStateChanges)
                                         ↓
                                 DynamoDB Stream
                                         ↓
                                 jobScheduler Lambda
                                   ↓           ↓
                          TriggerJobQueue  ImporterJobQueue
                              ↓                  ↓
                        jobTrigger          jobImporter
                              ↓
                            EMA (async)
                              ↓
                        /jobs/callback → JobCallbackQueue → jobCallback
                                                               ↓
                                                         DynamoDB → Stream →
      jobScheduler (loop)

     4 SQS Queues to eliminate:
     - PrepJobQueue
     - JobTriggerQueue
     - JobImporterQueue
     - JobCallbackQueue

     5 Lambdas to replace:
     - jobPrepper
     - jobScheduler
     - jobTrigger
     - jobImporter
     - jobCallback (partial - keeps HTTP endpoint)

     ---
     Target Architecture

     POST /jobs → CreateJobs Controller
                         ↓
                 Temporal Client.start()
                         ↓
             RecordingAnalyticsWorkflow
                         ↓
         ┌───────────────┴───────────────┐
         ↓                               ↓
     prepareJobs              [parallel] triggerJob × 3
     activity                 (player-tracking, ball-tracking,
     event-detection)
         ↓                               ↓
     DynamoDB                 EMA HTTP calls
     (metadata)                          ↓
                              await signal (emaJobCallback)
                                         ↓
                              [parallel] importers
                                         ↓
                              [parallel] triggerJob × 3
                              (match-details, shot-details, match-report)
                                         ↓
                              await signals + importers
                                         ↓
                                    Workflow complete

     EMA Callbacks (keep SQS buffer):
     /jobs/callback → SQS → Lambda → Temporal signal to workflow

     ---
     Workflow Design

     Single Workflow per Recording

     recordingAnalyticsWorkflow(recordingId, jobTypes[], actor)
       │
       ├── Activity: prepareJobs()
       │     └── Validates recording, fetches metadata, writes to DynamoDB
       │
       ├── Phase 1: Independent Jobs (parallel)
       │     ├── triggerJob("player-tracking") → await signal →
     importPlayerTracking()
       │     ├── triggerJob("ball-tracking") → await signal
       │     └── triggerJob("event-detection") → await signal →
     importEvents()
       │
       └── Phase 2: Dependent Jobs (parallel, after Phase 1 succeeds)
             ├── triggerJob("match-details") → await signal →
     importMatchDetails()
             ├── triggerJob("shot-details") → await signal →
     importShotDetails()
             └── triggerJob("match-report") → await signal →
     importMatchReport()

     EMA Callback → SQS → Temporal Signal

     Keep SQS buffer for durability during transition:

     EMA HTTP callback → API Gateway → JobCallbackQueue (SQS)
                                              ↓
                                       jobCallback Lambda (modified)
                                              ↓
                                       Temporal signal to workflow

     // Modified SQSJobCallbackConsumer
     async processSQSEvent(event: SQSEvent): Promise<void> {
       const { body, messageAttributes } = event.Records[0];
       const recordingId = messageAttributes.recordingId.stringValue;
       const emaData = JSON.parse(body) as emaCallbackType;

       // Send signal to Temporal workflow instead of writing to DynamoDB
       const client = getTemporalClient();
       const handle = client.workflow.getHandle(`ras-${recordingId}`);

       await handle.signal('emaJobCallback', {
         jobType: emaData.job_kind,
         jobId: emaData.job_id,
         status: emaData.job_state,
         exitCode: emaData.job_exit_code,
         reason: emaData.job_reason,
       });
     }

     State Persistence (for GET endpoints)

     Activities write to DynamoDB so existing GET endpoints work unchanged:
     - analyticsStateChanges table (event sourcing)
     - analyticsStates table (projected view via existing stream handler)
     - recordingMetadata table

     ---
     Activities to Create

     Activity: prepareJobs
     Wraps: PrepareJobsService.prepareJobs()
     Purpose: Validation, metadata fetch
     ────────────────────────────────────────
     Activity: triggerJob
     Wraps: TriggerJobService.triggerJob()
     Purpose: Call EMA to start job
     ────────────────────────────────────────
     Activity: importEvents
     Wraps: EventsImporter
     Purpose: Import event detection results
     ────────────────────────────────────────
     Activity: importPlayerTracking
     Wraps: PlayerTrackingImporter + MatchMomentumImporter
     Purpose: Import tracking data
     ────────────────────────────────────────
     Activity: importMatchDetails
     Wraps: MatchDetailsImporter
     Purpose: Import match details
     ────────────────────────────────────────
     Activity: importShotDetails
     Wraps: ShotDetailsImporter
     Purpose: Import shot details
     ────────────────────────────────────────
     Activity: importMatchReport
     Wraps: MatchReportImporter
     Purpose: Import match report
     ────────────────────────────────────────
     Activity: persistJobState
     Wraps: AnalyticsChangeRepository.put()
     Purpose: Write state to DynamoDB
     ---
     Files to Create

     src/temporal/
     ├── worker.ts                         # Temporal worker entry point
     ├── client.ts                         # Temporal client factory
     ├── workflows/
     │   └── recordingAnalytics.ts         # Main workflow
     ├── activities/
     │   ├── index.ts                      # Activity exports
     │   ├── prepareJobs.ts                # Wrap PrepareJobsService
     │   ├── triggerJob.ts                 # Wrap TriggerJobService
     │   ├── importers.ts                  # Wrap all data importers
     │   └── persistState.ts               # DynamoDB state writes
     └── signals/
         └── emaCallback.ts                # Signal type definitions

     Files to Modify
     File: src/adapters/driving/AWSLambda/controllers/job/createJobs.ts
     Change: Add Temporal workflow start
     ────────────────────────────────────────
     File: src/apps/jobCallback/SQSJobCallbackConsumer.ts
     Change: Replace SQS processing with Temporal signal
     ────────────────────────────────────────
     File: stack/recording-analytics-stack.ts
     Change: Add Temporal worker infrastructure
     ────────────────────────────────────────
     File: package.json
     Change: Add @temporalio/client, @temporalio/worker,
     @temporalio/workflow
     Files to Remove (after migration)

     - src/apps/jobPrepper/*
     - src/apps/jobScheduler/*
     - src/apps/jobTrigger/*
     - src/apps/jobImporter/*
     - src/core/job/jobScheduler.ts
     - src/core/job/initJobService.ts
     - src/core/publishers/prepJobQueue.ts
     - src/core/publishers/jobTriggerQueue.ts
     - src/core/publishers/jobImporterQueue.ts
     - Queue definitions in stack/recordingAnalyticsQueue.ts

     ---
     Migration Phases

     Phase 1: Foundation

     - Set up Temporal Cloud namespace
     - Create src/temporal/ directory structure
     - Implement activities wrapping existing services
     - Implement workflow with dependency logic
     - Unit test workflows with mocked activities

     Phase 2: Integration (Staging)

     - Deploy Temporal worker to staging
     - Modify callback handler to send Temporal signals (keep SQS buffer)
     - Add ORCHESTRATION_MODE env var (legacy | temporal)
     - Disable legacy flow in staging, enable Temporal
     - End-to-end testing: trigger jobs, verify callbacks, check GET
     endpoints

     Phase 3: Production Cutover

     - Deploy Temporal worker to production
     - Switch ORCHESTRATION_MODE to temporal
     - Monitor for issues
     - Keep legacy code available for rollback (2 weeks)

     Phase 4: Cleanup

     - Remove legacy Lambdas (jobPrepper, jobScheduler, jobTrigger,
     jobImporter)
     - Remove SQS queues (PrepJobQueue, TriggerJobQueue, ImporterJobQueue)
     - Keep JobCallbackQueue (still used for SQS → Signal flow)
     - Update CDK stack
     - Update documentation

     ---
     Verification

     1. Unit tests: Mock activities, test workflow logic (dependency
     ordering, signal handling, timeout handling)
     2. Integration tests (staging):
       - Trigger jobs via POST /jobs
       - Verify EMA receives requests and sends callbacks
       - Verify callbacks signal the correct workflow
       - Verify DynamoDB state is updated correctly
     3. GET endpoint verification: Ensure all GET endpoints return same data
      after migration
     4. Rollback test: Verify switching ORCHESTRATION_MODE back to legacy
     works

     ---
     Decisions

     - Temporal hosting: Temporal Cloud (managed)
     - Worker deployment: TBD with infrastructure team
     - Shadow mode: Skip - go directly from staging validation to production
      cutover
     - Callback approach: Keep SQS buffer initially (SQS → Lambda → Temporal
      signal)

 ⏺ The plan is approved. Let me create tasks to track the implementation.

 ⏺ Read 1 file (ctrl+o to expand)
  ⎿  Interrupted · What should Claude do instead?
No results found