Skip to content

Instantly share code, notes, and snippets.

@R4wm
Created February 2, 2026 16:12
Show Gist options
  • Select an option

  • Save R4wm/199fd2c8ad6349099ad091d226bdbfe6 to your computer and use it in GitHub Desktop.

Select an option

Save R4wm/199fd2c8ad6349099ad091d226bdbfe6 to your computer and use it in GitHub Desktop.
Analytics Pipeline Status Summary - 2026-01-30

Analytics Pipeline Status Summary - 2026-01-30

IMMEDIATE ISSUE - RESOLVED ✅

Problem: ClickHouse consumer stuck since 1/27, massive lag preventing event ingestion

Root Cause: 4 bad test messages blocking consumer across all 3 Kafka partitions:

  • Partition 0, Offset 58: {"test":"connection_test",...}
  • Partition 1, Offset 56: {"test":"direct_publish",...}
  • Partition 2, Offset 113: {"test":"message",...}
  • Error: Invalid DateTime parsing - messages missing required schema fields

Fix Applied (Raymond + Kenji):

  1. Stopped ClickHouse consumer
  2. Reset consumer group 'clickhouse-raw' offsets:
    • Partition 0 → 59
    • Partition 1 → 57
    • Partition 2 → 114
  3. Restarted consumer
  4. Result: Lag reduced to 0, events flowing again

Frank's confirmation: "yay, events!"


ARCHITECTURAL GAPS IDENTIFIED

Current Issues:

  1. ❌ No schema validation - invalid messages can block entire pipeline
  2. ❌ No DLQ (Dead Letter Queue) - failed messages have nowhere to go
  3. ❌ ClickHouse MV can't handle conditional validation easily
  4. ❌ Consumer not resilient to malformed messages

Proposed Solutions:

Short-term:

  • Schema Registry implementation for pre-publish validation
  • DLQ topic setup for failed messages
  • Resilient consumer pattern: validate → DLQ on failure → commit offset

Long-term (Will's recommendation):

  • Apache Spark or Kafka Streams for enrichment layer
  • Advanced enrichment capabilities:
    • Text analysis (tone, topics)
    • Video analysis
    • IP to Geohash (simpler - can do in ClickHouse)

Infrastructure Option (Kenji):

  • ClickHouse Kafka Connect Sink with built-in:
    • Schema validation via Confluent Schema Registry
    • Automatic DLQ forwarding
    • Comes free with Confluent Kafka deployment

CURRENT WORK (Kenji - Overnight)

Schema Definition Phase:

  • Requesting JSON Schema for all Kafka message types
  • Working on cdp.yaml - machine-readable schema config
  • Generated 105 tables from schema so far
  • Covers ~30% of Will's original spreadsheet

Analytics Architecture Components:

  1. Facts: Raw events (what we're ingesting now)
  2. Metrics: Aggregations (count, sum, distinct counts)
  3. Dimensions: Categorical values for slicing (GROUP BY fields)

Will's 200+ metrics need to be defined in terms of these components.


INFRASTRUCTURE (jsadowyj)

New documentation for exposing NodePort services in manually deployed cluster: https://github.com/parler-tech/customer-data-platform/blob/production/docs/adding_nodeport_services.md


CONNECTION TO PR #313

Our work validated:

  • ✅ PR #313 events ARE flowing correctly to Kafka
  • ✅ New schema with rich context fields working as designed
  • ✅ SubjectType changes not causing issues
  • ✅ Pipeline blockage was infrastructure issue (test messages), NOT PR #313

Next: Kenji's schema work will formalize the event structure we've been validating, ensuring proper schema validation prevents future pipeline blocks.


PR #313 Status

Ready for merge to development after:

  1. Kenji confirms pipeline can handle subjectType change (infrastructure issue now resolved)
  2. R4wm approval

The application is correctly sending events with the new schema. All app-side testing is complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment