Analytics Pipeline Status Summary - 2026-01-30

IMMEDIATE ISSUE - RESOLVED ✅

Problem: ClickHouse consumer stuck since 1/27, massive lag preventing event ingestion

Root Cause: 4 bad test messages blocking consumer across all 3 Kafka partitions:

Partition 0, Offset 58: {"test":"connection_test",...}
Partition 1, Offset 56: {"test":"direct_publish",...}
Partition 2, Offset 113: {"test":"message",...}
Error: Invalid DateTime parsing - messages missing required schema fields

Fix Applied (Raymond + Kenji):

Stopped ClickHouse consumer
Reset consumer group 'clickhouse-raw' offsets:
- Partition 0 → 59
- Partition 1 → 57
- Partition 2 → 114
Restarted consumer
Result: Lag reduced to 0, events flowing again

Frank's confirmation: "yay, events!"

ARCHITECTURAL GAPS IDENTIFIED

Current Issues:

❌ No schema validation - invalid messages can block entire pipeline
❌ No DLQ (Dead Letter Queue) - failed messages have nowhere to go
❌ ClickHouse MV can't handle conditional validation easily
❌ Consumer not resilient to malformed messages

Proposed Solutions:

Short-term:

Schema Registry implementation for pre-publish validation
DLQ topic setup for failed messages
Resilient consumer pattern: validate → DLQ on failure → commit offset

Long-term (Will's recommendation):

Apache Spark or Kafka Streams for enrichment layer
Advanced enrichment capabilities:
- Text analysis (tone, topics)
- Video analysis
- IP to Geohash (simpler - can do in ClickHouse)

Infrastructure Option (Kenji):

ClickHouse Kafka Connect Sink with built-in:
- Schema validation via Confluent Schema Registry
- Automatic DLQ forwarding
- Comes free with Confluent Kafka deployment

CURRENT WORK (Kenji - Overnight)

Schema Definition Phase:

Requesting JSON Schema for all Kafka message types
Working on cdp.yaml - machine-readable schema config
Generated 105 tables from schema so far
Covers ~30% of Will's original spreadsheet

Analytics Architecture Components:

Facts: Raw events (what we're ingesting now)
Metrics: Aggregations (count, sum, distinct counts)
Dimensions: Categorical values for slicing (GROUP BY fields)

Will's 200+ metrics need to be defined in terms of these components.

INFRASTRUCTURE (jsadowyj)

New documentation for exposing NodePort services in manually deployed cluster: https://github.com/parler-tech/customer-data-platform/blob/production/docs/adding_nodeport_services.md

CONNECTION TO PR #313

Our work validated:

✅ PR #313 events ARE flowing correctly to Kafka
✅ New schema with rich context fields working as designed
✅ SubjectType changes not causing issues
✅ Pipeline blockage was infrastructure issue (test messages), NOT PR #313

Next: Kenji's schema work will formalize the event structure we've been validating, ensuring proper schema validation prevents future pipeline blocks.

PR #313 Status

Ready for merge to development after:

Kenji confirms pipeline can handle subjectType change (infrastructure issue now resolved)
R4wm approval

The application is correctly sending events with the new schema. All app-side testing is complete.

R4wm/analytics-pipeline-status.md

Select an option

No results found