Here’s a concise summary + work items focused on implementation‑relevant points from the Kafka meeting notes.
Summary (implementation‑relevant highlights)
- The events endpoint currently requires auth, but there’s consensus that unauthenticated (guest) events are needed. This is a potential DDoS vector, so any unauthenticated mode needs rate‑limiting or an API key strategy.
- Clients were hitting 500s on the events endpoint. Root cause likely in validation/headers. Once the correct request + headers were used, responses became sane (401 or validation errors). Accept header absence can trigger HTML response instead of JSON.
- Logging quality is important for debugging bad payloads; logs should be useful in dev.
- Feature flag is required on mobile/web so event sending can be toggled off by default, especially in production builds.
- Batching events is expected; payloads should accept arrays and clients can send every 5–10 seconds.
- Brand/tenant identification should be present in event payloads; API cannot infer PlayTV vs Parlor.
- Kafka/Clickhouse infra discussion: retention, storage sizing, and recovery windows need clearer targets; 1‑day Kafka retention is likely too short (weekend recovery concern).
- There’s a proposal to rename “event_type” → “action_type” and “subject” → “entity” to avoid overload. This would require coordinated schema update (and possibly temporary dual‑field support).
Work items (implementation‑focused)
-
Unauthenticated events plan
- Decide on approach (API key vs rate limiting vs both).
- Implement rate limiting (avoid pure IP‑based limits due to NAT/shared networks).
- Create ticket for this change and align with security expectations.
-
Fix 500s / header sensitivity
- Verify Accept header handling (ensure JSON response even without
Accept: application/json). - Confirm missing‑fields validation yields 422 with clear error, not 500.
- Add/adjust logging for invalid payloads and unexpected failures.
- Verify Accept header handling (ensure JSON response even without
-
Feature flag gating
- Add a client‑side feature flag so events are off by default in prod builds.
- Ensure when flag is off, no requests are sent, no local queue grows.
-
Batching enforcement
- Ensure endpoint accepts array payloads (done in current plan).
- Align client batching strategy (5–10s cadence).
-
Payload completeness
- Ensure payload includes brand/tenant identifier.
- Confirm required fields match the latest contract in the ticket.
-
Event schema naming decision
- Decide whether to rename
event_type→action_typeandsubject→entity. - If renaming, decide on dual‑field transition vs hard cutover to avoid data gaps.
- Decide whether to rename
-
Infra coordination
- Align on Kafka retention (1 day likely too short).
- Estimate event volume to size Kafka + Clickhouse storage.
- Validate PVC/DirectPV allocation strategy and limits.