Date: February 8, 2026
Runtime: Bun 1.3.9 (darwin arm64)
Method: Custom stress test suite using Bun's built-in --cpu-prof-md and --heap-prof-md profiling tools
We profiled the SSE (Server-Sent Events) pipeline end-to-end across 5 dimensions to identify bottlenecks and validate the architecture under load. The in-process pipeline is exceptionally well-optimized — capable of sustaining 1.9M events/sec with zero memory leaks. We identified 3 concrete improvements in the I/O layer that would improve reconnection latency and reduce redundant work during fan-out.
eventsService.publish()
→ PostgreSQL storage + Redis PUBLISH
→ Fan-out: publishToUserChannel() → Promise.all(users.map(redis.publish))
Client connection:
→ getTopicObservable(topic) # 1 Redis subscription per topic
→ formatSSEMessage() # Format ONCE before share()
→ share({ resetOnRefCountZero: true }) # Fan to N subscribers
→ merge(eventStream$, heartbeat$)
→ createObservableSSEResponse() # ReadableStream for HTTP
Key architectural wins already in place:
- Format-before-share:
formatSSEMessageruns once per event, not once per subscriber - Shared heartbeat: Single
interval(2000)timer for all connections (not N timers) - Automatic cleanup:
share({ resetOnRefCountZero: true })prevents observable leaks
Measures event delivery through the RxJS observable pipeline (mock Redis, isolating in-process cost).
| Subscribers | Events/sec (per sub) | Total Deliveries/sec | Latency/event |
|---|---|---|---|
| 1 | 1,894,388 | 1,894,388 | 0.53 us |
| 10 | 1,767,696 | 17,676,964 | 0.57 us |
| 100 | 963,121 | 96,312,113 | 1.04 us |
| 1,000 | 63,629 | 63,629,422 | 15.72 us |
Finding: Per-subscriber throughput holds steady up to 100 subscribers, confirming the format-before-share optimization works. At 1,000 subscribers, per-event cost rises to ~16us — still far below the ~1ms Redis RTT that dominates in production.
Simulates the publishToUserChannel pattern: one event published to N user channels.
| Users | Avg Time/Event | Events/sec |
|---|---|---|
| 1 | 0.002 ms | 518,360 |
| 10 | 0.003 ms | 420,389 |
| 50 | 0.007 ms | 147,956 |
| 100 | 0.013 ms | 77,029 |
| 500 | 0.057 ms | 17,620 |
| 1,000 | 0.103 ms | 9,744 |
Finding: Fan-out scales linearly. At 500 users, each event takes ~57us in-process. With real Redis (~0.5ms RTT per PUBLISH), this becomes the dominant cost. Batching with MSET or Redis pipelines could reduce this.
Measures SSEConnectionRegistry register/unregister throughput (5 Redis commands per register, 4 per unregister).
| Operation | Throughput |
|---|---|
| Sequential register | 349,508 ops/sec |
| Sequential unregister | 461,823 ops/sec |
| Concurrent register (100) | 373,948 ops/sec |
| Churn cycle (register + unregister) | 288,781 cycles/sec |
Finding: Connection management is not a bottleneck. Even with 5 pipelined Redis commands per registration, throughput exceeds 280K ops/sec.
Tests the RxJS observable lifecycle for leaks under sustained load.
| Scenario | Heap Growth |
|---|---|
| 500 subscribe/unsubscribe cycles | 0.50 MB |
| 100 subscribers x 10,000 events | 0.06 MB |
Finding: No memory leaks. share({ resetOnRefCountZero: true }) correctly cleans up Redis subscriptions when the last subscriber disconnects. Heap growth is negligible even after 500 full lifecycle cycles.
Measures JSON.stringify + SSE framing at different payload sizes.
| Payload Size | Events/sec (10K batch) | Avg/event |
|---|---|---|
| Small (100B) | 4,574,128 | 0.22 us |
| Medium (1KB) | 2,688,082 | 0.37 us |
| Large (10KB) | 580,663 | 1.72 us |
Finding: JSON.stringify is the cost, not SSE framing. At 10KB payloads, throughput drops ~8x. For large payloads, consider pre-serializing at the publish site to avoid redundant stringify calls.
File: apps/api/src/redis/events.service.ts:722-727
// Current — no LIMIT clause
const events = await db
.select()
.from(schema.events)
.where(and(...conditions))
.orderBy(schema.events.pk)If a client reconnects with a stale lastEventId, this query returns every event since that ID with no upper bound. A client offline for hours could trigger a query returning thousands of rows, causing latency spikes on reconnect.
Fix: Add .limit(100) (or a configurable cap). Clients that miss more than 100 events should full-refresh via the /hydrate endpoint anyway.
File: apps/api/src/redis/events.service.ts:480-491
// Current — stringify happens inside the loop (once per user)
usersWithAccess.flatMap((userId: string) => {
const message = JSON.stringify(eventData) // ← repeated N times
return [redisPublisher.publish(userChannel, message), ...]
})JSON.stringify(eventData) is called once per user inside flatMap. The payload is identical for all users. Hoisting it above the loop eliminates N-1 redundant serializations.
Fix: Move const message = JSON.stringify(eventData) before the flatMap.
File: apps/api/src/utils/sse.manager.ts:259-264
// Current — sequential database queries
for (const topicItem of topics) {
const recentEvents = await eventsService.getRecentEvents(topicItem, lastEventId)
...
}When a client subscribes to multiple topics, replay queries execute sequentially. These are independent database queries that could run in parallel.
Fix: Use Promise.all to parallelize the replay queries.
Bun 1.3.7+ includes built-in profiling that outputs human-readable markdown:
# CPU profile — hot functions, call tree, file breakdown
bun --cpu-prof-md apps/api/src/utils/sse.stress-test.ts
# Heap profile — top types by retained size, largest objects
bun --heap-prof-md apps/api/src/utils/sse.stress-test.ts
# Programmatic profiling (node:inspector/promises API)
# Outputs Chrome DevTools-compatible .cpuprofile filesThe stress test file runs as both a bun:test suite (CI assertions) and a standalone script (profiling). Key discovery: import.meta.main is true in both bun test and bun run, so we use a try/catch on describe() — which throws synchronously outside the test runner — to detect mode.
# CI mode (assertions)
cd apps/api && bun test ./src/utils/sse.stress-test.ts
# Profiling mode (markdown output)
bun --cpu-prof-md apps/api/src/utils/sse.stress-test.tsThe SSE pipeline architecture is sound. The format-before-share pattern, shared heartbeat, and automatic cleanup via RxJS share() are all working correctly and performantly. The three improvements identified are in the I/O layer — an unbounded database query, a redundant serialization in the hot path, and serial queries that could be parallel. None are critical, but the replay query limit (#1) is worth prioritizing as a safety measure against reconnection storms.