Skip to content

Instantly share code, notes, and snippets.

@matthew-gerstman
Created February 9, 2026 01:33
Show Gist options
  • Select an option

  • Save matthew-gerstman/e6307bd9fcd921eaf0ed0fe96f34a082 to your computer and use it in GitHub Desktop.

Select an option

Save matthew-gerstman/e6307bd9fcd921eaf0ed0fe96f34a082 to your computer and use it in GitHub Desktop.
SSE Event Stream Performance Analysis — Bun 1.3.9 profiling results, benchmark data, and actionable improvements

SSE Event Stream Performance Analysis

Date: February 8, 2026 Runtime: Bun 1.3.9 (darwin arm64) Method: Custom stress test suite using Bun's built-in --cpu-prof-md and --heap-prof-md profiling tools


Executive Summary

We profiled the SSE (Server-Sent Events) pipeline end-to-end across 5 dimensions to identify bottlenecks and validate the architecture under load. The in-process pipeline is exceptionally well-optimized — capable of sustaining 1.9M events/sec with zero memory leaks. We identified 3 concrete improvements in the I/O layer that would improve reconnection latency and reduce redundant work during fan-out.


Architecture Under Test

eventsService.publish()
  → PostgreSQL storage + Redis PUBLISH
  → Fan-out: publishToUserChannel() → Promise.all(users.map(redis.publish))

Client connection:
  → getTopicObservable(topic)      # 1 Redis subscription per topic
    → formatSSEMessage()           # Format ONCE before share()
    → share({ resetOnRefCountZero: true })  # Fan to N subscribers
  → merge(eventStream$, heartbeat$)
  → createObservableSSEResponse()  # ReadableStream for HTTP

Key architectural wins already in place:

  • Format-before-share: formatSSEMessage runs once per event, not once per subscriber
  • Shared heartbeat: Single interval(2000) timer for all connections (not N timers)
  • Automatic cleanup: share({ resetOnRefCountZero: true }) prevents observable leaks

Benchmark Results

1. Pipeline Throughput

Measures event delivery through the RxJS observable pipeline (mock Redis, isolating in-process cost).

Subscribers Events/sec (per sub) Total Deliveries/sec Latency/event
1 1,894,388 1,894,388 0.53 us
10 1,767,696 17,676,964 0.57 us
100 963,121 96,312,113 1.04 us
1,000 63,629 63,629,422 15.72 us

Finding: Per-subscriber throughput holds steady up to 100 subscribers, confirming the format-before-share optimization works. At 1,000 subscribers, per-event cost rises to ~16us — still far below the ~1ms Redis RTT that dominates in production.

2. Fan-Out Scaling (Redis PUBLISH to N users)

Simulates the publishToUserChannel pattern: one event published to N user channels.

Users Avg Time/Event Events/sec
1 0.002 ms 518,360
10 0.003 ms 420,389
50 0.007 ms 147,956
100 0.013 ms 77,029
500 0.057 ms 17,620
1,000 0.103 ms 9,744

Finding: Fan-out scales linearly. At 500 users, each event takes ~57us in-process. With real Redis (~0.5ms RTT per PUBLISH), this becomes the dominant cost. Batching with MSET or Redis pipelines could reduce this.

3. Connection Churn (Registry Operations)

Measures SSEConnectionRegistry register/unregister throughput (5 Redis commands per register, 4 per unregister).

Operation Throughput
Sequential register 349,508 ops/sec
Sequential unregister 461,823 ops/sec
Concurrent register (100) 373,948 ops/sec
Churn cycle (register + unregister) 288,781 cycles/sec

Finding: Connection management is not a bottleneck. Even with 5 pipelined Redis commands per registration, throughput exceeds 280K ops/sec.

4. Memory Leak Detection

Tests the RxJS observable lifecycle for leaks under sustained load.

Scenario Heap Growth
500 subscribe/unsubscribe cycles 0.50 MB
100 subscribers x 10,000 events 0.06 MB

Finding: No memory leaks. share({ resetOnRefCountZero: true }) correctly cleans up Redis subscriptions when the last subscriber disconnects. Heap growth is negligible even after 500 full lifecycle cycles.

5. Formatting Performance (formatSSEMessage)

Measures JSON.stringify + SSE framing at different payload sizes.

Payload Size Events/sec (10K batch) Avg/event
Small (100B) 4,574,128 0.22 us
Medium (1KB) 2,688,082 0.37 us
Large (10KB) 580,663 1.72 us

Finding: JSON.stringify is the cost, not SSE framing. At 10KB payloads, throughput drops ~8x. For large payloads, consider pre-serializing at the publish site to avoid redundant stringify calls.


Actionable Improvements Found

1. Unbounded replay query (High Impact)

File: apps/api/src/redis/events.service.ts:722-727

// Current — no LIMIT clause
const events = await db
  .select()
  .from(schema.events)
  .where(and(...conditions))
  .orderBy(schema.events.pk)

If a client reconnects with a stale lastEventId, this query returns every event since that ID with no upper bound. A client offline for hours could trigger a query returning thousands of rows, causing latency spikes on reconnect.

Fix: Add .limit(100) (or a configurable cap). Clients that miss more than 100 events should full-refresh via the /hydrate endpoint anyway.

2. Redundant JSON.stringify in fan-out (Medium Impact)

File: apps/api/src/redis/events.service.ts:480-491

// Current — stringify happens inside the loop (once per user)
usersWithAccess.flatMap((userId: string) => {
  const message = JSON.stringify(eventData)  // ← repeated N times
  return [redisPublisher.publish(userChannel, message), ...]
})

JSON.stringify(eventData) is called once per user inside flatMap. The payload is identical for all users. Hoisting it above the loop eliminates N-1 redundant serializations.

Fix: Move const message = JSON.stringify(eventData) before the flatMap.

3. Serial replay across topics (Low Impact)

File: apps/api/src/utils/sse.manager.ts:259-264

// Current — sequential database queries
for (const topicItem of topics) {
  const recentEvents = await eventsService.getRecentEvents(topicItem, lastEventId)
  ...
}

When a client subscribes to multiple topics, replay queries execute sequentially. These are independent database queries that could run in parallel.

Fix: Use Promise.all to parallelize the replay queries.


Profiling Tools Reference

Bun 1.3.7+ includes built-in profiling that outputs human-readable markdown:

# CPU profile — hot functions, call tree, file breakdown
bun --cpu-prof-md apps/api/src/utils/sse.stress-test.ts

# Heap profile — top types by retained size, largest objects
bun --heap-prof-md apps/api/src/utils/sse.stress-test.ts

# Programmatic profiling (node:inspector/promises API)
# Outputs Chrome DevTools-compatible .cpuprofile files

Dual-Mode Test Pattern

The stress test file runs as both a bun:test suite (CI assertions) and a standalone script (profiling). Key discovery: import.meta.main is true in both bun test and bun run, so we use a try/catch on describe() — which throws synchronously outside the test runner — to detect mode.

# CI mode (assertions)
cd apps/api && bun test ./src/utils/sse.stress-test.ts

# Profiling mode (markdown output)
bun --cpu-prof-md apps/api/src/utils/sse.stress-test.ts

Conclusion

The SSE pipeline architecture is sound. The format-before-share pattern, shared heartbeat, and automatic cleanup via RxJS share() are all working correctly and performantly. The three improvements identified are in the I/O layer — an unbounded database query, a redundant serialization in the hot path, and serial queries that could be parallel. None are critical, but the replay query limit (#1) is worth prioritizing as a safety measure against reconnection storms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment