Skip to content

Instantly share code, notes, and snippets.

@vedesh-padal
Created December 30, 2025 16:34
Show Gist options
  • Select an option

  • Save vedesh-padal/e37cc8e4db09060f047dbcbf9bc89222 to your computer and use it in GitHub Desktop.

Select an option

Save vedesh-padal/e37cc8e4db09060f047dbcbf9bc89222 to your computer and use it in GitHub Desktop.
Stateful vs Stateless Systems in Design

Stateful vs. Stateless Systems in Design

elaborated by Grok

In system design, a stateful system maintains client-specific data (state) across requests on the server, like user sessions or shopping carts. This state is often stored in memory or local storage, making the server "remember" interactions. A stateless system treats each request independently, with no retained state on the server—any needed state is passed by the client (e.g., via tokens or headers) or fetched from external shared storage.

Statelessness is key for scalability, reliability, and simplicity, especially in distributed environments like microservices or cloud setups.

Problem: State Sticking to One Server in Horizontal Scaling

Horizontal scaling adds more servers (instances) behind a load balancer to handle increased load. In stateful systems, state "sticks" to a specific server because:

  • Session Affinity (Sticky Sessions): The load balancer routes all requests from a client to the same server where the state was initially created. This uses techniques like:
    • IP Hashing: Hash client's IP to select server (but fails with NAT or mobile IPs).
    • Cookie-Based: Load balancer sets a cookie (e.g., AWS ALB's AWSALB) identifying the target server.
    • URL Parameter: Embed server ID in URL (rare, insecure).

This "stickiness" causes issues:

  • Single Point of Failure: If the sticky server crashes, state is lost, forcing user re-authentication or data loss.
  • Uneven Load: Bursts from sticky clients overload one server while others idle.
  • Scaling Challenges: Adding/removing servers requires rebalancing sticky sessions, potentially disrupting users.
  • Complexity: Load balancers must track affinities, increasing overhead.

Types of Sticky State

State that causes stickiness can be categorized by type and storage:

  1. Session State:

    • User login info, preferences, or temporary data (e.g., auth tokens, shopping cart in memory).
    • Stuck because sessions are often stored in server RAM (e.g., Java HttpSession, Node.js in-memory stores).
    • Types: Transient (short-lived, like CSRF tokens) vs. Persistent (longer, like user profiles).
  2. Application State:

    • Runtime data like counters, caches, or workflow status (e.g., multi-step form progress).
    • Stuck in local caches (e.g., Guava Cache) or server-specific databases.
  3. Connection State:

    • Long-lived connections (e.g., WebSockets for chat apps) tied to a server.
    • Stuck due to TCP/HTTP persistent connections.
  4. Data State:

    • Local files or databases (e.g., SQLite on server disk).
    • Stuck if not replicated across nodes.
  5. Computational State:

    • In-progress computations or queues (e.g., background jobs tied to a worker).

These lead to "affinity" requirements, limiting true horizontal scaling where any server can handle any request.

Solutions: Making Systems Stateless

To solve stickiness, redesign to offload state from individual servers, enabling any server to process any request. Core principle: "Share nothing" across servers; externalize state.

1. Externalize State Storage

  • Move state to shared, distributed stores accessible by all servers.
    • Databases: Use centralized or replicated DBs (e.g., PostgreSQL with replication, MongoDB sharding).
      • Example: Store session data in Redis (key-value store) with keys like session:user123 containing JSON-serialized state.
    • Caching Layers: Redis or Memcached for fast access.
      • Iteration: Client sends request → Server fetches state from Redis using client-provided ID (e.g., JWT token) → Processes → Updates Redis → Responds.
    • Object Storage: For files (e.g., S3 for user uploads).
  • Benefits: Fault-tolerant (state survives server failure); Scalable (add servers without migration).
  • Drawbacks: Latency from network calls; Ensure consistency (e.g., via transactions).

2. Client-Side State Management

  • Push state to the client, reducing server load.
    • Cookies: Store small state (e.g., preferences) in signed/encrypted cookies.
    • Tokens: Use JWT (JSON Web Tokens) for auth/session info—client sends token with each request; server verifies but doesn't store.
      • Example: Login → Server generates JWT with user data → Client stores in localStorage → Subsequent requests include JWT in headers → Server validates signature and extracts state.
    • URL Parameters: For stateless navigation (e.g., pagination tokens).
  • Benefits: No server storage; Infinite scale.
  • Drawbacks: Size limits (cookies <4KB); Security (tamper-proof with signatures).

3. Stateless Protocols and Designs

  • Adopt HTTP/REST principles: Each request self-contained.
    • Idempotent Operations: Ensure retries don't duplicate (e.g., use unique IDs for transactions).
    • Microservices: Design services as stateless (e.g., Kubernetes pods auto-scale without state migration).
  • For connections: Use message brokers (e.g., Kafka, RabbitMQ) to decouple; Clients reconnect to any server.

4. Load Balancer Enhancements

  • Even in semi-stateful setups, use advanced balancers:
    • Session Replication: Servers sync state via multicast or DB (e.g., Tomcat clustering)—but adds overhead, not truly stateless.
    • Sticky with Failover: Balancers detect failures and redirect to replicas with synced state.

5. Implementation Steps in System Design

  • Assess State: Audit app for stateful components (e.g., in-memory maps, files).
  • Refactor: Serialize state to external stores; Use libraries like Spring Session (Java) or express-session with Redis (Node.js).
  • Testing: Simulate failures (e.g., kill a server) to verify statelessness.
  • Monitoring: Track metrics like session hits/misses in external stores.
  • Example Architecture: API Gateway → Load Balancer → Stateless App Servers → Shared Redis/DB.

Trade-offs

  • Stateless: Easier scaling, but potential higher latency/cost from external calls.
  • Hybrid: Some state local for perf (e.g., read-only caches), synced periodically.

In practice, aim for "mostly stateless" in cloud-native apps (e.g., 12-Factor App principles). Tools like Docker/Kubernetes enforce this by treating containers as ephemeral.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment