You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DAW Problem Statement – Multi-Region Ride-Hailing Platform
Context
You are tasked with designing a multi-tenant, multi-region ride-hailing system (think
Uber/Ola).
The platform must handle driver–rider matching, dynamic surge pricing, trip
lifecycle management, and payments at scale with strict latency requirements.
Functional Requirements:
Real-time driver location ingestion: each online driver sends 1–2 updates per second
Multi-region: region-local writes, failover handling, no cross-region sync on hot path
Compliance: PCI, PII encryption, GDPR/DPDP
Constraints:
Use Kafka/Pulsar for events, Redis for hot KV, Postgres/CockroachDB for transactions
Clients are mobile with flaky networks; all APIs must be idempotent
Payments through external PSPs; their latency is outside your control
Deliverables
HLD: components, data flow, scaling, storage, trade-offs
LLD: deep dive into either Dispatch/Matching, Surge Pricing, or Trip Service
APIs & Events: request/response schemas and event topics
Data Model: ERD for chosen component
Resilience plan: retries, backpressure, circuit breakers, failure modes
Designing a multi-tenant, multi-region ride-hailing system.
The platform must handle driver–rider matching, dynamic surge pricing, trip
lifecycle management, and payments at scale with strict latency requirements.
Actors
Customer
Driver
Workflows
Customers
1. Customer Opens the App
Trigger: Customer opens the app
Actions:
Check and Wait for location permissions
Check if service is available in region
Check if payment pending.
Get a list of nearby locations upto 5km, from user
Show a list of available vehicles, maybe events, or poi nearby
Pre-condition:
Authenticated
Location permissions enabled
Trips never cross regions at runtime
Success:
Failure:
2. Customer Intends to Book Ride
Trigger: Searches and selects a location to ride to
Action:
Check if the target location is within geo-boundaries
Check for Vehicle types that can reach.
System fetches current surge multiplier for pickup geo-cell
An estimated price can be calculated (base + distance + surge) (the first part can be cached as well)
Success:
Customer sees rides available in the area
Customer sees estimated prices of rides, and approximate duration (can be from google or internal tracking)
Additional metadata like toll-paid or paid separately etc can be shown.
Failure:
Customer sees appropriate message, ride not available
Price estimation failed, but booking can still proceed.
3. Customer Sends a Booking Request
Trigger: User selects a ride type and clicks "Book"
Action:
Create a Booking Request, status = pending, radius = 5km
Publish to Q
Set a TTL of 15s
Success:
Driver accepts the booking, status = 'assigned'
Driver can also cancel booking after accepting status = 'cancelled' (in which case, new booking request)
Failure:
Retry 2x with increasing radius.
REQUEST_EXPIRED, REQUEST_ABORTED, REQUEST_RETRY
4. Real Time ETAs
Trigger: App polls every 5s the driver's Location, and Booking status (via booking handshake)
Pre-condition:
Driver - Booking - User relationship should exsists amd status = assigned
Action:
System tracks driver location in context of the Booking
5. Customer Payments
Trigger: Trip ends (triggered by driver), Booking status = completed
Action:
Booking status changes to completed
Calculate actual amount for ride
Create Payment Request for the Booking (Payment.status = pending)
Redirect to appropriate PSP page, wait for completion
En-Q a recon process as a delayed job (TTL 5 minute)
On PSP Callback, Enqueue to update the Payment status and timestamps
Success:
Update Booking.status = paid
Failure:
Driver
1. Driver toggles duty status
Trigger: Driver taps Go Online / Go Offline
Pre-condition:
Driver has registered and KYC verified
Actions:
Update Driver.status = online/offline
If online: start sending location pings (batch them on the client side to reduce ping rate)
If offline: stop location stream, clear from matching pool
2. Driver receives & accepts ride
Trigger: New BookingRequest assigned to driver
Actiond:
Push notification sent (booking details)
Driver has 10s to accept
On accept:
Booking.status = accepted
Driver.status = on_trip
Start navigation
Clear other offers
On decline/timeout:
Put the driver id in offered set (to select new driver)
3. Driver views hotspot map
Trigger: Driver opens "Hotspots" tab
Data: Aggregated DemandHeatMap (geo-cells with high booking rate)
erDiagram
Trip ||--|| Booking : "belongs_to"
Trip ||--|| User : "belongs_to (rider)"
Trip ||--|| User : "belongs_to (driver)"
Trip ||--o| Fare : "belongs_to"
Trip ||--o| Payment : "belongs_to"
Trip ||--o| Vehicle : "belongs_to"
Trip ||--o{ TripStateTransition : "has_many"
Trip ||--o{ TripLocation : "has_many"
Trip ||--o{ TripEvent : "has_many"
Booking ||--|| User : "belongs_to (rider)"
Booking ||--o| Trip : "has_one"
Booking ||--o{ Offer : "has_many"
Offer ||--|| Booking : "belongs_to"
Offer ||--|| User : "belongs_to (driver)"
User ||--o{ Trip : "has_many (as rider)"
User ||--o{ Trip : "has_many (as driver)"
User ||--o{ Booking : "has_many"
User ||--o{ Offer : "has_many"
User ||--o{ Vehicle : "has_many"
TripStateTransition ||--|| Trip : "belongs_to"
TripStateTransition ||--o| User : "belongs_to (actor)"
TripLocation ||--|| Trip : "belongs_to"
TripEvent ||--|| Trip : "belongs_to"
Fare ||--|| Booking : "belongs_to"
Fare ||--o| Trip : "has_one"
Payment ||--|| Booking : "belongs_to"
Payment ||--|| User : "belongs_to (rider)"
Payment ||--o| Trip : "has_one"
Vehicle ||--|| User : "belongs_to (driver)"
Vehicle ||--o{ Trip : "has_many"
Trip {
uuid trip_id PK
varchar booking_id UK "FK to Booking"
varchar rider_id "FK to User"
varchar driver_id "FK to User"
varchar vehicle_id "FK to Vehicle"
decimal pickup_lat
decimal pickup_lng
text pickup_address
decimal dropoff_lat
decimal dropoff_lng
text dropoff_address
varchar vehicle_type
varchar city_id
varchar status "enum"
timestamptz assigned_at
timestamptz started_at
timestamptz paused_at
timestamptz resumed_at
timestamptz ended_at
decimal distance_km
int duration_sec
varchar fare_id "FK to Fare"
varchar payment_id "FK to Payment"
varchar cancelled_by "enum"
text cancellation_reason
decimal cancellation_fee
varchar otp
timestamptz created_at
timestamptz updated_at
}
Booking {
varchar booking_id PK
varchar rider_id "FK to User"
decimal pickup_lat
decimal pickup_lng
text pickup_address
decimal destination_lat
decimal destination_lng
text destination_address
varchar vehicle_type "enum"
varchar city_id
varchar status "enum"
decimal estimated_fare
decimal surge_multiplier
varchar payment_method
timestamptz created_at
timestamptz updated_at
}
User {
varchar user_id PK
varchar name
varchar email
varchar phone
varchar type "RIDER, DRIVER"
decimal rating
int total_trips
varchar city_id
varchar status "enum"
timestamptz created_at
}
Offer {
varchar offer_id PK
varchar booking_id "FK to Booking"
varchar driver_id "FK to User"
varchar status "enum"
int distance_m
int eta_sec
int rank
decimal score
timestamptz created_at
timestamptz expires_at
}
TripStateTransition {
bigint id PK
uuid trip_id "FK to Trip"
varchar from_status
varchar to_status
varchar changed_by "enum"
varchar actor_id "FK to User"
text reason
jsonb metadata
timestamptz created_at
}
TripLocation {
bigint id PK
uuid trip_id "FK to Trip"
decimal lat
decimal lng
int accuracy_m
decimal speed_kmh
int bearing
timestamptz recorded_at
timestamptz received_at
}
TripEvent {
bigint id PK
uuid trip_id "FK to Trip"
varchar booking_id
varchar event_type
jsonb event_data
int sequence_number
timestamptz created_at
}
Fare {
varchar fare_id PK
varchar booking_id "FK to Booking"
decimal base_fare
decimal distance_fare
decimal time_fare
decimal surge_multiplier
decimal subtotal
decimal tax
decimal total
varchar currency
timestamptz created_at
}
Payment {
varchar payment_id PK
varchar booking_id "FK to Booking"
varchar rider_id "FK to User"
decimal amount
varchar currency
varchar status "enum"
varchar payment_method
varchar psp_transaction_id
timestamptz created_at
timestamptz completed_at
}
Vehicle {
varchar vehicle_id PK
varchar driver_id "FK to User"
varchar vehicle_type "enum"
varchar make
varchar model
varchar plate_number
varchar color
int year
varchar status "enum"
timestamptz created_at
}
Loading
P.S. This is a generated diagram, from the blobs of text below
Scaling, Consistency and Availability
Scale assumptions:
Each online driver sends 1–2 updates per second
300k concurrent drivers globally
60k ride requests/minute peak
500k location updates/second globally
Dispatch decision: p95 ≤ 1s
End-to-end request->acceptance: p95 ≤ 3s
End-to-end request->acceptance: p95 ≤ 3s
(hmmm, not sure about this, depends on the driver, human involved, incremental search and retry upto 1 minute)
Average: 20s
Cancel semantics (emit events to stop broadcasting)
Final trip state
Central co-ordinator to avoid race-conditions
HA (Critical)
Notification Service
Responsibilities:
SMS
Push Notifications
Emails for invoices (if needed)
Most of the system is event driven
Idempotency Handling
Traffic Patterns:
Endpoint is both ready and write heavy, because of the Transactional Outbox
Should be HA, but shouldn't affect core booking and payment.
At most 1 gurranttee
Reconciliation Jobs
Responsibilities:
It would be nice to keep the ActiveBookings, ActivePayments tables separate and small, to reduce the data size.
For databases, it would also mean reduce the index size as well. Recon jobs can cleanup/move these data across partitions.
Cleanup notifications from Outbox
Rebuild Locks in Memory when system Crashes
Payment recon workers to validate transaction status and see missed or double charges
Experiment Service (control plane)
Responsibilities:
Flags
Experiments
Kill switches
Audits
Reads HA - Shouldn't block, Eventually C
Writes: Strong C
Localisation Engine
Flow Diagrams
Ride Discovery:
sequenceDiagram
participant Rider
participant Gateway
participant Identity
participant Policy
participant Pricing
participant Supply
participant Maps
Rider->>Gateway: App Launch (lat, lng)
Gateway->>Identity: Validate session
Identity-->>Gateway: OK
par Parallel Reads
Gateway->>Policy: Serviceable? city rules
Gateway->>Supply: Nearby vehicle supply snapshot
Gateway->>Pricing: Fare estimate request
end
Pricing->>Maps: Distance + ETA (approx)
Maps-->>Pricing: distance, duration
Pricing->>Policy: Surge multiplier
Policy-->>Pricing: surge
Pricing-->>Gateway: Fare ranges per vehicle type
Supply-->>Gateway: Vehicle availability
Policy-->>Gateway: Service flags
Gateway-->>Rider: Launch screen payload
Loading
MatchMaking:
sequenceDiagram
Kafka->>Matchmaking: booking.created
Matchmaking->>Redis: drivers:cell + neighbors
Matchmaking->>Matchmaking: rank & filter
loop Candidates
Matchmaking->>Redis: SET lock:driver:{id} TTL
Matchmaking->>Redis: SET offer:{offer_id} TTL
end
Matchmaking->>Kafka: offers.created
Partition Key, booking_id will be hashed, to choose one of the partitions.
Having a booking_id as partition allows ordered events for a booking
Having a city_id allows to prevent hotspots
For reconstruction the Postgres outbox snapshots table dispatch_attempts can be used, for the ordering
Given the traffic we need to benchmark to see, if pgpool would handle the traffic, or given kafka as a durable store
Make database writes Async.
Another options is to only use the Append only Log, or an LSM tree backed database, like rocksdb which are better
at handling writes at the cost of compaction.
Caches
Geo-spatial index using H3 cells, Updatd by Driver Supply Service
Key: drivers:h3:{city_id}:{h3_cell}
Type: ZSET
Members: driver_id
Score: last_seen_timestamp_ms
TTL: None (members auto-prune based on score in queries)
Example:
ZADD drivers:h3:blr:89283082837ffff 1703001234000 drv_001
ZADD drivers:h3:blr:89283082837ffff 1703001235000 drv_002
Query:
ZRANGEBYSCORE drivers:h3:blr:89283082837ffff
(now_ms - 30000) now_ms # Only fresh entries
This can be re-constructed by replaying in the Kafka events
Locking
Intermediate Lock to Prevent multiple ride assignements to driver
Key: lock:driver:{city_id}:{driver_id}
Type: STRING
Value: booking_id (which booking locked this driver)
TTL: 15 seconds (offer acceptance window)
SET lock:driver:blr:drv_001 bkg_123 EX 15
Lock to prevent multiple Matchmaking consumers from processing same booking.
Similar to caching for Transactional Outbox
SET lock:dispatch:{booking_id} {worker_id} NX EX 60
If lock acquired:
- Process dispatch
- Release lock after completion
Else:
- Skip (another worker handling it)
This can be re-constructed based on database and events
Hotspot Cache
Key: hotspot:{city_id}:{vehicle_type}
Type: SORTED SET
Members: h3_cell_id
Score: demand_score (0.0 to 1.0)
TTL: 120 seconds (refreshed by stream processor)
Example:
ZADD hotspot:blr:AUTO 0.85 89283082837ffff 0.72 89283082838ffff
EXPIRE hotspot:blr:AUTO 120
# Top 10 hotspots
ZREVRANGE hotspot:blr:AUTO 0 9 WITHSCORES
sequenceDiagram
participant Kafka as Kafka<br/>(Event Log)
participant Worker as Matchmaking<br/>Worker
participant Postgres as Postgres<br/>(Snapshots)
participant Redis as Redis<br/>(Ephemeral)
Note over Kafka,Redis: Normal Flow - Events + Snapshots
Kafka->>Worker: booking.created<br/>(bkg_123, 10:00:00)
Worker->>Worker: state = {status: SEARCHING}
Worker->>Postgres: INSERT dispatch_attempts<br/>{booking_id: bkg_123,<br/>status: SEARCHING,<br/>attempt: 1,<br/>started_at: 10:00:00}
Worker->>Redis: SET dispatch:state:blr:bkg_123<br/>(TTL 300s)
Note over Worker: Snapshot #1 saved to Postgres
Kafka->>Worker: offers.created<br/>(5 drivers, 10:00:01)
Worker->>Worker: state = {status: OFFERS_SENT,<br/>offers_pending: 5}
Worker->>Postgres: UPDATE dispatch_attempts<br/>SET status=OFFERS_SENT,<br/>offers_pending=5,<br/>last_updated=10:00:01
Worker->>Redis: HSET dispatch:state:blr:bkg_123<br/>status OFFERS_SENT
Note over Worker: Snapshot #2 updated in Postgres
Kafka->>Worker: offer.declined<br/>(drv_001, 10:00:08)
Worker->>Worker: state.offers_pending--
Worker->>Postgres: UPDATE dispatch_attempts<br/>SET offers_pending=4,<br/>offers_declined=1,<br/>last_updated=10:00:08
Worker->>Redis: HSET offers_pending 4
Note over Worker: Snapshot #3 updated in Postgres
rect rgb(255, 200, 200)
Note over Worker: WORKER CRASHES at 10:00:09
end
Note over Kafka,Redis: Worker Recovery - Using Snapshots
Worker->>Worker: New worker starts (10:00:15)
Worker->>Postgres: SELECT * FROM dispatch_attempts<br/>WHERE booking_id='bkg_123'
Postgres-->>Worker: Latest snapshot:<br/>{status: OFFERS_SENT,<br/>offers_pending: 4,<br/>offers_declined: 1,<br/>last_updated: 10:00:08}
Note over Worker: State restored instantly<br/>from snapshot!
Worker->>Kafka: Fetch events AFTER 10:00:08<br/>(optional - for freshness)
Kafka-->>Worker: [offer.declined at 10:00:09,<br/>trip.created at 10:00:11]
Worker->>Worker: Apply recent events:<br/>state.offers_pending = 3<br/>state.status = ASSIGNED
Worker->>Redis: Rebuild cache:<br/>SET dispatch:state:blr:bkg_123
Note over Worker: Resume processing from<br/>current state
rect rgb(200, 255, 200)
Note over Worker: Recovery complete<br/>Total time: <1 second
end
Note over Kafka,Redis: Without Snapshots (Pure Event Sourcing)
Worker->>Kafka: Must replay ALL events<br/>from topic beginning
Kafka-->>Worker: Event 1: booking.created<br/>Event 2: offers.created<br/>Event 3-100: offer.declined...<br/>(replay 100s of events)
Note over Worker: ❌ Slow reconstruction<br/>Total time: 10-30 seconds
Loading
Trip Service
Database Tables:
trips (
trip_id UUID,
booking_id VARCHAR(64) NOT NULL UNIQUE,
-- Participants
rider_id NOT NULL,
driver_id NOT NULL,
-- Location
pickup_lat DECIMAL(10, 8) NOT NULL,
pickup_lng DECIMAL(11, 8) NOT NULL,
dropoff_lat <same>,
dropoff_lng <same>,
pickup_address TEXT,
dropoff_address TEXT,
vehicle_type ENUM/VARCHAR,
city_id VARCHARNOT NULL,
status VARCHARNOT NULL,
-- DRIVER_ASSIGNED, IN_PROGRESS, PAUSED, COMPLETED, CANCELLED, PAID-- Timestamps
assigned_at TIMESTAMPTZNOT NULL,
started_at TIMESTAMPTZ,
paused_at TIMESTAMPTZ,
resumed_at TIMESTAMPTZ,
ended_at TIMESTAMPTZ,
-- Trip metrics (filled on completion)
distance_km DECIMAL(6, 2),
duration_sec INT,
-- References
fare_id VARCHAR(64), -- FK to pricing service
payment_id VARCHAR(64), -- FK to payment service-- Cancellation
cancelled_by VARCHAR(16), -- DRIVER, RIDER, SYSTEM
cancellation_reason TEXT,
cancellation_fee DECIMAL(10, 2),
-- Audit
created_at TIMESTAMPTZNOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZNOT NULL DEFAULT NOW(),
-- Indexes
INDEX idx_booking_id (booking_id),
INDEX idx_rider_id (rider_id, created_at DESC),
INDEX idx_driver_id (driver_id, created_at DESC),
INDEX idx_status (status, created_at DESC),
INDEX idx_city_created (city_id, created_at DESC)
);
-- Composite index for active trips queryCREATEINDEXidx_active_tripsON trips(status, rider_id)
WHERE status IN ('DRIVER_ASSIGNED', 'IN_PROGRESS', 'PAUSED');
-- Partition by created_at (monthly) for scalabilityCREATETABLEtrips_2025_12 PARTITION OF trips
FOR VALUESFROM ('2025-12-01') TO ('2026-01-01');
trip_state_transitions (
id BIGSERIALPRIMARY KEY,
trip_id UUID NOT NULLREFERENCES trips(trip_id),
from_status VARCHAR(32),
to_status VARCHAR(32) NOT NULL,
changed_by VARCHAR(16) NOT NULL,
actor_id VARCHAR(64),
reason TEXT,
metadata JSONB,
created_at TIMESTAMPTZNOT NULL DEFAULT NOW(),
INDEX idx_trip_id (trip_id, created_at DESC)
);
-- Partition by created_at (monthly)-- Retention: 90 days
trip.locations
trip.events (although present in Kafka, could be async write to database)
Key: trip:location:{trip_id}
Type: GEOHASH
Members: timestamped location points
TTL: 1 hour after trip ends
Purpose: Fast location queries for ETA
Example:
GEOADD trip:location:trp_001 77.610234 12.934523 "loc_1703001234"
Key: lock:trip:{trip_id}
Type: STRING
Value: {operation}:{worker_id}
TTL: 30 seconds
Purpose: Prevent concurrent state transitions
induced by driver, rider modifying the same trip record
Example:
SET lock:trip:trp_001 "start:worker-3" NX EX 30
Key: trip:location:{trip_id}
Type: GEOHASH
Members: timestamped location points
TTL: 1 hour after trip ends
Purpose: Fast location queries for ETA
Example:
GEOADD trip:location:trp_001 77.610234 12.934523 "loc_1703001234"
States
SEARCHING -> ASSIGNED -> IN_PROGRESS -> COMPLETED
Events
trip.created
Consumed by:
Matchmaking Service (cancel other offers)
Notification Service (notify rider "Driver assigned")
Analytics Service
trip.started
Consumers:
Push Notification/SMS Service (notify rider "Trip started")
Analytics Service (start tracking metrics)
Driver Supply Service (mark driver as on_trip)
trip.location_updated
Consumers:
Rider App (via WebSocket server that subscribes to this, or http poll)
Analytics Service (tracking, heatmaps)
Trip Service itself (async write to trip_locations table)
trip.completed
Consumers:
Payment Service (initiate payment)
Notification Service (send receipt, rating prompt)
Driver Supply Service (mark driver available)
Analytics Service
trip.cancelled:
Consumers:
Payment Service (charge cancellation fee if applicable)
GET /trips/active
Headers:
X-User-ID: usr_456
Response: 200 OK
{
"trip": {
"trip_id": "trp_001",
"status": "IN_PROGRESS",
...
}
}
Response: 404 Not Found (if no active trip)
{
"error": "NO_ACTIVE_TRIP"
}
Accept Offer & Create Trip
Driver clicks "Accept" in app
|
POST /trips/accept {offer_id}
|
Trip Service:
1. Validate offer exists in Redis
2. Check offer not expired
3. Atomic Redis operation (Lua):
- Check driver not locked
- Check offer status = PENDING
- Set offer = ACCEPTED
- Lock driver
4. Insert trip record in Postgres (idempotent)
5. Write to Redis caches (active trip bindings)
6. Publish trip.created event to Kafka
7. Return 201 with trip details
↓
Matchmaking Service consumes trip.created:
- Cancel other pending offers
- Release other driver locks
|
Notification Service:
- Send push to rider "Driver assigned"
- Send SMS with driver details
Request: GET /trips/{trip_id}
Layer 1: Redis (Hot Cache)
├─ Check: trip:active:{city}:{trip_id}
├─ Hit: Return immediately (5ms)
└─ Miss: Go to Layer 2
Layer 2: Postgres (Source of Truth)
├─ Query: SELECT * FROM trips WHERE trip_id = ?
├─ Hit: Return + Write to Redis (50ms)
└─ Miss: 404 Not Found
Cache Invalidation:
├─ On state change: Update Redis + Postgres
├─ On trip end: TTL expires in 1 hour
└─ On trip cancel: Delete from Redis
State Change (e.g., trip started):
1. Acquire lock
2. Write to Postgres (source of truth)
3. Write to Redis (invalidate + update)
4. Publish event to Kafka
5. Release lock
6. Return to client
If Redis write fails:
- Log error
- Continue (Postgres is source of truth)
- Next read will cache
The platform must handle driver–rider matching, dynamic surge pricing, trip
lifecycle management, and payments at scale with strict latency requirements.
Actors
Customer
Driver
Workflows
Customers
1. Customer Opens the App
Trigger: Customer opens the app
Actions:
Check and Wait for location permissions
Check if service is available in region
Check if payment pending.
Get a list of nearby locations upto 5km, from user
Show a list of available vehicles, maybe events, or poi nearby
Pre-condition:
Authenticated
Location permissions enabled
Trips never cross regions at runtime
Success:
Failure:
2. Customer Intends to Book Ride
Trigger: Searches and selects a location to ride to
Action:
Check if the target location is within geo-boundaries
Check for Vehicle types that can reach.
System fetches current surge multiplier for pickup geo-cell
An estimated price can be calculated (base + distance + surge) (the first part can be cached as well)
Success:
Customer sees rides available in the area
Customer sees estimated prices of rides, and approximate duration (can be from google or internal tracking)
Additional metadata like toll-paid or paid separately etc can be shown.
Failure:
Customer sees appropriate message, ride not available
Price estimation failed, but booking can still proceed.
3. Customer Sends a Booking Request
Trigger: User selects a ride type and clicks "Book"
Action:
Create a Booking Request, status = pending, radius = 5km
Publish to Q
Set a TTL of 15s
Success:
Driver accepts the booking, status = 'assigned'
Driver can also cancel booking after accepting status = 'cancelled' (in which case, new booking request)
Failure:
Retry 2x with increasing radius.
REQUEST_EXPIRED, REQUEST_ABORTED, REQUEST_RETRY
4. Real Time ETAs
Trigger: App polls every 5s the driver's Location, and Booking status (via booking handshake)
Pre-condition:
Driver - Booking - User relationship should exsists amd status = assigned
Action:
System tracks driver location in context of the Booking
5. Customer Payments
Trigger: Trip ends (triggered by driver), Booking status = completed
Action:
Booking status changes to completed
Calculate actual amount for ride
Create Payment Request for the Booking (Payment.status = pending)
Redirect to appropriate PSP page, wait for completion
En-Q a recon process as a delayed job (TTL 5 minute)
On PSP Callback, Enqueue to update the Payment status and timestamps
Success:
Update Booking.status = paid
Failure:
Driver
1. Driver toggles duty status
Trigger: Driver taps Go Online / Go Offline
Pre-condition:
Driver has registered and KYC verified
Actions:
Update Driver.status = online/offline
If online: start sending location pings (batch them on the client side to reduce ping rate)
If offline: stop location stream, clear from matching pool
2. Driver receives & accepts ride
Trigger: New BookingRequest assigned to driver
Actiond:
Push notification sent (booking details)
Driver has 10s to accept
On accept:
Booking.status = accepted
Driver.status = on_trip
Start navigation
Clear other offers
On decline/timeout:
Put the driver id in offered set (to select new driver)
3. Driver views hotspot map
Trigger: Driver opens "Hotspots" tab
Data: Aggregated DemandHeatMap (geo-cells with high booking rate)
erDiagram
Trip ||--|| Booking : "belongs_to"
Trip ||--|| User : "belongs_to (rider)"
Trip ||--|| User : "belongs_to (driver)"
Trip ||--o| Fare : "belongs_to"
Trip ||--o| Payment : "belongs_to"
Trip ||--o| Vehicle : "belongs_to"
Trip ||--o{ TripStateTransition : "has_many"
Trip ||--o{ TripLocation : "has_many"
Trip ||--o{ TripEvent : "has_many"
Booking ||--|| User : "belongs_to (rider)"
Booking ||--o| Trip : "has_one"
Booking ||--o{ Offer : "has_many"
Offer ||--|| Booking : "belongs_to"
Offer ||--|| User : "belongs_to (driver)"
User ||--o{ Trip : "has_many (as rider)"
User ||--o{ Trip : "has_many (as driver)"
User ||--o{ Booking : "has_many"
User ||--o{ Offer : "has_many"
User ||--o{ Vehicle : "has_many"
TripStateTransition ||--|| Trip : "belongs_to"
TripStateTransition ||--o| User : "belongs_to (actor)"
TripLocation ||--|| Trip : "belongs_to"
TripEvent ||--|| Trip : "belongs_to"
Fare ||--|| Booking : "belongs_to"
Fare ||--o| Trip : "has_one"
Payment ||--|| Booking : "belongs_to"
Payment ||--|| User : "belongs_to (rider)"
Payment ||--o| Trip : "has_one"
Vehicle ||--|| User : "belongs_to (driver)"
Vehicle ||--o{ Trip : "has_many"
Trip {
uuid trip_id PK
varchar booking_id UK "FK to Booking"
varchar rider_id "FK to User"
varchar driver_id "FK to User"
varchar vehicle_id "FK to Vehicle"
decimal pickup_lat
decimal pickup_lng
text pickup_address
decimal dropoff_lat
decimal dropoff_lng
text dropoff_address
varchar vehicle_type
varchar city_id
varchar status "enum"
timestamptz assigned_at
timestamptz started_at
timestamptz paused_at
timestamptz resumed_at
timestamptz ended_at
decimal distance_km
int duration_sec
varchar fare_id "FK to Fare"
varchar payment_id "FK to Payment"
varchar cancelled_by "enum"
text cancellation_reason
decimal cancellation_fee
varchar otp
timestamptz created_at
timestamptz updated_at
}
Booking {
varchar booking_id PK
varchar rider_id "FK to User"
decimal pickup_lat
decimal pickup_lng
text pickup_address
decimal destination_lat
decimal destination_lng
text destination_address
varchar vehicle_type "enum"
varchar city_id
varchar status "enum"
decimal estimated_fare
decimal surge_multiplier
varchar payment_method
timestamptz created_at
timestamptz updated_at
}
User {
varchar user_id PK
varchar name
varchar email
varchar phone
varchar type "RIDER, DRIVER"
decimal rating
int total_trips
varchar city_id
varchar status "enum"
timestamptz created_at
}
Offer {
varchar offer_id PK
varchar booking_id "FK to Booking"
varchar driver_id "FK to User"
varchar status "enum"
int distance_m
int eta_sec
int rank
decimal score
timestamptz created_at
timestamptz expires_at
}
TripStateTransition {
bigint id PK
uuid trip_id "FK to Trip"
varchar from_status
varchar to_status
varchar changed_by "enum"
varchar actor_id "FK to User"
text reason
jsonb metadata
timestamptz created_at
}
TripLocation {
bigint id PK
uuid trip_id "FK to Trip"
decimal lat
decimal lng
int accuracy_m
decimal speed_kmh
int bearing
timestamptz recorded_at
timestamptz received_at
}
TripEvent {
bigint id PK
uuid trip_id "FK to Trip"
varchar booking_id
varchar event_type
jsonb event_data
int sequence_number
timestamptz created_at
}
Fare {
varchar fare_id PK
varchar booking_id "FK to Booking"
decimal base_fare
decimal distance_fare
decimal time_fare
decimal surge_multiplier
decimal subtotal
decimal tax
decimal total
varchar currency
timestamptz created_at
}
Payment {
varchar payment_id PK
varchar booking_id "FK to Booking"
varchar rider_id "FK to User"
decimal amount
varchar currency
varchar status "enum"
varchar payment_method
varchar psp_transaction_id
timestamptz created_at
timestamptz completed_at
}
Vehicle {
varchar vehicle_id PK
varchar driver_id "FK to User"
varchar vehicle_type "enum"
varchar make
varchar model
varchar plate_number
varchar color
int year
varchar status "enum"
timestamptz created_at
}
Loading
P.S. This is a generated diagram, from the blobs of text below
Scaling, Consistency and Availability
Scale assumptions:
Each online driver sends 1–2 updates per second
300k concurrent drivers globally
60k ride requests/minute peak
500k location updates/second globally
Dispatch decision: p95 ≤ 1s
End-to-end request->acceptance: p95 ≤ 3s
End-to-end request->acceptance: p95 ≤ 3s
(hmmm, not sure about this, depends on the driver, human involved, incremental search and retry upto 1 minute)
Average: 20s
Cancel semantics (emit events to stop broadcasting)
Final trip state
Central co-ordinator to avoid race-conditions
HA (Critical)
Notification Service
Responsibilities:
SMS
Push Notifications
Emails for invoices (if needed)
Most of the system is event driven
Idempotency Handling
Traffic Patterns:
Endpoint is both ready and write heavy, because of the Transactional Outbox
Should be HA, but shouldn't affect core booking and payment.
At most 1 gurranttee
Reconciliation Jobs
Responsibilities:
It would be nice to keep the ActiveBookings, ActivePayments tables separate and small, to reduce the data size.
For databases, it would also mean reduce the index size as well. Recon jobs can cleanup/move these data across partitions.
Cleanup notifications from Outbox
Rebuild Locks in Memory when system Crashes
Payment recon workers to validate transaction status and see missed or double charges
Experiment Service (control plane)
Responsibilities:
Flags
Experiments
Kill switches
Audits
Reads HA - Shouldn't block, Eventually C
Writes: Strong C
Localisation Engine
Flow Diagrams
Ride Discovery:
sequenceDiagram
participant Rider
participant Gateway
participant Identity
participant Policy
participant Pricing
participant Supply
participant Maps
Rider->>Gateway: App Launch (lat, lng)
Gateway->>Identity: Validate session
Identity-->>Gateway: OK
par Parallel Reads
Gateway->>Policy: Serviceable? city rules
Gateway->>Supply: Nearby vehicle supply snapshot
Gateway->>Pricing: Fare estimate request
end
Pricing->>Maps: Distance + ETA (approx)
Maps-->>Pricing: distance, duration
Pricing->>Policy: Surge multiplier
Policy-->>Pricing: surge
Pricing-->>Gateway: Fare ranges per vehicle type
Supply-->>Gateway: Vehicle availability
Policy-->>Gateway: Service flags
Gateway-->>Rider: Launch screen payload
Loading
MatchMaking:
sequenceDiagram
Kafka->>Matchmaking: booking.created
Matchmaking->>Redis: drivers:cell + neighbors
Matchmaking->>Matchmaking: rank & filter
loop Candidates
Matchmaking->>Redis: SET lock:driver:{id} TTL
Matchmaking->>Redis: SET offer:{offer_id} TTL
end
Matchmaking->>Kafka: offers.created
Partition Key, booking_id will be hashed, to choose one of the partitions.
Having a booking_id as partition allows ordered events for a booking
Having a city_id allows to prevent hotspots
For reconstruction the Postgres outbox snapshots table dispatch_attempts can be used, for the ordering
Given the traffic we need to benchmark to see, if pgpool would handle the traffic, or given kafka as a durable store
Make database writes Async.
Another options is to only use the Append only Log, or an LSM tree backed database, like rocksdb which are better
at handling writes at the cost of compaction.
Caches
Geo-spatial index using H3 cells, Updatd by Driver Supply Service
Key: drivers:h3:{city_id}:{h3_cell}
Type: ZSET
Members: driver_id
Score: last_seen_timestamp_ms
TTL: None (members auto-prune based on score in queries)
Example:
ZADD drivers:h3:blr:89283082837ffff 1703001234000 drv_001
ZADD drivers:h3:blr:89283082837ffff 1703001235000 drv_002
Query:
ZRANGEBYSCORE drivers:h3:blr:89283082837ffff
(now_ms - 30000) now_ms # Only fresh entries
This can be re-constructed by replaying in the Kafka events
Locking
Intermediate Lock to Prevent multiple ride assignements to driver
Key: lock:driver:{city_id}:{driver_id}
Type: STRING
Value: booking_id (which booking locked this driver)
TTL: 15 seconds (offer acceptance window)
SET lock:driver:blr:drv_001 bkg_123 EX 15
Lock to prevent multiple Matchmaking consumers from processing same booking.
Similar to caching for Transactional Outbox
SET lock:dispatch:{booking_id} {worker_id} NX EX 60
If lock acquired:
- Process dispatch
- Release lock after completion
Else:
- Skip (another worker handling it)
This can be re-constructed based on database and events
Hotspot Cache
Key: hotspot:{city_id}:{vehicle_type}
Type: SORTED SET
Members: h3_cell_id
Score: demand_score (0.0 to 1.0)
TTL: 120 seconds (refreshed by stream processor)
Example:
ZADD hotspot:blr:AUTO 0.85 89283082837ffff 0.72 89283082838ffff
EXPIRE hotspot:blr:AUTO 120
# Top 10 hotspots
ZREVRANGE hotspot:blr:AUTO 0 9 WITHSCORES
sequenceDiagram
participant Kafka as Kafka<br/>(Event Log)
participant Worker as Matchmaking<br/>Worker
participant Postgres as Postgres<br/>(Snapshots)
participant Redis as Redis<br/>(Ephemeral)
Note over Kafka,Redis: Normal Flow - Events + Snapshots
Kafka->>Worker: booking.created<br/>(bkg_123, 10:00:00)
Worker->>Worker: state = {status: SEARCHING}
Worker->>Postgres: INSERT dispatch_attempts<br/>{booking_id: bkg_123,<br/>status: SEARCHING,<br/>attempt: 1,<br/>started_at: 10:00:00}
Worker->>Redis: SET dispatch:state:blr:bkg_123<br/>(TTL 300s)
Note over Worker: Snapshot #1 saved to Postgres
Kafka->>Worker: offers.created<br/>(5 drivers, 10:00:01)
Worker->>Worker: state = {status: OFFERS_SENT,<br/>offers_pending: 5}
Worker->>Postgres: UPDATE dispatch_attempts<br/>SET status=OFFERS_SENT,<br/>offers_pending=5,<br/>last_updated=10:00:01
Worker->>Redis: HSET dispatch:state:blr:bkg_123<br/>status OFFERS_SENT
Note over Worker: Snapshot #2 updated in Postgres
Kafka->>Worker: offer.declined<br/>(drv_001, 10:00:08)
Worker->>Worker: state.offers_pending--
Worker->>Postgres: UPDATE dispatch_attempts<br/>SET offers_pending=4,<br/>offers_declined=1,<br/>last_updated=10:00:08
Worker->>Redis: HSET offers_pending 4
Note over Worker: Snapshot #3 updated in Postgres
rect rgb(255, 200, 200)
Note over Worker: WORKER CRASHES at 10:00:09
end
Note over Kafka,Redis: Worker Recovery - Using Snapshots
Worker->>Worker: New worker starts (10:00:15)
Worker->>Postgres: SELECT * FROM dispatch_attempts<br/>WHERE booking_id='bkg_123'
Postgres-->>Worker: Latest snapshot:<br/>{status: OFFERS_SENT,<br/>offers_pending: 4,<br/>offers_declined: 1,<br/>last_updated: 10:00:08}
Note over Worker: State restored instantly<br/>from snapshot!
Worker->>Kafka: Fetch events AFTER 10:00:08<br/>(optional - for freshness)
Kafka-->>Worker: [offer.declined at 10:00:09,<br/>trip.created at 10:00:11]
Worker->>Worker: Apply recent events:<br/>state.offers_pending = 3<br/>state.status = ASSIGNED
Worker->>Redis: Rebuild cache:<br/>SET dispatch:state:blr:bkg_123
Note over Worker: Resume processing from<br/>current state
rect rgb(200, 255, 200)
Note over Worker: Recovery complete<br/>Total time: <1 second
end
Note over Kafka,Redis: Without Snapshots (Pure Event Sourcing)
Worker->>Kafka: Must replay ALL events<br/>from topic beginning
Kafka-->>Worker: Event 1: booking.created<br/>Event 2: offers.created<br/>Event 3-100: offer.declined...<br/>(replay 100s of events)
Note over Worker: ❌ Slow reconstruction<br/>Total time: 10-30 seconds
Loading
Trip Service
Database Tables:
trips (
trip_id UUID,
booking_id VARCHAR(64) NOT NULL UNIQUE,
-- Participants
rider_id NOT NULL,
driver_id NOT NULL,
-- Location
pickup_lat DECIMAL(10, 8) NOT NULL,
pickup_lng DECIMAL(11, 8) NOT NULL,
dropoff_lat <same>,
dropoff_lng <same>,
pickup_address TEXT,
dropoff_address TEXT,
vehicle_type ENUM/VARCHAR,
city_id VARCHARNOT NULL,
status VARCHARNOT NULL,
-- DRIVER_ASSIGNED, IN_PROGRESS, PAUSED, COMPLETED, CANCELLED, PAID-- Timestamps
assigned_at TIMESTAMPTZNOT NULL,
started_at TIMESTAMPTZ,
paused_at TIMESTAMPTZ,
resumed_at TIMESTAMPTZ,
ended_at TIMESTAMPTZ,
-- Trip metrics (filled on completion)
distance_km DECIMAL(6, 2),
duration_sec INT,
-- References
fare_id VARCHAR(64), -- FK to pricing service
payment_id VARCHAR(64), -- FK to payment service-- Cancellation
cancelled_by VARCHAR(16), -- DRIVER, RIDER, SYSTEM
cancellation_reason TEXT,
cancellation_fee DECIMAL(10, 2),
-- Audit
created_at TIMESTAMPTZNOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZNOT NULL DEFAULT NOW(),
-- Indexes
INDEX idx_booking_id (booking_id),
INDEX idx_rider_id (rider_id, created_at DESC),
INDEX idx_driver_id (driver_id, created_at DESC),
INDEX idx_status (status, created_at DESC),
INDEX idx_city_created (city_id, created_at DESC)
);
-- Composite index for active trips queryCREATEINDEXidx_active_tripsON trips(status, rider_id)
WHERE status IN ('DRIVER_ASSIGNED', 'IN_PROGRESS', 'PAUSED');
-- Partition by created_at (monthly) for scalabilityCREATETABLEtrips_2025_12 PARTITION OF trips
FOR VALUESFROM ('2025-12-01') TO ('2026-01-01');
trip_state_transitions (
id BIGSERIALPRIMARY KEY,
trip_id UUID NOT NULLREFERENCES trips(trip_id),
from_status VARCHAR(32),
to_status VARCHAR(32) NOT NULL,
changed_by VARCHAR(16) NOT NULL,
actor_id VARCHAR(64),
reason TEXT,
metadata JSONB,
created_at TIMESTAMPTZNOT NULL DEFAULT NOW(),
INDEX idx_trip_id (trip_id, created_at DESC)
);
-- Partition by created_at (monthly)-- Retention: 90 days
trip.locations
trip.events (although present in Kafka, could be async write to database)
Key: trip:location:{trip_id}
Type: GEOHASH
Members: timestamped location points
TTL: 1 hour after trip ends
Purpose: Fast location queries for ETA
Example:
GEOADD trip:location:trp_001 77.610234 12.934523 "loc_1703001234"
Key: lock:trip:{trip_id}
Type: STRING
Value: {operation}:{worker_id}
TTL: 30 seconds
Purpose: Prevent concurrent state transitions
induced by driver, rider modifying the same trip record
Example:
SET lock:trip:trp_001 "start:worker-3" NX EX 30
Key: trip:location:{trip_id}
Type: GEOHASH
Members: timestamped location points
TTL: 1 hour after trip ends
Purpose: Fast location queries for ETA
Example:
GEOADD trip:location:trp_001 77.610234 12.934523 "loc_1703001234"
States
SEARCHING -> ASSIGNED -> IN_PROGRESS -> COMPLETED
Events
trip.created
Consumed by:
Matchmaking Service (cancel other offers)
Notification Service (notify rider "Driver assigned")
Analytics Service
trip.started
Consumers:
Push Notification/SMS Service (notify rider "Trip started")
Analytics Service (start tracking metrics)
Driver Supply Service (mark driver as on_trip)
trip.location_updated
Consumers:
Rider App (via WebSocket server that subscribes to this, or http poll)
Analytics Service (tracking, heatmaps)
Trip Service itself (async write to trip_locations table)
trip.completed
Consumers:
Payment Service (initiate payment)
Notification Service (send receipt, rating prompt)
Driver Supply Service (mark driver available)
Analytics Service
trip.cancelled:
Consumers:
Payment Service (charge cancellation fee if applicable)
GET /trips/active
Headers:
X-User-ID: usr_456
Response: 200 OK
{
"trip": {
"trip_id": "trp_001",
"status": "IN_PROGRESS",
...
}
}
Response: 404 Not Found (if no active trip)
{
"error": "NO_ACTIVE_TRIP"
}
Accept Offer & Create Trip
Driver clicks "Accept" in app
|
POST /trips/accept {offer_id}
|
Trip Service:
1. Validate offer exists in Redis
2. Check offer not expired
3. Atomic Redis operation (Lua):
- Check driver not locked
- Check offer status = PENDING
- Set offer = ACCEPTED
- Lock driver
4. Insert trip record in Postgres (idempotent)
5. Write to Redis caches (active trip bindings)
6. Publish trip.created event to Kafka
7. Return 201 with trip details
↓
Matchmaking Service consumes trip.created:
- Cancel other pending offers
- Release other driver locks
|
Notification Service:
- Send push to rider "Driver assigned"
- Send SMS with driver details
Request: GET /trips/{trip_id}
Layer 1: Redis (Hot Cache)
├─ Check: trip:active:{city}:{trip_id}
├─ Hit: Return immediately (5ms)
└─ Miss: Go to Layer 2
Layer 2: Postgres (Source of Truth)
├─ Query: SELECT * FROM trips WHERE trip_id = ?
├─ Hit: Return + Write to Redis (50ms)
└─ Miss: 404 Not Found
Cache Invalidation:
├─ On state change: Update Redis + Postgres
├─ On trip end: TTL expires in 1 hour
└─ On trip cancel: Delete from Redis
State Change (e.g., trip started):
1. Acquire lock
2. Write to Postgres (source of truth)
3. Write to Redis (invalidate + update)
4. Publish event to Kafka
5. Release lock
6. Return to client
If Redis write fails:
- Log error
- Continue (Postgres is source of truth)
- Next read will cache miss -> rebuild from Postgres
This allows the system to control how the system rewards good customers and drivers
There could be multiple ways to control this:
Tiered Queues with differing consumption Rate
Better feature controls to improve the search
In this, I am using the second option, its easier and probably cheaper if tuned.
Feature Controls:
Search Radius deltas
More Results in Candidates
Matchmaking must never scan the world.
It must only touch pre-shaped memory that already reflects reality.
Driver Supply maintains:
Spatial locality
Temporal freshness
Driver eligibility
Pre-aggregation
Shard alignment
Matchmaking consumes only the already curated view + some real time data
Partitioning for location data
Region -> City -> Vehicle Type -> H3 Cell -> Drivers
Level
Why
Region
Legal + latency + isolation
City
Surge, policy, ops ownership
Vehicle
Matching correctness
H3 cell
Spatial pruning
Driver
Unit of supply
GEO strategy
Using H3, because its easier and cheaper with a bit of accuracy trade-off
two resolutions, minimum.
Purpose
H3 Resolution
Matching
r8 (~460m)
Pre-aggregation
r7 (~1.2km)
r8 is small enough to avoid false positives
r7 is stable enough to compute trends and load
There are other granual ways to calculate more precise Locations,
to determine things like, which side of the road, more granular locations at airports etc.
OpenStreetMap xml can be converted to Nodes and Edges, and we can triangulate based on,
location data, OSM node values.
Multi-region: region-local writes, failover handling, no cross-region sync on hot path
Compliance: PCI, PII encryption, GDPR/DPDP
Assumptions:
Most rides will be within a 60-80km radius.
Instead of OTP, we use static PINs
Drivers are charged platform fees per ride
Fleet management is not considered, how a driver gets a cab is out of scope.
Discount codes are not considered
Chat can be considered (but real-time is out of scope)
Assign drivers within <1s p95, wut!! (from the scaling req. assuiming) the time to dispatch the "Book" request from the user to the drivers.
We can't really assign a driver within <1s, unless they are self driving cars, in which case, maybe 3s.
The search radius to broadcast the ride booking can also be extended
Assuming 1 minute TTL for a ride request, every 5-15-30s it expands the broadcast radius by 5km.
Build the workflows, think about points of entry, from the UI
Identify entities (stateful nouns)
Data modelling based on Access patterns
Consistency Evaluation and Concurrency support
Scale and Durability
Tech decisions
Observability
All users of the platform has to be logged in from their devices.
All endpoints are authenticated requiring some form of tokens.
Ridees:
Users can search for destination locations (within the max radius, and intercity/interstate travel policies and boundaries)
User sees a list of ride types, with estimated prices or price-ranges
Ride types can be sorted by most availble or cheapest or user preferences. (this can be controlled based on country/cultures)
Pricing can vary based on surge (supply demand, peak hours in specific regions)
Ride requests are sent to drivers in the area nearby
Ride request gets accepted or times-out or user cancels it.
User can look at ETA of the ride to the location
User can also look at ETA to destination while in transit
Users are charged when the ride ends. All bookings are blocked if payment is pending from a previous trip.
User will have payment options, similar to order payment flow, cards, upi and other PSP integrations with callback mechanisms and retries
User can share their trip location updates with Friends/Point of Contacts.
Drivers:
Drivers can go on/off duty
Drivers get booking notifications containing (to-from-estimated fare)
Drivers can look at hotspots within X km radius
Drivers can see ride history and amount received - platform fees
Drivers can be banned based on policies and behaviours.
Others:
Notifications for login's
Send notifications to users on ride booking (Booked, Retry, Ended)
Send notifications for payment completions, payment status
Send notifications to Point of Contacts, periodically sending user location and trip details.
Send notifications to drivers for ride bookings, nearby hotspots, different POS at certain times like movie theaters, stations etc.
Send notifications to users based on travel history.
Alert drivers or users to confirm for safety based on certain factors: (booking in progress too long, route and destination not aligning, other patterns drivers use to game system)
Fair scoring policies, based on identified factors, for drivers.
32G RAM, 1Gbps NIC, 100B messages, assuming a 20% drop from 130K msg/s = 50K msg/s per partition across 6 partitions
Ride Request: 1K RPS, Assuming 3 events per booking, 3K msgs/sec , with 12 partitions , its 250 msgs/sec
Kafka Tail Latency is around 200ms at p95 (on i3en.large 16Gb RAM, 25Gbps, 2v CPU), given leader-relection or broker crashes on High Throughput systems.
API requests Round Trip - 100ms
2 DB Writes - 200ms
Since the application is global and requires no cross-region syncs, per region these
numbers will be less, because of region specific deployments.
Matching Engine:
Caching Opts:
Per driver location update (worst case)
Typical pipeline:
HGET driver:state:{id} last_cell
ZREM driver:geo:{old_cell} (only if cell changed)
ZADD driver:geo:{new_cell}
HSET driver:state:{id}
(Optional) EXPIRE driver:state:{id}
500,000 pings/sec × 4 ops = 2M Ops (Globally)
When split across region this number would come down, depending on partitioning.
Assuming 10 major regions and 10 hotspots: 200K Ops
Next, we examined an apples to apples comparison of latency under the same 850 warehouse workload between CockroachDB 2.0 and CockroachDB 1.1 with a three node (16 vcpus/node) setup:
Metric, CRDB 1.1, CRDB 2.0, % Improvement
Average Latency (p50), 201,67, 67%
95% Latency (p95), 671, 151, 77%
99% Latency (p99), 1,140, 210, 82%
These results were achieved at the same isolation level (i.e., serializable), number of nodes (i.e. 3), number of warehouses (i.e., 850).
For the same load, CockroachDB 2.0 reduces latency by as much as 82% when compared to CockroachDB 1.1. Viewed another way, CockroachDB 2.0 improves response time by 544% (CockroachDB 1.1 p99/CockroachDB 2.0 p99) when compared to 1.1.
Isolation Levels
Most databases present a choice of several transaction isolation levels, offering a tradeoff between correctness and performance. CockroachDB provides strong (“SERIALIZABLE”) isolation by default to ensure that your application always sees the data it expects.
With years of improvement in DB, if SERIALIZABLE is the default, Thank you for your service, RoachDb