- Alphabet:
0-9a-z(36 characters) - Length: 15 characters
- Bits per character: log₂(36) ≈ 5.17 bits
- Total entropy: 77.5 bits
- Total possible IDs: 221,073,919,720,733,357,899,776 (~2.21 × 10²³)
The key metric is total IDs that will ever exist in your system's lifetime:
| Total IDs (lifetime) | Collision Probability | Risk Assessment |
|---|---|---|
| < 10M | < 0.00002% | Safe - collision extremely unlikely |
| 10M - 100M | 0.00002% - 0.002% | Acceptable - 1 in millions chance |
| 100M - 500M | 0.002% - 0.06% | Moderate - collision possible but rare |
| 500M - 1B | 0.06% - 0.23% | Elevated - collision likely in 1/500-1/1000 |
| 1B - 10B | 0.23% - 18% | High risk - collisions will occur |
| > 10B | > 18% | Unacceptable - frequent collisions |
"Moderate-scale apps" = systems generating 10M-100M total IDs over their lifetime.
Examples:
- SaaS with 100K users × 100 records each = 10M IDs ✅
- E-commerce with 10M orders/year × 10 years = 100M IDs
⚠️ (borderline) - Social media with 100M posts/day = 100M in 1 day ❌
Does your system have collision detection/handling?
With uniqueness constraints (DB unique index, etc.):
- Risk tolerance: < 1% collision probability is acceptable
- Your config (77.5 bits): Safe up to ~600M IDs
Without collision handling (blind inserts):
- Risk tolerance: < 0.01% collision probability required
- Your config: Safe up to ~50M IDs
Why this matters: With DB constraints, collisions cause retries (performance hit). Without constraints, collisions cause data corruption.
Birthday paradox intensifies with concentrated generation:
| Scenario | IDs/day | Days to 100M | Risk Assessment |
|---|---|---|---|
| Small app | 1,000 | 274 years | Safe ✅ |
| Growing startup | 100,000 | 2.7 years | Monitor |
| High-volume API | 1,000,000 | 100 days | Risky ❌ |
| Distributed system | 10M+ | 10 days | Dangerous ❌ |
Defensible criterion: If you'll reach 100M IDs within 5 years, reconsider this config.
Multiple servers generating IDs simultaneously amplifies collision risk:
- Birthday problem applies normally
- 77.5 bits safe for ~50M IDs @ <0.01% collision
- Single point of coordination
- Problem: Each node creates independent ID pools that can collide with each other
- Amplification: Collision risk increases due to cross-node collisions
- Conservative adjustment: Reduce safe capacity by factor of √N, OR add log₂(N) bits
Defensible criteria by node count:
| Nodes | Safe ID Count (0.01% risk) | Recommended Length | Total Bits |
|---|---|---|---|
| 1 (centralized) | 50M | 15 chars | 77.5 |
| 2-5 nodes | 25M | 16 chars | 82.7 |
| 5-10 nodes | 15M | 16-17 chars | 82.7-87.9 |
| 10-50 nodes | 10M | 17-18 chars | 87.9-93.1 |
| 50+ nodes | < 5M | 19+ chars | 98.2+ |
Additional distributed concerns:
- Uneven load distribution (hotspot nodes)
- Synchronized bursts (batch jobs)
- Clock skew in time-based generation
- No coordination between nodes
Formula for distributed systems:
bits_needed = 77.5 + log₂(N)
For 10 nodes: 77.5 + 3.3 ≈ 81 bits → 16 characters
For 100 nodes: 77.5 + 6.6 ≈ 84 bits → 17 characters
What happens if collision occurs?
Low consequence (collision acceptable):
- Temporary cart IDs (retry on conflict)
- Analytics event IDs (duplication tolerable)
- Cache keys (overwrite acceptable)
- Risk tolerance: < 1%
Medium consequence (collision causes errors):
- Order IDs (customer confusion)
- Invoice numbers (accounting issues)
- URL slugs (SEO/user experience)
- Risk tolerance: < 0.01%
High consequence (collision causes corruption):
- Financial transaction IDs
- Medical record identifiers
- Authentication tokens
- Risk tolerance: < 0.0001% (use 128+ bits)
NIST/FIPS standards:
- Security-sensitive: minimum 128 bits entropy
- Your 77.5 bits: Not compliant for cryptographic use
PCI-DSS, HIPAA, SOC2:
- Unpredictable identifiers required
- Must use secure variant (not
non-secure) - Minimum 80 bits recommended (you're at 77.5
⚠️ )
Industry standard: < 10⁻⁶ collision probability
For your config (36¹⁵ possibilities):
Probability per ID pair: p = 1 / 36¹⁵ ≈ 4.5 × 10⁻²⁴
Birthday problem formula:
P(collision) ≈ n² / (2 × 36¹⁵)
Safe ID count where P(any collision) < 10⁻⁶:
n ≈ √(2 × 36¹⁵ × 10⁻⁶) ≈ 21 million IDs
Defensible thresholds:
- < 10⁻⁶ risk: Stay under 20M IDs
- < 10⁻⁵ risk: Stay under 47M IDs
- < 10⁻⁴ risk: Stay under 150M IDs
IF centralized AND total_ids < 20M AND has_unique_constraint:
✅ SAFE
ELIF centralized AND total_ids < 50M AND has_unique_constraint:
⚠️ ACCEPTABLE (monitor collision rate)
ELIF distributed AND nodes < 10 AND total_ids < 15M AND has_unique_constraint:
⚠️ ACCEPTABLE (consider adding characters)
ELIF distributed AND nodes >= 10:
❌ INCREASE TO 17+ CHARACTERS
ELIF total_ids < 500M AND has_unique_constraint AND low_consequence:
⚠️ RISKY (plan migration path)
ELIF is_security_sensitive OR no_unique_constraint:
❌ NOT RECOMMENDED
ELIF total_ids > 1B:
❌ UNACCEPTABLE
ELSE:
⚠️ EVALUATE (consider cost of collision vs. migration)
Centralized, low-volume:
- Startup SaaS with 10K users (10M records max over lifetime)
- Internal tool with 50K entities/year
- Blog with 1M posts over 10 years
- E-commerce with < 5M orders lifetime
- Mobile app with offline-first sync (single user device)
Centralized, growing volume:
- Growing platform: 1M users → 100M records
- API serving 10K requests/sec (86M/day - collision risk in months)
- Multi-year project with uncertain growth trajectory
Distributed, low-volume:
- 3-5 microservices generating < 20M total IDs
- Small distributed system with < 10M IDs
High volume:
- Twitter-scale (500M tweets/day)
- Distributed logging system (billions of events)
- Any system expecting > 100M IDs
Distributed systems:
- 10+ nodes without coordination
- Cloud auto-scaling (unknown node count)
- Multi-region deployments
Compliance/security:
- Payment processor (PCI-DSS)
- Session tokens (security requirement: 128+ bits)
- Any HIPAA/PCI regulated identifier
- API keys or authentication tokens
No collision handling:
- Blind inserts without unique constraints
- Append-only systems without validation
- Legacy systems that can't handle retry logic
Maximum safe ID count (for <0.01% collision risk):
max_ids ≈ 0.01 × √(alphabet_size ^ length)
≈ 0.01 × √(36¹⁵)
≈ 47 million IDs
Conservative adjustment (for N nodes):
max_ids_distributed ≈ max_ids_centralized / √N
For 10 nodes: 47M / √10 ≈ 15M IDs
For 100 nodes: 47M / √100 ≈ 4.7M IDs
Or increase length to compensate:
chars_needed = 15 + (log₂(N) / 5.17)
For 10 nodes: 15 + (3.3 / 5.17) ≈ 16 characters
For 100 nodes: 15 + (6.6 / 5.17) ≈ 16-17 characters
Your config (15 chars, 36-char alphabet) is defensibly safe if ALL of:
- Centralized generation (single server/process), OR
- Distributed < 10 nodes AND total IDs < 15M, OR
- Distributed 10+ nodes AND willing to increase to 17+ chars
- DB unique constraints exist (retry on collision)
- Total lifetime IDs < 50M (centralized) OR < 15M (distributed)
- Not security-sensitive (use secure variant if borderline)
- Low-medium consequence of collision (not financial/medical)
- No compliance requirements (PCI/HIPAA/SOC2)
Needs stronger config if ANY of:
- Growth trajectory unclear or aggressive
- Will scale to distributed system
- No unique constraints / can't handle collisions
- Security-sensitive identifiers
- Cost of collision > cost of longer IDs
- Regulatory compliance required
For centralized systems:
// 82.7 bits - safe for 100M IDs
customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 16)
// 93 bits - safe for 500M IDs
customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 18)
// 103 bits - safe for 5B IDs
customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 20)
// 129 bits - UUID-level safety
customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 25)For distributed systems (10+ nodes):
// 87.9 bits - safe for 15M IDs across 10 nodes
customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 17)
// 93 bits - safe for 50M IDs across 10 nodes
customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 18)
// 98.2 bits - safe for 150M IDs across 50 nodes
customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 19)// 89.4 bits - safe for 1B IDs centralized, 100M distributed (10 nodes)
customAlphabet('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz', 15)// 126 bits - industry standard, handles billions of IDs
import { nanoid } from 'nanoid'
nanoid() // 21 chars, URL-safe alphabet (A-Za-z0-9_-)// Add 3 chars for ~15 bits safety margin
customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 18) // 93 bits
// Handles growth: 50M → 500M IDs without migration// With DB unique constraint
try {
await db.insert({ id: nanoid(), ... })
} catch (error) {
if (error.code === 'UNIQUE_VIOLATION') {
// Log collision event for monitoring
logger.warn('ID collision detected', { attempts: 1 })
// Retry with new ID
await db.insert({ id: nanoid(), ... })
}
}// Track collision frequency
const collisionRate = collisions / totalGenerated
// Alert thresholds
if (collisionRate > 0.0001) {
alert('Collision rate exceeds 0.01% - consider longer IDs')
}If approaching limits:
- Add new ID column with longer config
- Dual-write to both columns temporarily
- Backfill old records asynchronously
- Switch reads to new column
- Drop old column after validation
- Nano ID Collision Calculator - Interactive probability calculator
- Birthday Problem (Wikipedia) - Mathematical foundation
- NIST SP 800-90A - Random Number Generation standards
- Nano ID GitHub Repository - Source code and documentation
- UUID Collision Probability - UUID v4 comparison baseline
| Architecture | Safe Threshold | Recommendation |
|---|---|---|
| Centralized, < 50M IDs | ✅ Current config safe | Monitor growth |
| Centralized, 50-500M IDs | Consider 18+ chars | |
| Distributed < 10 nodes, < 15M IDs | Consider 16+ chars | |
| Distributed 10+ nodes | ❌ Insufficient | Use 17-19+ chars |
| Security-sensitive | ❌ Insufficient | Use 21+ chars (126+ bits) |
| Billions of IDs | ❌ Insufficient | Use default nanoid (21 chars) |
Most important factors:
- Total lifetime ID count (not just current)
- Distributed vs. centralized generation
- Unique constraints (collision handling)
- Consequence severity (retry cost vs. data corruption)
- Growth trajectory (can you migrate later if needed?)
Analysis Date: 2025-12-09
Configuration: customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 15)
Verdict: Safe for centralized systems < 50M IDs with DB constraints. Add 1-2 characters per 10x node increase in distributed systems.