Collision Risk Analysis: `customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 15)`

Configuration Summary

Alphabet: 0-9a-z (36 characters)
Length: 15 characters
Bits per character: log₂(36) ≈ 5.17 bits
Total entropy: 77.5 bits
Total possible IDs: 221,073,919,720,733,357,899,776 (~2.21 × 10²³)

Defensible Risk Assessment Framework

1. Volume-Based Thresholds

The key metric is total IDs that will ever exist in your system's lifetime:

Total IDs (lifetime)	Collision Probability	Risk Assessment
< 10M	< 0.00002%	Safe - collision extremely unlikely
10M - 100M	0.00002% - 0.002%	Acceptable - 1 in millions chance
100M - 500M	0.002% - 0.06%	Moderate - collision possible but rare
500M - 1B	0.06% - 0.23%	Elevated - collision likely in 1/500-1/1000
1B - 10B	0.23% - 18%	High risk - collisions will occur
> 10B	> 18%	Unacceptable - frequent collisions

"Moderate-scale apps" = systems generating 10M-100M total IDs over their lifetime.

Examples:

SaaS with 100K users × 100 records each = 10M IDs ✅
E-commerce with 10M orders/year × 10 years = 100M IDs ⚠️ (borderline)
Social media with 100M posts/day = 100M in 1 day ❌

2. Collision Handling Capability

Does your system have collision detection/handling?

With uniqueness constraints (DB unique index, etc.):

Risk tolerance: < 1% collision probability is acceptable
Your config (77.5 bits): Safe up to ~600M IDs

Without collision handling (blind inserts):

Risk tolerance: < 0.01% collision probability required
Your config: Safe up to ~50M IDs

Why this matters: With DB constraints, collisions cause retries (performance hit). Without constraints, collisions cause data corruption.

3. Generation Rate & Time Window

Birthday paradox intensifies with concentrated generation:

Scenario	IDs/day	Days to 100M	Risk Assessment
Small app	1,000	274 years	Safe ✅
Growing startup	100,000	2.7 years	Monitor ⚠️
High-volume API	1,000,000	100 days	Risky ❌
Distributed system	10M+	10 days	Dangerous ❌

Defensible criterion: If you'll reach 100M IDs within 5 years, reconsider this config.

4. Distributed Generation Risk

Multiple servers generating IDs simultaneously amplifies collision risk:

Centralized Generation

Birthday problem applies normally
77.5 bits safe for ~50M IDs @ <0.01% collision
Single point of coordination

Distributed Generation (N independent nodes)

Problem: Each node creates independent ID pools that can collide with each other
Amplification: Collision risk increases due to cross-node collisions
Conservative adjustment: Reduce safe capacity by factor of √N, OR add log₂(N) bits

Defensible criteria by node count:

Nodes	Safe ID Count (0.01% risk)	Recommended Length	Total Bits
1 (centralized)	50M	15 chars	77.5
2-5 nodes	25M	16 chars	82.7
5-10 nodes	15M	16-17 chars	82.7-87.9
10-50 nodes	10M	17-18 chars	87.9-93.1
50+ nodes	< 5M	19+ chars	98.2+

Additional distributed concerns:

Uneven load distribution (hotspot nodes)
Synchronized bursts (batch jobs)
Clock skew in time-based generation
No coordination between nodes

Formula for distributed systems:

bits_needed = 77.5 + log₂(N)

For 10 nodes: 77.5 + 3.3 ≈ 81 bits → 16 characters
For 100 nodes: 77.5 + 6.6 ≈ 84 bits → 17 characters

5. Consequence Severity

What happens if collision occurs?

Low consequence (collision acceptable):

Temporary cart IDs (retry on conflict)
Analytics event IDs (duplication tolerable)
Cache keys (overwrite acceptable)
Risk tolerance: < 1%

Medium consequence (collision causes errors):

Order IDs (customer confusion)
Invoice numbers (accounting issues)
URL slugs (SEO/user experience)
Risk tolerance: < 0.01%

High consequence (collision causes corruption):

Financial transaction IDs
Medical record identifiers
Authentication tokens
Risk tolerance: < 0.0001% (use 128+ bits)

6. Regulatory/Compliance Requirements

NIST/FIPS standards:

Security-sensitive: minimum 128 bits entropy
Your 77.5 bits: Not compliant for cryptographic use

PCI-DSS, HIPAA, SOC2:

Unpredictable identifiers required
Must use secure variant (not non-secure)
Minimum 80 bits recommended (you're at 77.5 ⚠️)

7. Mathematical Safety Threshold

Industry standard: < 10⁻⁶ collision probability

For your config (36¹⁵ possibilities):

Probability per ID pair: p = 1 / 36¹⁵ ≈ 4.5 × 10⁻²⁴

Birthday problem formula:
P(collision) ≈ n² / (2 × 36¹⁵)

Safe ID count where P(any collision) < 10⁻⁶:
n ≈ √(2 × 36¹⁵ × 10⁻⁶) ≈ 21 million IDs

Defensible thresholds:

< 10⁻⁶ risk: Stay under 20M IDs
< 10⁻⁵ risk: Stay under 47M IDs
< 10⁻⁴ risk: Stay under 150M IDs

Concrete Decision Matrix

IF centralized AND total_ids < 20M AND has_unique_constraint:
    ✅ SAFE

ELIF centralized AND total_ids < 50M AND has_unique_constraint:
    ⚠️ ACCEPTABLE (monitor collision rate)

ELIF distributed AND nodes < 10 AND total_ids < 15M AND has_unique_constraint:
    ⚠️ ACCEPTABLE (consider adding characters)

ELIF distributed AND nodes >= 10:
    ❌ INCREASE TO 17+ CHARACTERS

ELIF total_ids < 500M AND has_unique_constraint AND low_consequence:
    ⚠️ RISKY (plan migration path)

ELIF is_security_sensitive OR no_unique_constraint:
    ❌ NOT RECOMMENDED

ELIF total_ids > 1B:
    ❌ UNACCEPTABLE

ELSE:
    ⚠️ EVALUATE (consider cost of collision vs. migration)

Specific Examples

✅ Safe Use Cases

Centralized, low-volume:

Startup SaaS with 10K users (10M records max over lifetime)
Internal tool with 50K entities/year
Blog with 1M posts over 10 years
E-commerce with < 5M orders lifetime
Mobile app with offline-first sync (single user device)

⚠️ Borderline (Monitor Carefully)

Centralized, growing volume:

Growing platform: 1M users → 100M records
API serving 10K requests/sec (86M/day - collision risk in months)
Multi-year project with uncertain growth trajectory

Distributed, low-volume:

3-5 microservices generating < 20M total IDs
Small distributed system with < 10M IDs

❌ Not Recommended

High volume:

Twitter-scale (500M tweets/day)
Distributed logging system (billions of events)
Any system expecting > 100M IDs

Distributed systems:

10+ nodes without coordination
Cloud auto-scaling (unknown node count)
Multi-region deployments

Compliance/security:

Payment processor (PCI-DSS)
Session tokens (security requirement: 128+ bits)
Any HIPAA/PCI regulated identifier
API keys or authentication tokens

No collision handling:

Blind inserts without unique constraints
Append-only systems without validation
Legacy systems that can't handle retry logic

Bottom Line Formulas

Centralized Generation

Maximum safe ID count (for <0.01% collision risk):

max_ids ≈ 0.01 × √(alphabet_size ^ length)
       ≈ 0.01 × √(36¹⁵)
       ≈ 47 million IDs

Distributed Generation

Conservative adjustment (for N nodes):

max_ids_distributed ≈ max_ids_centralized / √N

For 10 nodes: 47M / √10 ≈ 15M IDs
For 100 nodes: 47M / √100 ≈ 4.7M IDs

Or increase length to compensate:

chars_needed = 15 + (log₂(N) / 5.17)

For 10 nodes: 15 + (3.3 / 5.17) ≈ 16 characters
For 100 nodes: 15 + (6.6 / 5.17) ≈ 16-17 characters

Safety Checklist

Your config (15 chars, 36-char alphabet) is defensibly safe if ALL of:

Centralized generation (single server/process), OR
Distributed < 10 nodes AND total IDs < 15M, OR
Distributed 10+ nodes AND willing to increase to 17+ chars
DB unique constraints exist (retry on collision)
Total lifetime IDs < 50M (centralized) OR < 15M (distributed)
Not security-sensitive (use secure variant if borderline)
Low-medium consequence of collision (not financial/medical)
No compliance requirements (PCI/HIPAA/SOC2)

Needs stronger config if ANY of:

Growth trajectory unclear or aggressive
Will scale to distributed system
No unique constraints / can't handle collisions
Security-sensitive identifiers
Cost of collision > cost of longer IDs
Regulatory compliance required

Configuration Alternatives

Option 1: Increase Length (keep alphabet)

For centralized systems:

// 82.7 bits - safe for 100M IDs
customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 16)

// 93 bits - safe for 500M IDs
customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 18)

// 103 bits - safe for 5B IDs
customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 20)

// 129 bits - UUID-level safety
customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 25)

For distributed systems (10+ nodes):

// 87.9 bits - safe for 15M IDs across 10 nodes
customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 17)

// 93 bits - safe for 50M IDs across 10 nodes
customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 18)

// 98.2 bits - safe for 150M IDs across 50 nodes
customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 19)

Option 2: Add Uppercase (keep length)

// 89.4 bits - safe for 1B IDs centralized, 100M distributed (10 nodes)
customAlphabet('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz', 15)

Option 3: Default Nanoid (recommended)

// 126 bits - industry standard, handles billions of IDs
import { nanoid } from 'nanoid'
nanoid() // 21 chars, URL-safe alphabet (A-Za-z0-9_-)

Option 4: Custom with Safety Margin

// Add 3 chars for ~15 bits safety margin
customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 18) // 93 bits

// Handles growth: 50M → 500M IDs without migration

Monitoring & Mitigation

Detect Collisions

// With DB unique constraint
try {
  await db.insert({ id: nanoid(), ... })
} catch (error) {
  if (error.code === 'UNIQUE_VIOLATION') {
    // Log collision event for monitoring
    logger.warn('ID collision detected', { attempts: 1 })
    // Retry with new ID
    await db.insert({ id: nanoid(), ... })
  }
}

Monitor Collision Rate

// Track collision frequency
const collisionRate = collisions / totalGenerated

// Alert thresholds
if (collisionRate > 0.0001) {
  alert('Collision rate exceeds 0.01% - consider longer IDs')
}

Migration Strategy

If approaching limits:

Add new ID column with longer config
Dual-write to both columns temporarily
Backfill old records asynchronously
Switch reads to new column
Drop old column after validation

References

Nano ID Collision Calculator - Interactive probability calculator
Birthday Problem (Wikipedia) - Mathematical foundation
NIST SP 800-90A - Random Number Generation standards
Nano ID GitHub Repository - Source code and documentation
UUID Collision Probability - UUID v4 comparison baseline

Summary

Architecture	Safe Threshold	Recommendation
Centralized, < 50M IDs	✅ Current config safe	Monitor growth
Centralized, 50-500M IDs	⚠️ Acceptable with constraints	Consider 18+ chars
Distributed < 10 nodes, < 15M IDs	⚠️ Acceptable	Consider 16+ chars
Distributed 10+ nodes	❌ Insufficient	Use 17-19+ chars
Security-sensitive	❌ Insufficient	Use 21+ chars (126+ bits)
Billions of IDs	❌ Insufficient	Use default nanoid (21 chars)

Most important factors:

Total lifetime ID count (not just current)
Distributed vs. centralized generation
Unique constraints (collision handling)
Consequence severity (retry cost vs. data corruption)
Growth trajectory (can you migrate later if needed?)

Analysis Date: 2025-12-09
Configuration: customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 15)
Verdict: Safe for centralized systems < 50M IDs with DB constraints. Add 1-2 characters per 10x node increase in distributed systems.

loganlinn/nanoid-collision-analysis.md

Select an option

No results found

Select an option

No results found

Collision Risk Analysis: `customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 15)`

Configuration Summary

Defensible Risk Assessment Framework

1. Volume-Based Thresholds

2. Collision Handling Capability

3. Generation Rate & Time Window

4. Distributed Generation Risk

Centralized Generation

Distributed Generation (N independent nodes)

5. Consequence Severity

6. Regulatory/Compliance Requirements

7. Mathematical Safety Threshold

Concrete Decision Matrix

Specific Examples

✅ Safe Use Cases

⚠️ Borderline (Monitor Carefully)

❌ Not Recommended

Bottom Line Formulas

Centralized Generation

Distributed Generation

Safety Checklist

Configuration Alternatives

Option 1: Increase Length (keep alphabet)

Option 2: Add Uppercase (keep length)

Option 3: Default Nanoid (recommended)

Option 4: Custom with Safety Margin

Monitoring & Mitigation

Detect Collisions

Monitor Collision Rate

Migration Strategy

References

Summary

loganlinn/nanoid-collision-analysis.md

Collision Risk Analysis: customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 15)

Configuration Summary

Defensible Risk Assessment Framework

1. Volume-Based Thresholds

2. Collision Handling Capability

3. Generation Rate & Time Window

4. Distributed Generation Risk

Centralized Generation

Distributed Generation (N independent nodes)

5. Consequence Severity

6. Regulatory/Compliance Requirements

7. Mathematical Safety Threshold

Concrete Decision Matrix

Specific Examples

✅ Safe Use Cases

⚠️ Borderline (Monitor Carefully)

❌ Not Recommended

Bottom Line Formulas

Centralized Generation

Distributed Generation

Safety Checklist

Configuration Alternatives

Option 1: Increase Length (keep alphabet)

Option 2: Add Uppercase (keep length)

Option 3: Default Nanoid (recommended)

Option 4: Custom with Safety Margin

Monitoring & Mitigation

Detect Collisions

Monitor Collision Rate

Migration Strategy

References

Summary

Collision Risk Analysis: `customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 15)`