HashMap Internals: Deep Dive with Diagrams

Introduction
Internal Structure
Hashing Mechanism
Collision Resolution
Put Operation
Get Operation
Resizing and Rehashing
Performance Analysis

Introduction

HashMap is a hash table-based implementation of the Map interface in Java. It stores key-value pairs and provides constant-time performance O(1) for basic operations (get and put), assuming the hash function disperses elements properly.

Key Characteristics

Allows null: One null key and multiple null values
Not synchronized: Not thread-safe (use ConcurrentHashMap for thread safety)
No ordering: Does not maintain insertion order (use LinkedHashMap)
Initial capacity: Default 16
Load factor: Default 0.75

Internal Structure

HashMap internally uses an array of Node<K,V> objects. Each node represents a key-value pair.

Node Structure

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;      // Cached hash value
    final K key;         // The key
    V value;             // The value
    Node<K,V> next;      // Reference to next node (for collision handling)
}

Internal Array Representation

HashMap Internal Structure:
┌─────────────────────────────────────────┐
│         Node<K,V>[] table               │
├─────────────────────────────────────────┤
│ Index 0  │ → [Node] → null              │
├──────────┼──────────────────────────────┤
│ Index 1  │ → [Node] → [Node] → null     │  (Collision chain)
├──────────┼──────────────────────────────┤
│ Index 2  │ → null                       │
├──────────┼──────────────────────────────┤
│ Index 3  │ → [Node] → null              │
├──────────┼──────────────────────────────┤
│   ...    │   ...                        │
├──────────┼──────────────────────────────┤
│ Index 15 │ → [Node] → [Node] → [Node]   │  (Longer chain)
└──────────┴──────────────────────────────┘

Each Node contains:
┌──────────────────┐
│ hash: int        │
│ key: K           │
│ value: V         │
│ next: Node<K,V>  │
└──────────────────┘

Hashing Mechanism

HashMap uses a two-step hashing process to determine the bucket index.

Step 1: Hash Code Generation

// Get hashCode from the key
int hashCode = key.hashCode();

// Spread bits (Java 8+)
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

Step 2: Index Calculation

// Calculate bucket index
int index = (n - 1) & hash;  // n is the array length (always power of 2)

Visual Representation

Hashing Process:
─────────────────────────────────────────────────────────────

Key: "Apple"
    │
    ▼
┌─────────────────────┐
│  key.hashCode()     │ → hashCode = 63476538
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│  hash(key)          │ → Spread bits to reduce collision
│  h ^ (h >>> 16)     │ → finalHash = 63479898
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│  (n-1) & hash       │ → index = 10 (if n=16, then 15 & hash)
└─────────────────────┘
    │
    ▼
  Place in bucket[10]

Why spread bits?
Original:  1111 0001 1100 0101 0110 1010 (high 16 bits)
           0011 1000 1101 1010 (low 16 bits)
XOR:       ──────────────────────────────
Result:    Better distribution in lower bits

Collision Resolution

When two keys hash to the same index, HashMap uses separate chaining (linked list) and tree-based collision resolution (Java 8+).

Collision Handling Strategy

Collision Resolution Evolution:

Before Java 8:
─────────────────────────────────────────
Bucket[5] → [Node1] → [Node2] → [Node3] → null
            (Linked List only)

Java 8+ (Treeification):
─────────────────────────────────────────
If chain length > 8 AND table size >= 64:
Convert to Red-Black Tree

Bucket[5] → TreeNode (Root)
            ├── Left subtree
            └── Right subtree

Benefits:
- Linked List: O(n) worst case
- Red-Black Tree: O(log n) worst case

Visual: Collision Chain

Example: Keys "cat", "dog", "rat" all hash to index 3

┌────────────────────────────────────────────────────────┐
│ Index 3                                                │
└────────────────────────────────────────────────────────┘
    │
    ▼
┌──────────────────┐      ┌──────────────────┐      ┌──────────────────┐
│ hash: 12345      │      │ hash: 12389      │      │ hash: 12367      │
│ key: "cat"       │ ───► │ key: "dog"       │ ───► │ key: "rat"       │
│ value: 1         │      │ value: 2         │      │ value: 3         │
│ next: ───────────┘      │ next: ───────────┘      │ next: null       │
└──────────────────┘      └──────────────────┘      └──────────────────┘

Put Operation

The put operation inserts or updates a key-value pair.

Algorithm Flow

put(K key, V value) Flow:
──────────────────────────────────────────

1. Calculate hash
   ├── hash = hash(key)
   └── index = (n-1) & hash

2. Check if bucket is empty
   ├── Yes: Create new node → Done
   └── No: Handle collision

3. Collision handling
   ├── Key exists?
   │   ├── Yes: Update value → Return old value
   │   └── No: Continue
   │
   ├── Is TreeNode?
   │   ├── Yes: Insert in tree
   │   └── No: Traverse linked list
   │
   └── Add to end of chain
       └── Check if treeify needed (size > 8)

4. Increment size counter

5. Check if resize needed
   └── if (size > threshold) resize()

Visual: Put Operation

Example: map.put("key1", "value1")

Step 1: Hash Calculation
────────────────────────
hash("key1") → 1234
index = 1234 & 15 = 10

Step 2: Check Bucket[10]
────────────────────────
┌─────────────────┐
│ Bucket[10]: null │  ← Empty
└─────────────────┘

Step 3: Insert New Node
────────────────────────
┌─────────────────┐
│ Bucket[10]:     │
└─────────────────┘
       │
       ▼
┌──────────────────┐
│ hash: 1234       │
│ key: "key1"      │
│ value: "value1"  │
│ next: null       │
└──────────────────┘

Pseudocode:
───────────
if (bucket[index] == null) {
    bucket[index] = new Node(hash, key, value, null);
} else {
    // Handle collision (traverse chain or tree)
    for (Node e = bucket[index]; e != null; e = e.next) {
        if (e.hash == hash && e.key.equals(key)) {
            e.value = value; // Update
            return;
        }
    }
    // Add to end of chain
    addToChain(bucket[index], new Node(hash, key, value));
}

Get Operation

Retrieves the value associated with a key.

Algorithm Flow

get(K key) Flow:
──────────────────────────────────────────

1. Calculate hash
   ├── hash = hash(key)
   └── index = (n-1) & hash

2. Get first node at bucket[index]
   └── if (first == null) return null

3. Check first node
   └── if (first.key.equals(key)) return first.value

4. Traverse chain/tree
   ├── Is TreeNode?
   │   └── Search in Red-Black Tree: O(log n)
   │
   └── Is LinkedList?
       └── Linear search: O(n)
           for (Node e = first.next; e != null; e = e.next)
               if (e.key.equals(key)) return e.value

5. Not found
   └── return null

Visual: Get Operation

Example: map.get("key2")

Step 1: Calculate Index
────────────────────────
hash("key2") → 5678
index = 5678 & 15 = 6

Step 2: Access Bucket[6]
────────────────────────
┌─────────────────┐
│ Bucket[6]:      │
└─────────────────┘
       │
       ▼
┌──────────────────┐      ┌──────────────────┐      ┌──────────────────┐
│ hash: 5600       │      │ hash: 5678       │      │ hash: 5690       │
│ key: "key1"      │ ───► │ key: "key2"      │ ───► │ key: "key3"      │
│ value: "val1"    │      │ value: "val2"    │  ✓   │ value: "val3"    │
│ next: ───────────┘      │ next: ───────────┘      │ next: null       │
└──────────────────┘      └──────────────────┘      └──────────────────┘
                                   │
                                   └── Found! Return "val2"

Step 3: Compare Keys
────────────────────────
1. Check node1: "key1" != "key2" → Continue
2. Check node2: "key2" == "key2" → Match! Return "val2"

Resizing and Rehashing

When HashMap reaches its threshold (capacity × load factor), it doubles in size.

Resize Process

Resize Trigger:
───────────────────────────────────────────
if (size > capacity * loadFactor) {
    resize();
}

Default: capacity = 16, loadFactor = 0.75
Threshold = 16 * 0.75 = 12
When 13th element added → resize to 32

Visual: Resize Operation

Before Resize (capacity = 4, size = 3):
────────────────────────────────────────
┌─────────┬───────────────────────┐
│ Index 0 │ → [A] → null          │
├─────────┼───────────────────────┤
│ Index 1 │ → null                │
├─────────┼───────────────────────┤
│ Index 2 │ → [B] → [C] → null    │
├─────────┼───────────────────────┤
│ Index 3 │ → null                │
└─────────┴───────────────────────┘

After Resize (capacity = 8):
────────────────────────────────────────
┌─────────┬───────────────────────┐
│ Index 0 │ → [A] → null          │  (stayed)
├─────────┼───────────────────────┤
│ Index 1 │ → null                │
├─────────┼───────────────────────┤
│ Index 2 │ → [B] → null          │  (stayed)
├─────────┼───────────────────────┤
│ Index 3 │ → null                │
├─────────┼───────────────────────┤
│ Index 4 │ → null                │
├─────────┼───────────────────────┤
│ Index 5 │ → null                │
├─────────┼───────────────────────┤
│ Index 6 │ → [C] → null          │  (moved)
├─────────┼───────────────────────┤
│ Index 7 │ → null                │
└─────────┴───────────────────────┘

Rehashing Logic:
────────────────────────────────────────
Old capacity: n
New capacity: 2n

For each node:
  newIndex = hash & (2n - 1)
  
Because n is power of 2:
  Node either stays at same index
  OR moves to (oldIndex + oldCapacity)

Optimized Resize (Java 8+)

Bit Optimization:
─────────────────────────────────────────

Old capacity = 16 (10000 in binary)
New capacity = 32 (100000 in binary)

For a hash value:
  Old index: hash & 15  (0b01111)
  New index: hash & 31  (0b11111)
                           ▲
                           Extra bit

If extra bit is 0: Stay at old index
If extra bit is 1: Move to old index + 16

Example:
  hash = 0b10110101 (181)
  Old: 181 & 15 = 5
  New: 181 & 31 = 21 (5 + 16)
  
  hash = 0b10100101 (165)
  Old: 165 & 15 = 5
  New: 165 & 31 = 5 (stays)

Performance Analysis

Time Complexity

Operation	Average Case	Worst Case (before Java 8)	Worst Case (Java 8+)
get()	O(1)	O(n)	O(log n)
put()	O(1)	O(n)	O(log n)
remove()	O(1)	O(n)	O(log n)
containsKey()	O(1)	O(n)	O(log n)

Space Complexity

Space = O(n) where n is number of entries

Actual space = capacity * sizeof(Node)
Minimum capacity ≥ n / loadFactor

Example:
  1000 entries, loadFactor = 0.75
  Required capacity ≥ 1000 / 0.75 ≈ 1334
  Actual capacity = 2048 (next power of 2)

Load Factor Impact

Load Factor vs Performance:
────────────────────────────────────────

Load Factor = 0.5
  ├── More memory usage
  ├── Fewer collisions
  └── Faster operations

Load Factor = 0.75 (Default)
  ├── Balanced trade-off
  └── Good for most cases

Load Factor = 1.0
  ├── Less memory usage
  ├── More collisions
  └── Slower operations

Collision probability increases exponentially
as load factor approaches 1.0

Performance Visualization

Collision Chain Distribution (Good Hash):
──────────────────────────────────────────
Bucket 0:  ████ (4 items)
Bucket 1:  ███ (3 items)
Bucket 2:  █████ (5 items)
Bucket 3:  ██ (2 items)
...
Average chain length ≈ loadFactor

Collision Chain Distribution (Bad Hash):
──────────────────────────────────────────
Bucket 0:  
Bucket 1:  ████████████████████ (20 items)
Bucket 2:  
Bucket 3:  ███████████████████████ (23 items)
...
Performance degrades significantly

Key Takeaways

Design Decisions

Power of 2 Capacity: Enables fast bitwise operations for index calculation
Separate Chaining: Simple and effective collision resolution
Treeification: Prevents worst-case O(n) performance in Java 8+
Load Factor 0.75: Optimal balance between time and space

Best Practices

// 1. Initialize with expected size
Map<String, Integer> map = new HashMap<>(expectedSize);

// 2. Override hashCode() and equals() properly
@Override
public int hashCode() {
    return Objects.hash(field1, field2);
}

@Override
public boolean equals(Object obj) {
    // Implement proper equality check
}

// 3. Use immutable keys
final class Key {
    private final String id;
    private final int code;
    // ... immutable implementation
}

// 4. Avoid excessive resizing
int initialCapacity = (int) (expectedSize / 0.75f) + 1;
Map<K, V> map = new HashMap<>(initialCapacity);

Common Pitfalls

❌ Mutable keys
   └── hashCode changes → can't find entry

❌ Poor hashCode implementation
   └── All keys hash to same bucket → O(n) performance

❌ Not overriding equals with hashCode
   └── Violates contract → unpredictable behavior

❌ Using default capacity for large datasets
   └── Multiple resize operations → performance hit

✓ Use immutable keys
✓ Implement good hash function
✓ Override both hashCode and equals
✓ Size appropriately from start

Conclusion

HashMap is a highly optimized data structure that balances performance and memory usage. Understanding its internals helps in writing efficient code and debugging performance issues. The evolution from linked lists to red-black trees in Java 8 showcases continuous optimization for real-world scenarios.

Key Points:

Uses array + linked list/tree hybrid structure
O(1) average case performance for basic operations
Automatic resizing maintains performance as size grows
Proper hash function and key design are critical for performance

carefree-ladka/HashMap Internals: Deep Dive with Diagrams.mdx

Select an option

No results found

Select an option

No results found

HashMap Internals: Deep Dive with Diagrams

Table of Contents

Introduction

Key Characteristics

Internal Structure

Node Structure

Internal Array Representation

Hashing Mechanism

Step 1: Hash Code Generation

Step 2: Index Calculation

Visual Representation

Collision Resolution

Collision Handling Strategy

Visual: Collision Chain

Put Operation

Algorithm Flow

Visual: Put Operation

Get Operation

Algorithm Flow

Visual: Get Operation

Resizing and Rehashing

Resize Process

Visual: Resize Operation

Optimized Resize (Java 8+)

Performance Analysis

Time Complexity

Space Complexity

Load Factor Impact

Performance Visualization

Key Takeaways

Design Decisions

Best Practices

Common Pitfalls

Conclusion