Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save carefree-ladka/0ec7288a2641f1fb6837ddd3eeaf204b to your computer and use it in GitHub Desktop.

Select an option

Save carefree-ladka/0ec7288a2641f1fb6837ddd3eeaf204b to your computer and use it in GitHub Desktop.
HashMap Internals: Deep Dive with Diagrams

HashMap Internals: Deep Dive with Diagrams

Table of Contents

  1. Introduction
  2. Internal Structure
  3. Hashing Mechanism
  4. Collision Resolution
  5. Put Operation
  6. Get Operation
  7. Resizing and Rehashing
  8. Performance Analysis

Introduction

HashMap is a hash table-based implementation of the Map interface in Java. It stores key-value pairs and provides constant-time performance O(1) for basic operations (get and put), assuming the hash function disperses elements properly.

Key Characteristics

  • Allows null: One null key and multiple null values
  • Not synchronized: Not thread-safe (use ConcurrentHashMap for thread safety)
  • No ordering: Does not maintain insertion order (use LinkedHashMap)
  • Initial capacity: Default 16
  • Load factor: Default 0.75

Internal Structure

HashMap internally uses an array of Node<K,V> objects. Each node represents a key-value pair.

Node Structure

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;      // Cached hash value
    final K key;         // The key
    V value;             // The value
    Node<K,V> next;      // Reference to next node (for collision handling)
}

Internal Array Representation

HashMap Internal Structure:
┌─────────────────────────────────────────┐
│         Node<K,V>[] table               │
├─────────────────────────────────────────┤
│ Index 0  │ → [Node] → null              │
├──────────┼──────────────────────────────┤
│ Index 1  │ → [Node] → [Node] → null     │  (Collision chain)
├──────────┼──────────────────────────────┤
│ Index 2  │ → null                       │
├──────────┼──────────────────────────────┤
│ Index 3  │ → [Node] → null              │
├──────────┼──────────────────────────────┤
│   ...    │   ...                        │
├──────────┼──────────────────────────────┤
│ Index 15 │ → [Node] → [Node] → [Node]   │  (Longer chain)
└──────────┴──────────────────────────────┘

Each Node contains:
┌──────────────────┐
│ hash: int        │
│ key: K           │
│ value: V         │
│ next: Node<K,V>  │
└──────────────────┘

Hashing Mechanism

HashMap uses a two-step hashing process to determine the bucket index.

Step 1: Hash Code Generation

// Get hashCode from the key
int hashCode = key.hashCode();

// Spread bits (Java 8+)
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

Step 2: Index Calculation

// Calculate bucket index
int index = (n - 1) & hash;  // n is the array length (always power of 2)

Visual Representation

Hashing Process:
─────────────────────────────────────────────────────────────

Key: "Apple"
    │
    ▼
┌─────────────────────┐
│  key.hashCode()     │ → hashCode = 63476538
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│  hash(key)          │ → Spread bits to reduce collision
│  h ^ (h >>> 16)     │ → finalHash = 63479898
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│  (n-1) & hash       │ → index = 10 (if n=16, then 15 & hash)
└─────────────────────┘
    │
    ▼
  Place in bucket[10]

Why spread bits?
Original:  1111 0001 1100 0101 0110 1010 (high 16 bits)
           0011 1000 1101 1010 (low 16 bits)
XOR:       ──────────────────────────────
Result:    Better distribution in lower bits

Collision Resolution

When two keys hash to the same index, HashMap uses separate chaining (linked list) and tree-based collision resolution (Java 8+).

Collision Handling Strategy

Collision Resolution Evolution:

Before Java 8:
─────────────────────────────────────────
Bucket[5] → [Node1] → [Node2] → [Node3] → null
            (Linked List only)

Java 8+ (Treeification):
─────────────────────────────────────────
If chain length > 8 AND table size >= 64:
Convert to Red-Black Tree

Bucket[5] → TreeNode (Root)
            ├── Left subtree
            └── Right subtree

Benefits:
- Linked List: O(n) worst case
- Red-Black Tree: O(log n) worst case

Visual: Collision Chain

Example: Keys "cat", "dog", "rat" all hash to index 3

┌────────────────────────────────────────────────────────┐
│ Index 3                                                │
└────────────────────────────────────────────────────────┘
    │
    ▼
┌──────────────────┐      ┌──────────────────┐      ┌──────────────────┐
│ hash: 12345      │      │ hash: 12389      │      │ hash: 12367      │
│ key: "cat"       │ ───► │ key: "dog"       │ ───► │ key: "rat"       │
│ value: 1         │      │ value: 2         │      │ value: 3         │
│ next: ───────────┘      │ next: ───────────┘      │ next: null       │
└──────────────────┘      └──────────────────┘      └──────────────────┘

Put Operation

The put operation inserts or updates a key-value pair.

Algorithm Flow

put(K key, V value) Flow:
──────────────────────────────────────────

1. Calculate hash
   ├── hash = hash(key)
   └── index = (n-1) & hash

2. Check if bucket is empty
   ├── Yes: Create new node → Done
   └── No: Handle collision

3. Collision handling
   ├── Key exists?
   │   ├── Yes: Update value → Return old value
   │   └── No: Continue
   │
   ├── Is TreeNode?
   │   ├── Yes: Insert in tree
   │   └── No: Traverse linked list
   │
   └── Add to end of chain
       └── Check if treeify needed (size > 8)

4. Increment size counter

5. Check if resize needed
   └── if (size > threshold) resize()

Visual: Put Operation

Example: map.put("key1", "value1")

Step 1: Hash Calculation
────────────────────────
hash("key1") → 1234
index = 1234 & 15 = 10

Step 2: Check Bucket[10]
────────────────────────
┌─────────────────┐
│ Bucket[10]: null │  ← Empty
└─────────────────┘

Step 3: Insert New Node
────────────────────────
┌─────────────────┐
│ Bucket[10]:     │
└─────────────────┘
       │
       ▼
┌──────────────────┐
│ hash: 1234       │
│ key: "key1"      │
│ value: "value1"  │
│ next: null       │
└──────────────────┘

Pseudocode:
───────────
if (bucket[index] == null) {
    bucket[index] = new Node(hash, key, value, null);
} else {
    // Handle collision (traverse chain or tree)
    for (Node e = bucket[index]; e != null; e = e.next) {
        if (e.hash == hash && e.key.equals(key)) {
            e.value = value; // Update
            return;
        }
    }
    // Add to end of chain
    addToChain(bucket[index], new Node(hash, key, value));
}

Get Operation

Retrieves the value associated with a key.

Algorithm Flow

get(K key) Flow:
──────────────────────────────────────────

1. Calculate hash
   ├── hash = hash(key)
   └── index = (n-1) & hash

2. Get first node at bucket[index]
   └── if (first == null) return null

3. Check first node
   └── if (first.key.equals(key)) return first.value

4. Traverse chain/tree
   ├── Is TreeNode?
   │   └── Search in Red-Black Tree: O(log n)
   │
   └── Is LinkedList?
       └── Linear search: O(n)
           for (Node e = first.next; e != null; e = e.next)
               if (e.key.equals(key)) return e.value

5. Not found
   └── return null

Visual: Get Operation

Example: map.get("key2")

Step 1: Calculate Index
────────────────────────
hash("key2") → 5678
index = 5678 & 15 = 6

Step 2: Access Bucket[6]
────────────────────────
┌─────────────────┐
│ Bucket[6]:      │
└─────────────────┘
       │
       ▼
┌──────────────────┐      ┌──────────────────┐      ┌──────────────────┐
│ hash: 5600       │      │ hash: 5678       │      │ hash: 5690       │
│ key: "key1"      │ ───► │ key: "key2"      │ ───► │ key: "key3"      │
│ value: "val1"    │      │ value: "val2"    │  ✓   │ value: "val3"    │
│ next: ───────────┘      │ next: ───────────┘      │ next: null       │
└──────────────────┘      └──────────────────┘      └──────────────────┘
                                   │
                                   └── Found! Return "val2"

Step 3: Compare Keys
────────────────────────
1. Check node1: "key1" != "key2" → Continue
2. Check node2: "key2" == "key2" → Match! Return "val2"

Resizing and Rehashing

When HashMap reaches its threshold (capacity × load factor), it doubles in size.

Resize Process

Resize Trigger:
───────────────────────────────────────────
if (size > capacity * loadFactor) {
    resize();
}

Default: capacity = 16, loadFactor = 0.75
Threshold = 16 * 0.75 = 12
When 13th element added → resize to 32

Visual: Resize Operation

Before Resize (capacity = 4, size = 3):
────────────────────────────────────────
┌─────────┬───────────────────────┐
│ Index 0 │ → [A] → null          │
├─────────┼───────────────────────┤
│ Index 1 │ → null                │
├─────────┼───────────────────────┤
│ Index 2 │ → [B] → [C] → null    │
├─────────┼───────────────────────┤
│ Index 3 │ → null                │
└─────────┴───────────────────────┘

After Resize (capacity = 8):
────────────────────────────────────────
┌─────────┬───────────────────────┐
│ Index 0 │ → [A] → null          │  (stayed)
├─────────┼───────────────────────┤
│ Index 1 │ → null                │
├─────────┼───────────────────────┤
│ Index 2 │ → [B] → null          │  (stayed)
├─────────┼───────────────────────┤
│ Index 3 │ → null                │
├─────────┼───────────────────────┤
│ Index 4 │ → null                │
├─────────┼───────────────────────┤
│ Index 5 │ → null                │
├─────────┼───────────────────────┤
│ Index 6 │ → [C] → null          │  (moved)
├─────────┼───────────────────────┤
│ Index 7 │ → null                │
└─────────┴───────────────────────┘

Rehashing Logic:
────────────────────────────────────────
Old capacity: n
New capacity: 2n

For each node:
  newIndex = hash & (2n - 1)
  
Because n is power of 2:
  Node either stays at same index
  OR moves to (oldIndex + oldCapacity)

Optimized Resize (Java 8+)

Bit Optimization:
─────────────────────────────────────────

Old capacity = 16 (10000 in binary)
New capacity = 32 (100000 in binary)

For a hash value:
  Old index: hash & 15  (0b01111)
  New index: hash & 31  (0b11111)
                           ▲
                           Extra bit

If extra bit is 0: Stay at old index
If extra bit is 1: Move to old index + 16

Example:
  hash = 0b10110101 (181)
  Old: 181 & 15 = 5
  New: 181 & 31 = 21 (5 + 16)
  
  hash = 0b10100101 (165)
  Old: 165 & 15 = 5
  New: 165 & 31 = 5 (stays)

Performance Analysis

Time Complexity

Operation Average Case Worst Case (before Java 8) Worst Case (Java 8+)
get() O(1) O(n) O(log n)
put() O(1) O(n) O(log n)
remove() O(1) O(n) O(log n)
containsKey() O(1) O(n) O(log n)

Space Complexity

Space = O(n) where n is number of entries

Actual space = capacity * sizeof(Node)
Minimum capacity ≥ n / loadFactor

Example:
  1000 entries, loadFactor = 0.75
  Required capacity ≥ 1000 / 0.75 ≈ 1334
  Actual capacity = 2048 (next power of 2)

Load Factor Impact

Load Factor vs Performance:
────────────────────────────────────────

Load Factor = 0.5
  ├── More memory usage
  ├── Fewer collisions
  └── Faster operations

Load Factor = 0.75 (Default)
  ├── Balanced trade-off
  └── Good for most cases

Load Factor = 1.0
  ├── Less memory usage
  ├── More collisions
  └── Slower operations

Collision probability increases exponentially
as load factor approaches 1.0

Performance Visualization

Collision Chain Distribution (Good Hash):
──────────────────────────────────────────
Bucket 0:  ████ (4 items)
Bucket 1:  ███ (3 items)
Bucket 2:  █████ (5 items)
Bucket 3:  ██ (2 items)
...
Average chain length ≈ loadFactor

Collision Chain Distribution (Bad Hash):
──────────────────────────────────────────
Bucket 0:  
Bucket 1:  ████████████████████ (20 items)
Bucket 2:  
Bucket 3:  ███████████████████████ (23 items)
...
Performance degrades significantly

Key Takeaways

Design Decisions

  1. Power of 2 Capacity: Enables fast bitwise operations for index calculation
  2. Separate Chaining: Simple and effective collision resolution
  3. Treeification: Prevents worst-case O(n) performance in Java 8+
  4. Load Factor 0.75: Optimal balance between time and space

Best Practices

// 1. Initialize with expected size
Map<String, Integer> map = new HashMap<>(expectedSize);

// 2. Override hashCode() and equals() properly
@Override
public int hashCode() {
    return Objects.hash(field1, field2);
}

@Override
public boolean equals(Object obj) {
    // Implement proper equality check
}

// 3. Use immutable keys
final class Key {
    private final String id;
    private final int code;
    // ... immutable implementation
}

// 4. Avoid excessive resizing
int initialCapacity = (int) (expectedSize / 0.75f) + 1;
Map<K, V> map = new HashMap<>(initialCapacity);

Common Pitfalls

❌ Mutable keys
   └── hashCode changes → can't find entry

❌ Poor hashCode implementation
   └── All keys hash to same bucket → O(n) performance

❌ Not overriding equals with hashCode
   └── Violates contract → unpredictable behavior

❌ Using default capacity for large datasets
   └── Multiple resize operations → performance hit

✓ Use immutable keys
✓ Implement good hash function
✓ Override both hashCode and equals
✓ Size appropriately from start

Conclusion

HashMap is a highly optimized data structure that balances performance and memory usage. Understanding its internals helps in writing efficient code and debugging performance issues. The evolution from linked lists to red-black trees in Java 8 showcases continuous optimization for real-world scenarios.

Key Points:

  • Uses array + linked list/tree hybrid structure
  • O(1) average case performance for basic operations
  • Automatic resizing maintains performance as size grows
  • Proper hash function and key design are critical for performance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment