Hash map Binary Tree hybrid.md

I've come up with an interesting design for a hash map that instead of bucket lists, recursivly embeds more hash maps, but with a new key for each level.

Each hash map is super simple with two slots and hashe keys are 1 bit. However, the initial value is hashed to a 32-bit integer where each bit in the number is the key for each level in the recursive hash map.

In C, my structure looks like:

struct hash_tree_node {
  void *key;
  void *value;
  struct hash_tree_node *left;
  struct hash_tree_node *right;
}

In other words, this is shaped exactly like a binary tree.

Algorithm for setting a value, starting at the root node.

If the slot is free or matches our key, use it.
otherwise:
- if the hash ends in 1, recurse on left (shift the hash right 1 bit to get new final bit)
- else recurse on right (shift the same here)

As you can see, finding a free slot is O(log n). Each hash hey gives you a unique pseudo-random search path to traverse the binary tree.

I optimize a little with a top node that's special and records a running total of all value nodes for quick length checks.

struct hash_tree {
  struct hash_tree_node *root;
  uintptr_t count;
}

Reading a value from the hash is also O(log n) cost as we must walk the tree using the hashed bits.

Deleting a value has similar cost. Once found, we simply set key to null and the set algorithm knows it can reuse this slot instead of looking deeper.

Iterating over this can be done without recursion or a stack by using morris traversal.

Is there a name for this structure?

How does it compare to traditional hash maps and balanced binary trees?

Note that while this is shaped like a binary tree, there is no balancing. Nodes can be empty and reused later. Assuming a good integer hash for the bitfield used for the path, the tree should naturally stay balanced.

Since my bitpath is 32 bits long, if you happened to store more than 4 billion unique keys in this, the logic will turn into linear chains. But there will be 4,294,967,296 of them so it shouldn't matter unless you went way past this value.

Of course if you're storing that many keys in my simple hash/tree, you're probably using it wrong. You could simply use a 64-bit hash key instead to prevent the poor performance for trees deeper than 32 levels. Most likely you need to rethink your solution a further back.

For example, in one system, I have a tiny embedded scripting language that has exactly one native data structure. The cons pair. It's a fixed tuple of length two that can hold two arbitrary values. Using this I can construct these nodes in the following manner:

node = ((key,value),(left,right))

I found this -- very smart idea. I would implement it the same way with cons pairs.

However, there is a problem when you want to delete elements. Picture what happens if you have an empty hashmap, then set an element, it will always be the root node. Then picture what happens if you add another element with hash X, then delete the root node (by setting the key and value to nil) and then try to insert X again. It will be put into the root node, but the X's original child node still exists -- but the search algorithm will never find it, so it doesn't appear to be a problem right now. Except when you try to delete X by finding its node and setting key and value to nil, it will cause the search algorithm to again find the original X node, but X was supposed to be deleted.

So, when either setting or deleting an element from the map, add this -- you cannot return once the first target node is found, because there may be shadow nodes below it in the hash search sequence, and these need to be cleared as well.

creationix/Hash map Binary Tree hybrid.md

Select an option

No results found

Select an option

No results found

dragoncoder047 commented Apr 2, 2024

Uh oh!