Hash tables
How hash tables map keys to buckets for fast average-case lookup, and how they cope with collisions.
12 cards · 7 quiz questions · 8 min read
Imagine a library where, instead of walking the shelves, you could compute the exact shelf for any book from its title alone. That is the promise of a hash table: a data structure that stores key-value pairs and finds them in roughly constant time, no matter how many entries it holds. It powers dictionaries, sets, caches, and database indexes across nearly every language and system.
The core idea: keys to buckets
A hash table is built on an array of slots called buckets. To store a value under a key, the table feeds the key to a hash function, which turns it into an integer. That integer is reduced — typically with a modulo against the number of buckets — into a bucket index. The value is then placed at that index.
Looking the value up again repeats the same computation: hash the key, find the bucket, read the entry. Because the bucket index comes straight from the key, the table never scans its whole contents. This is why lookup, insertion, and deletion are all average O(1) — constant time that does not grow with the number of stored items.
Collisions are inevitable
There are vastly more possible keys than buckets, so sooner or later two different keys produce the same bucket index. This is a collision, and no hash table can avoid them entirely. The interesting engineering is in how they are resolved. Two strategies dominate.
Separate chaining stores a small list at each bucket. When keys collide, they are simply appended to that bucket’s list. A lookup hashes to the bucket and then scans its short list for the matching key. As long as lists stay short, this stays fast.
Open addressing keeps every entry directly in the bucket array. On a collision, the table probes for another slot following a fixed rule — linear probing checks the next slot, quadratic probing jumps by growing offsets — until it finds an empty or matching position. This avoids the overhead of separate lists and is cache-friendly, but it grows fragile as the array fills.
Load factor: the dial that controls speed
The single most important number for performance is the load factor: the count of stored entries divided by the number of buckets.
load factor = entries / buckets
A low load factor means buckets are mostly empty, collisions are rare, and operations are fast. A high load factor means crowded buckets, frequent collisions, and longer scans. To keep this dial in a healthy range, hash tables resize: once the load factor crosses a threshold (often around 0.7), the table allocates a larger bucket array — usually double the size — and rehashes every existing entry into it. Rehashing is expensive, but it happens rarely, so the average cost of insertion stays constant.
A good hash function is the unsung hero here. It must spread keys uniformly, be fast to compute, and be deterministic. Poor distribution clusters keys into a few buckets and quietly destroys performance.
When constant time breaks down
The O(1) guarantee is an average, not a promise. In the worst case, a hash table degrades to O(n). If a bad hash function — or an adversary deliberately crafting keys — sends every entry into the same bucket, that bucket becomes one long list and lookups must scan all n items. This is more than theoretical: hash-flooding attacks exploit exactly this to slow down servers, which is why many runtimes now use randomized or cryptographically strong hash functions.
Trade-offs to remember
Two properties shape when you reach for a hash table. First, keys must be stable — the bucket index is derived from the key’s value, so a key that mutates after insertion would hash to a different slot and become unfindable. This is why most languages require keys to be immutable or hashable.
Second, hash tables are unordered. Entries sit wherever their hash sends them, so iterating yields keys in no meaningful sequence. If you need keys kept in sorted order, a balanced binary search tree is the better tool, trading hashing’s constant-time access for logarithmic-time access plus guaranteed ordering.
Used within these limits, the hash table is one of the most rewarding structures in computing: a small amount of arithmetic buys you near-instant access to data of almost any size.
What is the average-case time complexity of a hash-table lookup?
Sources
- Cormen, Leiserson, Rivest & Stein — Introduction to Algorithms book CLRS; Chapter 11 covers hash tables, hash functions, and collision resolution.
- Donald E. Knuth — The Art of Computer Programming, Volume 3: Sorting and Searching book Classic analysis of hashing and collision handling.