Hash tables are used to store key-value pairs.
They are like arrays, but the keys are not ordered.
Unlike arrays, hash tables are fast for all of the following operations: finding values, adding new values, and removing values!
Nearly every programming language has some sort of hash table data structure
Because of their speed, hash tables are very commonly used!
* Objects have some restrictions, but are basically hash tables
Python has Dictionaries
JS has Objects and Maps*
Java, Go, & Scala have Maps
Ruby has...Hashes
HASH TABLES IN THE WILD
Python has Dictionaries
JS has Objects and Maps*
Java, Go, & Scala have Maps
Ruby has...Hashes
LET'S PRETEND...
Existing implementations mysteriously disappear
How would we implement our own version???
Imagine we want to store some colors
[ "#ff69b4","#ff4500","#00ffff" ]
Not super readable! What do these colors correspond to?
Introductory Example
We could just use an array/list:
Introductory Example
It would be nice if instead of using indices to access the colors, we could use more human-readable keys.
orangered
#ff4500
pink
#ff69b4
cyan
#00ffff
Introductory Example
How can we get human-readability and computer readability?
Computers don't know how to find an element at index pink!
Hash tables to the rescue!
To implement a hash table, we'll be using an array.
In order to look up values by key, we need a way to convert keys into valid array indices.
A function that performs this task is called a hash function.
1
0
2
3
4
5
6
7
8
9
orangered
7
pink
0
cyan
3
["pink", "#ff69b4"]
["orangered", "#ff4500"]
["cyan", "#00ffff"]
(not a cryptographically secure one)
Fast
Non-Example
function slowHash(key) {
for (var i = 0; i < 10000; i++) {
console.log("everyday i'm hashing");
}
return key[0].charCodeAt(0);
}
Uniformly Distributes Values
Non-Example
function sameHashedValue(key) {
return 0;
}
Deterministic
Non-Example
function randomHash(key) {
return Math.floor(Math.random() * 1000)
}
Simple Hash Example
function hash(key, arrayLen) {
let total = 0;
for (let char of key) {
// map "a" to 1, "b" to 2, "c" to 3, etc.
let value = char.charCodeAt(0) - 96
total = (total + value) % arrayLen;
}
return total;
}
Here's a hash that works on strings only:
hash("pink", 10); // 0
hash("orangered", 10); // 7
hash("cyan", 10); // 3
Problems with our current hash
function hash(key, arrayLen) {
let total = 0;
for (let i = 0; i < key.length; i++) {
let char = key[i];
let value = char.charCodeAt(0) - 96
total = (total + value) % arrayLen;
}
return total;
}
function hash(key, arrayLen) {
let total = 0;
let WEIRD_PRIME = 31;
for (let i = 0; i < Math.min(key.length, 100); i++) {
let char = key[i];
let value = char.charCodeAt(0) - 96
total = (total * WEIRD_PRIME + value) % arrayLen;
}
return total;
}
The prime number in the hash is helpful in spreading out the keys more uniformly.
It's also helpful if the array that you're putting values into has a prime length.
You don't need to know why. (Math is complicated!) But here are some links if you're curious.
Even with a large array and a great hash function, collisions are inevitable.
There are many strategies for dealing with collisions, but we'll focus on two:
With separate chaining, at each index in our array we store values using a more sophisticated data structure (e.g. an array or a linked list).
This allows us to store multiple key-value pairs at the same index.
Example
1
0
2
3
4
5
6
7
8
9
salmon
4
darkblue
4
[ ["darkblue", "#00008b"] ]
[ ["darkblue", "#00008b"] ]
[ ["darkblue", "#00008b"],
["salmon", "#fa8072"] ]
With linear probing, when we find a collision, we search through the array to find the next empty slot.
Unlike with separate chaining, this allows us to store a single key-value at each index.
Example
1
0
2
3
4
5
6
7
8
9
salmon
4
darkblue
4
["darkblue", "#00008b"]
tomato
4
["salmon", "#fa8072"]
["tomato", "#ff6347"]
class HashTable {
constructor(size=53){
this.keyMap = new Array(size);
}
_hash(key) {
let total = 0;
let WEIRD_PRIME = 31;
for (let i = 0; i < Math.min(key.length, 100); i++) {
let char = key[i];
let value = char.charCodeAt(0) - 96
total = (total * WEIRD_PRIME + value) % this.keyMap.length;
}
return total;
}
}
set
get
undefined
keys
values
(average case)
1
0
2
3
4
5
6
7
8
9
A good hash function
O(1)
1
0
2
3
4
5
6
7
8
9
With the world's worst hash function...
O(n)
Hash tables are collections of key-value pairs
Hash tables can find values quickly given a key
Hash tables can add new key-values quickly
Hash tables store data in a large array, and work by hashing the keys
A good hash should be fast, distribute keys uniformly, and be deterministic
Separate chaining and linear probing are two strategies used to deal with two keys that hash to the same index
When in doubt, use a hash table!