Cuckoo Hashing is a method for implementing a hash table. It achieves constant time worst-case complexity for lookups, unlike most other hash table techniques.
Collisions are handled by evicting existing keys and moving them from one array to another. This technique resembles how a cuckoo chick pushes out an egg from the nest to make room for itself, hence the name Cuckoo Hashing.
Problem Statement
Cuckoo Hashing is a method used to resolve the problem when a collision occurs in a Hash Table. There are high chances of collisions between two hash values of a hash function in a table. It occurs when two hash values for the same key occur in the hash function of a table.
The name, Cuckoo Hashing, is derived from some characteristic of a cuckoo, as a cuckoo chick shoves or pushes the other eggs or the young ones out of the nest to make a place for its own. In Cuckoo Hashing, we try to insert a new key into the hash table; we push the older key to the new place. And that is how Cuckoo Hashing is implemented.
Cuckoo Hashing Operations
Insertion
In the initial hash table, a new element is always added. In the event of a collision, the existing element is removed and added to the second hash table. The second existing element will be removed and added to the first hash table if that results in a collision, and so on. This continues until a bucket is discovered to be empty. The Cuckoo Hash table below has the key as an example. Keys are pushed around until a vacant slot is discovered. Rehashing occurs if the number of displacements reaches a specific limit (for instance, due to a cycle among the inserted keys).
Rehashing is a linear operation, so the worst-case complexity is O(n). However, as with other hashing techniques, this method can show the amortized run time to be O(1). The proof for this is non-trivial. The performance degrades as the load factor surpasses 50%.
Lookup
If a key exists, it will be stored in its original bucket, either in the first or second array. In other words, at most, two lookups are needed to figure out if the key exists or not. Thus the worst-case complexity is O(1).
Removal
A value is removed simply by clearing the bucket and storing the key. Again, the worst-case complexity is O(1). There are no chains and therefore, no need to use deleted markings or so-called tombstones.
Stashing
There is a little chance that a cycle will develop among the initial few added parts. Even with a low load factor, this can cause a rehash. The stash, a constant-sized array, can be used as a defence against this.
When a key has to be inserted, but there isn't a free bucket available, the key is kept in the stash. In addition to the two arrays, the lookup technique is also updated to search in the stash. Rehashing is done when hashing cannot add a key and the stash is full.
D-Cuckoo Hashing
Cuckoo hashing uses a random but fixed number of internal hash tables.
As with other techniques leveraging multiple hash functions, more hash functions increase the spread and reduce the likelihood of rehash for a given number of insertions. The improvement going from two layers to three (and beyond) is smaller than the improvement gained from one to two.
#include<bits/stdc++.h>
#define MAX_NUMBER 11
#define ver 2
int hashtable[ver][MAX_NUMBER];
// Array to store positions for a key
int idx[ver];
/* function to fill hash_it table with dummy value*/
void initialize_table()
{
for (int j=0; j<MAX_NUMBER; j++)
for (int i=0; i<ver; i++)
hashtable[i][j] = INT_MIN;
}
/* return hashed value for a key */
int hash_it(int function, int key)
{
switch (function)
{
case 1: return key%MAX_NUMBER;
case 2: return (key/MAX_NUMBER)%MAX_NUMBER;
}
}
/* function that places a key in one of its possible positions
* ID_value: table in which key has to be placed.
* count: number of times function has already been called
* n: maximum number of times function can be recursively
called before stopping of cycle */
void place_it(int key, int ID_value, int count, int n)
{
/* if function has been recursively called max number
of times, stop and declare cycle. Rehash. */
if (count==n){
printf("%d unpositioned\n", key);
printf("Cycle present. REHASH.\n");
return;
}
for (int i=0; i<ver; i++){
idx[i] = hash_it(i+1, key);
if (hashtable[i][idx[i]] == key)
return;
}
/* check if another key is already present or not
* If YES: place the new key in its position
* and the older key in a different position
for it in the next table */
if (hashtable[ID_value][idx[ID_value]]!=INT_MIN)
{
int dis = hashtable[ID_value][idx[ID_value]];
hashtable[ID_value][idx[ID_value]] = key;
place_it(dis, (ID_value+1)%ver, count+1, n);
}
else
hashtable[ID_value][idx[ID_value]] = key;
}
//prints table contents
void print_hash()
{
printf("Final hash_it tables:\n");
for (int i=0; i<ver; i++, printf("\n"))
for (int j=0; j<MAX_NUMBER; j++)
(hashtable[i][j]==INT_MIN)? printf("- "):
printf("%d ", hashtable[i][j]);
printf("\n");
}
/* function for Cuckoo-hashing the keys */
void cuckoo_hash(int keys[], int n)
{
// initialize hash_it tables to a dummy value (INT-MIN)
// indicating empty position
initialize_table();
// start with placing every key at its position in
// the first hash_it table according to first hash_it
// function
for (int i=0, count=0; i<n; i++, count=0)
place_it(keys[i], 0, count, n);
//print the final hash_it tables
print_hash();
}
/* driver function */
int main()
{
int keys_one[] = {20, 50, 53, 75, 100, 67, 105, 3, 36, 39};
int n = sizeof(keys_one)/sizeof(int);
cuckoo_hash(keys_one, n);
/* This array has a cycle and hence we will have to rehash */
int keys_two[] = {20, 50, 53, 75, 100, 67, 105, 3, 36, 39, 6};
int m = sizeof(keys_two)/sizeof(int);
cuckoo_hash(keys_two, m);
return 0;
}
You can also try this code with Online C++ Compiler
Cuckoo hashing helps in achieving the worst-case time complexity of O(1)
What is multiple-choice hashing?
Multiple choice hashing includes giving each element multiple choices for positions where it can reside in the hash table.
What is relocation hashing?
Unlike multiple-choice hashing, relocation hashing allows elements in the hash table to move after being placed.
What is the open addressing?
In open addressing, it is allowed for elements to overflow out of their target bucket and into other spaces.
What is the closed addressing?
In closed addressing, the colliding elements are stored in an auxiliary data structure like a linked list or a binary search tree.
Conclusion
In this article, we have extensively discussed Cuckoo Hashing, its time and space complexities, and its implementation.
You can also check our previous blogs on JAR files, Azure Cognitive Search, and View and View Controllers. If you are eager to learn advanced front-end web development, Coding Ninjas is here with one of the best courses available, which you can find here.