Introduction
Hashing is one of the elementary concepts in computer programming and is heavily used by developers for many use cases. Hashing is used to index and retrieve information from the database quickly.
Hashing is the process of mapping a key to a value using hash functions. Hash functions are mathematical functions that help us hash key-value pairs.
Consider a hash function: Y = K % N, where ‘K’ is a key, ‘N’ is the size of the hash table, and ‘Y’ is the generated value.
Let us suppose the size of the hash table is 10. In that case, the values will be generated in the following ways.
(Also see, Index Mapping)
Key |
Function |
Value |
17 |
17 % 10 = 7 |
7 |
23 |
23 % 10 = 3 |
3 |
34 |
34 % 10 = 4 |
4 |
51 |
51 % 10 = 1 |
1 |
67 |
67 % 10 = 7 |
7 |
Here is a catch. The hash value for both 17 and 67 is 7. Should they be mapped to the same value? Definitely not. This situation, where one or more keys map to the same value, is called a collision. There are several methods to combat such problems. Let us briefly discuss them here.
Collision Handling Techniques
There are several collision handling techniques. We will discuss a few of them here.
Linear Probing
Whenever a collision occurs, we try to find a new empty location for the key down the hash table. Whenever we find an empty location, we store the key on that location.
Notice that there is a collision between 17 and 67. At first, 17 is hashed using 17 % 5 to generate the value as 2. Now, for the key 67, value = 67 % 5, which is again 2. Hence, hashing of 17 and 67 results in a collision. Here we will use the concept of linear probing. We will search for the first empty location down the hash table. The first empty location after 2 is 3. Therefore, we insert 67 at location 3.
Now comes 39, which is hashed to 39 % 5 = 4. Since location 4 is empty, we insert 39 at location 4.
Then for key 43, there is another collision. Since 43 % 5 = 3, it should be inserted at location 3, we can see that location 3 is already occupied by key 67. Therefore, we will now search for the first empty location in the hash table. So, we found location 0 as an empty location. Thus, 43 is inserted at location 0.
In the end, the key 51 is hashed to 51 % 5 = 1. Since location 1 is empty, 51 is inserted there.
Now when we want to look up the table for a key, we just have to find a hash of the key.
For example, if we want to search 17 in the above hash table, we calculate its hash. So, value = 17 % 5 = 2. Therefore, we look up at location 2. If we don’t find it at that particular location, we need to look down the hash table until we find it. If the entire table is looked up and the key is not found, it means the value does not exist in the hashtable.
Chaining
Chaining is a simple technique to remove collisions. In this technique, we implement the hash table as an array of linked lists. We find the hash of a key, as usual, using the hash function. If a collision occurs, we attach the given key at the end of the linked list in that location.
In this example, there are three collisions. Firstly, 17 is hashed to 2. Then 67 is again hashed to 2, and finally, 32 is also hashed to 2. So, as per chaining, they are linked together in a linked list.
Key |
Value |
Linked List Representation |
17, 67, 32 |
Value = 2 |
2 -> 17 -> 67 -> 32 |
43 |
Value = 3 |
3 -> 43 |
59 |
Value = 4 |
4 -> 59 |
When we want to look up a key, we find its hash. Then if we want to find an element, we have to traverse the linked list.
In this case, insertion, search, and deletion take O(n) time in the worst case.
However, the amortized or the average complexity is still O(1) for insertion, deletion, and search.
Double Hashing
This collision resolution technique is quite efficient. As the name suggests, we apply two hash functions to an element. We apply the second hash only if a collision occurs after applying the first hash.
Double hashing is performed in the following way:
Hash1(key) = key % size of hash table.
Hash2(key) = prime - (prime - key % prime), where ‘prime’ is the prime number just smaller than the size of the hash table.
Value = (Hash1 + i * Hash2) % size of the hash table, where ‘i’ is initialized with one and incremented if there is a collision.
Let us consider an example to understand double hashing.
Suppose we want to insert the numbers: 17, 67, 32, 45, and 98 in the hash table. Therefore, using double hashing, we can generate the values corresponding to every key in the following way:
Key |
Function |
Value |
17 |
Hash1 = 17 % 5 = 2. |
2 |
67 |
Hash1 = 67 % 5 = 2, Hash2 = 3 - 67 % 3 = 2, Value = (2 + 1 *1) % 5 = 3. |
3 |
32 |
Hash1 = 32 % 5 = 2, Hash2 = 3 - 32 % 3 = 1, Value = (2 + 1 *1) % 5 = 3, Value = (2 + 2 *1) % 5 = 4. |
4 |
45 |
Hash1 = 45 % 5 =0 |
0 |
98 |
Hash1 = 98 % 5 = 3, Hash2 = 3 - 98 % 3 = 1, Value = (3 + 1 * 1) % 5 = 4, Value = (3 + 2 * 1) % 5 = 0. Value = (3 + 3 * 1) % 5= 1. |
1 |
Similar to the above case, the time complexity for insertion, deletion, and search is O(N). However, the amortized time complexity is still O(1) for all three operations.