Collision Resolution Techniques
When hash collisions occur in Java hash tables, two main techniques are used to resolve them: separate chaining and open addressing. Let's discuss each technique in detail.
Separate Chaining
Separate chaining is a collision resolution technique where each index in the hash table is associated with a linked list or a similar data structure. When a collision occurs, instead of overwriting the existing entry, the new key-value pair is appended to the linked list at that index.
For example :
class HashNode<K, V> {
K key;
V value;
HashNode<K, V> next;
public HashNode(K key, V value) {
this.key = key;
this.value = value;
this.next = null;
}
}
class HashTable<K, V> {
private ArrayList<HashNode<K, V>> bucketArray;
private int numBuckets;
private int size;
public HashTable() {
bucketArray = new ArrayList<>();
numBuckets = 10;
size = 0;
for (int i = 0; i < numBuckets; i++) {
bucketArray.add(null);
}
}
private int getBucketIndex(K key) {
int hashCode = key.hashCode();
return hashCode % numBuckets;
}
public void put(K key, V value) {
int bucketIndex = getBucketIndex(key);
HashNode<K, V> head = bucketArray.get(bucketIndex);
while (head != null) {
if (head.key.equals(key)) {
head.value = value;
return;
}
head = head.next;
}
size++;
head = bucketArray.get(bucketIndex);
HashNode<K, V> newNode = new HashNode<>(key, value);
newNode.next = head;
bucketArray.set(bucketIndex, newNode);
}
// Other methods (get, remove, etc.) omitted for brevity
}
In this example, each index in the `bucketArray` represents a bucket, and each bucket is a linked list of `HashNode` objects. When a key-value pair is inserted using the `put` method, the bucket index is calculated using the `getBucketIndex` method, which applies the hash function to the key.
If a collision occurs (i.e., there are already nodes in the linked list at that index), the new node is added at the beginning of the linked list. This allows multiple key-value pairs with the same hash code to coexist in the hash table.
Separate chaining provides good performance for hash tables with a low to moderate number of collisions. However, if the number of collisions is high and the linked lists become long, the search time for a specific key can degrade to O(n) in the worst case, where n is the number of elements in the linked list.
Open Addressing
Open addressing is another collision resolution technique used in Java hash tables. Unlike separate chaining, open addressing does not use linked lists to handle collisions. Instead, when a collision occurs, the hash table probes for the next empty slot in the array to store the key-value pair.
There are different variations of open addressing, but we'll focus on one common approach called linear probing.
Linear Probing
In linear probing, when a collision occurs, the hash table linearly searches for the next empty slot in the array. The probing sequence starts from the original hash index and increments by a fixed step size (usually 1) until an empty slot is found.
For example:
import java.lang.reflect.Array;
class HashTable<K, V> {
private class Entry<K, V> {
K key;
V value;
public Entry(K key, V value) {
this.key = key;
this.value = value;
}
}
private Entry<K, V>[] table;
private int capacity;
private int size;
public HashTable(int capacity) {
this.capacity = capacity;
this.size = 0;
table = (Entry<K, V>[]) Array.newInstance(Entry.class, capacity);
}
private int hash(K key) {
return Math.abs(key.hashCode()) % capacity;
}
public void put(K key, V value) {
int index = hash(key);
while (table[index] != null && !table[index].key.equals(key)) {
index = (index + 1) % capacity;
}
if (table[index] == null) {
size++;
}
table[index] = new Entry<>(key, value);
if ((float) size / capacity > 0.7) {
rehash();
}
}
public V get(K key) {
int index = hash(key);
while (table[index] != null) {
if (table[index].key.equals(key)) {
return table[index].value;
}
index = (index + 1) % capacity;
}
return null;
}
private void rehash() {
capacity = capacity * 2;
Entry<K, V>[] oldTable = table;
table = (Entry<K, V>[]) Array.newInstance(Entry.class, capacity);
size = 0;
for (Entry<K, V> entry : oldTable) {
if (entry != null) {
put(entry.key, entry.value);
}
}
}
public int size() {
return size;
}
public boolean isEmpty() {
return size == 0;
}
}
In this example, the `table` array stores the key-value pairs as `Entry` objects. The `hash` method calculates the initial index using the key's hash code modulo the table's capacity.
When inserting a key-value pair using the `put` method, if a collision occurs (i.e., the slot at the calculated index is already occupied), linear probing is applied. The code increments the index by 1 (modulo the capacity to wrap around) until an empty slot is found or the key is already present in the table.
Linear probing has the advantage of maintaining good cache locality, as the probed slots are contiguous in memory. However, it can suffer from clustering, where collisions tend to cluster together, leading to longer probe sequences and reduced performance.
Other open-addressing techniques, like quadratic probing and double hashing, can reduce the effects of clustering. These techniques use different probing sequences to distribute the collisions more evenly across the table.
Note: Always remember that open addressing requires the hash table to have sufficient empty slots to accommodate the collisions. If the load factor (ratio of filled slots to total slots) becomes too high, the hash table's performance degrades significantly. In such cases, the hash table must be resized and rehashed to a larger capacity.
Frequently Asked Questions
What is the time complexity of inserting a key-value pair in a hash table with linear probing?
The average case time complexity of inserting a key-value pair in a hash table with linear probing is O(1). However, in the worst case, when there are many collisions and the hash table becomes full, the time complexity can degrade to O(n), where n is the size of the hash table.
How does the load factor affect the performance of a hash table with linear probing?
The load factor, which is the ratio of the number of filled slots to the total number of slots in the hash table, plays a crucial role in the performance of a hash table with linear probing. As the load factor increases, the likelihood of collisions also increases, leading to longer probe sequences and degraded performance.
Can linear probing handle duplicate keys in a hash table?
Yes, linear probing can handle duplicate keys in a hash table. When a duplicate key is inserted, the linear probing technique will search for the next empty slot to store the new key-value pair. However, it's important to note that the behavior of handling duplicate keys depends on the specific implementation and requirements of the hash table.
Conclusion
In this article, we discussed the concept of hash collisions in Java and different techniques to resolve them, which are separate chaining & linear probing. We learned that collisions occur when multiple keys map to the same hash code and saw how separate chaining uses linked lists to handle collisions while linear probing linearly searches for the next empty slot.
You can also check out our other blogs on Code360.