Table of contents
1.
Introduction
2.
Understanding Collisions in Java Hash Tables
3.
Collision Resolution Techniques
3.1.
Separate Chaining
3.2.
Open Addressing
3.2.1.
Linear Probing
4.
Frequently Asked Questions
4.1.
What is the time complexity of inserting a key-value pair in a hash table with linear probing?
4.2.
How does the load factor affect the performance of a hash table with linear probing?
4.3.
Can linear probing handle duplicate keys in a hash table?
5.
Conclusion
Last Updated: Nov 11, 2024
Easy

Hash Collisions in Java

Author Rinki Deka
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Hash tables are an essential data structure in Java, which offers fast and efficient storage and retrieval of key-value pairs. They use a hash function to map keys to specific indices in an array, which gives quick access to the associated values. However, when two or more keys generate the same hash code, a hash collision occurs. This means that multiple key-value pairs are mapped to the same index in the hash table, which leads to potential conflicts. Handling hash collisions effectively is crucial to maintaining the performance and integrity of the hash table. In Java, there are several collision resolution techniques, like separate chaining and open addressing, that help resolve collisions and ensure the proper functioning of the hash table.

Hash Collisions in Java

In this article, we will discuss the concept of hash collisions in Java and discuss different collision resolution techniques, with the help of examples to show their implementation.

Understanding Collisions in Java Hash Tables

Hash tables in Java, like HashMap and HashSet, use a hash function to map keys to indices in an underlying array. The hash function takes a key and returns an integer value, known as the hash code. The hash code is then used to calculate the index where the key-value pair should be stored in the array.

Ideally, each key would have a unique hash code, allowing for direct mapping to distinct indices. However, in practice, different keys can generate the same hash code, a situation called a hash collision.

When a collision occurs, multiple key-value pairs are mapped to the same index in the hash table. If not handled properly, collisions can lead to incorrect data retrieval or overwriting of existing values.

For example : 

Suppose we have a simple hash function that calculates the hash code by summing the ASCII values of the characters in a string key:

public int hashCode(String key) {
    int sum = 0;
    for (char c : key.toCharArray()) {
        sum += (int) c;
    }
    return sum % 10; // Modulo 10 to fit within array size
}


Now, let's say we have two different keys, "apple" and "papa", that we want to store in a hash table with a size of 10. Using the above hash function, both keys generate the same hash code:

"apple" => 97 + 112 + 112 + 108 + 101 = 530 % 10 = 0

"papa"  => 112 + 97 + 112 + 97 = 418 % 10 = 0


In this case, both "apple" and "papa" would be mapped to index 0 in the hash table, resulting in a collision.

Collisions are a common occurrence in hash tables, especially when the number of keys is large compared to the size of the array. Handling collisions efficiently is essential to maintain the performance and correctness of hash table operations.

Collision Resolution Techniques

When hash collisions occur in Java hash tables, two main techniques are used to resolve them: separate chaining and open addressing. Let's discuss each technique in detail.

Separate Chaining

Separate chaining is a collision resolution technique where each index in the hash table is associated with a linked list or a similar data structure. When a collision occurs, instead of overwriting the existing entry, the new key-value pair is appended to the linked list at that index.

For example : 

class HashNode<K, V> {
    K key;
    V value;
    HashNode<K, V> next;

    public HashNode(K key, V value) {
        this.key = key;
        this.value = value;
        this.next = null;
    }
}

class HashTable<K, V> {
    private ArrayList<HashNode<K, V>> bucketArray;
    private int numBuckets;
    private int size;

    public HashTable() {
        bucketArray = new ArrayList<>();
        numBuckets = 10;
        size = 0;


        for (int i = 0; i < numBuckets; i++) {
            bucketArray.add(null);
        }
    }

    private int getBucketIndex(K key) {
        int hashCode = key.hashCode();
        return hashCode % numBuckets;
    }

    public void put(K key, V value) {
        int bucketIndex = getBucketIndex(key);
        HashNode<K, V> head = bucketArray.get(bucketIndex);

        while (head != null) {
            if (head.key.equals(key)) {
                head.value = value;
                return;
            }
            head = head.next;
        }


        size++;
        head = bucketArray.get(bucketIndex);
        HashNode<K, V> newNode = new HashNode<>(key, value);
        newNode.next = head;
        bucketArray.set(bucketIndex, newNode);
    }


    // Other methods (get, remove, etc.) omitted for brevity
}


In this example, each index in the `bucketArray` represents a bucket, and each bucket is a linked list of `HashNode` objects. When a key-value pair is inserted using the `put` method, the bucket index is calculated using the `getBucketIndex` method, which applies the hash function to the key.

If a collision occurs (i.e., there are already nodes in the linked list at that index), the new node is added at the beginning of the linked list. This allows multiple key-value pairs with the same hash code to coexist in the hash table.

Separate chaining provides good performance for hash tables with a low to moderate number of collisions. However, if the number of collisions is high and the linked lists become long, the search time for a specific key can degrade to O(n) in the worst case, where n is the number of elements in the linked list.

Open Addressing

Open addressing is another collision resolution technique used in Java hash tables. Unlike separate chaining, open addressing does not use linked lists to handle collisions. Instead, when a collision occurs, the hash table probes for the next empty slot in the array to store the key-value pair.

There are different variations of open addressing, but we'll focus on one common approach called linear probing.

Linear Probing

In linear probing, when a collision occurs, the hash table linearly searches for the next empty slot in the array. The probing sequence starts from the original hash index and increments by a fixed step size (usually 1) until an empty slot is found.

For example: 

import java.lang.reflect.Array;

class HashTable<K, V> {
    private class Entry<K, V> {
        K key;
        V value;

        public Entry(K key, V value) {
            this.key = key;
            this.value = value;
        }
    }

    private Entry<K, V>[] table;
    private int capacity;
    private int size;

    public HashTable(int capacity) {
        this.capacity = capacity;
        this.size = 0;
        table = (Entry<K, V>[]) Array.newInstance(Entry.class, capacity);
    }

    private int hash(K key) {
        return Math.abs(key.hashCode()) % capacity;
    }

    public void put(K key, V value) {
        int index = hash(key);

        while (table[index] != null && !table[index].key.equals(key)) {
            index = (index + 1) % capacity;
        }

        if (table[index] == null) {
            size++;
        }

        table[index] = new Entry<>(key, value);

        if ((float) size / capacity > 0.7) {
            rehash();
        }
    }

    public V get(K key) {
        int index = hash(key);
        while (table[index] != null) {
            if (table[index].key.equals(key)) {
                return table[index].value;
            }
            index = (index + 1) % capacity;
        }
        return null;
    }

    private void rehash() {
        capacity = capacity * 2;
        Entry<K, V>[] oldTable = table;
        table = (Entry<K, V>[]) Array.newInstance(Entry.class, capacity);
        size = 0;

        for (Entry<K, V> entry : oldTable) {
            if (entry != null) {
                put(entry.key, entry.value);
            }
        }
    }

    public int size() {
        return size;
    }

    public boolean isEmpty() {
        return size == 0;
    }
}


In this example, the `table` array stores the key-value pairs as `Entry` objects. The `hash` method calculates the initial index using the key's hash code modulo the table's capacity.

When inserting a key-value pair using the `put` method, if a collision occurs (i.e., the slot at the calculated index is already occupied), linear probing is applied. The code increments the index by 1 (modulo the capacity to wrap around) until an empty slot is found or the key is already present in the table.

Linear probing has the advantage of maintaining good cache locality, as the probed slots are contiguous in memory. However, it can suffer from clustering, where collisions tend to cluster together, leading to longer probe sequences and reduced performance.

Other open-addressing techniques, like quadratic probing and double hashing, can reduce the effects of clustering. These techniques use different probing sequences to distribute the collisions more evenly across the table.

Note: Always remember that open addressing requires the hash table to have sufficient empty slots to accommodate the collisions. If the load factor (ratio of filled slots to total slots) becomes too high, the hash table's performance degrades significantly. In such cases, the hash table must be resized and rehashed to a larger capacity.

Frequently Asked Questions

What is the time complexity of inserting a key-value pair in a hash table with linear probing?

The average case time complexity of inserting a key-value pair in a hash table with linear probing is O(1). However, in the worst case, when there are many collisions and the hash table becomes full, the time complexity can degrade to O(n), where n is the size of the hash table.

How does the load factor affect the performance of a hash table with linear probing?

The load factor, which is the ratio of the number of filled slots to the total number of slots in the hash table, plays a crucial role in the performance of a hash table with linear probing. As the load factor increases, the likelihood of collisions also increases, leading to longer probe sequences and degraded performance. 

Can linear probing handle duplicate keys in a hash table?

Yes, linear probing can handle duplicate keys in a hash table. When a duplicate key is inserted, the linear probing technique will search for the next empty slot to store the new key-value pair. However, it's important to note that the behavior of handling duplicate keys depends on the specific implementation and requirements of the hash table.

Conclusion

In this article, we discussed the concept of hash collisions in Java and different techniques to resolve them, which are separate chaining & linear probing. We learned that collisions occur when multiple keys map to the same hash code and saw how separate chaining uses linked lists to handle collisions while linear probing linearly searches for the next empty slot. 

You can also check out our other blogs on Code360.

Live masterclass