How does Hashing Work?
Hashing works by using a hash function to convert an input key into a compact numerical value, which serves as an index in the hash table. Let’s look at a step-by-step process:
1. Hash Function: The hash function takes the input key & applies a mathematical algorithm to generate a hash value. The hash value is typically an integer that falls within the range of the hash table's size.
2. Compression: The hash value is then compressed or truncated to fit within the size of the hash table. This is done using the modulo operator (%), which returns the remainder of the hash value divided by the table size. For example, if the hash value is 1234 & the table size is 100, the compressed index would be 34 (1234 % 100).
3. Indexing: The compressed hash value is used as an index to access the corresponding "bucket" or slot in the hash table. Each bucket can store one or more key-value pairs that have the same hash value.
4. Collision Resolution: When multiple keys map to the same hash value (known as a collision), a collision resolution technique is employed. Two common methods are:
- Chaining: Each bucket in the hash table is a linked list. When a collision occurs, the new key-value pair is appended to the linked list at that index.
- Open Addressing: If a collision occurs, the hash table probes for the next empty slot using a predefined sequence (e.g., linear probing, quadratic probing).
5. Retrieval: To retrieve a value associated with a key, the key is hashed again to obtain the index. The corresponding bucket is then searched to find the matching key-value pair.
Let’s see a visual representation of the hashing process:
[Input Key] -> [Hash Function] -> [Hash Value] -> [Compression] -> [Hash Table Index]
For example :
Map<String, Integer> hashMap = new HashMap<>();
hashMap.put("apple", 10);
hashMap.put("banana", 20);
hashMap.put("orange", 30);
int value = hashMap.get("banana");
System.out.println("Value for 'banana': " + value);
You can also try this code with Online Java Compiler
Run Code
In this example, the strings "apple", "banana" & "orange" are used as keys, & their corresponding values are stored in the `HashMap`. When `get("banana")` is called, the hash function calculates the hash value for "banana", compresses it to an index & retrieves the associated value 20.
Methods to implement Hashing in Java
Java provides several built-in classes that implement the hashing concept. Each class has its own characteristics and use cases. Let's discuss them one by one.
1. HashTable (A synchronized implementation of hashing):
- `Hashtable` is a synchronized implementation of a hash table, meaning multiple threads can safely access it concurrently.
- It stores key-value pairs & does not allow null keys or values.
- Hashtable is slower compared to `HashMap` due to the overhead of synchronization.
For example :
Hashtable<String, Integer> hashtable = new Hashtable<>();
hashtable.put("key1", 10);
hashtable.put("key2", 20);
int value = hashtable.get("key1");
2. HashMap (A non-synchronized faster implementation of hashing):
- `HashMap` is a non-synchronized implementation of a hash table, offering better performance than `Hashtable`.
- It allows one null key & multiple null values.
- HashMap is not thread-safe & should be used in a single-threaded environment or with manual synchronization.
For example :
HashMap<String, Integer> hashMap = new HashMap<>();
hashMap.put("key1", 10);
hashMap.put("key2", 20);
int value = hashMap.get("key1");
3. LinkedHashMap (Similar to HashMap, but keeps order of elements):
- `LinkedHashMap` is an extension of `HashMap` that maintains the insertion order of elements.
- It has slightly lower performance than `HashMap` due to the additional overhead of maintaining the order.
- LinkedHashMap is useful when the order of elements needs to be preserved.
For example :
LinkedHashMap<String, Integer> linkedHashMap = new LinkedHashMap<>();
linkedHashMap.put("key1", 10);
linkedHashMap.put("key2", 20);
for (Map.Entry<String, Integer> entry : linkedHashMap.entrySet()) {
System.out.println(entry.getKey() + ": " + entry.getValue());
}
4. ConcurrentHashMap (Similar to Hashtable, Synchronized, but faster as multiple locks are used):
- `ConcurrentHashMap` is a thread-safe implementation of a hash table designed for concurrent access.
- It achieves better performance than `Hashtable` by using multiple locks, allowing concurrent reads & segmented writes.
- ConcurrentHashMap does not allow null keys or values.
For example :
ConcurrentHashMap<String, Integer> concurrentHashMap = new ConcurrentHashMap<>();
concurrentHashMap.put("key1", 10);
concurrentHashMap.put("key2", 20);
int value = concurrentHashMap.get("key1");
With the help of HashSet (Similar to HashMap, but maintains only keys, not pair): `HashSet` is a class in Java that implements the Set interface using a hash table. It is similar to `HashMap` but only stores unique keys, not key-value pairs.
Main characteristics of `HashSet`are
- Stores unique elements: `HashSet` does not allow duplicate elements. If you try to add an element that already exists, it will not be added again.
- No guaranteed order: The elements in a `HashSet` are not stored in any particular order. The order may change over time as elements are added or removed.
- Null value allowed: `HashSet` allows storing one null value.
- Non-synchronized: `HashSet` is not synchronized, meaning multiple threads can access it simultaneously without explicit synchronization.
For example:
import java.util.HashSet;
public class Main {
public static void main(String[] args) {
HashSet<String> hashSet = new HashSet<>();
hashSet.add("apple");
hashSet.add("banana");
hashSet.add("apple"); // This will not be added as "apple" already exists
hashSet.add(null); // Null value is allowed
System.out.println("Contains 'apple': " + hashSet.contains("apple"));
System.out.println("Size: " + hashSet.size());
for (String element : hashSet) {
System.out.println(element);
}
}
}
You can also try this code with Online Java Compiler
Run Code
In this example:
1. We create a `HashSet` called `hashSet` to store strings.
2. We add elements "apple" & "banana" to the set using the `add()` method.
3. We try to add "apple" again, but it will not be added as it already exists in the set.
4. We add a null value, which is allowed in `HashSet`.
5. We check if the set contains the element "apple" using the `contains()` method.
6. We print the size of the set using the `size()` method.
7. We iterate over the elements of the set using a for-each loop & print each element.
Output:
Contains 'apple': true
Size: 3
null
banana
apple
As you can see, `HashSet` maintains only unique elements & provides methods to add, check existence & iterate over the elements.
With the help of LinkedHashSet (Similar to LinkedHashMap, but maintains only keys, not pair):
`LinkedHashSet` is a class in Java that extends `HashSet` & implements the Set interface. It maintains a linked list of the entries in the set, in the order they were inserted.
Key characteristics of `LinkedHashSet`
- Maintains insertion order: Unlike `HashSet`, `LinkedHashSet` maintains the order in which elements were inserted into the set.
- Unique elements: Like `HashSet`, `LinkedHashSet` does not allow duplicate elements.
- Performance: `LinkedHashSet` provides constant-time performance for basic operations (add, remove, contains, size), assuming the hash function disperses elements properly among the buckets.
For example :
import java.util.LinkedHashSet;
public class Main {
public static void main(String[] args) {
LinkedHashSet<String> linkedHashSet = new LinkedHashSet<>();
linkedHashSet.add("apple");
linkedHashSet.add("banana");
linkedHashSet.add("cherry");
linkedHashSet.add("banana"); // This will not be added as "banana" already exists
System.out.println("Contains 'banana': " + linkedHashSet.contains("banana"));
System.out.println("Size: " + linkedHashSet.size());
for (String element : linkedHashSet) {
System.out.println(element);
}
}
}
You can also try this code with Online Java Compiler
Run Code
In this example:
1. We create a `LinkedHashSet` called `linkedHashSet` to store strings.
2. We add elements "apple", "banana", & "cherry" to the set using the `add()` method.
3. We try to add "banana" again, but it will not be added as it already exists in the set.
4. We check if the set contains the element "banana" using the `contains()` method.
5. We print the size of the set using the `size()` method.
6. We iterate over the elements of the set using a for-each loop & print each element.
Output:
Contains 'banana': true
Size: 3
apple
banana
cherry
As you can see, `LinkedHashSet` maintains the insertion order of the elements while ensuring uniqueness.
Let’s look at a comparison with `HashSet`, how they are different:
- `LinkedHashSet` maintains the insertion order of elements, whereas `HashSet` does not provide any guarantee on the order.
- `LinkedHashSet` has slightly higher memory overhead compared to `HashSet` due to the additional linked list structure.
- `LinkedHashSet` is useful when you need to maintain the insertion order of elements in the set.
With the help of TreeSet (Implements the SortedSet interface, Objects are stored in a sorted & ascending order):
`TreeSet` is a class in Java that implements the `SortedSet` interface & stores elements in a sorted & ascending order. It uses a self-balancing binary search tree (Red-Black tree) internally to maintain the order of elements.
Key characteristics of `TreeSet`
- Sorted order: Elements in a `TreeSet` are stored in their natural ordering (ascending order) or by a custom comparator provided at the time of creation.
- Unique elements: `TreeSet` does not allow duplicate elements. If you try to add an element that already exists, it will not be added again.
- Implements `SortedSet`: `TreeSet` provides additional methods from the `SortedSet` interface, such as `first()`, `last()`, `headSet()`, `tailSet()`, & `subSet()`.
- Performance: `TreeSet` provides guaranteed log(n) time for basic operations like add, remove & contains.
For example:
import java.util.TreeSet;
public class Main {
public static void main(String[] args) {
TreeSet<Integer> treeSet = new TreeSet<>();
treeSet.add(5);
treeSet.add(2);
treeSet.add(8);
treeSet.add(1);
treeSet.add(5); // This will not be added as 5 already exists
System.out.println("Contains 5: " + treeSet.contains(5));
System.out.println("Size: " + treeSet.size());
System.out.println("First element: " + treeSet.first());
System.out.println("Last element: " + treeSet.last());
for (Integer element : treeSet) {
System.out.println(element);
}
}
}
You can also try this code with Online Java Compiler
Run Code
In this example:
1. We create a `TreeSet` called `treeSet` to store integers.
2. We add elements 5, 2, 8, & 1 to the set using the `add()` method.
3. We try to add 5 again, but it will not be added as it already exists in the set.
4. We check if the set contains the element 5 using the `contains()` method.
5. We print the size of the set using the `size()` method.
6. We retrieve & print the first & last elements of the set using `first()` & `last()` methods.
7. We iterate over the elements of the set using a for-each loop & print each element.
Output:
Contains 5: true
Size: 4
First element: 1
Last element: 8
1
2
5
8
As you can see, `TreeSet` stores the elements in a sorted & ascending order, maintains uniqueness & provides additional methods for retrieving elements based on their order.
Let’s have a basic comparison of `HashSet` with `LinkedHashSet`:
- `TreeSet` stores elements in a sorted order, whereas `HashSet` & `LinkedHashSet` do not provide any ordering guarantees.
`TreeSet` has a higher memory overhead and slightly slower performance compared to `HashSet` and `LinkedHashSet` due to the maintenance of the self-balancing binary search tree.
- `TreeSet` is useful when you need to maintain elements in a sorted order & perform range-based operations efficiently.
Use Cases of Hashing in Java
1. Efficient data retrieval: Hashing provides a fast & efficient way to retrieve data from a collection based on a unique key. It allows for constant-time (O(1)) lookups, making it ideal for scenarios where quick data access is crucial, such as in databases, caches, & symbol tables.
2. Detecting duplicates: Hashing can quickly detect duplicate elements in a collection. By storing elements in a hash table, you can easily check if an element already exists by calculating its hash code and comparing it with the existing elements. This is useful in scenarios like removing duplicates from a list or finding unique elements in a dataset.
3. Indexing: Hashing is commonly used in indexing mechanisms to facilitate fast data searching and retrieval. By creating a hash table index based on a key, you can quickly locate the corresponding data without scanning the entire dataset. This is particularly useful in large databases or search engines, where efficient indexing is essential for optimal performance.
4. Caching: Hashing is often used in caching systems to store frequently accessed data in memory. By using a hash table, you can quickly retrieve cached data based on a key, avoiding the need to fetch it from a slower storage medium. This improves application performance by reducing the latency of data access.
5. Counting frequencies: Hashing can be employed to count the frequency of elements in a collection. By using a hash table to store elements as keys and their frequencies as values, you can efficiently update and retrieve the count of each element. This is handy in scenarios like word frequency analysis, network traffic monitoring, or market basket analysis.
6. Memoization: Memoization is a technique used to optimize recursive algorithms by caching the results of expensive function calls. Hashing plays a role in memoization by allowing you to store the computed results in a hash table, keyed by the input parameters. Subsequent calls with the same parameters can retrieve the cached result instead of recomputing it, improving the algorithm's efficiency.
7. Load balancing: Hashing is utilized in load balancing techniques to distribute data or tasks across multiple servers or nodes. By applying a hash function to a key (e.g., user ID, request ID), you can determine which server or node should handle the data or request. This helps achieve an even distribution of load and optimize resource utilization in distributed systems.
Benefits of Hashing in Java
1. Fast data retrieval: Hashing provides constant-time (O(1)) average-case complexity for data retrieval operations. By using a hash function to map keys to specific indices in the hash table, accessing elements becomes much faster compared to linear search or other search algorithms. This efficiency is particularly beneficial when dealing with large datasets or frequent lookup operations.
2. Efficient memory utilization: Hashing helps efficiently utilize memory by storing data in a compact hash table. The hash function distributes the elements across the table, minimizing collisions and ensuring optimal space utilization. This is especially advantageous when working with limited memory resources or when storing a large number of key-value pairs.
3. Improved performance: Hashing algorithms are designed to be fast and efficient. The average time complexity of basic operations like insertion, deletion, and lookup is O(1), which remains constant regardless of the size of the dataset. This performance advantage makes hashing suitable for scenarios that require quick data access and manipulation, such as real-time applications or high-traffic systems.
4. Scalability: Hashing techniques are highly scalable & can handle large amounts of data efficiently. As the dataset grows, the hash table can be resized or rehashed to accommodate more elements while maintaining its performance characteristics. This scalability is crucial in applications that need to handle increasing data volumes without compromising speed or efficiency.
5. Simplified data organization: Hashing provides a convenient way to organize and structure data based on unique keys. By using keys as indices in the hash table, elements can be easily stored, retrieved, and managed. This simplifies data organization and allows for quick access to specific elements without the need for complex data structures or search algorithms.
6. Flexibility in key types: Hashing algorithms can work with various types of keys, including integers, strings, or even objects. As long as the keys are unique and can be mapped to a numeric hash code, they can be used effectively in hashing. This flexibility enables the use of hashing in a wide range of applications and domains, regardless of the nature of the data being stored.
7. Enabling unique data structures: Hashing forms the foundation for implementing unique data structures like sets, maps, and caches. These structures rely on hashing's properties to ensure element uniqueness, fast retrieval, and efficient memory usage.
Frequently Asked Questions
What happens if two different keys generate the same hash value (collision)?
Collisions are resolved using techniques like chaining or open addressing to ensure data integrity.
Can hashing guarantee constant-time performance in all cases?
While hashing provides O(1) average-case complexity, worst-case performance may degrade to O(n) if collisions are not handled properly.
Is hashing suitable for sorting elements?
Hashing itself does not provide sorting capabilities, but it can be used in conjunction with other algorithms for efficient sorting.
Conclusion
In this article, we discussed the concept of hashing in Java and its different implementations. We learned about the characteristics of hashing algorithms, how hashing works, and the different classes available in Java for hashing. We also discussed the use cases and benefits of hashing, showcasing its importance in efficient data storage, retrieval, and manipulation.
You can also check out our other blogs on Code360.