Get a skill gap analysis, personalised roadmap, and AI-powered resume optimisation.
Introduction
Databases are essential for storing and managing large amounts of data efficiently. To quickly access and retrieve specific information from a database, we use two important techniques: indexing and hashing. These methods help optimize database performance by reducing the time it takes to search for and locate desired data.
In this article, we will talk about the concepts of indexing and hashing in database management systems (DBMS) and understand their differences with the help of proper examples.
1. Indexing
Indexing is a technique used in databases to improve the speed of data retrieval. It works by creating a separate data structure, called an index, which contains a copy of selected columns from a table along with a pointer to the corresponding row in the original table. This allows the database to quickly locate the desired data without having to scan the entire table.
Imagine you have a library with thousands of books. To find a specific book, you could either search through each shelf one by one, or you could use a catalog that lists all the books along with their locations. The catalog acts as an index, making it faster to find the book you need.
In a database, an index is created on one or more columns of a table. For example, let's say we have a table called "students" with columns "id", "name", and "age". We can create an index on the "name" column to speed up searches based on student names.
Let’s see an example of how to create an index in SQL
CREATE INDEX idx_student_name ON students (name);
This statement creates an index named "idx_student_name" on the "name" column of the "students" table.
When we execute a query that includes a condition on the indexed column, like
SELECT * FROM students WHERE name = 'Rahul';
The database can use the index to quickly locate the rows that match the condition, rather than scanning the entire table.
Indexing is particularly useful for tables with a large number of rows and for queries that frequently search based on specific columns. However, it's important to note that indexes also consume additional storage space and can slow down insert, update, and delete operations, as the index needs to be updated along with the table.
2. Hashing
Hashing is another technique used in databases to enable fast data retrieval. It involves using a hash function to convert a key value (such as a primary key) into a unique index or memory address where the corresponding data is stored. This allows for constant-time (O(1)) lookup, regardless of the size of the dataset.
A hash function takes an input value and produces a fixed-size output, called a hash code or hash value. The hash function should be deterministic, meaning that for the same input, it always generates the same hash code. The goal is to distribute the keys evenly across the available memory addresses to minimize collisions (when multiple keys map to the same hash code).
For example :
Python
Python
class HashTable:
def __init__(self, size):
self.size = size
self.table = [[] for _ in range(size)]
def _hash_function(self, key):
return hash(key) % self.size
def insert(self, key, value):
hash_index = self._hash_function(key)
self.table[hash_index].append((key, value))
def search(self, key):
hash_index = self._hash_function(key)
for item in self.table[hash_index]:
if item[0] == key:
return item[1]
return None
# Example usage
hash_table = HashTable(10)
hash_table.insert("Rahul", 25)
hash_table.insert("Rinki", 30)
hash_table.insert("Harsh", 27)
print(hash_table.search("Rahul"))
print(hash_table.search("Sinki"))
You can also try this code with Online Python Compiler
In this example, we create a simple hash table class with a fixed size. The `_hash_function` method takes a key and returns the corresponding index in the table using the built-in `hash` function and modulo operation. The `insert` method adds a key-value pair to the table, while the `search` method retrieves the value associated with a given key.
When we search for a key, the hash function quickly computes the index where the data should be stored, allowing for fast retrieval. If there are collisions (multiple keys mapping to the same index), we can handle them using techniques like chaining (storing multiple key-value pairs in a linked list at the same index) or open addressing (probing for the next empty slot).
Hashing is commonly used in databases for indexing, as well as in other data structures like dictionaries and sets. It provides excellent performance for lookups, especially when the dataset is large. However, we need to remember that it may not be suitable for range queries or scenarios where the keys are not known in advance.
Difference between Indexing and Hashing in DBMS:
Feature
Indexing
Hashing
Data Structure
Balanced tree (e.g., B-tree)
Hash table
Key Distribution
Keys are stored in sorted order
Keys are distributed uniformly across buckets
Search Complexity
O(log n) for balanced trees
O(1) on average, O(n) in worst case (collisions)
Range Queries
Efficient for range queries
Not suitable for range queries
Collision Handling
Not applicable
Chaining or open addressing
Insertion/Deletion
Requires rebalancing the tree
Requires rehashing when load factor exceeds threshold
Ordering
Preserves key order
Does not preserve key order
Uniqueness
Allows duplicate keys
Requires unique keys or additional handling for duplicates
Size Overhead
Moderate (depends on tree height)
High (depends on load factor and collision resolution)
Suitability
Suitable for range queries and ordered access
Suitable for fast lookups and unordered access
.
Frequently Asked Questions
Can indexing and hashing be used together in a database?
Yes, indexing and hashing can be used together in a database. Indexing can be used for columns that require range queries or ordered access, while hashing can be used for columns that need fast point lookups.
How do I choose between indexing and hashing for a specific database table?
The choice between indexing and hashing depends on the query patterns and requirements of your application. If you need to perform range queries or retrieve data in sorted order, indexing is a good choice. If you primarily need fast lookups based on unique keys, hashing is more suitable.
What are the trade-offs of using indexing and hashing in terms of storage space?
Indexing and hashing both require additional storage space compared to storing the data without any indexing. Indexing overhead depends on the size of the index and the chosen index structure, while hashing overhead depends on the load factor and collision resolution mechanism used. It's important to consider the storage trade-offs when designing your database schema.
Conclusion
In this article, we explained the concepts of indexing and hashing in database management systems. We learned that indexing is a technique that uses balanced tree structures to facilitate efficient data retrieval and is suitable for range queries and ordered access. On the other hand, hashing uses a hash function to distribute keys across a hash table, that provides fast lookups but not supporting range queries.
You can also check out our other blogs on Code360.