Table of contents
1.
Introduction
2.
1. Indexing
3.
2. Hashing
3.1.
Python
4.
Difference between Indexing and Hashing in DBMS:
5.
 
6.
 
7.
 
8.
 
9.
 
10.
 
11.
Frequently Asked Questions
11.1.
Can indexing and hashing be used together in a database?
11.2.
How do I choose between indexing and hashing for a specific database table?
11.3.
What are the trade-offs of using indexing and hashing in terms of storage space?
12.
Conclusion
Last Updated: Aug 18, 2024
Easy

Indexing and Hashing in DBMS

Author Ravi Khorwal
0 upvote

Introduction

Databases are essential for storing and managing large amounts of data efficiently. To quickly access and retrieve specific information from a database, we use two important techniques: indexing and hashing. These methods help optimize database performance by reducing the time it takes to search for and locate desired data. 

Indexing and Hashing in DBMS

In this article, we will talk about the concepts of indexing and hashing in database management systems (DBMS) and understand their differences with the help of proper examples.

1. Indexing

Indexing is a technique used in databases to improve the speed of data retrieval. It works by creating a separate data structure, called an index, which contains a copy of selected columns from a table along with a pointer to the corresponding row in the original table. This allows the database to quickly locate the desired data without having to scan the entire table.

Imagine you have a library with thousands of books. To find a specific book, you could either search through each shelf one by one, or you could use a catalog that lists all the books along with their locations. The catalog acts as an index, making it faster to find the book you need.

In a database, an index is created on one or more columns of a table. For example, let's say we have a table called "students" with columns "id", "name", and "age". We can create an index on the "name" column to speed up searches based on student names.

Let’s see an example of how to create an index in SQL

CREATE INDEX idx_student_name ON students (name);


This statement creates an index named "idx_student_name" on the "name" column of the "students" table.

When we execute a query that includes a condition on the indexed column, like

SELECT * FROM students WHERE name = 'Rahul';


The database can use the index to quickly locate the rows that match the condition, rather than scanning the entire table.

Indexing is particularly useful for tables with a large number of rows and for queries that frequently search based on specific columns. However, it's important to note that indexes also consume additional storage space and can slow down insert, update, and delete operations, as the index needs to be updated along with the table.

2. Hashing

Hashing is another technique used in databases to enable fast data retrieval. It involves using a hash function to convert a key value (such as a primary key) into a unique index or memory address where the corresponding data is stored. This allows for constant-time (O(1)) lookup, regardless of the size of the dataset.

A hash function takes an input value and produces a fixed-size output, called a hash code or hash value. The hash function should be deterministic, meaning that for the same input, it always generates the same hash code. The goal is to distribute the keys evenly across the available memory addresses to minimize collisions (when multiple keys map to the same hash code).

For example : 

  • Python

Python

class HashTable:

   def __init__(self, size):

       self.size = size

       self.table = [[] for _ in range(size)]



   def _hash_function(self, key):

       return hash(key) % self.size


   def insert(self, key, value):

       hash_index = self._hash_function(key)

       self.table[hash_index].append((key, value))


   def search(self, key):

       hash_index = self._hash_function(key)

       for item in self.table[hash_index]:

           if item[0] == key:

               return item[1]

       return None


# Example usage

hash_table = HashTable(10)

hash_table.insert("Rahul", 25)

hash_table.insert("Rinki", 30)

hash_table.insert("Harsh", 27)

print(hash_table.search("Rahul")) 

print(hash_table.search("Sinki"))
You can also try this code with Online Python Compiler
Run Code

 

Output

25
None


In this example, we create a simple hash table class with a fixed size. The `_hash_function` method takes a key and returns the corresponding index in the table using the built-in `hash` function and modulo operation. The `insert` method adds a key-value pair to the table, while the `search` method retrieves the value associated with a given key.

When we search for a key, the hash function quickly computes the index where the data should be stored, allowing for fast retrieval. If there are collisions (multiple keys mapping to the same index), we can handle them using techniques like chaining (storing multiple key-value pairs in a linked list at the same index) or open addressing (probing for the next empty slot).

Hashing is commonly used in databases for indexing, as well as in other data structures like dictionaries and sets. It provides excellent performance for lookups, especially when the dataset is large. However, we need to remember that it may not be suitable for range queries or scenarios where the keys are not known in advance.

Difference between Indexing and Hashing in DBMS:

Feature

Indexing

Hashing

Data StructureBalanced tree (e.g., B-tree)Hash table
Key DistributionKeys are stored in sorted orderKeys are distributed uniformly across buckets
Search ComplexityO(log n) for balanced treesO(1) on average, O(n) in worst case (collisions)
Range QueriesEfficient for range queriesNot suitable for range queries
Collision HandlingNot applicableChaining or open addressing
Insertion/DeletionRequires rebalancing the treeRequires rehashing when load factor exceeds threshold
OrderingPreserves key orderDoes not preserve key order
UniquenessAllows duplicate keysRequires unique keys or additional handling for duplicates
Size OverheadModerate (depends on tree height)High (depends on load factor and collision resolution)
SuitabilitySuitable for range queries and ordered accessSuitable for fast lookups and unordered access

 

 

 

 

 

 

 

.

Frequently Asked Questions

Can indexing and hashing be used together in a database?

Yes, indexing and hashing can be used together in a database. Indexing can be used for columns that require range queries or ordered access, while hashing can be used for columns that need fast point lookups.

How do I choose between indexing and hashing for a specific database table?

The choice between indexing and hashing depends on the query patterns and requirements of your application. If you need to perform range queries or retrieve data in sorted order, indexing is a good choice. If you primarily need fast lookups based on unique keys, hashing is more suitable.

What are the trade-offs of using indexing and hashing in terms of storage space?

Indexing and hashing both require additional storage space compared to storing the data without any indexing. Indexing overhead depends on the size of the index and the chosen index structure, while hashing overhead depends on the load factor and collision resolution mechanism used. It's important to consider the storage trade-offs when designing your database schema.

Conclusion

In this article, we explained the concepts of indexing and hashing in database management systems. We learned that indexing is a technique that uses balanced tree structures to facilitate efficient data retrieval and is suitable for range queries and ordered access. On the other hand, hashing uses a hash function to distribute keys across a hash table, that provides fast lookups but not supporting range queries.

You can also check out our other blogs on Code360.

Live masterclass