Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Let us start this article by performing an activity. Try to find your name in this blog in the next three minutes!
Did you find out your name? No? Maybe you did not check the blog thoroughly. You can recheck it. Did you find out your name this time? Again no? Your name is not in the blog, but you searched it, right? And it would have taken so much time to do this. Imagine you could have a tool by which you can directly type the word you are looking for, and the output will give you the word's location in no time. This tool is nothing but indexing or inverted index.
We may encounter this kind of situation many times while handling databases, so searching the particular word or group of words is very difficult without using inverted indexes. In indexing, we store the content using the indexing data structures so that whenever we demand the information, it can be accessed smoothly and within no time.
Not only this but there are benefits of indexing in handling the input/output cost of CPU also.
So let us dive deeper into the topic of indexing without wasting time as you already have wasted time in searching for your name.
Inverted indexes are nothing but a kind of Data Structure similar to that of hashmaps; they store the data and map them with their locations to be accessed by the users quickly.
It is handy in searching for a particular thing in a database with their locations. Let us now see how they are helpful.
As you have already seen in the introduction, how expensive the process of searching a word is and how easy it is to find a talk with the help of an inverted index. So this was the first utility of the indexing. Another utility reduces the input/output cost inside the CPU. Let us see how?
We know that the architecture of a computer is such that the CPU generates the instructions according to the secondary memory (hard disk). Still, the speed of secondary memory is not compatible with the CPU, so we use the main memory to store the data temporarily. Whenever the CPU creates the instructions, the main memory inputs the required data from the secondary memory and exits the information, which is no more required. This process is costly. With the help of indexing, we create indexes for the database, making it easier for the CPU to find out the required data.
Now, as we have seen the utility of indexing, let us understand it with an example.
Example
Let us consider these table flowers in the database.
Flower Id
Flower_name
Flower_color
7
Rose
Red
9
Lotus
White, Pink
11
China rose
Red, Orange
Suppose we want to search for the word rose in the above table, The traditional way is to traverse throughout the table, but it will be taking a tremendous amount of time. If we create an inverted index for this which will look like the following, It will be searched easily.
Inverted index for table flower
Flower_id
term
7, 11
Rose
9
Lotus
11
China
7, 11
Red
9
White
9
Pink
11
Orange
Looking at the above table of the inverted index, one can find the word.
It contains the list of references to documents of each word. Comparatively, it is less functional than the other type, but it takes less time to form and requires low maintenance.
Word level inverted index
It contains the position of each word in a document. It is more functional, but it requires more time and high maintenance.
Let us now see some frequently asked questions on this topic.
Is there any disadvantage of indexing? Though we discussed many advantages of inverted indexing, there are specific limitations. It acquires colossal space and to create an inverted index is again an expensive process.
How can we transform the data before searching and saving it? Transformation can do this with the help of two methods Drop the stop words - We may drop the most common words in the database like “I, in, is.” Stemming is a process of transforming an expression into root form by clipping the ending of the word.
What is lemmatization? It is a process of changing the word into its dictionary form. For example - Running would be altered to Run after lemmatization. This ends the blog, and let us summarize our learning
Key takeaways
In this article, we started with a fun yet time taking activity, and with its help, we learned about the utility of indexing in DBMS, followed by some examples of it and its types. At last, we saw some frequently asked questions from the topic, but that is not all. You need to access Coding Ninjas Studio to get ahead of your competitors and solve some critical problems by checking out our list here. Top 100 SQL problems.