To locate the presence of a pattern in the text, use the following formula., the string matching algorithm proved to be an effective approach, but it was quite costly. So in 1987, two computer scientists, Richard M. Karp and Michael O. Rabin, created an algorithm called the Rabin-Karp algorithm. Let us study more about this algorithm and the problem it solves.
Problem Statement
Two strings, a pattern 'P' and a text 'T' are given. The task is to determine if the pattern occurs in the text, and if it does, print all of its occurrences; else, print -1.
Example 1
Input
P = “res”, T = “pres tyu res”
Output
1 9
Explanation: P occurs at indices “1” and “9” in the given text T.
Example 2
Input
P = “rock”, T = “computer science”
Output
-1
Explanation: “rock” cannot be found in the text.
Rabin-Karp Algorithm
In the string matching algorithm, we simply check for each substring of the text of length |P|, if it is equal to the pattern string, then print the occurrence, else continue. In the Rabin-Karp algorithm, we calculate the hash for the pattern 'P' and then calculate the hash values for all the prefixes of the text 'T' such that we may compare the hash value of a substring of length |P| with the hash value of pattern in constant time.
Let's discuss step by step explanation through an example:
Step 1 : Calculate the length of string(sizeT) and pattern(sizeP), So,
sizeT = 10 and sizeP = 4
Step 2: Create multiplying factor 'mfact' using formula : mfact[i] = (mfact[i-1] * val)%mod, Where value of val = 31(taken here) and mod = 1000000009. The mfact table for the above string will be :
Index
Value
0
1
1
31
2
961
3
29791
4
923521
5
28629151
6
887503681
7
512623868
8
891029773
9
621922720
10
0
Step 3: Calculate the hash value for all the characters of string using formula : hashValue[i+1] = (hashValue[i] + (T[i] - ‘a’ + 1) * mfact[i]) % mod, where T represent string and value of mfact is taken from above table. So, the corresponding hash values will be :
Character
Hash Value
0
0
1
-7
2
458
3
18717
4
644328
5
23732353
6
453169618
7
640761445
8
329969393
9
605713520
10
934554239
Step 4: Calculate the hash value of the whole pattern using the formula :
hashofP = (hashofP + (P[i] - ‘ a’ + 1)* mfact[i]) % mod. According to which hash value of pattern P = 471380.
Step 5: Check the occurrence of the pattern in a given string. By comparing the hash values of current substring which is calculated using : currHash = (hashValue[i + sizeP] + mod - hashValue[i]) % mod and hash value of pattern. If the hash value of the pattern is equal to the hash value of the current substring then add the string index of that substring in the answer vector.
Step 6: Add indexes 4 and 6 to the answer vector and return it.
Hash Function in Rabin-Karp
Hash Function in the Rabin-Karp algorithm calculates the hash values for the pattern string 'P' as well as the substrings of the text strings. The hash function in this algorithm is required to calculate a unique hash with the help of the previous prefix string in constant time. In every iteration, we can calculate the hash value for the current substring by subtracting the hash value of the most significant character and adding the hash value of the least significant character as follows:
hashValue (T[i + 1, i + 2, … i + l + 1]) = hashValue (T[i, i + 1, i +2,...i + l]) - hashValue (T[i]) + hashValue(T[i + l + 1])
Where hashValue() is the resultant hashed value of the corresponding substrings. Let’s learn about how the hashValue() needs to be calculated for each substring of length |P|.
Hash Value Calculation
The hashing of the substrings is to be done in such a way that every pair of distinct substrings must have a different hash value. So we can assign very large values to each substring so that the chances for repetition of the hash values become very less. So we can have a multiplying factor 'mfact' as defined below:
mfact[i + 1] = mfact[i] * val
Where 'val' is an arbitrary integer greater than 1.
But as multiplying 'Val' continuously will exceed the integer limit at a time, so we can introduce a prime number as modulo factor 'mod' to have it in the integer limit.
Thus hash value for the text will be calculated as follows:
And the hashVaue for the current substring of text size |P| will be calculated as:
currHash = (hashValue[i + |P|] + mod - hashValue[i]) % mod
This currHash can be compared to the hash value of the pattern in constant time and conclude if the pattern is matched.
Below the program is the implementation of the Rabin-Karp Algorithm.
Implementation
// Implementation of the Rabin-Karp Algorithm
#include <iostream>
#include <vector>
using namespace std;
// Required constants for hashValue calculation.
const int mod = 1e9 + 9;
const int val = 31;
// Function to return the vector of occurrences.
vector<int> rabinKarp(string &P, string &T)
{
// Storing sizes of pattern and text.
int sizeP = P.length(), sizeT = T.length();
// Multiplying factors required for hashValue calculation.
vector<long long> mfact(sizeT + 1);
mfact[0] = 1;
for (int i = 1; i < sizeT; i++)
mfact[i] = (mfact[i - 1] * val) % mod;
// Calculating hashValue for text.
vector<long long> hashValue(sizeT + 1, 0);
for (int i = 0; i < sizeT; i++)
hashValue[i + 1] = (hashValue[i] + (T[i] - 'a' + 1) * mfact[i]) % mod;
// Calculating hashValue for whole pattern string.
long long hashOfP = 0;
for (int i = 0; i < sizeP; i++)
hashOfP = (hashOfP + (P[i] - 'a' + 1) * mfact[i]) % mod;
// Checking for the occurrences of pattern in text.
vector<int> indx;
for (int i = 0; i <= sizeT - sizeP; i++)
{
// Hash value for the current substring of size equal to pattern string size.
long long currHash = (hashValue[i + sizeP] + mod - hashValue[i]) % mod;
// If hash value for pattern equal to current substring of text.
if (currHash == (hashOfP * mfact[i]) % mod)
indx.push_back(i);
}
return indx;
}
// Main function
int main()
{
// Input of the string pattern and text.
string P, T;
cin >> P >> T;
// Occurrences of pattern in text.
vector<int> indx = rabinKarp(P, T);
// If no occurrence, print -1.
if (indx.empty())
cout << -1;
else
{
// Printing all the occurrence indices in text.
for (int i = 0; i < indx.size(); i++)
cout << indx[i] << " ";
}
return 0;
}
Input
yoyo
Yosuyoyoyo
Output
4 6
Complexity Analysis
Time Complexity
O(|P| + |T|), where |P| is the size of string 'P' and |T| is the size of string 'T.'
Explanation: Calculating the hashValue of string 'P' consumes a total of O(|P|) time, and comparing hash values of each substring of length |P| in 'T' with the hash value of 'P' takes a total time of O(|T|) time. So collectively, O(|P| + |T|) time will be consumed.
Space Complexity
O(|T|), where |T| is the size of string ‘T’.
Explanation: Storing all the hash values and multiplication factors consumes a total of O(|T|) space.
Frequently Asked Questions
When two keys have the same hashCode, how does HashMap's get() method work?
This is a follow-up question to previous equals and hashcode interview questions; in fact, it may lead to a discussion of the earlier issue. When two keys yield the same hashcode, they are grouped together. You used the keys.equals() method to compare with the key stored in each entry of the linked list to discover the proper value. Keep in mind the keys.equals() method is what the interviewer is searching for. You may also find a complete set of Java HashMap interview questions here.
In your project, where did we use equals() and hashCode?
This is to determine whether or not the developer has developed these methods. Of course, practically every Java programmer is aware of this; you can point to value objects and Hibernate entities from your domain when equals and hashCode have been altered. Always give examples from your domain and project rather than a simple example from a test program because if the interviewer is asking this question, he is interested in examples from your domain.
Should you include an Id field in equals() if your Class has one? Why?
This topic was submitted by one of my readers as a Hibernate Interview question. Including id in the equals() function is not a good idea because this method should check equality based on content and business rules. Also included is id, which is primarily a database identifier that isn't accessible to temporary objects until they are put into the database.
What's the difference between testing type inside equals with instanceof and getClass()?
This question was asked several times, and it was often in the context of your equals() and hashCode implementations. The essential distinction is that the instance of the operator returns true even when compared to a subclass, for example. InstanceofSuperclass is true in Subclass, but false in getClass().
What happens if the equals() function differs from the compareTo() method?
This is an intriguing subject that was posted in conjunction with the equals() and hashCode() contracts—some of the java.util. For example, if you want to implement a set of rules, you can do so. SortedSet or a concrete implementation of SortedSet The compareTo() method is used by TreeSet to compare objects. If compareTo() does not return zero, and the equals() method returns true, the Set contract may be broken, which is not to avoid any duplication.
Conclusion
The above blog covers an important algorithm that efficiently solves the problem of finding the occurrences of patterns in the text called the Rabin Karp Algorithm.
If you find the blog helpful, then you can explore our other blogs as well: