Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
What is the Rabin-Karp Algorithm?
3.
How Rabin-Karp Algorithm Works?
4.
How is Hash Value calculated in Rabin-Karp Algorithm?
5.
Implementation
5.1.
C++ Code
5.2.
Python Code
5.3.
Java Code
6.
Rabin-Karp Algorithm Complexity
6.1.
Time Complexity
6.2.
Space Complexity
7.
Rabin-Karp Algorithm Applications
8.
Advantages and Disadvantages Of Rabin-Karp Algorithm
8.1.
Advantages
8.2.
Disadvantages 
9.
Frequently asked questions
9.1.
What is the difference between KMP and Rabin-Karp algorithm?
9.2.
What is Horner's rule in Rabin-Karp algorithm?
9.3.
How do you use Rabin-Karp algorithm?
9.4.
What is the Rabin-Karp algorithm for numbers?
10.
Conclusion
Last Updated: Mar 27, 2024
Easy

Rabin-Karp Algorithm

Author Riya
3 upvotes
Roadmap to SDE career at Amazon
Speaker
Anubhav Sinha
SDE-2 @
25 Jun, 2024 @ 01:30 PM

Introduction

This blog will discuss the Rabin-Karp Algorithm. It is a string searching algorithm that is named after its authors Richard M. Carp and Michael O. Rabin. 

This algorithm is used to find all the occurrences of a given pattern ‘P’’ in a given string ‘S’ in O(Ns + Np) time, where ‘Ns’ and ‘Np’ are the lengths of ‘S’’ and ‘P’, respectively.

rabin karp algorithm

Let’s take an example to make it more clear.

Assume the given string S = “cxyzghxyzvjkxyz” and pattern P = “xyz” and we have to find all the occurrences of ‘P’ in ‘S’.

rabin-karp pattern

We can see that “xyz” is occurring in “cxyzghxyzvjkxyz” at three positions. So, we have to print that pattern ‘P’ is occurring in string ‘S’ at indices 1, 6, and 12.

What is the Rabin-Karp Algorithm?

The Rabin-Karp Algorithm is a string-searching algorithm that efficiently locates a substring within a larger string by using hashing to compare the hash values of substrings with the hash value of the target substring. This algorithm is particularly useful for finding multiple occurrences of a pattern within a text.

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

How Rabin-Karp Algorithm Works?

The algorithm starts by computing, at each index of the text, the hash value of the string starting at that particular index with the same length as the pattern. If the hash value of that equals the hash value of the given pattern, then it does a full match at that particular index.

How is Hash Value calculated in Rabin-Karp Algorithm?

Step 1. Create a function “rabinKarpSearch()’ for implementing the Rabin Karp algorithm, that will accept the two parameters - the given string ‘S’ and the given pattern ‘P’. First, calculate the lengths of ‘S’ and ‘P’.

Step 2. Now, we have to choose a prime number and a value for taking modulus while calculating the hash values. For minimizing the hashing collision, we have to take the value of the prime number close to the number of characters used in the string and pattern. Assuming that the given ‘S’ and ‘P’ consist of only lower alphabets, the total number of characters will be 26, so take prime = 31. Now a value for taking modulus should be very large and prime so, take mod = 1e +9.

Step 3.  The hash function that we have used here is:

     hash(S) = (Σ((S[i] - ‘a’ + 1) * (P^(i)))) % mod

Step 4. Create a vector to store the powers of “prime” and store (prime ^ 0) to (prime ^ Ns). Now calculate the hash value of the given pattern and the first window of the string ‘S’.

Step 5. Now one by one, slide the given pattern and calculate the hash value of the corresponding substring and compare it with the hash value of the pattern. If found the same, print the occurrence of the pattern at that index.
 

Implementation

  • C++ Code

C++ Code

//C++ code for implementation of Rabin Karp algorithm
#include <bits/stdc++.h>
using namespace std;

// Function for searching a pattern in a string using Rabin Karp algorithm
void rabinKarpSearch(string S, string P)
{

   // Calculating the length of S and P
   int Ns = S.length();
   int Np = P.length();
  
   // Initialize the value of prime number and mod for calculating hash values
   int prime = 31;
   int mod = 1e9 + 9;
  
   // Calculating the power raise to the taken prime
   vector<long long> p_pow(Ns);
   p_pow[0] = 1;
   for (int i = 1; i < Ns; i++)
   {
p_pow[i] = (p_pow[i-1] * prime) % mod;
   }
 
   vector<long long> h(Ns + 1, 0);
   for (int i = 0; i < Ns; i++)
   {
h[i+1] = (h[i] + (S[i] - 'a' + 1) * p_pow[i]) % mod;
   }
  
// Calculating the hash value of P
   long long hash_P = 0;
   for (int i = 0; i < Np; i++)
   {
hash_P = (hash_P + (P[i] - 'a' + 1) * p_pow[i]) % mod;
   }

/*
Now slide the pattern by one character and check for the corresponding
hash value to match with the hash value of the given pattern
*/
for (int i = 0; i + Np - 1 < Ns; i++)
{
long long curr_hash = (h[i+Np] + mod - h[i]) % mod;
if (curr_hash == hash_P * p_pow[i] % mod)
cout<<"The given pattern occurs in the given string at index "<<i<<endl;
}
}
 
int main()
{
string S = "cxyzghxyzvjkxyz";
string P = "xyz";
  
// Call the function for rabin karp algorithm
rabinKarpSearch(S,P);
  
return 0;
}

Output:

The given pattern occurs in the given string at index 1
The given pattern occurs in the given string at index 6
The given pattern occurs in the given string at index 12

“Rabin Karp” algorithm is a string searching algorithm used to find all the occurrences of a given pattern ‘P’’ in a given string ‘S’. You can check out this video for conceptual knowledge and implementation of code.

Also read - Kadane's Algorithm And Application of Graph in Data Structure

  • Python Code

Python Code

class RabinKarp:
def __init__(self, text, pattern):
# The constructor for the class takes in the main string and the target string,
# and also sets an arbitrary prime number used for hash calculation.
self.text = text
self.pattern = pattern
self.prime = 101
def search_pattern(self):
# This is the main function to search the pattern in the given text string.
pattern_length = len(self.pattern)
text_length = len(self.text)

# Calculate the hash value for the pattern, and the hash value for the first window of text.
pattern_hash = self.create_hash(self.pattern, pattern_length - 1)
text_hash = self.create_hash(self.text, pattern_length - 1)
for i in range(1, text_length - pattern_length + 2):
# If the hash value of the pattern matches the hash value of the current window of text
# then only check individual characters for matching.
if pattern_hash == text_hash:
if self.check_equal(self.text[i - 1:i + pattern_length - 1], self.pattern[0:]):
return i - 1

# Calculate hash value for next window of text: Remove leading digit,
# add trailing digit from remaining string.
if i < text_length - pattern_length + 1:
text_hash = self.recalculate_hash(self.text, i - 1, i + pattern_length - 1, text_hash, pattern_length)
return -1
def create_hash(self, text, end):
# This function calculates the initial rolling hash value.
hash = 0
for i in range(end + 1):
hash = hash + ord(text[i]) * pow(self.prime, i)
return hash
def recalculate_hash(self, text, old_index, new_index, old_hash, pattern_length):
# This function calculates hash value for next window of text.
new_hash = old_hash - ord(text[old_index])
new_hash = new_hash // self.prime
new_hash += ord(text[new_index]) * pow(self.prime, pattern_length - 1)
return new_hash
def check_equal(self, text1, text2):
# This function checks if individual characters are equal in case of hash match.
if text1 == text2:
return True
else:
return False
# Test the code
txt = "this is a test text"
pat = "test"
# Create a RabinKarp object with the text and pattern
rabin_karp = RabinKarp(txt, pat)
# Search for the pattern in the text
start = rabin_karp.search_pattern()
if start != -1:
print("Pattern found at position: ", start)
else:
print("Pattern not found")

Output

Output
  • Java Code

Java Code

import java.math.BigInteger;
import java.util.Random;
public class RabinKarp {
private String pattern; // the pattern string
private long patternHash; // hash value of pattern string
private int m; // pattern length
private long q; // a large prime for computing hash
private int R; // radix
private long RM; // R^(m-1) % q
public RabinKarp(String pattern) {
// Save pattern (needed for Las Vegas)
this.pattern = pattern;
// Choose a large prime number q and a radix R
R = 256;
m = pattern.length();
q = longRandomPrime();
// Pre-compute R^(m-1) % q for use in removing leading digit
RM = 1;
for (int i = 1; i <= m - 1; i++)
RM = (R * RM) % q;
patternHash = hash(pattern, m);
}
// Compute hash for pattern and initial text window
private long hash(String key, int m) {
long h = 0;
for (int j = 0; j < m; j++)
h = (R * h + key.charAt(j)) % q;
return h;
}
// Check for pattern match
private boolean check(String txt, int i) {
return hash(txt.substring(i, i + m), m) == patternHash;
}
// Returns a random 31-bit prime number
private static long longRandomPrime() {
BigInteger prime = BigInteger.probablePrime(31, new Random());
return prime.longValue();
}
// Search for the pattern string in the text string
public int search(String txt) {
int n = txt.length();
if (n < m) return n;
long txtHash = hash(txt, m);
// Check for match at offset 0
if (patternHash == txtHash && check(txt, 0))
return 0;
// Check for hash match; if hash match, check for exact match
for (int i = m; i < n; i++) {
// Remove leading digit, add trailing digit, and check for match
txtHash = (txtHash + q - RM * txt.charAt(i - m) % q) % q;
txtHash = (txtHash * R + txt.charAt(i)) % q;
// match
int offset = i - m + 1;
if (patternHash == txtHash && check(txt, offset))
return offset;
}
// No match found
return n;
}
// Test the code
public static void main(String[] args) {
String pattern = "test";
String txt = "this is a test text";

RabinKarp searcher = new RabinKarp(pattern);
int offset = searcher.search(txt);

if (offset != txt.length()) {
System.out.println("Pattern found at position: " + offset);
} else {
System.out.println("Pattern not found");
}
}
}

Output

Output

Rabin-Karp Algorithm Complexity

We will now discuss the time and space complexity of Rabin-Karp Alogrithm:

Time Complexity

  • In the Rabin Karp algorithm, we have calculated the hash value of the pattern in O(Np) time and traversed the given string for calculating the hash value and comparing the corresponding hash value with that of the pattern in O(Ns) time. 
     
  • So, the time complexity is O(Ns + Np), where ‘Ns’ and ‘Np’ are the lengths of the given string and pattern respectively.

Space Complexity

  • We have used constant space. So, the space complexity is O(1).

Rabin-Karp Algorithm Applications

The Rabin-Karp algorithm is a string search algorithm. It efficiently finds occurrences of patterns within a given text. Its primary use is pattern matching and string searching, and it has several advantages that make it a valuable tool in many areas. Here are some of its uses:

  • Word Processing: Used by search engines and text editors to find and highlight occurrences of keywords and phrases within large bodies of text.
     
  • Plagiarism Detection: Used to identify instances of copied content within a document, website, or scholarly article.
     
  • Biological Sequence Analysis: Used in bioinformatics to search and match DNA, RNA, or protein sequences in genomic databases. 
     
  • Data Mining: It is used for pattern matching and similarity search on large datasets.
     
  • Computer Security: Implemented in intrusion detection systems and antivirus software to identify and block malicious patterns and signatures.
     
  • Compression Algorithm: It is used to search for repeating patterns and substrings and can be compressed more effectively.
     
  • Image Processing: It has been adapted for image recognition tasks. B. Find specific patterns in images.
     
  • Network Packet Inspection: Used in network security to identify specific patterns or signatures in network packets. 
     
  • Spelling Correction: Used by spell checking systems to suggest corrections based on similar patterns in the text.
     
  • Data Deduplication: Used in data storage systems to eliminate duplicate data and optimize storage capacity. 

Advantages and Disadvantages Of Rabin-Karp Algorithm

Advantages

  1. Rabin-Karp algorithm is best suited to find multiple patterns in the same text. 
     
  2. Rabin-Karp algorithm can work with various types of data like common characters in the same input, multiple substrings, etc. 
     
  3. Rabin-Karp algorithm helps in detecting plagiarism for large datasets. 
     
  4. The algorithm can also be used in string-matching questions when used with hash functions. 

Disadvantages 

  1. The Rabin-Karp algorithm can have the worst time complexity when frequent hash collisions occur. The complexity can go to O(M*N) which is not an optimized complexity when compared with different strung matching algorithms. 
     
  2. Rabin-Karp algorithm uses extra space to store hash value data. 
     
  3. Rabin-Karp algorithm uses predictability of the hash function, which is a security concern.
     
  4. Cryptographic applications do not prefer to use the Rabin-Karp algorithm, because of safety reasons. 

Frequently asked questions

What is the difference between KMP and Rabin-Karp algorithm?

The Knuth-Morris-Pratt (KMP) algorithm uses prefix matching, while the Rabin-Karp algorithm employs hashing. KMP focuses on efficiently comparing characters, whereas Rabin-Karp leverages hashing for pattern matching in strings.

What is Horner's rule in Rabin-Karp algorithm?

Horner's rule, applied in Rabin-Karp algorithm, efficiently calculates hash values of substrings by iteratively updating the hash using a rolling hash function, enabling quick comparison with the target pattern's hash.

How do you use Rabin-Karp algorithm?

The Rabin-Karp algorithm searches for a pattern in a text using hashing. It slides a window along the text, calculates hash values for the pattern and window, and compares them. If hashes match, it checks the substrings for an exact match.

What is the Rabin-Karp algorithm for numbers?

The Rabin-Karp algorithm for numbers operates similarly to text, but uses number sequences instead. It employs rolling hash functions to identify a specific sequence of numbers within a larger numeric dataset.

Conclusion

In this article, we discussed the Rabin-Karp Algorithm for finding a given pattern in a given string, the Java, Python & C++ implementation of the algorithm, and its time and space complexities. If you want to check out more articles and solve similar problems for practice, then you can visit Coding Ninjas Studio

Also read palindrome number in python.

If you think that this blog helped you, then share it with your friends!. 

Until then, All the best for your future endeavors, and Keep Coding.

Previous article
Check If a String is a Palindrome or Not
Next article
Rabin-Karp Algorithm
Live masterclass