This blog will discuss the Rabin-Karp Algorithm. It is a string searching algorithm that is named after its authors Richard M. Carp and Michael O. Rabin.
This algorithm is used to find all the occurrences of a given pattern ‘P’’ in a given string ‘S’ in O(Ns + Np) time, where ‘Ns’ and ‘Np’ are the lengths of ‘S’’ and ‘P’, respectively.
Let’s take an example to make it more clear.
Assume the given string S = “cxyzghxyzvjkxyz” and pattern P = “xyz” and we have to find all the occurrences of ‘P’ in ‘S’.
We can see that “xyz” is occurring in “cxyzghxyzvjkxyz” at three positions. So, we have to print that pattern ‘P’ is occurring in string ‘S’ at indices 1, 6, and 12.
What is the Rabin-Karp Algorithm?
The Rabin-Karp Algorithm is a string-searching algorithm that efficiently locates a substring within a larger string by using hashing to compare the hash values of substrings with the hash value of the target substring. This algorithm is particularly useful for finding multiple occurrences of a pattern within a text.
How Rabin-Karp Algorithm Works?
The algorithm starts by computing, at each index of the text, the hash value of the string starting at that particular index with the same length as the pattern. If the hash value of that equals the hash value of the given pattern, then it does a full match at that particular index.
How is Hash Value calculated in Rabin-Karp Algorithm?
Step 1. Create a function “rabinKarpSearch()’ for implementing the Rabin Karp algorithm, that will accept the two parameters - the given string ‘S’ and the given pattern ‘P’. First, calculate the lengths of ‘S’ and ‘P’.
Step 2. Now, we have to choose a prime number and a value for taking modulus while calculating the hash values. For minimizing the hashing collision, we have to take the value of the prime number close to the number of characters used in the string and pattern. Assuming that the given ‘S’ and ‘P’ consist of only lower alphabets, the total number of characters will be 26, so take prime = 31. Now a value for taking modulus should be very large and prime so, take mod = 1e +9.
Step 3. The hash function that we have used here is:
hash(S) = (Σ((S[i] - ‘a’ + 1) * (P^(i)))) % mod
Step 4. Create a vector to store the powers of “prime” and store (prime ^ 0) to (prime ^ Ns). Now calculate the hash value of the given pattern and the first window of the string ‘S’.
Step 5. Now one by one, slide the given pattern and calculate the hash value of the corresponding substring and compare it with the hash value of the pattern. If found the same, print the occurrence of the pattern at that index.
Implementation
C++ Code
C++ Code
//C++ code for implementation of Rabin Karp algorithm #include <bits/stdc++.h> using namespace std;
// Function for searching a pattern in a string using Rabin Karp algorithm void rabinKarpSearch(string S, string P) {
// Calculating the length of S and P int Ns = S.length(); int Np = P.length();
// Initialize the value of prime number and mod for calculating hash values int prime = 31; int mod = 1e9 + 9;
// Calculating the power raise to the taken prime vector<long long> p_pow(Ns); p_pow[0] = 1; for (int i = 1; i < Ns; i++) { p_pow[i] = (p_pow[i-1] * prime) % mod; }
vector<long long> h(Ns + 1, 0); for (int i = 0; i < Ns; i++) { h[i+1] = (h[i] + (S[i] - 'a' + 1) * p_pow[i]) % mod; }
// Calculating the hash value of P long long hash_P = 0; for (int i = 0; i < Np; i++) { hash_P = (hash_P + (P[i] - 'a' + 1) * p_pow[i]) % mod; }
/* Now slide the pattern by one character and check for the corresponding hash value to match with the hash value of the given pattern */ for (int i = 0; i + Np - 1 < Ns; i++) { long long curr_hash = (h[i+Np] + mod - h[i]) % mod; if (curr_hash == hash_P * p_pow[i] % mod) cout<<"The given pattern occurs in the given string at index "<<i<<endl; } }
int main() { string S = "cxyzghxyzvjkxyz"; string P = "xyz";
// Call the function for rabin karp algorithm rabinKarpSearch(S,P);
return 0; }
You can also try this code with Online C++ Compiler
The given pattern occurs in the given string at index 1
The given pattern occurs in the given string at index 6
The given pattern occurs in the given string at index 12
“Rabin Karp” algorithm is a string searching algorithm used to find all the occurrences of a given pattern ‘P’’ in a given string ‘S’. You can check out this video for conceptual knowledge and implementation of code.
class RabinKarp: def __init__(self, text, pattern): # The constructor for the class takes in the main string and the target string, # and also sets an arbitrary prime number used for hash calculation. self.text = text self.pattern = pattern self.prime = 101 def search_pattern(self): # This is the main function to search the pattern in the given text string. pattern_length = len(self.pattern) text_length = len(self.text)
# Calculate the hash value for the pattern, and the hash value for the first window of text. pattern_hash = self.create_hash(self.pattern, pattern_length - 1) text_hash = self.create_hash(self.text, pattern_length - 1) for i in range(1, text_length - pattern_length + 2): # If the hash value of the pattern matches the hash value of the current window of text # then only check individual characters for matching. if pattern_hash == text_hash: if self.check_equal(self.text[i - 1:i + pattern_length - 1], self.pattern[0:]): return i - 1
# Calculate hash value for next window of text: Remove leading digit, # add trailing digit from remaining string. if i < text_length - pattern_length + 1: text_hash = self.recalculate_hash(self.text, i - 1, i + pattern_length - 1, text_hash, pattern_length) return -1 def create_hash(self, text, end): # This function calculates the initial rolling hash value. hash = 0 for i in range(end + 1): hash = hash + ord(text[i]) * pow(self.prime, i) return hash def recalculate_hash(self, text, old_index, new_index, old_hash, pattern_length): # This function calculates hash value for next window of text. new_hash = old_hash - ord(text[old_index]) new_hash = new_hash // self.prime new_hash += ord(text[new_index]) * pow(self.prime, pattern_length - 1) return new_hash def check_equal(self, text1, text2): # This function checks if individual characters are equal in case of hash match. if text1 == text2: return True else: return False # Test the code txt = "this is a test text" pat = "test" # Create a RabinKarp object with the text and pattern rabin_karp = RabinKarp(txt, pat) # Search for the pattern in the text start = rabin_karp.search_pattern() if start != -1: print("Pattern found at position: ", start) else: print("Pattern not found")
You can also try this code with Online Python Compiler
import java.math.BigInteger; import java.util.Random; public class RabinKarp { private String pattern; // the pattern string private long patternHash; // hash value of pattern string private int m; // pattern length private long q; // a large prime for computing hash private int R; // radix private long RM; // R^(m-1) % q public RabinKarp(String pattern) { // Save pattern (needed for Las Vegas) this.pattern = pattern; // Choose a large prime number q and a radix R R = 256; m = pattern.length(); q = longRandomPrime(); // Pre-compute R^(m-1) % q for use in removing leading digit RM = 1; for (int i = 1; i <= m - 1; i++) RM = (R * RM) % q; patternHash = hash(pattern, m); } // Compute hash for pattern and initial text window private long hash(String key, int m) { long h = 0; for (int j = 0; j < m; j++) h = (R * h + key.charAt(j)) % q; return h; } // Check for pattern match private boolean check(String txt, int i) { return hash(txt.substring(i, i + m), m) == patternHash; } // Returns a random 31-bit prime number private static long longRandomPrime() { BigInteger prime = BigInteger.probablePrime(31, new Random()); return prime.longValue(); } // Search for the pattern string in the text string public int search(String txt) { int n = txt.length(); if (n < m) return n; long txtHash = hash(txt, m); // Check for match at offset 0 if (patternHash == txtHash && check(txt, 0)) return 0; // Check for hash match; if hash match, check for exact match for (int i = m; i < n; i++) { // Remove leading digit, add trailing digit, and check for match txtHash = (txtHash + q - RM * txt.charAt(i - m) % q) % q; txtHash = (txtHash * R + txt.charAt(i)) % q; // match int offset = i - m + 1; if (patternHash == txtHash && check(txt, offset)) return offset; } // No match found return n; } // Test the code public static void main(String[] args) { String pattern = "test"; String txt = "this is a test text";
RabinKarp searcher = new RabinKarp(pattern); int offset = searcher.search(txt);
if (offset != txt.length()) { System.out.println("Pattern found at position: " + offset); } else { System.out.println("Pattern not found"); } } }
You can also try this code with Online Java Compiler
We will now discuss the time and space complexity of Rabin-Karp Alogrithm:
Time Complexity
In the Rabin Karp algorithm, we have calculated the hash value of the pattern in O(Np) time and traversed the given string for calculating the hash value and comparing the corresponding hash value with that of the pattern in O(Ns) time.
So, the time complexity is O(Ns + Np), where ‘Ns’ and ‘Np’ are the lengths of the given string and pattern respectively.
Space Complexity
We have used constant space. So, the space complexity is O(1).
Rabin-Karp Algorithm Applications
The Rabin-Karp algorithm is a string search algorithm. It efficiently finds occurrences of patterns within a given text. Its primary use is pattern matching and string searching, and it has several advantages that make it a valuable tool in many areas. Here are some of its uses:
Word Processing: Used by search engines and text editors to find and highlight occurrences of keywords and phrases within large bodies of text.
Plagiarism Detection: Used to identify instances of copied content within a document, website, or scholarly article.
Biological Sequence Analysis: Used in bioinformatics to search and match DNA, RNA, or protein sequences in genomic databases.
Data Mining: It is used for pattern matching and similarity search on large datasets.
Computer Security: Implemented in intrusion detection systems and antivirus software to identify and block malicious patterns and signatures.
Compression Algorithm: It is used to search for repeating patterns and substrings and can be compressed more effectively.
Image Processing: It has been adapted for image recognition tasks. B. Find specific patterns in images.
Network Packet Inspection: Used in network security to identify specific patterns or signatures in network packets.
Spelling Correction: Used by spell checking systems to suggest corrections based on similar patterns in the text.
Data Deduplication: Used in data storage systems to eliminate duplicate data and optimize storage capacity.
Advantages and Disadvantages Of Rabin-Karp Algorithm
Advantages
Rabin-Karp algorithm is best suited to find multiple patterns in the same text.
Rabin-Karp algorithm can work with various types of data like common characters in the same input, multiple substrings, etc.
Rabin-Karp algorithm helps in detecting plagiarism for large datasets.
The algorithm can also be used in string-matching questions when used with hash functions.
Disadvantages
The Rabin-Karp algorithm can have the worst time complexity when frequent hash collisions occur. The complexity can go to O(M*N) which is not an optimized complexity when compared with different strung matching algorithms.
Rabin-Karp algorithm uses extra space to store hash value data.
Rabin-Karp algorithm uses predictability of the hash function, which is a security concern.
Cryptographic applications do not prefer to use the Rabin-Karp algorithm, because of safety reasons.
Frequently asked questions
What is the difference between KMP and Rabin-Karp algorithm?
The Knuth-Morris-Pratt (KMP) algorithm uses prefix matching, while the Rabin-Karp algorithm employs hashing. KMP focuses on efficiently comparing characters, whereas Rabin-Karp leverages hashing for pattern matching in strings.
What is Horner's rule in Rabin-Karp algorithm?
Horner's rule, applied in Rabin-Karp algorithm, efficiently calculates hash values of substrings by iteratively updating the hash using a rolling hash function, enabling quick comparison with the target pattern's hash.
How do you use Rabin-Karp algorithm?
The Rabin-Karp algorithm searches for a pattern in a text using hashing. It slides a window along the text, calculates hash values for the pattern and window, and compares them. If hashes match, it checks the substrings for an exact match.
What is the Rabin-Karp algorithm for numbers?
The Rabin-Karp algorithm for numbers operates similarly to text, but uses number sequences instead. It employs rolling hash functions to identify a specific sequence of numbers within a larger numeric dataset.
Conclusion
In this article, we discussed the Rabin-Karp Algorithm for finding a given pattern in a given string, the Java, Python & C++ implementation of the algorithm, and its time and space complexities. If you want to check out more articles and solve similar problems for practice, then you can visit Coding Ninjas Studio