Last Updated: 4 Feb, 2021

Search Pattern (Rabin-Karp Algorithm)

Moderate
Asked in companies
Celebal TechnologiesNirmataHexaview Technologies

Problem statement

You’re given two strings, 'text' of length 'n' and 'pattern' of length 'm', consisting of lowercase characters.


Find all the occurrences of the string ‘pattern’ in ‘text’.


For each occurrence, print the index from where it starts in the string ‘text’ (1 - indexed).


Example:
Input: ‘text’ = “cxyzghxyzvjkxyz” and ‘pattern’ = “xyz”

Output: 2 7 13

Explanation: The pattern ‘pattern’ = “xyz” appears at 3 positions in ‘text’.
Input Format:
The first line contains the string ‘text’.
The second line contains the string ‘pattern’.


Output format:
The first line contains an integer ‘count’, the number of occurrences.
The first line contains ‘count’ integers, the positions of occurrence of ‘pattern’ in ‘text’ in sorted order.


Note:
You need not print anything. Return the list of positions in any order. They will be sorted and printed.

Approaches

01 Approach

Check each substring in ‘text’ of length ‘m’ and see if they are equal to ‘pattern’.

If we find a match we add the current index (1 - based) in our list ‘ans’.

The steps are as follows:

List stringMatch(string ‘text’, string ‘pattern’)

  1. Let ‘n’ = ‘text.length’, ‘m’ = ‘pattern.length’
  2. Let ‘ans’ be a list storing the indices of occurrences.
  3. For ‘i’ in the range from 0 to ‘n’ - ‘m’ (inclusive):
    • Check whether the substring of ‘text’ from index ‘i’ to ‘i’ + ‘m’ is equal to ‘pattern’ or not.
    • If the substring is equal to ‘pattern’:
      • Add ‘i’ + 1 in ‘ans’.
  4. Return ‘ans’.

02 Approach

Rabin-Karp algorithm is a pattern-matching algorithm that uses rolling hash. In the naive approach, we compare each substring of length equal to that of the pattern string. But we can improve it by comparing the hash value of the substring and then comparing the substring only if the hash value matches.

The hash function we are using calculates the hash value in the following way:

Since we are dealing only with lowercase English letters, we have at most 26 different characters. So we assign a value from 0 to 25 to each character. Let the values assigned be:

‘a’ = 0

‘b’ = 1

‘c’ = 2

‘z’ = 25

As we can see, the hash code for each character = Their ASCII code - 97 (ASCII code of ‘a’)

If the string is “code”, its hash value = 2 * 26^3 + 14 * 26^2 + 3 * 26^1 + 4 * 26^0 = 44698.

In the case of longer strings, hash values can be bigger, so we will use modular arithmetic.

For rolling hash function:

If our string is “codes” and we are considering substrings of length 4:

The hash value of:

“code” = 2 * 26^3 + 14 * 26^2 + 3 * 26^1 + 4 * 26^0 = 44698

“odes” = 14 * 26^3 + 3 * 26^2 + 4 * 26^1 + 18 * 26^0 = 248214

As we can see, the difference between them is:

  • The first character is removed.
  • The last character is added.
  • All other characters have 1 extra power of 26.

So the hash value of “odes” = (Hash value of “code” - 2 * 26^3) * 26 + 18

Generalizing this idea, the hash value of the current substring = (Hash value of previous substring - Hash code of the first character in previous substring * 26^(length of substring - 1)) * 26 + Hash code of last character in the current substring

Our algorithm computes the hash value of each substring of ‘text’ of length equal to that of ‘pattern’ and compares it to the hash value of ‘pattern’. If both the hash values are equal, then it does a full match at that particular index.

The steps are as follows (please note that all arithmetic operations are modulo 10^9 + 7):

Int searchPatternRabinKarp(string ‘text’, string ‘pattern’)

  1. Let ‘n’ = ‘text.length’, ‘m’ = ‘pattern.length’
  2. Let ‘ans’ be an empty list of integers.
  3. Let ‘maxPow’ = 26 ^ (‘m’ - 1)
  4. Let ‘hashPattern’ = 0, ‘hashText’ = 0.
  5. For ‘i’ in the range from 0 to ‘m’ - 1 (both inclusive):
    • ‘hashPattern’ = ‘hashPattern’ * 26 + (‘pattern[i]’ - 97)
  6. For ‘i’ in the range from 0 to ‘m’ - 2:
    • ‘hashText’ = ‘hashText’ * 26 + (‘text[i]’ - 97)
  7. For ‘i’ in the range from 0 to ‘n’ - ‘m’:
    • ‘hashText’ = ‘hashText’ * 26 + (‘text[i + ‘pattern.length’ - 1]’ - 97)
    • If the hash values ‘hashText’ and ‘hashPattern’ are equal:
      • Check whether the substring of ‘text’ from index ‘i’ to ‘i’ + ‘m’ is equal to ‘pattern’ or not.
      • If the substring is equal to ‘pattern’:
        • Add ‘i’ + 1 in ‘ans’.
    • ‘hashText’ = ‘hashText’ - ‘maxPow’ * (‘text[i]’ - 97)
  8. Return ‘ans’.