Boyer Moore Algorithm For Pattern Searching

Moderate
0/80
Average time to solve is 30m
profile
Contributed by
5 upvotes
Asked in companies
GoogleGoldman SachsRxLogix

Problem statement

You are given a string ‘text’ and a string ‘pattern’, your task is to find all occurrences of pattern in the string ‘text’ and return an array of indexes of all those occurrences. You may assume that the length of ‘text’ is always greater than the length of ‘pattern’.

For example :
Let text = “this is a good place to have a good start”, pattern = “good” so you have to return {10, 31} because at 10 and 31 index pattern is present in the text.
Note :
If there is no such index in the text then just return an array containing -1.
Detailed explanation ( Input/output format, Notes, Images )
Input format :
The first line of input contains an integer ‘T’ denoting the number of test cases.

The first line of each test case contains a string ‘text’ in which the pattern will be searched for.

The second line of each test case contains a string ‘pattern’ of which you have to find all occurrences in ’text’.
Output format :
For each test case, return an array representing the indices where the pattern is found in the text. The output of each test case is printed in a separate line. 
Note :
You don’t have to print anything; it has already been taken care of. Just implement the given function. 
Constraints :
1 <= T <= 5
1 <= M < 100
1 <= N < M

Time Limit: 1 sec
Sample Input 1 :
2
XYXXXYZW
XYZ
ABAAAABAACD
ABA
Sample Output 1 :
4
0 5   
Explanation For Sample Input 1 :
For the first test case:
The given text is  “XYXXXYZW” and we have to find the pattern “XYZ” as we can clearly see that this pattern only exists once in the text at index 4 to index 6 so we will return an array containing just a single element that is 4. 

For the second test case:
The given text is  “ABAAAABAACD” and we have to find the pattern “ABA” as we can clearly see that this pattern exists at two places in the given text that is from index 0 to 2 and from index 5 to 7, so clearly we will return both the starting indices where the pattern is found therefore we will return 0 and 5.

Sample Input 2 :

3
DDDVVVBHHUT
HUT
SSYTPQSSYGVBFSSY
SSY
PABCDABCDABCD
ABCDE    

Sample Output 2 :

8
0 6 13
-1
Approaches (2)
Bad Character Heuristic

The idea is to find out which character mismatches in the text from the pattern that is the character of the text that doesn't match with the current character of the pattern and we call that character a bad character.

 

  • One thing here needed to be observed is that unlike other algorithms for pattern searching boyer-Moore algorithm starts matching from the last character of the pattern.
  • Therefore on finding a mismatch seen from the last, we will shift the pattern until the mismatch becomes a match and the pattern moves past the mismatched character i.e. pattern crosses the bad character. We will have two cases here as discussed-
  • While implementing this we preprocess the pattern and store the last occurrence of every possible character in an array of size equal to alphabet size.
  • In the first case when the mismatch becomes a match through shifting, We will find the position of the last occurrence of the mismatching character in the pattern and if the mismatching character exists in the pattern then we’ll shift the pattern such that it gets aligned to the mismatching character in the given text. For example ‘text’ = “ABCAFDBACA” and ‘pattern’ = “BACA”.
    • We can clearly see that we have our first mismatch at position 1 and the mismatched character is “A” so we will find the last occurrence of “A” in a pattern that is present at position 1 as we check from last.
    • Now we will shift the pattern two times so that “A” in the pattern gets aligned with the “A” in text.
  • Now for the second case where we have to move the pattern past the mismatched character because if the bad character does not exist in the pattern there is no point in searching again so we move the pattern past it like in the above example after moving the pattern two times ahead we get a mismatch at position 4 and the mismatching character “F” does not exist in the pattern before position 4 so we’ll shift pattern past to the position 4 and eventually in above example we have got a perfect match of pattern.
Time Complexity

O(M *  N), where, M denotes the length of the string text and N denotes the length of string pattern.

 

If the string text has all the characters of string pattern for example ‘text’ = “XXXXXXXX” and ‘pattern’ = “XXXX” here all the characters of pattern matches text at every position that means at each sliding we will get a match therefore we will repeat the search for each index in pattern as well as in text so this will be the worst case and we will search each character at every shift of pattern so we will get a complexity of O(M * N).

 

One thing to be noted here that if the character is not in the whole string text then it will result in the jump of length N every time so the best case time complexity will be O(M / N).

Space Complexity

O(M + N), where ‘M’ denotes the length of the string text and N denotes the length of string pattern.

 

Since we are using an extra character array to store all the elements of the text and the pattern.

Code Solution
(100% EXP penalty)
Boyer Moore Algorithm For Pattern Searching
Full screen
Console