Last Updated: 28 Jan, 2021

Find Pattern in String - KMP Algorithm

Easy
Asked in companies
MicrosoftMakeMyTripWalmart

Problem statement

You are given two strings 'S' and 'P' consisting of lowercase English alphabets. Your task is to find whether the 'P' is present in 'S' as a substring or not.

Note
1. There may be more than one occurrence of 'P' in 'S'.
2. Some alphabets in the strings may be repeated.
Input Format:
The first line of input contains a single integer 'T', representing the number of test cases

Then the 'T' test cases follow.

The first line of each test case contains two space-separated strings 'P' and 'S' respectively.
Output Format:
For each test case, print a single line containing “YES” if string 'P' is present in string 'S' as a substring, otherwise print “NO”.

The output for each test case will be printed in a separate line.

Note

You don’t have to print anything, it has already been taken care of. Just implement the given function. 
Constraints:
1 <= T <= 100
1 <= |S| <= 10000
1 <= |P| < |S|

Where |S| and |P| represents the length of the string 'S' and 'P' respectively.

Time limit: 1 sec.

Approaches

01 Approach

The idea is to generate all the substrings of ‘S’ of size equal to the size of ‘P’ and match each of them with the ‘P’.

 

  • Let ‘M’ is the size of ‘P’ and ‘N’ is the size of ‘S’.
  • Iterate over S[i] for each 0 <= i <= N - M and do:
    • Iterate over P[j] for each 0 <= j < M and do:
      • If P[j] == S[i+j] then do:
        • Continue iterating
      • If P[j] is not equal to S[j] then do:
        • Exit from the inner loop.
    • If j == i + M then it means we have matched the substring with P so:
      • Return TRUE
  • Return FALSE(if we still have not matched a substring).

02 Approach

The basic idea of KMP is to generate and use LPS array here LPS means longest proper prefix which is also a suffix. For example, if a string is ‘aada’ then :- 

Proper prefixes are :[a, aa, aad] 

Proper suffix are :[a,da,ada].

The LPS array helps us to skip characters while matching.

Basically this algorithm works in two parts where the first part is to generate the LPS array and the second part is to use this array to skip characters while matching P with S.

 

Part 1: Generate the LPS array.

  • Let ‘M’ is the size of ‘P’ and ‘N’ is the size of ‘S’.
  • Initialize an integer array LPS, initialized to zero.
  • Initialize an integer variable LEN=0 which is the length of the previous longest prefix suffix.
  • Initialize an integer variable IDX=0, to iterate over ‘P’.
  • Iterate while IDX<M :
    • If P[IDX]==P[LEN] then do:
      • Increase LEN by 1
      • LPS[IDX]=LEN
      • Increase IDX by 1
    • Else :
      • If LEN is equal to zero then do:
        • LPS[IDX]=0
        • Increase IDX by 1.
      • Else
        • LEN = LPS[IDX-1]

Part 2: Using the LPS array to find ‘P’ in ‘S’ :

  • Initialize an integer variable IDX1=0, to iterate over P.
  • Initialize an integer variable IDX2=0, to iterate over S.
  • Iterate while IDX2<N (iterating over the whole S):
    • If we found a match or P[IDX1] == S[IDX2] then do :
      • Increase IDX1 by 1.
      • Increase IDX2 by 1.
      • If IDX2 == M then we can say we have found ‘P’ so do:
        • Return TRUE.
    • If IDX1 is less than N and P[IDX1] != S[IDX2](found mismatch) then do:
      • If IDX2!=0 then do (Here comes the role of LPS we will simply skip LPS[IDX2-1] characters):
        • IDX2 = LPS[IDX2-1]
      • Otherwise do :
        • Increase IDX1 by 1.
  • Return FALSE(if we still have not matched a substring).