Last Updated: 28 Jan, 2021

Z Algorithm

Hard
Asked in companies
BarclaysFlipkart limited

Problem statement

You’re given a string S of length N and a string P of length M. Your task is to find the number of occurrences of P in S in linear time.

For example:
If S = "ababa", and P = "ab", then "ab" occurs twice in "ababa".
Note:
The string only consists of lowercase English alphabets.
Input Format:
The first line of input contains T, the number of test cases.

The first line of each test case contains two integers N and M, the length of string S and P respectively.

The second line of each test case contains the string S.

The third line of each test case contains the string P.
Output Format:
The only line of output of each test case should contain an integer denoting the number of occurrences of P in S.
Note:
You don’t need to print anything. It has already been taken care of. Just implement the given function.
Constraints:
1 <= T <= 100
1 <= N, M <= 10^4

Time Limit: 1 sec

Approaches

01 Approach

A basic approach is to check all substrings of string ‘S’ and see if that substring is equal to ‘P’ or not. Let’s see how we’ll do this.

 

  • Initialize an integer variable count = 0.
  • Start by running an outer loop(loop variable i) from 0 to N - M.
  • Then run a nested loop(loop variable j) from 0 to 'M'.
  • For index ‘i’, check if S[i + j] != P[j], if this condition is true, we simply break the inner loop and move to the next i. We do this to check every substring of length ‘M’ if it is equal to 'P' or not.
  • Later on, we check if ‘j’ is equal to 'M' or not, if this condition is true, we increment our count by 1 because we have found a substring of length M which is equal to 'P'.
  • Finally, we return the count which is our final answer.

 

For example-  String S = ababa and String P = ab

Here N = 5 and M = 2

 

  • We first run a loop from 0 to 3(N - M) with loop variable ‘i’.
  • In this loop, we run another loop with a loop variable ‘j’. First, we see if S[i + j] = P[j] or not, and since it is true(S[0] = P[0]), we move to the next ‘j’ and check if S[1] = P[1] or not and since it is also true we move further.
  • But now on moving further ‘j’ becomes 2 and we exit this inner loop but we see that since j == M(2), we increment our count by 1. Now we move further to our next ‘i’ and see if S[1 + 0] = P[0] or not and since this condition doesn’t hold true, we simply break the inner loop and move further and see if S[2 + 0] = P[0] is true or not and this is true since both the characters are ‘a’, then we check if S[2 + 1] = P[1] and again we see that this condition is true and again j = 2 and we increment our count once again by 1.
  • Now for the last ‘i’ we check if S[3 + 0] = P[0] or not and since this condition isn’t true we come out of this nested loop. We also see that the outer loop variable has also reached N - M(3), therefore we exit this loop also and return our final count which is 2.

02 Approach

As we are supposed to find the number of occurrences of 'P' in ‘S’ in linear time. We can do this by maintaining a Z array. The Z array for any string 'S' of length 'N' is an array of size 'N' where the ith element is equal to the greatest number of characters starting from the position ‘i’ that coincide with the first characters of ‘S’.

 

The idea here is to concatenate both 'S' and 'P' and make a string “P#S”, where ‘#’ is a special character that we are using. Now, for this concatenated string we will make a Z array. In this Z array, if the value at any index(z[i]) is equal to ‘M', we increment our answer(initially=0) by 1 because this indicates that ‘P’ is present at that index.

 

Let’s understand how we will reach our final answer using the following points. 

 

  • Start by making a concatenated string ‘C’, where C = P + # + S and initialize an integer variable count=0.
  • Make an array(Z array) of size 'K', where 'K' is the length of string ‘C’.
  • Now initialize two integer values L=0 and R=0. We will be using these variables to create an interval to check if the string inside this interval matches 'P' or not.
  • Now run a loop(loop variable i) from 1 till K and check if i > ‘R’ or not.
    • If i > 'R'. This means that there is no substring that is starting before i and ending after i. So we reset the values of ‘L’ and ‘R’ and calculate this new interval of [L, R] by using the Z array method where we use a while loop and we simply check if C[R - L] = C[R] and ‘R’ < ‘K’(to check if we are still inside the string C), and till the time both of these conditions are wet me simply increment our 'R' by 1 and finally obtain the Z array element(z[i]) by using z[i] = R - L and then we decrease ‘R’ by 1.
    • If i <= 'R'. Start by making another integer variable pos = i - ‘L’.
      • We first check if z[pos] is less than the remaining interval(R - i + 1) or not. If it is so then we simply make z[i] = z[pos] because this means that there is no prefix substring that starts at C[i] because if it was so then z[pos] would have been larger than the remaining interval(R-i+1).
      • If z[pos] was greater than or equal to the remaining interval (R-i+1). This condition implies that we can still extend our [L, R] interval. To do so we set L = i and from there starting with ‘R’ we check manually as we did in the case where i > 'R' where we calculated z[i] = R-L and then we decrease 'R' by 1.
  • Since our Z array is created, we just traverse our array once and check if the current value of the array is equal to 'M' or not. If it is equal to 'M', we increment the count by 1.
  • Finally, we return count, which is our final answer.

 

For example- String S = abba and String P = ab

Here N = 4 and M = 2

 

  • We start by making a string C = ab + “#” + abba = ab#abba
  • Make a Z array, whose size is K, where K is the length of the string C(here, K = 7).
  • Make two integer variables L = 0 and R = 0.
  • Run a loop from 1 till K(loop variable i) and start by checking if i less R or not.
  • Since our i > R we reset L = i and R = i. While R < K(7) and C[R-L] = C[R] we keep incrementing R and after coming out of this while loop we calculate z[1] and since this while loop is never executed because of C[0] != C[1]. Hence,  z[1] = R - L = 1 - 1 = 0.
  • Next for index 2 we see that i > R just like index 1.  Here also since C[0] != C[2] we find z[2] = R - L = 2 - 2 = 0
  • For index 3 still i < R. But here we see that C[0] = C[3], hence we increment R by 1 and still C[1] = C[4], hence we again increment by 1 and finally since C[2] != C[5] we get out of the while loop and calculate z[3] = R - L = 5 - 3 = 2 and change R to 4.
  • For index 4 we see that still i >= R. We make another variable pos = 4 - 3 = 1. And since  z[1] < R - i + 1, z[4] = z[1] = 0.
  • Similarly for the next index also z[5] = 0.
  • For the final index z[6]. Since our i >= R. We make another variable pos = 6 - 5 = 1. And since  z[1] < R - i + 1, z[6] = z[1] = 0.
  • Our final Z array is [0, 0, 0, 2, 0, 0, 1] and in this array, there is only one element equal to M(2), which is our answer.