Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
This blog will discuss a Z Algorithm that is used in pattern searching. Pattern searching is a crucial topic and is widely asked in various coding contests and multiple interviews. This blog will show how the Z algorithm is used in pattern searching and the advantage over usual searching.
What is Z algorithm?
This algorithm is used to find all the pattern occurrences in the text. This algorithm is also known as linear time pattern searching algorithm because it searches any pattern in the text in linear time. Let the length of the pattern be ‘M’, and the size of the text is ‘N’, then this algorithm finds whether the pattern is present in the text or not in O(M + N) time complexity. Before discussing the algorithm, let's discuss what the Z array is.
Analysis of Z Algorithm
The Z algorithm is a string-matching algorithm used to find all occurrences of a pattern within a text efficiently. It preprocesses the pattern to construct a Z array, which stores the longest common prefix between the pattern and each suffix of the pattern.
Working of Z Algorithm
Preprocessing: Construct the Z array for the pattern. Initialize Z[0] as 0 and compute Z[i] for each character in the pattern.
Searching: Slide a window of length equal to the pattern size over the text. Compare characters of the text with the corresponding characters of the pattern using the Z array.
Matching: If Z value equals the pattern size at any position, it indicates a match, and the index denotes the starting position of the match.
Examples of Z Algorithm
Consider pattern "abc" and text "ababcabcabababd":
Preprocessing: Z array for pattern "abc" is [0, 0, 0].
Searching: Slide the pattern over the text.
Matching: At index 2, Z value equals 3, indicating a match at position 2 in the text. Repeat until the end of the text for all occurrences.
Z array
For any string ‘S’, the length of the Z array is the same as the length of the string ‘S’. Any element of the array that says Z[i] stores the size of the longest substring starting from S[i], which is also the prefix of ‘S’. The first value of the Z array is meaningless because the complete string is always a prefix of itself. Hence the first value of the Z array is always 0.
Example
The Z array of string “aabaaab” is shown below:
Index
0
1
2
3
4
5
6
string
a
a
b
a
a
a
b
Z value
0
1
0
2
3
1
0
How to search patterns using the Z array?
We can search the pattern by concatenating the pattern string and the text string like “P#T”, where ‘P’ is the pattern string, ‘#’ is any special character that is not present in pattern string as well as a text string and ‘T’ is the text string. After that, generate the Z array of the above string formed. In Z array, if the value of any index is equal to the length of the pattern string, then the pattern is present in the text; otherwise, not.
Example
Suppose pattern string P = “aab” and text string T = “abaaab”.
The concatenated string is “aab#abaaab”. So we need to generate the Z array for the concatenated string. The Z array is shown below:
Index
0
1
2
3
4
5
6
7
8
9
String
a
a
b
#
a
b
a
a
a
b
Z value
0
1
0
0
1
0
2
3
1
0
Here, we have seen that the value of the 7th index is three and the size of the pattern string is also 3. So the pattern is present in the text string.
Algorithm to generate Z array
The algorithm creates an interval [L, R] that represents the substring and the prefix. The algorithm to maintain this interval and compute the Z array is discussed below:
If i > R, no prefix substring starts before the ith index and ends after the ith index. So, in this case, calculate Z[i] by comparing the elements after the ith index with the starting index of concatenated string consecutively and update the value of ‘L’ and ‘R’ when the elements are not equal.
If i <= R, then suppose k= i- L, now there are two cases:
If Z[K] < R-i+1 then there is no prefix substring that starts with S[i] so Z[i] = Z[k].
Else it is possible to extend the [L, R] interval; thus, we will select ‘L’ as ‘i’ and start matching from S[R] onwards and get new ‘R’ then we need to update [L, R] and Z[i] = R-L+1.
Implementation
C++
C++
#include <bits/stdc++.h> using namespace std;
/*Function to check whether pattern is present in text or not.*/ bool check(string pat, vector<int> z) { for (int i = 0; i < z.size(); i++) {
/*If the size of pattern string is equal to the ith index of z- array*/ if (z[i] == pat.size()) return true; } return false; }
/*Function to create Z array.*/ void createZArray(string concatenated, vector<int> &z) {
/*Variables to store left bound and right bound.*/ int left = 0; int right = 0; for (int i = 1; i < concatenated.size(); i++) { if (i > right) { left = right = i; while (right < concatenated.size() and concatenated[right] == concatenated[right - left]) right++; z[i] = right - left; } else if (z[i - left] < right - i + 1) { z[i] = z[i - left]; } else { left = i; while (right < concatenated.size() and concatenated[right - left] == concatenated[right]) right++; z[right] = right - left; right--; } } } int main() {
/*Taking pattern and text string as an input.*/ string pat, text; cin >> text >> pat;
/*creating array to store values of Z array.*/ vector<int> z(text.size() + pat.size() + 1); string concatenated = pat + "#" + text;
Z algorithm is very useful in searching any pattern in the string. It is also useful in getting the number of times any pattern occurs in a string.
Why is the Z algorithm known as a linear search pattern?
Z algorithm is known as linear search pattern because it searches the pattern in linear time. Its time complexity for searching is O(M + N).
Are the KMP algorithm and Z algorithm the same?
No, KMP and Z algorithm is different. KMP algorithm is a little bit hard to implement and understand compared to the Z algorithm.
Is there any difference between the time complexity of the KMP and Z algorithm?
No, both the Z and KMP algorithms have the same time complexity. Both have linear time complexity but the Z algorithm is quite easy to implement.
Is there any difference between the space complexity of the KMP and Z algorithms?
No, both the Z and KMP algorithms have the same space complexity. Both have linear space complexity. Both create an extra array to store the length.
Conclusion
In this blog, we learned about an interesting topic based on a Search pattern. We have discussed the Z algorithm and learned how it is implemented using the Z array. Another topic involving pattern search is: here.