Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
Problem Statement
3.
Naive Approach
3.1.
Code in C++
4.
Efficient Approach
4.1.
Z-Algorithm
5.
Code in C++
6.
Frequently Asked Questions
7.
Key Takeaways
Last Updated: Mar 27, 2024

Find the indices of all the occurrences of a pattern in a string

Author Gaurish Anand
0 upvote
Master Python: Predicting weather forecasts
Speaker
Ashwin Goyal
Product Manager @

Introduction

Problems related to substring and subsequences are commonly asked in product-based companies like Amazon, Microsoft, and Adobe. In this article, we will discuss one of those problems. 

Before beginning with this question, let’s recap what a substring of a string is. A substring is a contiguous part of a string placed between two specified indices. For example, non-empty substrings of the string “abc” are “a”, “b”, “c”, ”ab”, “bc”, and “abc”.  

Problem Statement

Given a string S and a pattern P, find the indices of all the occurrences of P in the string S. Example: 

Input: S = “abaxabab”     P = “ba” 
Output: 1 5
Explanation: We have the pattern P = “ba” starting at the index 1 and 5 in the string S = “abaxabab”.
 

Input: S = “xabcabzabc”     P = “abc” 
Output: 1 7
Explanation: We have the pattern P = “abc” starting at the index 1 and 7 in the string S = “xabcabzabc”.

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Naive Approach

The naive approach is to check for each substring of size equal to that of the pattern if that is equal to pattern S or not. Therefore, if the size of the string S is s and the pattern P is p, the time complexity of this approach will be O(p*s). 

Code in C++

#include<iostream>
#include<string>
using namespace std;

void findPattern(string S,string P){
	for(int i=0;i<S.size();i++){
		bool flag = true;
		for(int j=0;j<P.size();j++){
			if(S[i+j]!=P[j]){
				flag = false;
				break;
			}
		}
		if(flag){
			cout<<i<<" ";
		}
	}
}

int main(){
	string S = "abaxabab";
	string P = "ba";
	cout<<"Pattern "<<P<<" occurs at the indices: ";
	findPattern(S,P);
}


Output

Pattern ba occurs at the indices: 1 5 

Efficient Approach

We will solve this using the Z-algorithm where time and space complexity is O(s+p) where s = size of the given string and p is the size of the pattern. 

Note - We can solve it using KMP with the same time complexity. But since Z-algorithm is easier to understand and implement, we will solve this problem using Z-algorithm in this article. To learn more about the KMP Algorithm, you can go through this article. 

Z-Algorithm

  1. In this algorithm, we will be constructing a Z-array. 
    What is a Z-Array?
    Z[i] in Z-array will contain the length of the longest substring starting from the index i, which is also the prefix of the string. For example, Let’s construct the Z-array for the string abaxabab”.

    The explanation for some indices  - 
    1. We have Z[4] = 3 because we have a substring of size 3 (“aba”) starting at the 4th index, which is also a prefix of the string (abaxabab). 
    2. We have Z[6] = 2 because we have a substring of size 2 (“ab”) starting at the 6th index, which is also a prefix of the string (abaxabab). 
       
  2. How is the Z-Array constructed for a string S?
    We can construct the Z-array in linear time complexity. The main logic is to maintain an interval [L, R] with maximum R such that the substring S[L … R] is also the prefix of the string S. 

    Note - Substrings that are prefix too are also known as prefix substrings.

    Now, let’s assume we have already computed all the Z-values till the (i-1)th index, and we know the correct interval [L, R] for the index (i-1). We will maintain [L, R] and find Z[i] by the following steps: 
    1. Case 1: (i>R)
      It means no prefix-substring exists before i that ends at i or after i. So we now compute the new [L, R] by comparing S[0 ..] to S[i..]. Therefore, in this case Z[i] = R-L+1.
       
    2. Case 2: (i<R) 
      We know S[0…(R-L)] matches with S[L...R] where i<R. Therefore, from this, we can also conclude that S[i..R] matches with S[(i-L)..(R-L)]. Let K = i-L. Now there are 2 cases - 
      1. Z[K] < (R-i+1) - It means there is no prefix substring starting at the index i which would extend beyond R. So Z[i] = Z[k].
      2. Z[K] > (R-i+1) - It means we have prefix substring starting at the index i, which can go beyond the index > R. Thus, we need to update [L, R] by updating L=i and match from R+1 onwards.
         
  3. How is Z-Array useful in the searching of patterns? 
    We will be constructing a Z-array for the string M = Pattern + $ + String. In the Z-array, if the value at any index is equal to the length of the pattern, then we know that we have an occurrence of the pattern starting at the (index - pattern.size() - 1) in the original string. For example, let’s construct the Z-array for the string M = “ba” + “$” + “abaxabab” where S = “abaxabab” and P = “ba”.

    Since Z[4] and Z[8] equals 2, the pattern occurs at the (4-2-1)th and (8-2-1)th index. 

Code in C++

#include<iostream>
#include<string>
#include<vector>
using namespace std;

void updateInterval(int& L,int& R,string& M){
	while(R<M.size() && M[R]==M[R-L]){
		R++;
	}
	R--;
}

vector<int> getZArray(string M){
	vector<int> zArray(M.size(),0);
	int L = 0,R = 0;
	for(int i=1;i<M.size();i++){
		if(i>R){
			L = R = i;
			updateInterval(L,R,M);
			zArray[i] = R-L+1;
		}else{
			int K = i - L;
			if(zArray[K]<(R-i+1)){
				zArray[i] = zArray[K];
			}else{
				L = K;
				updateInterval(L,R,M);
				zArray[i] = R-L+1;
			}
		}
	}
	return zArray;
}

void findPattern(string S,string P){
	string M = P + "$" + S;
	vector<int> zArray = getZArray(M);
	for(int i=P.size();i<zArray.size();i++){
		if(zArray[i]==P.size()){
			// pattern occurs at the current index
			cout<<(i-P.size()-1)<<" ";
		}
	}
}
int main(){
	string S = "abaxabab";
	string P = "ba";
	cout<<"Pattern "<<P<<" occurs at the indices: ";
	findPattern(S,P);	
}


Output

Pattern ba occurs at the indices: 1 5 

Also check out - Substr C++

Frequently Asked Questions

1.What is the difference between a subsequence and a subarray (or substring in the case of a string)?

A subsequence is formed by deriving 0 or more elements from a sequence without changing the relative order of the elements in the original sequence. In comparison, a subarray is a contiguous part of a sequence. Therefore all the subarray will be subsequences too of a sequence, but vice versa is not true. For example: [1,2,4] is a subsequence for the sequence [1,2,3,4], but it is not the subarray.

2.What is the time complexity of the Z-Algorithm to search the pattern indices?

The time complexity is O(s + p), where s = size of the string and p = size of the pattern.

3.Where is the Z-algorithm used?

This algorithm is used for string searching in Gmail, Microsoft Word, Google Docs, Google Sheets, browsers, and other text editors and databases.

4.Are there more problems on Coding Ninjas Studio that deal with Data Structures and Algorithms?

Yes, Coding Ninjas Studio allows you to practice coding and offers frequently asked interview questions. The more we practice, the more our chances of landing a job at our ideal organization.

Key Takeaways

In this article, we learned to find the indices of all pattern occurrences in a string in the most efficient way using the Z-Algorithm. Since questions related to strings are frequently asked in interviews, we recommend you practice more problems based on strings on Coding Ninjas Studio.

Recommended Problems:

Are you planning to ace the interviews of reputed product-based companies like Amazon, Google, Microsoft, and more? Attempt our Online Mock Test Series on Coding Ninjas Studio now!

Previous article
Suffix Trees (Implementation - Brute Force)
Next article
Longest Palindromic Substring using a suffix tree
Live masterclass