


Input: ‘text’ = “cxyzghxyzvjkxyz” and ‘pattern’ = “xyz”
Output: Occurrences = [2, 7, 13]
Explanation: The pattern ‘pattern’ = “xyz” appears at 3 positions in ‘text’.
The first line contains the string ‘text’.
The second line contains the string ‘pattern’.
The first line contains an integer ‘count’, the number of occurrences.
The first line contains ‘count’ integers, the positions of occurrence of ‘pattern’ in ‘text’ in sorted order.
You need not print anything. Return the list of positions in any order. They will be sorted and printed.
Check each substring in ‘text’ of length ‘m’ and see if they are equal to ‘pattern’.
If we find a match we add the current index (1 - based) in our list ‘ans’.
The basic idea of the Knuth-Morris-Pratt algorithm is that in case of a mismatch, we need not check the entire ‘pattern’ string from the beginning. For this, we will create an LPS array of ‘pattern’.
LPS(Longest Prefix Suffix) is an array data structure that captures the length of the longest proper prefix which is also a proper suffix for every substring. Proper prefix means a prefix that is not equal to the actual string.
For example: Let ‘pattern’ = “abaabc”
For “a”, there is no proper prefix. Therefore LPS of “a” is 0.
For “ab”, proper prefix = [“a”] and proper suffix = [“b”]. Therefore LPS of “ab” is also 0.
For “aba”, proper prefix = [“a”, “ab”] and proper suffix = [“a”, “ba”]. Therefore LPS of “aba” is 1 because “a” is common.
For “abaa”, proper prefix = [“a”, “ab”, “aba”] and proper suffix = [“a”, “aa”, “baa”]. Therefore LPS of “abaa” is also 1.
For “abaab”, proper prefix = [“a”, “ab”, “aba”, “abaa”] and proper suffix = [“b”, “ab”, “aab”, “baab”]. Therefore LPS of “abaab” is 2 because “ab” is the longest common prefix that is also suffix.
For “abaabc”, proper prefix = [“a”, “ab”, “aba”, “abaa”, “abaab”] and proper suffix = [“c”, “bc”, “abc”, “aabc”, “baabc”]. Therefore LPS of “abaabc” is 0.
So our final lps array is: [0, 0, 1, 1, 2, 0].
Now when we will compare our strings, whenever we find a mismatch, instead of assigning the current variable pointer of ‘pattern’ as 0, we will assign it to the previous lps value, if possible. We will never decrement the variable pointer of ‘text’, so effectively the time complexity of this algorithm is almost linear.