Last Updated: 8 Jul, 2021

Ninja's Frustrating Homework

Ninja
Asked in companies
CodenationIncedo Inc.

Problem statement

Ninja got a summer vacation homework in which he got a booklet containing a very long string and some set of words written in his diary for which he had to search all these words in that string booklet.

His teacher asked him to write all the starting indices for all the words written in the diary after searching that from the string booklet.

Ninja finds this work very frustrating. He tries to find some help from his neighbor and currently, you are the one who is his neighbor.

It is very time consuming to find every word in the string booklet manually. So you decide to design a code for that. Help Ninja!

Note :

Follow 0 based indexing. 
Print the indices in sorted order.

Input Format :

The first line of input contains an integer ‘T’, which denotes the number of test cases. Then each test case follows.

The first line of each test case contains a string ‘S’ representing the string in the booklet.

The second line of each test case contains an integer N denoting the number of words in the diary.

The third line of each test case contains an array of strings of size ‘N’ representing the words in the diary.

Output Format :

The output contains ‘N’ space-separated integers representing starting indices of ‘N’ number of words present in the diary.

Note:

You don't need to print anything, it has already been taken care of. Just implement the given function.

Constraints :

1 <= T <= 10
1 <= |S| <= 1000
1 <= N <= 30

Where |S| denotes the length of the given string ‘S’. 

Approaches

01 Approach

The idea is to slide every word in ‘diary’ over ‘booklet’ string one by one and check for a match. If a match is found, then slide by 1 to check for subsequent matches.

Approach :

  • First store the length of ‘booklet’ in a variable‘bookletLength’.
  • Iterate the loop for all words of ‘diary’ with the help of an iterator pointer ‘i’.
    • Store the length of each word of ‘diary’ in a variable ‘wordLength’.
    • Make an iteration to slide the word of ‘diary’ one by one with the help of an iterator pointer ‘j’ from 0 to ‘bookletLength’ - ‘wordLength’.
      • For the current index ‘j’ check for pattern matching with an iteration from 0 to ‘wordLength’ with an iterator pointer ‘k’.
        • Check if booklet[j + k] is not equal to diary[i][k]. If this condition is true then exit the iteration.
      • Check if k = ‘wordLength’ then push the index ‘i’ into the vector ‘ans’.
  • Return ‘ans’ as the final answer.

02 Approach

Let all the words in the diary represented by an array of strings, say ‘DIARY[]’ and the string given in the booklet is represented by a string, say ‘BOOKLET’.

To implement the Aho - Corosick algorithm, We need to first build a Trie for all the words present in ‘DIARY[]’. After this there will be preprocessing which consists of three following functions:

  1. Go To :   This function simply follows the edges of the Trie of all words in DIARY[]. It is represented as a 2D array ‘GOTO[][]’ where we store the next state for the current state and character.
  2. Failure : This function stores all edges that are followed when the current character doesn't have an edge in Trie.  It is represented as a 1D array ‘FAILURE[]’ where we store the next state for the current state.
  3. Output :  Stores indexes of all words that end in the current state. It is represented as a 1D array ‘OUTPUT[]’ where we store indices of all matching words as a bitmap for the current state.

Building Trie for all words [“a”, “cab”, “abca”]

This part fills entries in goto GOTO[][] and output OUTPUT[].

Next we extend Trie into an automaton to support linear time matching.

Blue Arrows represent failed transactions and green arrows represent Go To Transactions.

This part fills entries in failure ‘fail[]’ and output ‘output[]’.

Go to : 

We build Trie. And for all characters which don’t have an edge at root, we add an edge back to root.

Failure : 

For a state s, we find the longest proper suffix which is a proper prefix of some pattern. This is done using Breadth First Traversal of Trie.

Output : 

For a state s, indexes of all words ending at s are stored. These indexes are stored as bitwise maps (by doing bitwise OR of values). This is also computing using Breadth First Traversal with Failure.

Approach :

  • First make a ‘buildMachine()’ function which will take an array of string ‘diary[]’ and size of array ‘N’ as its arguments.
  • For Go To function: Initialize all values in the output functions ‘out[]’ with 0 and Go To function ‘goTo[][]’ with -1.
    • Initialize a variable ‘state’ to 1 which will store the states as initially we have the 0 state.
    • Now construct the values for ‘goTo[][]’ function, the procedure for which will be the same as that for building a Trie for ‘diary[]’.
    • For that, iterate for the values of ‘diary[]’ with the help of iterator pointer ‘j’.
    • Store the value of ‘diary[j]’ in a string, say ‘word’ and initialize a variable, say ‘currentState’ to 0 which will represent the current state.
    • Now store all the characters of ‘word’ in ‘goTo[][]’ .
    • For this, iterate over all the characters of ‘word’ with the help of iterator pointer ‘i’.
      • Store the character of ‘word’ in the variable, say ‘character’.
      • Now check if nodes for ‘character’ exist or not. If the node does not exist in ‘goTo[][]’ then allocate a new node.
      • Make ‘currentState’ to goTo[currentState][character].
      • Now add current word in output function, ‘output[]’ as output[currentState] = output[currentState] | 1 << i.
    • For all the characters which don't have an edge from the root or state 0 in Trie, then add a ‘goTo[][]’ edge to state 0 itself.
    • For this, iterate over the size of ‘goTo[][]’ with an iterator pointer ‘i’.
      • if goTo[0][i] == -1 set goTo[0][i] = 0.
    • For Fail function:
      • Initialize all values in ‘fail[]’ to -1.
      • Failure function will be calculated using Breadth First Search using a queue.
      • For that make a queue of integer type, say ‘q’.
      • Iterate over every possible input with the help of iterator pointer ‘i’ .
        • As all nodes with depth = 1 have failure function value 0. So set the function value for them as 0 (fail[goTo[0][i]] = 0).
        • Push that ‘goTo[0][i]’ value in the queue.
      • Iterate while q does not become empty.
        • Store the value of q’s front into a variable, say ’state’ and pop the element.
        • Now iterate over all the possible values for ‘goTo[][]’ with the iterator pointer ‘i’.
          • If the ‘goTo’ function is defined for character 'i' and 'state' i.e. if goTo[state][i] != -1 then :
            • Now, find the failure state of the removed state by storing ‘fail[state]’ in a variable, say ‘failure’. Find the deepest node labeled by proper suffix of string from root to current state and store the value in ‘failure’.
            • Now store the value stored in ‘failure’ in failure function ‘fail[]’ as fail[goTo[state][i]] = failure.
            • Merge the output values as out[goTo[state][i]] = out[goTo[state][i]] | out[failure].
            • Now push the next value node in the queue ‘q’.
      • Return ‘states’ to the function.
  • Now make a function, say ‘nextState()’ taking the integer variable ‘currentState’ denoting the current state of the machine. (Must be between 0 and the number of states - 1, inclusive) and character variable ‘nextInput’ denoting the next character that enters into the machine as arguments to return the next state the machine will transition to using goto and failure functions.
  • Inside the ‘nextState()’ function:
    • Store the ‘currentState’ in a variable, say ‘answer’ and ‘nextInput’ as an integer in a variable, say ‘character’.
    • If ‘goTo [currentState][character]’ is not defined then use the failure function.
    • Return goTo[currentState][character] to the function.
  • Inside the given function:
    • Call the function ‘matchingMachine()’.
    • Initialize the variable ‘currentState’ to 0.
    • Traverse the ‘booklet’ through the built machine to find all occurrences of words in ‘diary[]’.
    • For this, iterate throughout the ‘booklet’ with the help of iterator pointer ‘i’.
      • Call the function ‘nextState()’ passing ‘currentState’ and ‘booklet[i]’ as arguments and store the output in ‘currentState’.
      • If a match is not found then move to the next state.
      • if (out[currentState] == 0) continue.
      • If a match is found then push the starting and ending indices of matching words from ‘diary’ using the output function.
      • For this, iterate over all values ‘diary’ with the help of iterator pointer ‘j’.
        • if (out[currentState] & (1 << j))
        • ans.push_back(i - diary[j].size() + 1); For starting index.
  • Return the array ‘ans’ as the final answer.