## Introduction

The Aho-Corasick technique may be used to search for various patterns in a vast blob of text, making it a valuable tool in data science and many other fields. Alfred Aho and Margaret Corasick proposed this algorithm.

Letâ€™s Dig Deeper into the Algorithm Itself.

### Example

Given an input text and an array containing some words, arr[], find occurrences of all words in the input text.

#### Input

```
text = "codingninjas"
arr[] = {"coding", "ninja", "as", "ding"}
```

#### Output

```
Word coding appears from 0 to 5
Word ding appears from 2 to 5
Word ninja appears from 6 to 10
Word as appears from 10 to 11
```

## Algorithm

We can solve this problem efficiently by using a Tree-like data structure to store our strings called â€˜Trieâ€™. Trie can efficiently store multiple strings and make string matching accessible. If you want to know more about Tries and their implementation, look at __Using Trie in Data Structures__. But here is a catch. We easily match patterns using a trie, but it will cost us a significant time and space complexity. Building a Trie takes linear time on average. For the above-given parameters, time complexity using a simple trie is O(k_{max}*n) time complexity, where kmax is the maximum pattern length, which can improve this complexity.

**C++ implementation**

```
/* C++ program for implementation of Aho-Corasick algorithm */
#include <bits/stdc++.h>
using namespace std;
/* Maximum number of characters in input alphabet */
const int MAXC = 26;
/* Maximum number of states in the matching machine, which is equal to the sum of the length of all keywords. */
const int MAXS = 500;
/* OUTPUT FUNCTION IS IMPLEMENTED USING out[] */
int out[MAXS];
/* GOTO FUNCTION IS IMPLEMENTED USING g[][] */
int g[MAXS][MAXC];
/* FAILURE FUNCTION IS IMPLEMENTED USING f[] */
int f[MAXS];
/* arr - array of words.
It will return the number of states that the built machine has. */
int buildMatchingMachine(string arr[], int k)
{
/* Initialize values to 0 in output function. */
memset(out, 0, sizeof out);
/* Initialize values to -1 in goto function. */
memset(g, -1, sizeof g);
int states = 1;
/* Construct values for goto function */
for (int i = 0; i < k; ++i)
{
const string &word = arr[i];
int currentState = 0;
/* Insert all characters in arr[] */
int j;
for (j = 0; j < word.size(); ++j)
{
int ch = word[j] - 'a';
/* Create a new node if a node for ch doesn't exist. */
if (g[currentState][ch] == -1)
g[currentState][ch] = states++;
currentState = g[currentState][ch];
}
out[currentState] |= (1 << i);
}
/* Characters which does not have an edge, add a goto edge to state 0 */
for (int ch = 0; ch < MAXC; ++ch)
if (g[0][ch] == -1)
g[0][ch] = 0;
memset(f, -1, sizeof f);
queue<int> q;
for (int ch = 0; ch < MAXC; ++ch)
{
/* All nodes with depth 1 have failure function value as 0. */
if (g[0][ch] != 0)
{
f[g[0][ch]] = 0;
q.push(g[0][ch]);
}
}
/* Now queue has states 1 and 3 */
while (q.size())
{
/* Now, remove the front state from queue */
int state = q.front();
q.pop();
/* Find failure function for those characters for which goto function is not defined. */
for (int ch = 0; ch <= MAXC; ++ch)
{
if (g[state][ch] != -1)
{
/* Find failure state of removed state */
int failure = f[state];
while (g[failure][ch] == -1)
failure = f[failure];
failure = g[failure][ch];
f[g[state][ch]] = failure;
out[g[state][ch]] |= out[failure];
/* Insert the next level node in Queue */
q.push(g[state][ch]);
}
}
}
return states;
}
int findNextState(int currentState, char nextInput)
{
int answer = currentState;
int ch = nextInput - 'a';
while (g[answer][ch] == -1)
answer = f[answer];
return g[answer][ch];
}
void searchWords(string arr[], int k, string text)
{
buildMatchingMachine(arr, k);
int currentState = 0;
/* Traverse the text using built machine to find all occurrences of words in arr[] */
int i;
for (i = 0; i < text.size(); ++i)
{
currentState = findNextState(currentState, text[i]);
/* If match does not found, move to next state */
if (out[currentState] == 0)
continue;
/* if match found, print all matching words of arr[] */
for (int j = 0; j < k; ++j)
{
if (out[currentState] & (1 << j))
{
cout << "Word " << arr[j] << " appears from "
<< i - arr[j].size() + 1 << " to " << i << endl;
}
}
}
}
/* Main code */
int main()
{
string arr[] = {"coding", "ninja", "as", "ding"};
string text = "codingninjas";
int k = sizeof(arr)/sizeof(arr[0]);
searchWords(arr, k, text);
return 0;
}
```

#### Output

```
Word coding appears from 0 to 5
Word ding appears from 2 to 5
Word ninja appears from 6 to 10
Word as appears from 10 to 11
```

### Complexities

#### Time complexity

**O(n+m+z), **where n is the length of the text, m is the total number of characters in all words, z is the total number of occurrences of words in the text.**Reason: **The Aho-corasick Algorithm formed the basis of the original Unix command fgrep.

#### Space complexity

**O(alphabet_size * average key length * N), **where N is the number of words in the trie.**Reason: **Space complexity is used to create a trie data structure in Aho-corasick Algorithm.

Read More - __Time Complexity of Sorting Algorithms__