Code:
/*
Ukkonen's Suffix Tree Construction is implemented in C++
We create a generalised suffix tree for two strings in this example
The longest common substring of the two input strings is then found
*/
#include <bits/stdc++.h>
using namespace std;
#define MAX_CHAR 256
struct SuffixTreeNode {
struct SuffixTreeNode *children[MAX_CHAR];
//pointer to other node via suffix link
struct SuffixTreeNode *suffixLink;
/*
The (start, end) interval indicates the edge that connects the node to its parent node.
Each edge will connect two nodes, one parent and one child, and the child node will record
the (start, end) interval of each edge. If two nodes X and Y are joined by an edge with the
indices (5, 8) stored in node Y, then the indices (5, 8) will be stored in node Y.
*/
int start;
int *end;
// It saves the suffix index for the path from root to leaf for leaf nodes.
int suffixIndex;
};
typedef struct SuffixTreeNode Node;
// Input string
char text[100];
// Root node pointer
Node *root = NULL;
/*
lastNewNode will point to a newly generated internal node that is awaiting the setting of its suffix link,
which may receive a new suffix link (other than root) in the following phase extension. When the previous
newly generated internal node (if any) has its suffix link reset to a new internal node created in the
following extension of the same phase, lastNewNode will be set to NULL.
*/
Node *lastNewNode = NULL;
Node *activeNode = NULL;
// activeEdge is represented as a character index in the input string (not the character itself)
int activeEdge = -1;
int activeLength = 0;
// remainingSuffixCount indicates how many suffixes in the tree have yet to be added.
int remainingSuffixCount = 0;
int leafEnd = -1;
int *rootEnd = NULL;
int *splitEnd = NULL;
//Length of input string
int size = -1;
//String 1 size that is x
int size1 = 0;
Node *newNode(int start, int *end)
{
Node *node =(Node*) malloc(sizeof(Node));
int i;
for (i = 0; i < MAX_CHAR; i++)
node->children[i] = NULL;
/*
The suffixLink property will be set to NULL for the root node. In the current extension,
suffixLink will be set to root by default for internal nodes, however this may change in
the future.
*/
node->suffixLink = root;
node->start = start;
node->end = end;
/*
By default, suffixIndex is set to -1, and the actual suffix index is set afterwards for
leaves at the end of all phases.
*/
node->suffixIndex = -1;
return node;
}
int edgeLength(Node *n) {
if(n == root)
return 0;
return *(n->end) - (n->start) + 1;
}
int walkDown(Node *currNode)
{
/*
Using the Skip/Count Trick, alter activePoint for a walk down (APCFWD) (Trick 1). Set
next internal node as activeNode and change activeEdge and activeLength to represent same
activePoint if activeLength is higher than current edge length.
*/
if (activeLength >= edgeLength(currNode))
{
activeEdge += edgeLength(currNode);
activeLength -= edgeLength(currNode);
activeNode = currNode;
return 1;
}
return 0;
}
void extendSuffixTree(int pos)
{
/*
Extension Rule 1 is responsible for extending all of the leaves that have been formed so far
in the tree.
*/
leafEnd = pos;
/*
Increase the amount of time left. SuffixCount indicates that a new suffix has been added to the
list of suffixes that have not yet been added to the tree.
*/
remainingSuffixCount++;
/*
When entering a new phase, set lastNewNode to NULL to indicate that no internal node is waiting
for its suffix link to be reset in the current phase.
*/
lastNewNode = NULL;
//Add all suffixes (yet to be added) one by one in tree
while(remainingSuffixCount > 0) {
if (activeLength == 0)
activeEdge = pos; //APCFALZ
// Important: Add all suffixes (that haven't been added yet) to the tree one by one.
if (activeNode->children] == NULL)
{
//Starting with activeEdge from activeNode, there is no outgoing edge.
activeNode->children] = newNode(pos, &leafEnd);
/*
In the above line, a new leaf edge is generated, starting from an existing node (the current
activeNode), and if any internal nodes are waiting for their suffix links to be reset, point
the suffix link from that last internal node to the current activeNode. Then set lastNewNode
to NULL to indicate that there are no more nodes waiting for the suffix link to be reset.
*/
if (lastNewNode != NULL)
{
lastNewNode->suffixLink = activeNode;
lastNewNode = NULL;
}
}
// From activeNode, there is an outgoing edge called activeEdge.
else
{
// Start with activeEdge to get the next node at the end of the edge.
Node *next = activeNode->children];
// Go down
if (walkDown(next))
{
// Begin at the next node (the new activeNode)
continue;
}
// Extension Rule 3 (the currently processed character is already on the edge)
if (text[next->start + activeLength] == text[pos])
{
/*
Set the suffix link of a freshly generated node to the current active node
if it is waiting for its suffix link to be set.
*/
if(lastNewNode != NULL && activeNode != root)
{
lastNewNode->suffixLink = activeNode;
lastNewNode = NULL;
}
// APCFER3
activeLength++;
// STOP ALL ADDITIONAL PROCESSING IN THIS PHASE AND GO TO THE NEXT PHASE
break;
}
/*
make sure When activePoint is in the middle or mid way of the edge being traversed and the current
character being processed is not on the edge (we fall off the tree), we will be
here. In this scenario, we're going to add a new internal node and a new leaf edge
that comes out of it. This is Extension Rule 2, which creates a new leaf edge
and an internal node.
*/
splitEnd = (int*) malloc(sizeof(int));
*splitEnd = next->start + activeLength - 1;
// New node (internal node)
Node *split = newNode(next->start, splitEnd);
activeNode->children] = split;
// A new leaf emerges from a new internal node.
split->children] = newNode(pos, &leafEnd);
next->start += activeLength;
split->children] = next;
/*
We've added a new internal node to the mix. If any internal nodes generated in
previous extensions of the same phase have yet to have their suffix links reset,
do so now.
*/
if (lastNewNode != NULL)
{
// lastNewNode's suffixLink links to the current freshly generated internal node.
lastNewNode->suffixLink = split;
}
/*
Make the current newly constructed internal node wait for its suffix link to be reset
(which is currently pointing to root). If we may come across any other internal node (existing
or freshly formed) in the next extension of the same phase, suffixLink of this node will
point to that internal node when a new leaf edge is added (i.e. when Extension Rule 2 applies
in any of the next extensions of the same phase).
*/
lastNewNode = split;
}
// One suffix was added to the tree, reducing the number of suffixes still to be added.
remainingSuffixCount--;
// check for active node
if (activeNode == root && activeLength > 0) //APCFER2C1
{
activeLength--;
activeEdge = pos - remainingSuffixCount + 1;
}
else if (activeNode != root) //APCFER2C2
{
activeNode = activeNode->suffixLink;
}
}
}
void print(int i, int j)
{
int k;
for (k=i; k<=j && text[k] != '#'; k++)
printf("%c", text[k]);
if(k<=j)
printf("#");
}
/*
As well as printing the suffix tree, you may also set the suffix index.
As a result, the tree will be printed in DFS format.
Each edge will be printed together with its suffix index.
*/
void setSuffixIndexByDFS(Node *n, int labelHeight)
{
if (n == NULL) return;
if (n->start != -1) // A non-root node
{
}
int leaf = 1;
int i;
for (i = 0; i < MAX_CHAR; i++)
{
if (n->children[i] != NULL)
{
// Because it has outgoing /edges, the current node is not a leaf.
leaf = 0;
setSuffixIndexByDFS(n->children[i], labelHeight +
edgeLength(n->children[i]));
}
}
// check for leaf
if (leaf == 1)
{
// Iteration
for(i= n->start; i<= *(n->end); i++)
{
if(text[i] == '#')
{
n->end = (int*) malloc(sizeof(int));
*(n->end) = i;
}
}
n->suffixIndex = size - labelHeight;
}
}
void freeSuffixTreeByPostOrder(Node *n)
{
if (n == NULL)
return;
int i;
for (i = 0; i < MAX_CHAR; i++)
{
if (n->children[i] != NULL)
{
freeSuffixTreeByPostOrder(n->children[i]);
}
}
if (n->suffixIndex == -1)
free(n->end);
free(n);
}
/*
Create the suffix tree and output the edge labels, as well as suffixIndex.
For leaf edges, suffixIndex will be >= 0 while for non-leaf edges, suffixIndex
will be -1.
*/
void buildSuffixTree()
{
size = strlen(text);
int i;
rootEnd = (int*) malloc(sizeof(int));
*rootEnd = - 1;
/*
Because it has no parent from which an edge comes to root, root is a special
node with start and end indices of -1.
*/
root = newNode(-1, rootEnd);
// root will be first active node
activeNode = root;
for (i=0; i<size; i++)
extendSuffixTree(i);
int labelHeight = 0;
setSuffixIndexByDFS(root, labelHeight);
}
int doTraversal(Node *n, int labelHeight, int* maxHeight,
int* substringStartIndex)
{
if(n == NULL)
{
return;
}
int i=0;
int ret = -1;
// only if it is internal node
if(n->suffixIndex < 0)
{
for (i = 0; i < MAX_CHAR; i++)
{
if(n->children[i] != NULL)
{
ret = doTraversal(n->children[i], labelHeight +
edgeLength(n->children[i]),
maxHeight, substringStartIndex);
if(n->suffixIndex == -1)
n->suffixIndex = ret;
else if((n->suffixIndex == -2 && ret == -3) ||
(n->suffixIndex == -3 && ret == -2) ||
n->suffixIndex == -4)
{
// Mark this given node as XY
n->suffixIndex = -4;
// always keep the track of the deepest node
if(*maxHeight < labelHeight)
{
*maxHeight = labelHeight;
*substringStartIndex = *(n->end) -
labelHeight + 1;
}
}
}
}
}
// suffix of X
else if(n->suffixIndex > -1 && n->suffixIndex < size1)
// Mark this node as X
return -2;
// suffix of Y
else if(n->suffixIndex >= size1)
// Mark this node as Y
return -3;
return n->suffixIndex;
}
void getLongestCommonSubstring()
{
int maxHeight = 0;
int substringStartIndex = 0;
doTraversal(root, 0, &maxHeight, &substringStartIndex);
int k;
for (k=0; k<maxHeight; k++)
printf("%c", text[k + substringStartIndex]);
if(k == 0)
printf("No common substring");
else
printf(", of length: %d",maxHeight);
printf("\n");
}
// main function which will test all the given functions
int main(int argc, char *argv[])
{
size1 = 7;
printf("Longest Common Substring in xabxac and abcabxabcd is: ");
strcpy(text, "xabxac#abcabxabcd$"); buildSuffixTree();
getLongestCommonSubstring();
//Free the dynamically allocated memory
freeSuffixTreeByPostOrder(root);
size1 = 10;
printf("Longest Common Substring in xabxaabxa and babxba is: ");
strcpy(text, "xabxaabxa#babxba$"); buildSuffixTree();
getLongestCommonSubstring();
//Free the dynamically allocated memory
freeSuffixTreeByPostOrder(root);
size1 = 6;
printf("Longest Common Substring in abcde and fghie is: ");
strcpy(text, "abcde#fghie$"); buildSuffixTree();
getLongestCommonSubstring();
//Free the dynamically allocated memory
freeSuffixTreeByPostOrder(root);
size1 = 6;
printf("Longest Common Substring in pqrst and uvwxyz is: ");
strcpy(text, "pqrst#uvwxyz$"); buildSuffixTree();
getLongestCommonSubstring();
//Free the dynamically allocated memory
freeSuffixTreeByPostOrder(root);
return 0;
}

You can also try this code with Online C++ Compiler
Run Code
Output:
Longest Common Substring in xabxac and abcabxabcd is: abxa, of length: 4
Longest Common Substring in xabxaabxa and babxba is: abx, of length: 3
Longest Common Substring in abcde and fghie is: e, of length: 1
Longest Common Substring in pqrst and uvwxyz is: No common substring

You can also try this code with Online C++ Compiler
Run Code
Time Complexity
O(n)
The time complexity to find the Longest common Substring using the suffix tree method will cost only linear time complexity and is the most optimized approach for finding the Longest Common Substring. The first construction of the entire suffix tree will cost o(n) time complexity, where n is the size of the given string. So in the worst-case iteration on the entire tree may cost o(n) time complexity, so the total time complexity to solve the entire problem will be o(n) time.
Space Complexity
O(n)
The space complexity to find the Longest common Substring using the suffix tree method will cost only linear space complexity and is the most optimized approach for finding the Longest common Substring. The entire construction of the suffix tree will cost o(n) space complexity, where n is the size of the given string.
FAQs
-
What is the longest common string?
The longest common substring problem in computer science is to identify the longest string that is a substring of two or more strings.
-
What are suffix trees?
A suffix tree (also known as a PAT tree or, in an earlier form, a position tree) is a compressed trie that contains all the suffixes of a given text as keys and text positions as values. Many key string operations can be implemented very quickly using suffix trees.
Key Takeaways
In this blog, we discussed the most optimized solution to find the Longest common Substring using suffix trees. Along with the solution, We also explored the time and space complexity of the solution.
Check out this problem - Longest Common Prefix
If you want to learn more about such hidden number theory algorithms and want to practice some quality questions which require you to excel your preparation s a notch higher, then you can visit our Guided Path for these number theory algorithms on Coding Ninjas Studio.To be more confident in data structures and algorithms, try out our DS and Algo Course. Until then, All the best for your future endeavours, and Keep Coding.