Last Updated: 25 Dec, 2020

Count Distinct Substrings

Moderate
Asked in companies
AdobeAmazonIntuit

Problem statement

Given a string 'S', you are supposed to return the number of distinct substrings(including empty substring) of the given string. You should implement the program using a trie.

Note :
A string ‘B’ is a substring of a string ‘A’ if ‘B’ that can be obtained by deletion of, several characters(possibly none) from the start of ‘A’ and several characters(possibly none) from the end of ‘A’. 

Two strings ‘X’ and ‘Y’ are considered different if there is at least one index ‘i’  such that the character of ‘X’ at index ‘i’ is different from the character of ‘Y’ at index ‘i’(X[i]!=Y[i]).
Input Format :
The first line contains a single integer ‘T’ denoting the number of test cases.

Then, the ‘T’ test cases follow.

The first line of each test case contains a string.
Output Format :
For each test case, print an integer denoting the number of distinct substrings in the given string.

The output for each test case will be printed in a separate line.
Note :
You don’t need to print anything, It has already been taken care of. Just implement the given function.
Constraints :
1 <= T <= 5
1 <= |S| <= 10^3

‘S’ contains only lowercase English letters.

Time Limit: 1 sec

Approaches

01 Approach

The idea is to find all substrings and insert each substring into a HashSet. Since there will be no duplicates possible into the HashSet, after we have inserted all the substrings in the HashSet the size of the HashSet will be equal to the number of distinct substrings.

 

The steps are as follows :

  1. Iterate through the current string and pivot each index, let’s say the pivoted index is ‘i’.
    • Iterate from the ‘i’ till the end of the loop, let’s say the current index is ‘j’.
      • Insert the substring from index ‘i’ to index ‘j’ into the HashSet.
  2. Return the size of the HashSet+1(extra 1 for empty substring), as that will be the number of distinct substrings in the given string.

02 Approach

The idea is to build a trie for all suffixes of the given string. We will use the fact that every substring of ‘S’ can be represented as a prefix of some suffix string of ‘S’. Hence, if we create a trie of all suffixes of ‘S’ then the number of distinct substrings will be equal to the nodes in the trie. This is because for every substring there will be only one path from the root node of the trie to the last character of the substring, hence no duplicate substrings will be present in the trie.

 

The trie class structure is as follows:

class TrieNode
{
public:
    unordered_map<char, TrieNode*> children;
};

 

The steps are as follows :

  1. Iterate through the given string, let’s say our current index is ‘i’.
    1. Insert the suffix starting at index ‘i’ of the given string into the trie.
  2. Now to count the number of nodes in the trie we will use recursion. Let’s define a function countNodes(root) where “root” denotes the current node of the trie. This function will return the number of nodes in the subtree of the “root”.
    1. Initialize a variable “subtreeNodes” to 1, we will use this variable to count the number of nodes in the subtree of the current node. We are initializing the variable to 1 because the current node will also be counted in the subtree.
    2. Iterate through the children nodes of the current node, let’s say we are currently at the ‘i’ th child.
      1. If the current child node is not null then recursively call the function for all children of the current node i.e. countNodes(root->child[i]) and add the value returned by this function call to “subtreeNodes”.
    3. Return the value of “subtreeNodes
  3. Return the number of nodes of the trie as this will be the number of distinct substrings in the given string.

03 Approach

The approach is very similar to the previous approach, however, in this approach, we will use an array to store the children of a node to improve the access time of a node. Ofcourse we will be worst in space but we can avoid the collision in and Hashing which takes more time than the random access of array’s elements.

 

Each node will have a maximum of 26 children nodes as there are a maximum of 26 distinct characters possibles. Hence we can use an array to store the children nodes.

 

The idea is to build a trie for all suffixes of the given string. We will use the fact that every substring of ‘S’ can be represented as a prefix of some suffix string of ‘S’. Hence, if we create a trie of all suffixes of ‘S’ then the number of distinct substrings will be equal to the nodes in the trie. This is because for every substring there will be only one path from the root node of the trie to the last character of the substring, hence no duplicate substrings will be present in the trie.

The trie class structure is as follows:

class TrieNode
{
public:
    TrieNode *children[26];
    
    TrieNode()
    {
        
        for (int i = 0; i < 26; i++)
        {
            children[i] = NULL;
        }
    }
};

 

The steps are as follows :

  1. Iterate through the given string, let’s say our current index is ‘i’.
    1. Insert the suffix starting at index ‘i’ of the given string into the trie.
  2. Now to count the number of nodes in the trie we will use recursion. Let’s define a function countNodes(root) where “root” denotes the current node of the trie. This function will return the number of nodes in the subtree of the “root”.
    1. Initialize a variable “subtreeNodes” to 1, we will use this variable to count the number of nodes in the subtree of the current node. We are initializing the variable to 1 because the current node will also be counted in the subtree.
    2. Iterate through the children nodes of the current node, let’s say we are currently at the ‘i’ th child.
      1. If the current child node is not null then recursively call the function for all children of the current node i.e. countNodes(root->child[i]) and add the value returned by this function call to “subtreeNodes”.
    3. Return the value of “subtreeNodes
  3. Return the number of nodes of the trie as this will be the number of distinct substrings in the given string.