Table of contents
1.
Introduction
2.
Problem Statement
2.1.
Example1:
2.2.
Example 2:
3.
Approach 1: Naive
4.
Implementation
5.
Complexity Analysis
6.
Approach 2: 
7.
Implementation
8.
Complexity Analysis
9.
Frequently Asked Questions
10.
Key Takeaways
Last Updated: Mar 27, 2024

Count Different Substrings in a String

Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

 

Introduction

 

Data Structures play a vital role in our problem-solving skills. Consistency with an efficient learning approach can raise your bar in the programming world.

This problem is about counting the different substrings in a string. 

Before heading towards the problem, certain words should be clear in your head for thinking it in that way.

 

What do you mean by a substring?

A substring is a contiguous sequence of characters in a string. For example, "is the" is a substring of "This is the programming language."

 

For questions related to each data structure, refer to the blog Must-Do Coding Interview Questions for Product Based Companies.

 

Let's move to our problem statement for more clarity.

 

Problem Statement

We are provided with a string' s,' and we need to find the distinct substrings of 's.' We will be implementing different approaches in the Java language.

 

Let's understand this with some examples:

 

Example1:

 

Input: s= “abc” 

Output: 6

 

Explanation: a, b, c, ab, bc, abc. These will be the total distinct substrings of string 'abc.'

 

Example 2:

 

Input: s=”abcd”

Output: 10

 

Explanation: a, b, c, d, ab, bc, cd, abc,bcd, abcd. These will be the total distinct substrings of string' abcd.'

 

Before stepping into the solution, you must give it a shot on "Count Distinct Substrings."

 

Approach 1: Naive

The first naive approach would be generating all the substrings of the string and counting the different numbers of substrings. The idea is to use HashSet data structure to count the number of distinct substrings in a string.

 

Implementation

 

import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;

public class Solution {

    public static int distinct(String s)
    {
        // HashSet for distinct substrings
        Set<String> ans = new HashSet<String>();


        for (int i = 0; i <= s.length(); i++) {
            for (int j = i + 1; j <= s.length(); j++) {

                 // To add the substrings in the set.
                // It will eliminate the duplicates
                 ans.add(s.substring(i, j));
            }
        }

       // This will return the count i.e
       // the size of the hashset
      return ans.size();
   }

  // Driver Code
  public static void main(String[] args)
  {
     String s = "abc";
     System.out.println(distinct(s));
  }
}
You can also try this code with Online Java Compiler
Run Code

 

Output

 

6
You can also try this code with Online Java Compiler
Run Code

 

Complexity Analysis

 

Time Complexity: The overall complexity of this approach becomes O(n^3log(n2)).

Space Complexity: O(n)

Also see,  Euclid GCD Algorithm

Approach 2: 

 

The idea is to create a Trie of all suffixes of a given string. Once the Trie is constricted, our answer is the total number of nodes in the constructed Trie. The idea is to insert the character not already present in the Trie. For example, the below diagram represents Trie of all suffixes for "ababa." The total number of nodes is 10, which is our answer.

 

 

Source-tutorialspoint

 

Each root represents a prefix of words existing in the Trie to node path of a Trie. We are suffixes in this case. As a result, each node corresponds to a prefix of suffixes.

A prefix of a suffix of "str" is every substring of the string "str."

 

Implementation

 

import java.io.*;

class Solution {
    static class TrieNode{
       TrieNode child[];
       boolean end;

       TrieNode(){
       this.child = new TrieNode[26]; 
       this.end = false;
      }
  }

    static TrieNode root = new TrieNode();

    static void insert(String s){ 
        TrieNode curr = root; 
        for(char ch : s.toCharArray()){
            int i = ch - 'a';

            if(curr.child[i] == null)
                curr.child[i] = new TrieNode();

            curr = curr.child[i];
         }
         curr.end = true;
     }

   public static int substring(String s){
        // To count the total number of strings
        int count = 0;
        for (int i = 0; i <= s.length(); i++){

             TrieNode temp = root;
             for (int j = i; j < s.length(); j++){
                    char ch = s.charAt(j);
                    // Add it in trie, if character is not present
                   if(temp.child[ch - 'a'] == null){
                       temp.child[ch - 'a'] = new TrieNode();
                       temp.end = true;
                       count++;
                    }
                   // Move to the next character
                   temp = temp.child[ch - 'a'];
             }
       }
    return count;
  }

   // Main function
   public static void main (String[] args) {
       int count = substring("abc");
       System.out.println(count);
   }
}
You can also try this code with Online Java Compiler
Run Code

 

Output

6
You can also try this code with Online Java Compiler
Run Code

Complexity Analysis

 

Time Complexity: The time complexity is O(n^2)

Space Complexity: O(n)*26. T is the reduced space complexity.

Check out this problem - Find Duplicate In Array

Frequently Asked Questions

  1. What do you mean by HashSet data structure?
    HashSet is a data structure in which duplicate values are not allowed. It is mainly used for its less time complexity consuming in insert and delete operations.
     
  2. What do you mean by a substring?
    It is a contiguous sequence of characters in a string. For example, "is the" is a substring of "This is the programming language."
     
  3. How many approaches can be there to compute the different substrings in a string?
    There can be many approaches to finding the distinct substrings in a string. But we have implemented two approaches in the Java language.

 

Key Takeaways

 

This blog has covered the following approaches:

 

  • The first approach is the naive approach to compute the different substrings in a string, where we were using a HashSet data structure to store the substrings.

 

  • The second approach is the efficient one where we are using a unique data structure called Suffix Arrays/ Trie, and you can learn about it in great detail here.

 

Also, you can try more problems based on similar concepts or questions contributed by creative programmers like Ninja and substringsFind all distinct palindromic substrings of a given string, and many more.

Recommended problems -

 

Coding Ninjas Studio is a one-stop destination for various DSA questions typically asked in interviews to practice more such problems.

 

Keep Learning, Keep Growing!!!

Live masterclass