Table of contents
1.
Introduction
2.
Problem Statement
3.
Approach
4.
Implementation
5.
Time Complexity
6.
Space Complexity
7.
Frequently Asked Questions
7.1.
Can this problem be solved using a trie data structure?
7.2.
Can the BFS solution handle significant inputs?
7.3.
Can the hash set optimization in the BFS solution cause false positives?
8.
Conclusion
Last Updated: Mar 27, 2024
Medium

Minimum Size Lexicographically Smallest String Which is not a Substring of ‘S’

Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Array problems involve manipulating arrays, which are collections of elements stored in a contiguous memory location. Arrays are an essential data structure in computer programming used to store and process large amounts of data efficiently. In array problems, tasks may include searching, sorting, inserting, deleting, and modifying elements in the array. Other operations, such as merging, splitting, and resizing, may also be required.

Minimum size lexicographically smallest string that is not a substring of a given string

We're working on an array problem with a graph-like feel today. In this blog, we’ll solve a fascinating array problem, “Minimum size lexicographically smallest string which is not a substring of the given string. 

Problem Statement

We are given a string ‘S.’ We are assigned to find the lexicographically smallest string, which is not a substring of S. In other words, we must find a minimum size string, not a substring of ‘S,’ and ensure it is the lexicographically smallest possible string. In this blog, we will solve this problem efficiently.

Now, what is lexicographical order?

Lexicographical order is a way of ordering words or symbols based on their alphabetical or numerical sequence. The first element is compared to the first element of another word or symbol. If they are the same, the comparison moves to the next element until a difference is found, determining the order.

For example:

  • Consider the string “abc.” Among all the possible strings, the lexicographically smallest string, which is not a substring of “abc,” is “d.” Our task is to compute this string efficiently.
     
  • Consider the string “abcdefghijklmnopqrstuvwxyz.” The Minimum size lexicographically smallest string that is not a substring of this string is “aa.

Approach


This problem can be solved using one of the most commonly used algorithms: Breadth First Search(BFS). The idea is to generate all the strings in lexicographical order and check if the current string is a substring of the given string “S.” To do this efficiently, we precompute all S substrings and store them in a hashmap. The first string, not the substring of S, is the required answer.

The steps to solve this problem are

  1. Create a set of all possible substrings of the given string S.
     
  2. Initialize a queue with all single-character strings.
     
  3. While the queue is not empty:
              
    • Dequeue the first string in the queue.
       
    • If the dequeued string is not in the set of possible substrings:
      • Return the dequeued string (the minimum size lexicographically smallest string, not a substring of S).
         
    • Otherwise, for each possible character that can be appended to the dequeued string:
      • Enqueue the new string (formed by appending the character) to the back of the queue.
         
  4. Return an empty string if the queue becomes empty, and no such string is found.

Also check out - Substr C++

Implementation

The implementation involves a simple BFS following a proper lexicographical order. The first string, which fails to be a substring of S, is the required answer.

#include <bits/stdc++.h>
using namespace std;
string smallestNonSubstring(string s) 
{
    // Create a set of all possible substrings
    map<string,int> substrings;
    for (int i = 0; i < s.length(); i++) 
    {
        for (int j = i; j < s.length(); j++) 
        {
            substrings[s.substr(i,j-i+1)]++;
        }
    }
    
    // BFS to find the smallest non-substring
    queue<string> q;
    for (char c = 'a'; c <= 'z'; c++) 
    {
        string str(1, c);
        q.push(str);
    }
    while (!q.empty()) 
    {
        string curr = q.front();
        q.pop();
        if (substrings.find(curr) == substrings.end()) 
        {
            return curr;
        }
        // Push characters from [a-z]
        // to the back of the string curr
        // and push into the queue.
        for (int i = 0; i < 26; ++i) 
        {
            curr.push_back(i + 'a');
            q.push(curr);
            curr.pop_back();
        }
    }
    // If no non-substring is found, return an empty string
    return " ";
}
int main() 
{
    string s ;
    cout << "Enter any string" << "\n";
    cin >> s ;
    string result = smallestNonSubstring(s);
    cout << "lexicographically smallest string that is not a substring of a given string: " << result << endl;
    return 0;
}
You can also try this code with Online C++ Compiler
Run Code

Example:

Input: “abcde”.
On entering this string as an input, we receive the following output.

Output: 

Output

Time Complexity

The time complexity is O(N2 * log N). The nested loops generate the factor of N2 we ran to precompute all the substrings, and the map contributes log N.

Space Complexity

The space complexity is O(N2).

Frequently Asked Questions

Can this problem be solved using a trie data structure?

Yes, a trie data structure can efficiently store all the substrings of the given string and check if a given string is a substring.

Can the BFS solution handle significant inputs?

No, the BFS solution is unsuitable for significant inputs as the space complexity is exponential in the length of the smallest non-substring string.

Can the hash set optimization in the BFS solution cause false positives?

No, the hash set optimization does not cause false positives as it only stores substrings of the given string and not any other arbitrary strings.

Conclusion

In this blog, we discussed a standard, tricky problem of strings and came across a creative BFS solution to solve such problems. We hope this article helped increase 

Transparency related to strings and BFS. You may also refer to the following:

Interview Question for DSA

The Elliptic Curve DSA

Roadmap to learn DSA


Recommended problems -

You may refer to our Guided Path on Code Studios to enhance your skill set on DSA and many more. Check out essential interview questions, practice our available mock tests, and more!

Happy Coding!
 

Live masterclass