Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Array problems involve manipulating arrays, which are collections of elements stored in a contiguous memory location. Arrays are an essential data structure in computer programming used to store and process large amounts of data efficiently. In array problems, tasks may include searching, sorting, inserting, deleting, and modifying elements in the array. Other operations, such as merging, splitting, and resizing, may also be required.
We're working on an array problem with a graph-like feel today. In this blog, we’ll solve a fascinating array problem, “Minimum size lexicographically smallest string which is not a substring of the given string.
Problem Statement
We are given a string ‘S.’ We are assigned to find the lexicographically smallest string, which is not a substring of S. In other words, we must find a minimum size string, not a substring of ‘S,’ and ensure it is the lexicographically smallest possible string. In this blog, we will solve this problem efficiently.
Now, what is lexicographical order?
Lexicographical order is a way of ordering words or symbols based on their alphabetical or numerical sequence. The first element is compared to the first element of another word or symbol. If they are the same, the comparison moves to the next element until a difference is found, determining the order.
For example:
Consider the string “abc.” Among all the possible strings, the lexicographically smallest string, which is not a substring of “abc,” is “d.” Our task is to compute this string efficiently.
Consider the string “abcdefghijklmnopqrstuvwxyz.” The Minimum size lexicographically smallest string that is not a substring of this string is “aa.”
Approach
This problem can be solved using one of the most commonly used algorithms: Breadth First Search(BFS). The idea is to generate all the strings in lexicographical order and check if the current string is a substring of the given string “S.” To do this efficiently, we precompute all S substrings and store them in a hashmap. The first string, not the substring of S, is the required answer.
The steps to solve this problem are
Create a set of all possible substrings of the given string S.
Initialize a queue with all single-character strings.
While the queue is not empty:
Dequeue the first string in the queue.
If the dequeued string is not in the set of possible substrings:
Return the dequeued string (the minimum size lexicographically smallest string, not a substring of S).
Otherwise, for each possible character that can be appended to the dequeued string:
Enqueue the new string (formed by appending the character) to the back of the queue.
Return an empty string if the queue becomes empty, and no such string is found.
The implementation involves a simple BFS following a proper lexicographical order. The first string, which fails to be a substring of S, is the required answer.
#include <bits/stdc++.h>
using namespace std;
string smallestNonSubstring(string s)
{
// Create a set of all possible substrings
map<string,int> substrings;
for (int i = 0; i < s.length(); i++)
{
for (int j = i; j < s.length(); j++)
{
substrings[s.substr(i,j-i+1)]++;
}
}
// BFS to find the smallest non-substring
queue<string> q;
for (char c = 'a'; c <= 'z'; c++)
{
string str(1, c);
q.push(str);
}
while (!q.empty())
{
string curr = q.front();
q.pop();
if (substrings.find(curr) == substrings.end())
{
return curr;
}
// Push characters from [a-z]
// to the back of the string curr
// and push into the queue.
for (int i = 0; i < 26; ++i)
{
curr.push_back(i + 'a');
q.push(curr);
curr.pop_back();
}
}
// If no non-substring is found, return an empty string
return " ";
}
int main()
{
string s ;
cout << "Enter any string" << "\n";
cin >> s ;
string result = smallestNonSubstring(s);
cout << "lexicographically smallest string that is not a substring of a given string: " << result << endl;
return 0;
}
You can also try this code with Online C++ Compiler
Input: “abcde”. On entering this string as an input, we receive the following output.
Output:
Time Complexity
The time complexity is O(N2 * log N). The nested loops generate the factor of N2 we ran to precompute all the substrings, and the map contributes log N.
Space Complexity
The space complexity is O(N2).
Frequently Asked Questions
Can this problem be solved using a trie data structure?
Yes, a trie data structure can efficiently store all the substrings of the given string and check if a given string is a substring.
Can the BFS solution handle significant inputs?
No, the BFS solution is unsuitable for significant inputs as the space complexity is exponential in the length of the smallest non-substring string.
Can the hash set optimization in the BFS solution cause false positives?
No, the hash set optimization does not cause false positives as it only stores substrings of the given string and not any other arbitrary strings.
Conclusion
In this blog, we discussed a standard, tricky problem of strings and came across a creative BFS solution to solve such problems. We hope this article helped increase
Transparency related to strings and BFS. You may also refer to the following: