Last Updated: 4 Dec, 2020

Shortest Substring with all characters

Easy
Asked in companies
ZSAmazonUnthinkable Solutions

Problem statement

You have been given a string 'S' which only consists of lowercase English-Alphabet letters. Your task is to find the shortest (minimum length) substring from 'S' which contains all the characters of 'S' at least once.

Note:

If there are more than one substring with the shortest length, then find one which appears earlier in the string ‘S’ i.e. substring whose starting index is lowest.
For example:
If the given string ‘S’ = "abcba", then the possible substrings having all the characters of ‘S’ at least once and of minimum length are "abc" and "cba". 

As "abc" starts with a lower index (i.e. 0, "cba" starts with index 2), we will return the string "abc" as our shortest substring that contains all characters of 'S'.
Input Format :
The first line of input contains an integer 'T' which denotes the number of test cases or queries to be run. Then each test case follows.

The first and the only line of each test case contains a single string 'S'.
Output Format :
For each test case, print a single line containing a string i.e. the shortest substring of 'S' which contains all the characters of 'S' at least once.

Note:

You don’t need to print the output, it has already been taken care of. Just implement the given function.

Constraints:

1 <= T <= 10 ^ 2
1 <= N <= 5 * (10 ^ 3)
S[i] = [a - z]

Where 'T' denotes the number of test cases and ‘N’ is the length of the given string ‘S’.
The given string 'S' consists of only English lowercase alphabets.

Time Limit: 1 sec 

Approaches

01 Approach

The problem boils down to counting distinct characters that are present in the string and then finding the minimum length substring that contains distinct characters at least once.

 

We can check for all substrings with two nested iterations and maintain a 'COUNT' for distinct characters for each substring with the help of a hashmap. There are many methods for maintaining this 'COUNT' with a hashmap, like the size of the hashmap or maintaining the 'COUNT' variable, etc. 

 

Whenever the count of distinct characters for a substring equals the count of distinct characters in the given string, we process this substring. We will choose the minimum length substring out of all these substrings.

 

Approach :

 

  • First, we will take an unordered set 'HASHSET' to store the distinct characters of the string ‘S.’
  • Make an integer variable say 'STARTINDEX', which will store the starting index of the substring with minimum length and having all the distinct characters of string ‘S.’
  • Make two nested iterations, one with iterator pointer ‘i’, which will iterate throughout the length of the string ‘S’ and another for nested iteration with iterator pointer ‘j' iterates from ‘i’ till the length of string ‘S'
  • Make a counter 'COUNT' which will store the distinct characters (with the help of size another unordered set 'VISITED')  from the substring starting from ‘ith’ index and ending at ‘jth’ index.
  • When ‘COUNT’ becomes equal to count of all distinct characters of string ‘S’ then store the length of this substring in integer variable say, 'MINLENGTH' and updating 'STARTINDEX' with ‘i’.
  • Every time we encounter a 'COUNT' less than the previous ‘COUNT’, we will update 'MINLENGTH'.
  • After the iterations, return the substring from string ‘S’  starting from 'STARTINDEX' and with the length of 'MINLENGTH'.

02 Approach

The idea is to use two pointers technique with a sliding window of variable length. The current window will be our current processing substring. 

 

If the count of distinct characters of the current window is equal to that of the given string, it makes no sense in expanding the window (because the count will remain the same in doing so).

 

Similarly, if the count of the distinct characters is less than that of the given string, there is no benefit in shrinking the window.

 

We will keep track of the minimum length substring obtained so far and only update it when we find a substring of smaller length.

 

Approach :

 

  • Add all the characters of the given string to an unordered set, say 'HASHSET'.
  • Initialize integer variable 'DISTINCTCHAR' to the size of this 'HASHSET' which will store the count of all distinct characters of string ‘S’.
  • Initialize ‘START’ and ‘END’ pointers to 0, as currently, our window has size of 1.
  • Initialize 'COUNT' to 0, which will keep track of the number of distinct characters in the current window/substring.
  • Declare a hashmap, say ‘FREQUENCY’ to store current window characters.
  • Initialize 'MINLENGTH' to the length of the given string and 'STARTINDEX' to 0.
  • Iterate ‘end’ till the last element of string ‘S’.
    • Add S[end] to the window i.e. increment FREQUENCY[S[END]] by 1.
      • If after this FREQUENCY[S[END]] == 1, increment  'COUNT' by 1.
    • Shrink window till 'COUNT' is equal to 'DISTINCTCHAR'
      • Update ‘minimumLength’ and 'STARTINDEX' if the current window is shorter.
      • Remove S[START] from the window i.e. decrement FREQUENCY[S[START]] by 1.
      • If after this FREQUENCY[S[START]] == 0, decrement 'COUNT' by 1.
    • Shrink window size i.e. Increment ‘START’ pointer by 1.
    • Expand window size i.e. Increment ‘END’ pointer by 1.
  • Return 'MINLENGTH'.