Last Updated: 16 Sep, 2020

Shortest substring with all characters

Moderate
Asked in companies
Natwest GroupIntuitChegg Inc.

Problem statement

You have been given a string 'S' which only consists of lowercase English-Alphabet letters.

Your task is to find the shortest(minimum length) substring of 'S' which contains all the characters of 'S' at least once. If there are many substrings with the shortest length, then find one which appears earlier in the string i.e. substring whose starting index is lowest.

For example-
If the given string is S = "abcba", then the possible substrings are "abc" and "cba". As "abc" starts with a lower index (i.e. 0, "cba" start with index 2), we will print "abc" as our shortest substring that contains all characters of 'S'.
Input Format:
The only line of input contains a string 'S' i.e. the given string.
Output Format:
The only line of output contains a string i.e. the shortest substring of 'S' which contains all the characters of 'S' at least once.

Note:

You are not required to print the expected output, it has already been taken care of. Just implement the function.
Constraints:
1 <= N <= 10^5
'S' only contains lowercase English-Alphabet letters.

Where 'S' is the given string and ‘N’ is the length of ‘S’.

Time Limit: 1 sec 

Approaches

01 Approach

The problem boils down to counting distinct characters that are present in the string and then finding the minimum length substring that contains this many distinct characters at least once.

 

We can naively check all substrings by two nested loops and maintain a count of distinct characters for each substring with the help of a hashmap. There are many methods for maintaining this count with a hashmap like the size of the hashmap, or maintaining the ‘count’ variable, etc. 

 

Whenever the count of distinct characters for a substring equals the count of distinct characters in the given string, we process this substring. We will choose the minimum length substring out of all these substrings.

02 Approach

The idea is to use two pointers technique with a sliding window of variable length. The current window will be our current processing substring. 

 

  • If the count of distinct characters of the current window is equal to that of the given string, it makes no sense in expanding the window (because the count will only increase in doing so).
  • Similarly, if the count of the distinct characters is less than that of the given string, there is no benefit in shrinking the window.
  •  

We will keep track of the minimum length substring obtained so far and only update it when we find a substring of a smaller length.

Here, is the complete algorithm-

  • Add all the characters of the given string to a HashSet ‘M’.
  • Initialize ‘DISTINTCT_CHAR’ to the size of this HashSet ‘M’.
  • Initialize ‘START’ and ‘END’ pointers to 0, as currently, our window has size 1.
  • Initialize ‘COUNT’ to 0, which will keep track of the number of distinct characters in the current window/substring.
  • Declare a HashMap ‘FREQUENCY’ to store current window characters.
  • Initialize ‘MIN_LENGTH’ to the length of the given string and ‘START_INDEX’ to 0.
  • Loop till ‘END’ < length of the string
    • Add S[END] to the window i.e. increment FREQUENCY[S[END]] by 1.
      • If after this FREQUENCY[s[END]] == 1, increment  ‘COUNT’ by 1.
    • Shrink window till ‘COUNT’ is equal to ‘DISTINCT_CHAR’
      • Update ‘MIN_LENGTH’ and ‘START_INDEX’ if the current window is shorter.
      • Remove S[START] from the window i.e. decrement FREQUENCY[S[START]] by 1.
        • If after this FREQUENCY[S[START]] == 0, decrement  ‘COUNT’ by 1.
      • Shrink window size i.e. increment ‘START’ pointer by 1.
    • Expand window size i.e. increment ‘END’ pointer by 1.
  • Return ‘MIN_LENGTH’.