Last Updated: 15 Dec, 2020

Smallest Window

Moderate
Asked in companies
HSBCSAP LabsExpedia Group

Problem statement

You are given two strings S and X containing random characters. Your task is to find the smallest substring in S which contains all the characters present in X.

Example:

Let S = “abdd” and X = “bd”.

The windows in S which contain all the characters in X are: 'abdd', 'abd', 'bdd', 'bd'. 
Out of these, the smallest substring in S which contains all the characters present in X is 'bd'. 
All the other substring have a length larger than 'bd'.
Input format:
The very first line of input contains an integer T denoting the number of test cases. 

The first line of every test case contains the string S.

The second line of every test case contains the string X.
Output format:
For each test case, print the smallest window in S which contains all the characters present in X, in a separate line.
Note:
There is always a valid window in S which contains all the characters of X.  

You do not need to print anything, it has already been taken care of. Just implement the given function.
Note:
In case of multiple answers, print only the substring that occurs first.

For example: for the string S = 'cbbbc' and X = 'bc', there are 2 possible answers i.e. 'cb' and 'bc', your code should print 'cb'.
Constraints:
1 <= T <= 10 
1 <= |S|, |X| <= 10^5

Here, |S| denotes the length of string S and |X| denotes the length of string X.
Time Limit: 1 sec

Approaches

01 Approach

A simple and intuitive approach could be to generate all the possible substrings of string S and choose the smallest substring which contains all the characters present in X.

This can be done by the following approach:

  1. Firstly, store the count corresponding to each character, occurring in string X, into a hashtable.
  2. Now, the next step is to generate all the possible substrings of string S.
  3. Maintain a variable to keep track of the smallest substring, which is valid. Update it every time we find a smaller valid substring.
  4. Starting from each index in string S, generate all the possible substrings.
  5. Store the count corresponding to each character, occurring in the substring, into a new hashtable.
  6. Compare the count of each character in string X to the count of the same character in the substring. Let the character being compared be denoted by ‘ch’, then two cases arise:
    • If HashTable_X[ch] <= HashTable_SUBSTRING[ch]: The substring (window) being considered contains the character ‘ch’ and the number of occurrences of ‘ch’ in substring is greater than or equal to the number of occurrences of ‘ch’ in string X. It is a valid substring, continue iterating.
    • Otherwise, the number of occurrences of ‘ch’ in substring (window) is less than the number of occurrences of ‘ch’ in string T. Hence the substring (window) being considered cannot be valid.
  7. If the substring (window) being considered is valid, and the length of this substring is smaller than the previously stored valid substring, make the current substring as the new smallest valid substring.
  8. Repeat the process from step 5 until all the substrings of string S have been compared.
  9. Return the smallest valid substring.

02 Approach

In this approach, we use the concept of two pointers along with a hashtable. The two pointers - start and end, represent the current substring  (window) which is in consideration. We move the end pointer to increase the size of the window and move the start pointer to decrease the size of the window. When a valid window is found, shrink the window if possible, then record its width. The smallest valid window gives us the answer to the problem.

 

The idea is to consider the windows of the string S and check for validity; the window(S) is valid if all the characters of X are present in S. 
The basic variables used are: 
Start, stores the starting index.
End stores the ending index.

FinalStart and finalEnd storing the indices of the final string to be returned.

Counter, storing the count of characters in X which are not yet found in the window(S).

 

The algorithm for the approach is as follows:

  1. Initialize a hashtable, HASH, to store the counts of each character occurring in string X.
  2. Initialize start = 0, end = 0.
  3. Initialize counter = |X| and finalLength= |S|
  4. Loop 1: while end < S.length:
    • New Character say end_ch = S[ end ]
    • HASH[ end_ch ] = HASH[ end_ch ] - 1.
    • If HASH[ end_ch ] >= 0 : decrement counter.
    • Loop 2: while the counter remains 0:
      • If, new window’s width is less than finalLength: update finalLength= end - start + 1, also keep a note of start and end.
      • New Character, start_ch = S[start]
      • HASH[start_ch ] = HASH[start_ch] + 1.
      • If HASH[start_ch] > 0 : increment counter.
      • Shrink the window: Increment start pointer.
    • Expand the window: Increment end pointer to check for more valid strings.
  5. Return the smallest valid substring.

 

Let us understand this by an example:

Let's assume that string S = “aaabca” and string X = “ac”.

  1. Initialising, HASH[ a ]  = 1 and HASH[ c ]  = 1, rest all as 0.
  2. Start = 0, end = 0 and counter = 2
  3. Initially, the end pointer points to the first character of string S, i.e., ‘a’, and the window is “a”. 
    HASH[ a ] = HASH[ a ] - 1, thus, HASH[ a ] becomes 0. 
    As the letter ‘a’ also occurs in string T (the reason being hash[ a ] is >= 0) hence, the counter is also decremented. Counter becomes 1.
  4. The substring window is expanded by incrementing the end pointer. The window now becomes “aa”. 
    Repeating the same steps as before, HASH[ a ] = HASH[ a ] - 1. So, HASH[ a ] becomes -1. This indicates that ‘a’ is no more a required character; hence, the counter remains the same.
  5. The end pointer is again incremented, the window now becomes “aaa”.
    Now, for the window from “aaa” to “aaab”, the previous steps are repeated, and Hash[ a ] becomes -2, and Hash[ b ] becomes -1.
  6. In the next iteration, our window becomes “aaabc”, 
    HASH[ c ] is decremented, so it becomes 0, 
    The counter is also decremented. 
    Here, the counter becomes 0.
  7. counter = 0 indicates that a valid substring (or window) has been found.

    While loop starts
    Incase of a valid window:
    finalLength = end - start + 1. So, finalLength = 4 - 0 + 1 = 5.
    Store the starting and ending position of the substring by as finalStart = 0, finalEnd = 4.
    Now, we start shrinking the window by incrementing the start pointer. 

    Window: “aaabc”, HASH[ a ] is incremented by 1, so it becomes -1. HASH[ a ] being negative indicates that the letter ‘a’ in the substring was not useful. So, the counter remains 0. And start is incremented, start = 1.

    finalStart = 1, finalEnd = 4, finalLength = 4.
    Window: “aabc”, HASH[ a ] becomes 0, (-1 + 1). The counter remains 0, start = 2.

    finalStart = 2, finalEnd = 4, finalLength = 3.
    Window: “abc”, HASH[ a ] becomes 1, ( 0 + 1 ), this means it is a useful character to us. Thus, the counter becomes 1, start = 3
    The while loop stops. 
  8. The end pointer is incremented and it becomes 5, new window = “bca”, 
    HASH[ S[end] ] i.e. HASH[ a ] becomes 0, counter becomes 0.
  9. Now, again when counter = 0, it indicates that a valid substring (or window) has been found.
    Incase of a valid window:
    finalLength = end - start + 1, 5 - 3 + 1 = 3. 
    finalLength remains 3.
    finalStart = 2, finalEnd =4. (remains same)

    Window: “bca”, HASH[ b ] becomes -1. This being negative was unnecessary; thus, the counter remains 0, start = 4.

    Updation: finalStart = 4, finalEnd = 5, finalLength = 2.
    Window: “ca”, HASH[ c ] becomes -1. The counter becomes 1, start = 5.

    The while loop stops. 
     
  10. The end pointer is incremented, the main loop ends.

As a part of the summary, we considered substrings:

  • “a”, “aa”, “aaa”, “aaab” and “aaabc”.
  • “aaabc” contains all the characters of X, i.e., “ac”, now, we removed unnecessary characters.
  • Then, we considered the substrings “aabc”, “abc”, “bc”. Note: “abc” is the shortest and valid yet.
  • When “bc” came, it was invalid, so we stopped shrinking the string and increment our end pointer.
  • In the next step, we considered the substrings “bca”. As “bca” was also valid, we removed unnecessary characters again.
  • This time, “ca” was the end result.