Last Updated: 8 Jan, 2021

Most Frequent Word

Easy
Asked in companies
SalesforceIBMGoldman Sachs

Problem statement

You are given a paragraph that may have letters both in lowercase and uppercase, spaces, and punctuation. You have also given a list of banned words. Now your task is to find the most frequent word which is not in the list of banned words. There will always be a solution, and the solution will be unique.

While comparing words, you can ignore whether the letter is lowercase or uppercase. For example, ‘AsK’ and ‘aSK’ are the same. The words will always contain only alphabets or we can say words will be separated by spaces or punctuation. The answer will be in uppercase letters.

The words in the banned list will always be in upper letters and free from punctuation and spaces.

For example :

Paragraph = ‘It's a square SqUare. It's a FLAT flat.’ 
Banned =[FLAT, IT, S]. 
So we can see these words [IT, S, SQUARE, FLAT ]  are most frequent.
Now we will look at to banned list and we can see 3 of the words are banned.
So we have a unique answer SQUARE which has a frequency of 2 and not on the banned list.

Input Format:

The first line of input contains a string PARAGRAPH.
The second line of input contains an integer 'N' representing the size of the banned words list.
The third line of input contains 'N' space-separated words BANNEDWORDS.

Output format:

Print the word (in UPPERCASE) which is most frequent and is not present in the list of banned words.

Note :

You don’t have to print anything. It has already been taken care of. Just implement the given function. 

Constraints:

1<= N <=10^6
1<=banned.size<=100
Where ‘N’ is the size of paragraph.

Time limit: 1 second

Approaches

01 Approach

  1. To find the frequency of each word we need to extract the words from the paragraph.
  2. So we will convert all the punctuation to space in the paragraph and lowercase alphabets to uppercase.
  3. Now we can use the library function (e.g sstream in C++) to extract words from the paragraph in an array of strings say WORDS.
  4. We will iterate over WORDS and do:
    1. Iterate over the banned array and check if the current word is present in banned or not.
    2. If not present in banned we will run another loop on WORDS to find its frequency.
    3. If the frequency of the current word is greater than the max frequency then update the max frequency and answer(the string that function will return).
  5. After the 4th step, we will have a word that is not banned and have the maximum frequency.

02 Approach

First of all, we need to extract all words into an array of strings. We can do this the same way as we did in the earlier approach. But this time we will use Hashmap to store the frequency of each word and a Set to store the banned words so that we can check if the word is banned or not in constant time.

  1. Let's say we have given a paragraph P.
  2. Let’s say the array of words is WORDS.
  3. We are using an unordered_map called MYMAP as Hashmap.
  4. We are also using a Set called MYSET to store banned words.
  5. Let’s say MXCOUNT=0 and ANS is an empty string.
  6. Iterate over WORDS[i] for each 0 <= i < WORDS.size() and do:
    1. If WORDS[i] is present in MYSET go to step 6.
    2. MYMAP[ WORDS[i] ] += 1.
    3. Initialize an integer variable COUNT=.MYMAP[ WORDS[i] ].
    4. If COUNT > MXCOUNT then do :
      1. MXCOUNT = COUNT
      2. ANS = WORDS[i]
  7. Return ANS.

03 Approach

We can solve this problem without extracting words into extra space and without storing Banned into Set. We will only be using a Hashmap to store the frequency of words and will initialize the Banned words with a minimum value of INTEGER say INT_MIN. And from the given constraint we know once initialized with INT_MIN Banned words frequency can never reach a positive value. And the algorithm for processing words is given below:

  1. Let's say we have given a paragraph P.
  2. We are using an unordered_map called MYMAP as Hashmap.
  3. Set the MYMAP value for banned words as INT_MIN.
  4. Iterate over P[i] for each 0 <= i < N and do:
    1. If P[i] is alphabet:
      1. Convert P[i] to uppercase and append to TEMP.
    2. If P[i] is a space or punctuation :
      1. MYMAP[TEMP] += 1.
      2. Set TEMP=””.
  5. Return the word with max frequency.