Most Frequent Word

Easy
0/40
1 upvote
Asked in companies
SalesforceIBMGoldman Sachs

Problem statement

You are given a paragraph that may have letters both in lowercase and uppercase, spaces, and punctuation. You have also given a list of banned words. Now your task is to find the most frequent word which is not in the list of banned words. There will always be a solution, and the solution will be unique.

While comparing words, you can ignore whether the letter is lowercase or uppercase. For example, ‘AsK’ and ‘aSK’ are the same. The words will always contain only alphabets or we can say words will be separated by spaces or punctuation. The answer will be in uppercase letters.

The words in the banned list will always be in upper letters and free from punctuation and spaces.

For example :

Paragraph = ‘It's a square SqUare. It's a FLAT flat.’ 
Banned =[FLAT, IT, S]. 
So we can see these words [IT, S, SQUARE, FLAT ]  are most frequent.
Now we will look at to banned list and we can see 3 of the words are banned.
So we have a unique answer SQUARE which has a frequency of 2 and not on the banned list.
Detailed explanation ( Input/output format, Notes, Images )

Input Format:

The first line of input contains a string PARAGRAPH.
The second line of input contains an integer 'N' representing the size of the banned words list.
The third line of input contains 'N' space-separated words BANNEDWORDS.

Output format:

Print the word (in UPPERCASE) which is most frequent and is not present in the list of banned words.

Note :

You don’t have to print anything. It has already been taken care of. Just implement the given function. 

Constraints:

1<= N <=10^6
1<=banned.size<=100
Where ‘N’ is the size of paragraph.

Time limit: 1 second

Sample input 1:

Smile, smile, smile at your mind as often as possible.
3
YOUR MIND POSSIBLE

Sample output 1:

SMILE

Explanation for sample output 1:

We can see ‘SMILE’ occurs 3 times and all other words occur 1 time except ‘AS’ which appears 2 times. So ‘SMILE’ is the most frequent word and it is not on the banned list.

Sample input 2:

The sad truth is that the truth is sad.
4
THE THAT IS TRUTH

Sample output 2:

SAD

Explanation for sample output 2:

In the paragraph, we have a total of 9 words and 4 of them appear twice which are [SAD, THE, IS, TRUTH]. So ‘SAD’ is the only word that has a frequency of 2 and not in the list of banned words.
Hint

Can we find the frequency of each word with two nested loops.

Approaches (3)
Brute Force
  1. To find the frequency of each word we need to extract the words from the paragraph.
  2. So we will convert all the punctuation to space in the paragraph and lowercase alphabets to uppercase.
  3. Now we can use the library function (e.g sstream in C++) to extract words from the paragraph in an array of strings say WORDS.
  4. We will iterate over WORDS and do:
    1. Iterate over the banned array and check if the current word is present in banned or not.
    2. If not present in banned we will run another loop on WORDS to find its frequency.
    3. If the frequency of the current word is greater than the max frequency then update the max frequency and answer(the string that function will return).
  5. After the 4th step, we will have a word that is not banned and have the maximum frequency.
Time Complexity

O(N^2) where N is the length of the paragraph.

As we have converted the paragraph to an array of strings. And for each word in the array we are traversing the whole array making it N^2.

Space Complexity

O(N) where N is the length of the paragraph.

As we are storing the paragraph in an array of strings, which will take O(N) space.

Code Solution
(100% EXP penalty)
Most Frequent Word
Full screen
Console