Solution Approach
Approach 1: Naive Approach
The brute force solution idea is very straightforward, and we will discuss that here. The idea is to traverse first for each line in the text provided and then search for each word from the inappropriate word if it is present as the substring in the text or not.
If the word is found, then the line and position are printed, and if not, then "Passed" is printed.
The steps of implementation are:
- Take the inputs of n, n number of inappropriate words in words_list, m, m lines of text in the text list.
- Traverse over each line or each element from the text list
- For each word from the words_list, check for its presence in the text line.
- Call search function to check whether the word is present or not in the text line.
- The search function will take two arguments - pattern and text, declare m as the length of pattern and n as the length of text.
- Now implement a for loop to slide the word over the text line; this loop will run from 0 to n-m+1
- For each index in the outer loop, run another while loop to traverse the pattern length and search for the pattern in the text.
- Return the index value if the word is found or return -1 if not found
- Check whether the return value is not equal to -1 and print the index+1 and return value of the search function.
- If the return value is -1, then print 'Passed.'
Python Implementation
def search(pat, text):
pl = len(pat)
tl = len(text)
#loop to slide pat over the text one by one
for i in range(tl-pl+1):
j = 0
#check for the pattern at the current index i
while(j<pl):
if pat[j] != text[i+j] :
break
j += 1
#return the i if complete pattern is found in text at i
if j == pl:
return(i)
#return -1 if not found
return -1
def main():
#creating an empty list of inappropriate words and text
words_list = []
text = []
#Taking the value of n as input
n = int(input())
#taking input of n words
for i in range(n):
words_list.append(input())
#Taking the value of m
m = int(input())
#Taking input of text
for i in range(m):
text.append(input())
#loop for each line in text
for index, text_line in enumerate(text):
#loop for each pattern in words_list
for pat in words_list:
tmp = search(pat, text_line)
if tmp != -1:
#printing the line and position
print(index+1, tmp+1, sep = ' ')
return
# if no inappropriate word is found, print "Passed."
print("Passed")
#execute main
main()

You can also try this code with Online Python Compiler
Run Code
Input
4
dear
dream
Baby
lovingly
5
I was feeling sleepy, so I went to bed early
I did my toothbrush, and the next thing I
remember, is being in a spaceship,
which I was dreaming about floating in
It was a good dream and a great sleep.
Output
4 13
Complexities
Time Complexity
O(m*n*(tl-pl+1)*pl), where m is the number of lines, n is the number of words, tl is the length of a line in the search function, and pl is the length of a word in the search function.
Reason: In the search function, we have two loops, so the complexity becomes O((tl-pl+1)*pl), then in the main function, first of all, we are traversing for the number of lines and then for the number of words, so that gives the overall complexity of O(m*n*(tl-pl+1)*pl)
Space Complexity
O(n+m), where n is the number of words and m is the number lines in the text
Reason: To store the n number of words, we have created the words_list, and to keep the m lines of text, we have created the text list. Both the list depends on the value of n and m.
Approach 2: KMP Pattern Searching algorithm
We have seen that the brute force approach is not good enough for large values of m and n. KMP is the abbreviation for Knuth Morris Pratt. It makes use of the degenerating property, which reduces the algorithm's time complexity and optimizes the program. The algorithm takes into consideration the fact that while looking for a pattern, we also look at part of the characters in the following window, so this algorithm uses this information to avoid checking the windows which we already know are not going to match.
The implementation of this algorithm has two parts- the first is the creation of a temporary list for each word, and the second is the search part.
The Steps of implementation for preprocessing part:
- For each word in the words_list, do the following.
- Create a list of lengths same as the length of the word with all elements as 0 (say, temp)
- The value at the first index is always 0
- Loop over the word with i and initialize j =0; if word[i] is equal to the word[j], increment both i and j and store the value of j to the temp[i]
- If word[i] is not equal to the word[j] and j is not equal to 0, then change j to the value of temp[j-1] and continue the process
-
Else If j is 0 then assign temp[i] to 0 and increment i.
This implementation will create a temporary list that gives the length of the suffix, which is also a prefix for each position in the word.
The steps of implementation for the search part:
- Assign the length of the word to m and the size of the line of text to n
- Initialize both i and j with 0 to traverse the text line and word respectively and then loop over i while i is less than n
- Check if word[j] is equal to text[i], if equal then increment both i and j
- Now, if j is equal to m, then the word is found, and we return the index and exit the function
- Now, if word[j] is not equal to text[i] and i is still less than n, then we reassign j the value of temp[j-1] if j is not equal to 0. Otherwise, i is incremented.
Python Implementation
def create_temp(pat, temp):
m = len(pat)
j, i = 0, 1
#loop to traverse over the text line
while i < m:
#if the value is equal increment both i and j and stores the value at j in temp[i]
if pat[i]== pat[j]:
j += 1
temp[i] = j
i += 1
else:
if j == 0:
temp[i] = 0
i += 1
else:
j = temp[j-1]
def KMPsearch(pat, text,temp):
m = len(pat)
n = len(text)
#Initialize i and j with 0 to traverse the text and pat respectively
j, i = 0, 0
#loop over to traverse the text
while i < n:
if pat[j] == text[i]:
i += 1
j += 1
#check if pattern is found or not and return index if found
if j == m:
return i-j
elif pat[j] != text[i] and i < n:
if j != 0:
j = temp[j-1]
else:
i += 1
return -1
def main():
#creating an empty list of inappropriate words and text
words_list = []
text = []
#Taking the Inputs
n = int(input())
for i in range(n):
words_list.append(input())
m = int(input())
for i in range(m):
text.append(input())
#To create temporary list for each word
temp_list =[]
for pat in words_list:
temp = [0] * len(pat)
create_temp(pat, temp)
temp_list.append(temp)
# Loop over each line in the text
for index, text_line in enumerate(text):
#loop over each word to check for its presence in the text line
for index2, pat in enumerate(words_list):
temp = temp_list[index2]
res = KMPsearch(pat,text_line,temp)
if res != -1:
print(index+1, res+1, sep = ' ')
return
print("Passed")
#execute main
main()

You can also try this code with Online Python Compiler
Run Code
Input
4
dear
dream
Baby
lovingly
5
I was feeling sleepy, so I went to bed early
I did my toothbrush, and the next thing I
remember, is being in a spaceship,
which I was dreaming about floating in
It was a good dream and a great sleep.
Output
4 13
Complexities
Time Complexity
O(n*m*l), where n is the number of words and m is the number of lines, and l is the length of the longest line of text.
Reason: We have to traverse for each line and then for each word and then traverse the line of text; hence there are three loops; therefore, the complexity of this algorithm is O(n*m*l)
Space Complexity
O(n+m+n), where n is the number of words and m is the number lines in the text
Reason: To store the n number of words, we have created the words_list, and to keep the m lines of text, we have created the text list. Also, the temporary list is created, so another n. Both the list depends on the value of n and m.
Approach 3: Rabin Karp Algorithm
The Rabin Karp approach to pattern search is based on the hash function notion. The hash value of the pattern is compared to the hash value of the substrings of the text to be checked. For all the substrings whose hash value matches the pattern, individual characters are traversed and checked.
The steps of implementation are:
- Take the input for the number of words n, n number of words, the number of lines in text m, and m lines of text.
- Loop over the text line to take into account each line and then loop over the words_list to consider each word
- Take the value of d, the number of characters equal to 256, the value of q, any prime number (say101)
- Calculate the value of h
- calculate the hash value of the pattern as well as the first window
- Slide over the text and check if the hash value of the window matches the hash value of the pattern. If the value matches, then check the individual characters to see if the pattern matches the window
- If the pattern matches the window, return the index and exit loop.
- Otherwise, calculate the hash value of the next window by adding the leading character and removing the trailing character.
- Repeat the step until a pattern matches or all the windows are traversed
Python Implementation
def RKsearch(pat, text):
d = 256 #number of characters in alphabet generally 256
q = 101 #any prime number which will control integer overflow
m = len(pat)
n = len(text)
i, j = 0, 0 # to traverse text and pattern
phash = 0 # to store the hash value for pattern
thash = 0 # to store the hash value for txt
h = 1 # initialize
#calculate the value of hash h
for i in range(m-1):
h = (h*d)%q
#calculate the hash value of pattern
for i in range(m):
phash = (d*phash + ord(pat[i]))%q
#calculate the hash value of first window
for i in range(m):
thash = (d*thash + ord(text[i]))%q
#Loop over to traverse the text
for i in range(n-m+1):
#first check if the hash values of the current window match with the hash value of the pattern
if thash == phash:
#now check for the individual characters
for j in range(m):
if text[i+j] != pat[j]:
break
else: j+=1
if j == m:
return(i) #if the pattern is found return index
#now we calculate the hash of next window
if i < n-m:
thash = (d*(thash-ord(text[i])*h) + ord(text[i+m]))%q #add the leading character and remove the trailing character
#check if the current hash is negative
if thash < 0:
thash += q #use modular ability to change it to positive
return -1
def main():
#creating an empty list of inappropriate words and text
words_list = []
text = []
#Taking the Inputs
n = int(input())
for i in range(n):
words_list.append(input())
m = int(input())
for i in range(m):
text.append(input())
#loop for each line in text
for index, text_line in enumerate(text):
#loop for each pattern in words_list
for pat in words_list:
tmp = RKsearch(pat, text_line)
if tmp != -1:
#printing the line and position
print(index+1, tmp+1, sep = ' ')
return
# if no inappropriate word is found, print "Passed."
print("Passed")
#execute main
main()

You can also try this code with Online Python Compiler
Run Code
Input
4
dear
dream
Baby
lovingly
5
I was feeling sleepy, so I went to bed early
I did my toothbrush, and the next thing I
remember, is being in a spaceship,
which I was dreaming about floating in
It was a good dream and a great sleep.
Output
4 13
Complexity
Time Complexity
O(n*m*lw*lt), where n is the number of words, m is the number of lines in the text, lw is the length of the longest word, and lt is the length of the longest line of text.
Reason: We have to traverse for each line and then for each word, so that gives the n*m complexity. Now each line and word, we have to traverse the text to slide the window and get substrings, and then when the hash values match, the word is crossed to check for individual characters.
Space Complexity
O(n+m), where n is the number of words and m is the number lines in the text
Reason: To store the n number of words, we have created the words_list, and to keep the m lines of text, we have created the text list. Both the list depends on the value of n and m.
FAQs
-
What is hashing?
The process of converting a given key into another value is known as hashing. Here we have converted the string into numeric value to reduce the computation cost and reduce the run time.
-
What is String matching algorithms?
It is a type of algorithm where we find one, several, or all occurrences of a pattern in the given string. Knuth Morris Pratt and Rabin Karp are examples of such algorithms.
-
How is the hash value of the next window calculated?
To calculate the hash value of the next window, the hash of the trailing character is removed, and the hash of the leading character is added to the current hash value.
-
What are spurious hits in Rabin-Karp?
Spurious hits in Rabin Karp occur when the hash values of the pattern and the window match, but the actual window does not match the pattern.
-
What are the worst case and best case complexity of the Rabin Karp algorithm?
The best-case complexity of the Rabin Karp algorithm is O(m+n), and the worst-case complexity is O(nm).
Key Takeaways
In this article, we learned how to solve the problem of "Inappropriate Words filtering". The brute force approach is not suitable for larger values, so we use optimized algorithms. The KMP and Rabin Karp are the two algorithms that optimize the problem and give a better and more efficient solution.
Are you planning to ace the interviews of reputed product-based companies like Amazon, Google, Microsoft, and more?
Attempt our Online Mock Test Series on Coding Ninjas Studio now!
Happy Coding!