Last Updated: 23 Jan, 2021

Regular Expression Matching

Hard
Asked in companies
FacebookGrowwSAP Labs

Problem statement

Given an input string 'S' and a pattern 'P', implement a regular expression matching with the support of two special characters ‘.’ (dot) and ‘*’(asterisk) where

1. ‘.’ matches to any single character.
2. ‘*’ matches to zero or more of the preceding element.

If the input string 'S' matches the pattern 'P', then return true else, return false.

Note:
1. You have to match the entire string with the pattern given.

2. Both the strings, 'S' and 'P' contain only lower-case alphabets.

3. Only the pattern will contain additional characters ‘*’ and ‘.’ along with alphabets.
Input Format:
The first line contains a single integer T representing the number of test cases.

The one and only line of each test case contains two space-separated strings 'S' and 'P' respectively. 
Output Format :
For each test case, print a single line containing “true” if the string matches the pattern, else print ”false”.

The output of every test case is printed in a separate line.
Note:
You do not need to print anything. It has already been taken care of. Just implement the given function.
Constraints:
1 <= T <= 10
1 <= N <= 1000
1 <= M <= 1000    

Where 'T' is the number of test cases, ‘N’ is the length of the input string to be matched, and ‘M’ is the length of the pattern 'P'.

Time limit: 1 sec.

Approaches

01 Approach

  • Since there will be a lot of possibilities to explore, the best way is to try to solve the problem recursively.
  • Let isMatch be the function which takes the string S and pattern P as input and returns true if the string matches the pattern else returns false.
  • The base case of this recursion will be if the length of P is 0, then the length of S should also be 0.
  • One thing to note is that ‘*’ will be only present in the pattern if there are some previously occurring characters in the pattern, i.e. ‘*’ cannot be the first character.
  • So if the second character is ‘*’ in the pattern, we can just remove the first character from the string and check from the second character onwards, i.e return isMatch(S.substr(1), P.substr(2)).
  • Another case which is possible is that if the first character in the pattern is ‘.’ or both the characters are equal, we can match it with the character in the string. So we can just return isMatch(S.substr(1), P), where substr is a function used to find the substring from 2nd to last character of the string.
  • If the second character is not ‘*’, so we cannot copy any preceding character. Hence we just need to check whether the characters are equal in both or not, or the first character in the pattern is ‘.’. So we can return isMatch(S.substr(1), P.substr(1)).

02 Approach

  • Since the recursion is too slow and inefficient, we can try to memoize the previously calculated results so that we don’t calculate them again and again.
  • Let memo[N + 1][M + 1] be an integer memoization table which will store the states of the recursion. Initially all the values in this table are -1.
  • memo[i][j] will represent the state whether S[0...i - 1] matches P[0… j - 1] or not.
  • Let isMatch() be the boolean function which takes the string S and pattern P as input and returns true if the string matches the pattern else returns false.
  • isMatch() takes i , j, memo, S, P as parameters. Where i is the pointer on S and j is the pointer on P.
  • Let N = length of S and M = length of P
  • The base case will be if j = M, then i should also be equal to N. On memoizing it we get, dp[i][j] = (i == N).
  • The memoization case will be if memo[i][j] != -1 which means that we have already found the value, and we can directly return it.
  • We will have two cases:
    • If P[j + 1] = ‘*’, that means we can match it with preceding character, so we will have
      • Either we can match S[i] with P[j + 2]
      • or if P[j +1] = ‘.’ then we can match S[i + 1] with P[j + 1].
      • In other words,  if(isMatch(i,S,j+2,S,memo) || i<N && (P[j] == '.' || S[i] == P[j]) && isMatch(i+1,S,j,P,memo))

  return memo[i][j] = 1;

  • If this condition is not true, we can just return memo[i][j] = 0.

03 Approach

  • Since we are considering a bottom up approach, we will have to consider small cases first and then use the previously calculated values to find the answer.
  • Let DP[N + 1][M + 1] be the boolean table where DP[i][j] represents whether S[0...i - 1] matches P[0… j - 1] or not.
  • So for all the lengths greater than 0 of string S, DP[i][0] will be false.
  • We can also observe that p[0.., j - 3, j - 2, j - 1] matches empty iff p[j - 1] is '*' and p[0..j - 3] matches empty.
  • Now we run two nested loops as follows :
    • L1: For i = 1 to N, considering all the indices of string S.
    • L2: For j = 1 to M, considering all the indices of pattern P.
      • Now if P[j - 1] != ‘*’, that means we cannot match it to a preceding index, so DP[i][j] will be equal to DP[i - 1][j - 1] && (s[i - 1] == p[j - 1] || '.' == p[j - 1]);
      • Else as shown in the recursive approach we just have to check whether S[i - 1] is equal to  P[j - 2]  or not, i.e.  DP[i][j] = dp[i][j - 2] || (s[i - 1] == p[j - 2] || '.' == p[j - 2]) && dp[i - 1][j]
  • After this loop ends, we will have our answer at DP[N][M].