Table of contents
1.
Introduction
2.
Problem Statement
2.1.
Example
3.
How to Approach?
4.
Dynamic Programming Approach
5.
C++ Implementation
5.1.
Time Complexity
5.2.
Space Complexity 
6.
Frequently Asked Questions
7.
Key Takeaways
Last Updated: Mar 27, 2024

x-Prime Substring

Author Yukti Kumari
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

In this article, we will learn to solve the problem of the x-prime substring. We will see the problem from different angles and analyze the given constraints to reach the optimal solution eventually. The main algorithm we will be using is Aho-Corasick Algorithm. So, it is recommended to brush up on your knowledge of the Aho-Corasick Algorithm as a prerequisite for this blog.


Let’s get started with the problem statement.

Problem Statement

You are given an integer x and a string s consisting of digits from 1 to 9 inclusive. 

A substring of a string is a contiguous subsequence of that string.

Let f(l,r) be the sum of digits of a substring s[l..r].

Let's call substring s[l1..r1] x-prime if - 

  • f(l1,r1)=x 
  • there are no values l2,r2 such that
    • l1≤l2≤r2≤r1
    • f(l2,r2)≠x
    • x is divisible by f(l2,r2)

You can erase some characters from the string. After erasing a character of the string, the two resulting parts of the string are concatenated.


Find the minimum number of characters you should erase from the string so that there are no x-prime substrings in it if no such x-prime substrings exist in the given string s, print 0. 

Constraints: The length of string s lies in the range [1,1000], and the integer x lies in the range [1,20].


Let’s understand the problem statement with an example.

Example

Input

string s= “116285317” and x=8


Output

2

Explanation 

Since x=8, we first have to find an 8-prime substring in s.

According to the definition of an x-prime substring, there are two 8-prime substrings,  “8” and “53” in 116285317 because the sum of the digits of both of these substrings is equal to 8, and there are no substrings in the two strings such that their sum is not 8 and it divides 8.


Now, the following substrings also have their sum of digits as 8, but they are not 8-prime substrings:

  • “62” - Because it has the substring “2”, such that 2 is not equal to 8, and it divides 8. 
  • “17” - Because it has the substring “1”, such that 1 is not equal to 8, and it divides 8. 
  • “116” - Because it has the substring “1”, such that 1 is not equal to 8, and it divides 8.


Next, we have to erase the minimum number of characters so that none of the 8-prime substrings remains. 

One of the optimal choices is to erase 2 characters: 116285317, after which string s will be 1162317, and we can observe that after the above operation, no 8-prime substrings exists in s. 

Note that there can be more than one optimal choice. Like we can also erase the characters 8 and 3 to get rid of all 8-prime substrings as 116285317 to get 1162517. 

Hence, the output is 2.

Please try to solve this problem on your own before moving on to further discussion here.

How to Approach?

Our first step should be to find all the x-prime substrings because after that, only we can proceed to erase some characters to eliminate all the x-prime substrings.


How to find all the x-prime substrings?
We will see the brute force approach to do this.

The steps are as follows:

  • Iterate over all the substrings of the string s.
  • For each substring s[i,..j], check the sum of its digits. 
  • If the sum is equal to x, then proceed to check all of its substrings to find if there exists a substring of s[i,..j] such that its sum of digits is not equal to x, but it divides x. If so, then this means the current substring s[i,..j] is not x-prime. 


There are some important points that we can deduce by observing the constraints of the given problem, which is 1<=x<=20 and 1<=|s|<=1000 - 

  • Since x can not be more than 20, the total count of x-prime strings of any given string will be relatively small. 
  • If we consider the sum of the lengths of all the x-prime strings, it will also be small and can be up to a maximum value of 5000.

Following the above two points, now our problem is - 

  • We have a string s having a maximum length of 1000 and a set of x-prime strings whose total length is up to 5000. 
  • We have to erase a minimum number of characters from that string s such that none of the strings from the set of x-prime strings appears as a substring of the resulting string s.


Now, how to solve this problem?

Whenever in any problem, we have to make some choices, and out of all possible choices we need to select the most optimal choice then this clearly points to using “Dynamic Programming”.

Let’s see the DP approach in detail in the next section.

Dynamic Programming Approach

The steps of implementation of the DP approach are as follows:

  • Generate all the x-prime strings for the integer x.
     
  • Build a trie of all the x-prime strings to match their prefixes according to the aho-corasick algorithm.
     
  • Each vertex of the trie can be interpreted as a position in one or more strings from the x-prime strings set.
     
  • Define a 2-D dp[][] array with the number of rows=length of string s + 1 and the number of columns = size of trie constructed.
     
  • Initialize the dp array with INT_MAX.
     
  • The value dp[x][y] represents the minimum number of characters to be erased if we consider the first x characters of string s, and it does not consist of any of the strings from the set.
     
  • y is the length of the longest suffix of the current string that coincides with some prefix of a string from the set.
     
  • If there exists a string from the set, which is a suffix of y, then there is an occurrence of an x-prime string. 
     
  • Else, we don't have any x-prime string ending in the last character because the longest possible suffix that matches some prefix from the set has been taken into account. 
     
  • So, summing up the transitions in dp, we have two choices: to take the next character and recompute or ignore the next character and increment the current number of characters to be erased by 1.


Note: After reading the Algorithm part, it is recommended to try writing the code on your own before reading the solution code.


Let’s see the code implementation and the time and space complexity analysis in the next section.

Also check out - Substr C++

C++ Implementation

/*
   C++ Program to Find the minimum number of characters you should
   erase from the string so that there are no x-prime substrings in it
*/


#include <bits/stdc++.h>
using namespace std;
const int AL = 9; // as the string only contains integers from 1 to 9
const int N = 5000;
const int INF = 1e9;
// defining node structure for the trie for Aho-corasick algorithm
struct node
{
    int next[AL];
    int p; // state
    char pch;
    int link;
    int go[AL];
    // bool term;
    bool leaf = false;
    node()
    {
        memset(next, -1, sizeof(next));
        memset(go, -1, sizeof(go));
        link = p = -1;
        leaf = false;
    }
    int &operator[](int x)
    {
        return next[x];
    }
};


string s;
string t;
int x;


/* Implementing Aho-corasick algorithm */
// creating the trie as a vector of the struture "nodes"
vector<node> trie;


// function to add a string s to the trie
void add_string(string s)
{
    int v = 0;
    for (char it : s)
    {
        int c = it - '1'; // '1'<=it<='9 so 0<=c<=8
        if (trie[v][c] == -1)
        {
            trie.push_back(node());
            trie[trie.size() - 1].p = v;
            trie[trie.size() - 1].pch = c;
            trie[v][c] = trie.size() - 1;
        }
        v = trie[v][c];
    }
    trie[v].leaf = true;
}
int go(int v, int c);
int get_link(int v)
{
    if (trie[v].link == -1)
    {
        if (v == 0 || trie[v].p == 0)
            trie[v].link = 0;
        else
            trie[v].link = go(get_link(trie[v].p), trie[v].pch);
    }
    return trie[v].link;
}
int go(int v, int c)
{
    if (trie[v].go[c] == -1)
    {
        if (trie[v][c] != -1)
            trie[v].go[c] = trie[v][c];
        else
            trie[v].go[c] = (v == 0 ? 0 : go(get_link(v), c));
    }
    return trie[v].go[c];
}


// function to generate all x-prime strings
void generateAll_X_Prime_Strings(int i, int sum)
{
    if (sum == x)
    {
        bool is_x_prime = true;
        for (int l = 0; l < int(t.size()); ++l)
        {
            int cur = 0;
            for (int r = l; r < int(t.size()); ++r)
            {
                cur += (t[r] - '0');
                if (x % cur == 0 && cur != x)
                    is_x_prime = false;
            }
        }
        if (is_x_prime)
        {
            add_string(t);
        }
        return;
    }
    for (int j = 1; j <= min(x - sum, 9); ++j)
    {
        t += '0' + j;
        generateAll_X_Prime_Strings(i + 1, sum + j);
        t.pop_back();
    }
}


int main()
{
    s = "116285317";
    x = 8;
    int minNumCharacters;
    trie.push_back(node());
    generateAll_X_Prime_Strings(0, 0);
    vector<vector<int>> dp(s.size() + 1, vector<int>(trie.size(), INF));
    dp[0][0] = 0;
    for (int i = 0; i < s.size(); i++)
    {
        for (int j = 0; j < trie.size(); j++)
        {
            if (dp[i][j] != INF)
            {
                dp[i + 1][j] = min(dp[i + 1][j], dp[i][j] + 1);
                int next = go(j, s[i] - '1');
    //if by adding the next character we dont get a x-prime string
                if (!trie[next].leaf)
                    dp[i + 1][next] = min(dp[i + 1][next], dp[i][j]);
            }
        }
    }
    minNumCharacters = *min_element(dp[s.size()].begin(), dp[s.size()].end());
    cout << "The minimum number of characters to be erased from the string = " << minNumCharacters << endl;
    return 0;
}
You can also try this code with Online C++ Compiler
Run Code

Output

The minimum number of characters to be erased from the string = 2
You can also try this code with Online C++ Compiler
Run Code

Time Complexity

O(n*number_of_xPrime_strings), where n-length of the string s.

This is because we are using two nested loops, one of length n and the other of length equal to trie size, which in the worst case can be equal to the number of x-Prime strings. Hence, the total time complexity is O(n*number_of_xPrime strings).

Space Complexity 

We use a 2D dp array of size=(n*number_of_xPrime_strings) and a trie for the aho-corasick algorithm which uses O(alphabet_size * average key length * number_of_xPrime strings) space. Here alphabet_size is equal to 9 because the string only contains numbers from 1 to 9. Thus the total space complexity is O(n*number_of_xPrime_strings) + O(9 * average key length * number_of_xPrime strings).

Check out this problem - Frog Jump

Frequently Asked Questions

  1. What is the proper prefix of a string?
    The proper prefix of a string is a substring that is not equal to the length of the string itself.
     
  2. What is a trie?
    A trie is an ordered data structure, a search tree used to store associative data structures. It is also referred to as a Radix tree or Prefix tree.
     
  3. Where can you use a trie?
    You use a trie to take a partial value and return a set of possible complete values.
     
  4. What is the Aho-Corasick algorithm?
    The Aho-Corasick algorithm uses a trie data structure to match multiple patterns against a large sequence of text efficiently.
     
  5. What is a string matching problem?
    The problem of finding occurrence(s) of a pattern string within another string or body of text. 

Key Takeaways

In this article, we learnt to solve the problem of an x-prime substring by carefully observing the constraints given and crafting the approach to reach the optimal solution.

We used Trie, dynamic programming and the Aho-Corasick algorithm to reach the solution.

To learn the concepts used in this article, check out the following articles- 

 

Recommended problems -

 

Are you planning to ace the interviews of reputed product-based companies like Amazon, Google, Microsoft, and more? Attempt our Online Mock Test Series on Coding Ninjas Studio now!

Live masterclass