Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Problem statement
2.
Brute Force Approach
2.1.
Algorithm
2.2.
C++ code
2.3.
Time Complexity 
2.4.
Space Complexity 
3.
Efficient Approach
3.1.
Implementation
3.2.
Time Complexity
3.3.
Space Complexity
4.
FAQs
5.
Key takeaways
Last Updated: Mar 27, 2024

Ninja and his Strings

Problem statement

Ninja has stored his strings in an array ‘A’ containing ‘N’ strings and their prices in an array ‘B’ containing ‘N’ integers such that the price of string ‘A[i]’ is equal to ‘B[i]’. Also, it is guaranteed that |A[i]| <= 26

For a given string ‘S’, he calculates its value using the above strings and their prices using the formula - 

              value(S) = ∑0N-1(Fi(‘a’) * B[i])                                                         (1)

Here, fi(S) = Number of substrings of S that are equal to string A[i]

          B[i] = Price of string A[i]

We have been given two strings ‘X’ and ‘Y’ and we can create a string ‘S’ by - 

S = substr(X) + substr(Y)                                                                              (2)

Here, substr(X) = Any substring of X (possibly empty)

          substr(Y) = Any substring of Y (possibly empty)

          And ‘+’ denotes the concatenation of two strings.

The task is to find the maximum value calculated using the given equation (1) with any string ‘S’ created by using equation (2).

Let’s take an example to understand the problem.

Suppose N = 3 and the strings of Ninja and their prices are given b the following arrays: 

A = {“ab”, “a”, “b”}

B = {2, 3, 1}

Also, given X = “bab” and Y = “ab”

We have to find the maximum value given by equation -

    value(S) = ∑0N-1(Fi(‘a’) * B[i])

We know that we can create string S by using the substrings of X and Y.

All the substrings of X = {“b”, “ba”, “bab”, “a”, “ab” }

All the substrings of Y = {“a”, “ab”, “b”}

S = substr(X) + substr(Y), where substr(str) = Any substring of ‘str’ (possibly empty)

So, all the possible value of S = {“”, “a”, “ab”, “b”, “ba”, “bab”, “bb”, “baa”, “baab”, “bab”, “baba”, “babab”, “babb”, “aa”, “aab”, “aba”, “abab”, “abb”}

Now, we have to find the value of each of these possible ‘S’ and find the maximum value among them.

Let’s calculate the value of each of the strings of the array -

S = {“”, “a”, “ab”, “b”, “ba”, “bab”, “bb”, “baa”, “baab”, “bab”, “baba”, “babab”, “babb”, “aa”, “aab”, “aba”, “abab”, “abb”}

We can calculate value of string “a” in the following way:

value(“a”) = ∑0N-1(Fi(‘a’) * B[i]) 

= ∑0N-1(Number of substrings of ‘a’ that are equal to A[i] * B[i])

= ((Number of substrings of “a” equal to A[0]) * B[0]) + (Number of substrings of “a” equal to A[1]) * B[1]) + (Number of substrings of “a” equal to A[2]) * B[2]))

= ((Number of substrings of “a” equal to “ab”) * B[0]) + (Number of substrings of “a” equal to “a”) * B[1]) + (Number of substrings of “a” equal to “b”) * B[2]))
= ((0 * 2) + (1 * 3) + (0 * 1))

= 3

Similarly, calculate the values of each of the possible strings ‘S’ and find the maximum among them.

Now that we understand the problem let's discuss the approach to solve it in the next section.

Also see, Euclid GCD Algorithm

Brute Force Approach

This section will discuss the brute force approach to find the maximum value of a string generated by concatenating any substring of X with any substring of Y. The simple approach is to first generate all the possible strings ‘S’ by concatenating each substring of X with each substring of Y. Then calculate the value of each of these strings using the given equation (1) and find the maximum among them. 

Algorithm

Step 1. First, the main function. Inside the main function, create variables and store the given values of ‘N’ , arrays ‘A’ and ‘B’ , and strings ‘X’ and ‘Y’. Then calculate the lengths of X and Y and call the function to find all the substrings of X and Y. Then create a variable ‘maxValue’ to store the maximum value of a string generated by concatenating any substring of X with any substring of Y. Then concatenate each substring of ‘X’ with each substring of ‘Y’ to get ‘S’ and call the function to calculate the value of ‘S’ and keep updating the value of variable ‘maxValue’.

Step 2. Next, create the function “findSubstrings()”, which takes a string and its length as inputs and returns the vector of strings containing all the substrings of the given string. Inside the function, create a “for loop” which runs for the length of the substring from 1 to N and then create two more “for loops” inside it to take the value of starting and ending indices of the substring. Generate each substring and store it into a vector and finally return the vector containing all the substrings.

Step 3. Next, create the function “findValue()” to find the value of a string which takes the following inputs:

  1. A : array of strings that Ninja have
  2. B: array containing prices of strings that Ninja have
  3. N: number of strings that Ninja have
  4. S: string whose value is to be calculated

Inside the function, run a “for loop” from i = 0 to i = N - 1 to calculate the value given by the equation - 

                               value(S) = i = 0N-1(fi(S) * B[i])

To get the value of fi(S), call the function “f()” which takes ‘S’ and ‘A[i]’ as input.

Step 4.  Finally, create the function “f()” which takes strings ‘S’ and ‘A[i]’ as input and returns the count of substrings of ‘S’ that are equal to A[i]. Inside the function, first call the function “findSubstrings()” to find all the substrings of ‘S’.Then create a variable ‘count = 0’ , traverse through all the substrings of ‘S’ and increase the value of count if any substring is equal to A[i]. After traversing through all the substrings of ‘S’, return the variable ‘count’.

C++ code


/*
   C++ program to find the  find the maximum value of a string generated by concatenating any 
   substring of X with any substring of Y
*/
#include <bits/stdc++.h>
using namespace std;


// Function to return all the substrings f str
vector<string> findSubstrings(string str, int n)
{
vector<string> ans;

for(int len = 1; len <= n; len++)
{
  for(int i = 0; i <= n-len; i++)
  {
    int j = i + len - 1;  
    string s;
            for (int k = i; k <= j; k++)
              {
                s.push_back(str[k]);
              }
            ans.push_back(s);
  }
}
 
    return ans;
}


// Function to return the number of substrings of S that are equal to Ai
int f(string S, string Ai)
{
int lenS = S.length();
vector<string> substrS = findSubstrings(S, lenS);

int count = 0;
for(int i=0; i<substrS.size(); i++)
  {
   if(substrS[i] == Ai) {
     count++;
   }
  }
    return count;
}


// Function to return the maximum value of the string S Calculated using the formula given in the question
int findValue(string A[], int B[], int N, string S)
{
   int val = 0;
   
   for(int i=0; i<N; i++)
     {
      val = val + (f(S, A[i]) * B[i]); 
     }
     
   return val;
}


// Main function
int main() 
{


// Number of strings that Ninja have
int N = 3;

// Given Ninja's strings
string A[3] = {"ab", "a", "b"};

// Given Prices of strings of Ninja
int B[3] = {2, 3, 1};

// Given strings whose substrings are used to generate string S
string X = "bab";
string Y = "ab";

// Calculating the lengths of X and Y
int lenX = X.length();
int lenY = Y.length();

// Call the function to find all the substrings of X 
vector<string> substrX = findSubstrings(X, lenX);

// Call the function to find all the substrings of Y 
vector<string> substrY = findSubstrings(Y, lenY);

// Variable to store the maximum value
int maxValue = 0;

// Concatenate every substring of X with every substring of Y to get S and calculate its value
for(int i = 0; i < substrX.size(); i++)
{
  for(int j = 0; j < substrY.size(); j++)
    {
     string S = substrX[i] + substrY[j];
      
     // Call the function to calculate value of S
     int val = findValue(A, B, N, S);
      
     // Update the maximum value
     maxValue = max(maxValue, val);
    }
}
 
// Print the calculated maximum value
cout<<"The maximum possible value is: "<<maxValue<<endl;
 
 }

Time Complexity 

In the above program to to find the  find the maximum value of a string generated by concatenating any substring of X with any substring of Y, the following steps have taken time:

  1. Function call to find all the substrings of X - Time Complexity = O(Nx ^ 3)
  2. Function call to find all the substrings of X - Time Complexity = O(NY ^ 3)

The function to find all the substrings of a string is taking cubic polynomial time complexity as three nested loops are used inside the function

  1. For concatenating every substring of X with every substring of Y, two nested “for loops” are used - Time Complexity = O(Number of substrings of X * Number of substrings of Y).

Number of substrings of a string of length ‘n’ = (n*(n+1))/2

So, here time complexity of these two nested loops is O(Nx^2 * NY ^ 2)

Inside these two nested loops, the function “findValue()” is called. Inside the function a “for loop” is used inside which function “f()” is called. Inside the function “f()”, “findSubstrings()” function is called for ‘S’. The longest length of ‘S’ can be NX + NY. So, the worst case time complexity for “findValue()” function will be O (N * ( ((NX + NY) ^ 3) + ((NX + NY) ^ 2)))

Thus the  time complexity = O(Nx ^ 3) +  O(NY ^ 3) + O((Nx^2 * NY ^ 2) * ((N * ( ((NX + NY) ^ 3) + ((NX + NY) ^ 2))))

= O(Nx ^ 3) +  O(NY ^ 3) + O((Nx^2 * NY ^ 2) * (N * ((NX + NY) ^ 3))

Where NX and NY are the length of strings X and Y respectively and N is the total number of strings that Ninja have.

Space Complexity 

In the above program to find the maximum value of a string generated by concatenating any substring of X with any substring of Y, we have created vectors to store the total number of substrings of X, Y and S which takes O(NX ^ 2), O (NY ^ 2), and O((NX + NY) ^ 2 ) space respectively. So, the overall space complexity is O((NX + NY) ^ 2 ).

Efficient Approach

Prerequisites: Aho-corasick

Let us make a few observations regarding the optimal choice of S to arrive at an efficient approach:

  1. S will consist of some prefix of X and some suffix of Y. Why? Let’s try to prove it by contradiction. Say the optimal string S contains a substring of X which is not a prefix. Let us denote S as Xix, Xix+1, … Xjx, Yiy, Yiy+1, … Yjy. In other words, S is formed by concatenating the substring of X from index ‘ix’ to ‘jx’ and the substring of Y from index ‘iy’ to ‘jy’. Let V be the value of S.
    Now we can construct another string S’ = X1, X2, … Xjx, Yiy, Yiy+1, …, YNy. In other words S’ is formed by concatenating prefixes of X from index 1 to ‘jx’ and suffix of Y from index ‘iy’ till the end. We can see that S is a substring of S’ and thus, all the substrings of S will also be substrings of S’. Therefore, the value of S’ will be greater than or equal to the value of S.
  2. For a string S = X[0…i] +Y[j…Ny-1], we can divide its substrings into three categories:
    a) The substrings that lie completely in X[0…i].
    b) The substrings that lie completely in Y[j…Ny-1]
    c) The substrings that start in X[0…i] and end in Y[j…Ny-1].

There can be total |X| prefixes of X and |Y| suffixes of Y. We can iterate over all these |X| * |Y| possibilities to calculate the answer. To efficiently calculate the value of each resulting substring, we will use the following:

  1. We will build an automaton on all the given strings in A using Aho-Corasick algorithm. For each node u, we define sum[u] as the sum of the values of all strings whose nodes are ancestors of u in the suffix link tree.
  2. Denote prefixXSum[i] as the sum of the values of all the occurrences of the strings in S in the prefix X[0…i] and prefixXNode[i] as the automaton node corresponding to the prefix X[0…i]. We can calculate prefixXSum and prefixXNode by iterating over A and moving on the automaton character by character.
  3. Similar to step-2, we build suffixYSum as the sum of values of all occurrences of strings from S in Y that end in Y[j…|Y|].
  4. To calculate the values of the substrings that lie in X[0…i] + Y[j…|Y|], we will start at prefixXNode[i] and move character by character in the automaton of the substring Y[j…j+24] and add the sum[node] of each node we visit. This way, we will get the sum of values of all the substrings that start in X[0…i] and end in Y[j…j+24]. Node that the length of strings in A is <= 26. So all the strings that start in X and end in Y will be covered.

Implementation

#include <bits/stdc++.h>

using namespace std;

template <int ALPHABET_SIZE, unsigned char MINIMAL_CHAR>
struct AhoCorasick
{
    static constexpr int NON_EXISTENT_NODE_ID = -1;
    static constexpr int FAKE_NODE_ID = 0;
    static constexpr int ROOT_ID = 1;

    int currentNode;

    vector<array<int, ALPHABET_SIZE>> edges;
    vector<int> suffixLink;

    vector<long long> sum;

    explicit AhoCorasick(const vector<pair<string, int>> &a) : currentNode(ROOT_ID), edges(2),
                                                               suffixLink(2, FAKE_NODE_ID), sum(2, 0)
    {
        edges[FAKE_NODE_ID].fill(ROOT_ID);
        edges[ROOT_ID].fill(NON_EXISTENT_NODE_ID);

        for (const auto &p : a)
        {
            int node = ROOT_ID;
            for (unsigned char c : p.first)
            {
                c -= MINIMAL_CHAR;
                if (edges[node][c] == -1)
                {
                    edges[node][c] = edges.size();
                    edges.emplace_back();
                    edges.back().fill(NON_EXISTENT_NODE_ID);
                    suffixLink.push_back(NON_EXISTENT_NODE_ID);
                    sum.push_back(0);
                }
                node = edges[node][c];
            }
            sum[node] += p.second;
        }

        queue<int> q;
        q.push(ROOT_ID);
        while (!q.empty())
        {
            int node = q.front();
            if (suffixLink[node] != NON_EXISTENT_NODE_ID)
            {
                sum[node] += sum[suffixLink[node]];
            }
            q.pop();
            for (int i = 0; i < ALPHABET_SIZE; i++)
            {
                int child = edges[node][i];
                if (child == NON_EXISTENT_NODE_ID)
                {
                    edges[node][i] = edges[suffixLink[node]][i];
                }
                else
                {
                    suffixLink[child] = edges[suffixLink[node]][i];
                    q.push(child);
                }
            }
        }
    }

    void setNode(int node)
    {
        currentNode = node;
    }

    void resetNode()
    {
        setNode(ROOT_ID);
    }

    long long getCurrentNodeSum()
    {
        return sum[currentNode];
    }

    void move(unsigned char c)
    {
        c -= MINIMAL_CHAR;
        currentNode = edges[currentNode][c];
    }
};

void solve(string X, string Y, vector<pair<string, int>> s)
{

    // Initializing AhoCorasick
    AhoCorasick<26, 'a'> ahoCorasick(s);

    // Initializing prefixXSum and prefixXNode
    vector<long long> prefixXSum(X.size());
    vector<int> prefixXNode(X.size());

    // Iterating over X
    for (int i = 0; i < X.size(); i++)
    {

        // Moving character by character in automaton
        ahoCorasick.move(X[i]);

        // Calculating prefixXSum
        prefixXSum[i] = ahoCorasick.getCurrentNodeSum();
        if (i != 0)
        {
            prefixXSum[i] += prefixXSum[i - 1];
        }

        // Updating prefixXNode
        prefixXNode[i] = ahoCorasick.currentNode;
    }

    ahoCorasick.resetNode();

    // Initializing suffixBSum
    vector<long long> suffixBSum(Y.size());
    for (int i = 0; i < Y.size(); i++)
    {
        // Moving character by character on automaton
        ahoCorasick.move(Y[i]);
        suffixBSum[i] = ahoCorasick.getCurrentNodeSum();
    }
    for (int i = (int)Y.size() - 2; i >= 0; i--)
    {
        suffixBSum[i] += suffixBSum[i + 1];
    }

    long long ans = 0;

    // Iterating over all |X| * |Y| possible strings
    for (int i = 0; i < X.size(); i++)
    {
        for (int j = 0; j < Y.size(); j++)
        {

            // Value of all substrings that lie in X
            long long cur = prefixXSum[i];

            ahoCorasick.setNode(prefixXNode[i]);

            // Calculating value of all strings that start in X and end in Y
            for (int k = j; k <= j + 24 && k < Y.size(); k++)
            {
                ahoCorasick.move(Y[k]);
                cur += ahoCorasick.getCurrentNodeSum();
            }

            // Calculating value of all substrings that lie in Y
            if (j + 25 < Y.size())
            {
                cur += suffixBSum[j + 25];
            }
           
            // Updating the final answer
            ans = max(ans, cur);
        }
    }
    cout << ans << endl;
}

// Driver Code
int main()
{
    int N = 3;

    // Given Ninja's strings
    string A[3] = {"ab", "a", "b"};

    // Given Prices of strings of Ninja
    int B[3] = {2, 3, 1};

    // Given strings whose substrings are used to generate string S
    string X = "bab";
    string Y = "ab";

    // Combining string and value
    vector<pair<string, int>>s(N);
    for (int i=0; i<N; i++){
        s[i].first = A[i];
        s[i].second = B[i];
    }
    solve(X, Y, s);
    return 0;
}

Output:

13

Time Complexity

Building an automation will take O(sum( |A[i]| )). Calculating prefix and suffix arrays will take O(|X| + |Y|) time and finally calculating the value for each prefix and suffix combination will take O(|X| * |Y| * max(|A[i]|)) time. Therefore, the total time complexity of the above approach is O(|X| * |Y| * max(|A[i]|) + sum(|A[i]|)).

Space Complexity

Creating the prefix and suffix arrays will take O(|X| + |Y|) space and aho-corasick algorithm will take O(sum(|A[i]|)) space. Therefore, the total space complexity is O(|X| + |Y| + sum(|A[i]|)).

Also check out - Substr C++

FAQs

  1. What is the Aho-Corasick algorithm?
    Aho-corasick algorithm is a string algorithm that uses a trie-like data structure with some additional links. It then constructs a finite automaton to match multiple patterns against a large sequence of text efficiently.
  2. What is a trie?
    A trie is a search tree used to store associative data structures. It is also referred to as a Radix tree or Prefix tree. It is used to locate specific keys from within a large data set.
  3. What is a prefix sum array?
    A prefix sum array is a data structure whare prefixSum[i] stores the sum of some array till index i. It can be used to answer queries such as sum in a given range, number of elements in a range in constant time.           

Key takeaways

This article discussed the problem “Ninja and his Strings”, the solution approach to the problem, its C++ implementation, and its time and space complexity. If you want to solve more problems for practice, you can visit Coding Ninjas Studio.

If you think that this blog helped you, then share it with your friends!. Refer to the DSA C++ course for more information.

Check out this article - C++ String Concatenation

Until then, All the best for your future endeavors, and Keep Coding.

Live masterclass