String Frequency - Naukri Code 360

Problem statement

Rakesh and Mukesh are best friends. Rakesh has done a competitive programming course from Coding Ninjas. Mukesh has challenged Rakesh with a problem statement and has given 1 hour of time to come up with the solution but somehow Rakesh is not able to solve the problem. So he decided to ask the problem from the teaching assistant in the course. You are one of the teaching assistants and have received Rakesh's query. So please help Rakesh so that he can prove his friend Mukesh. So the problem statement is explained below.

You are given a string 's'. You have to answer Q queries. The i-th query consists of integer ‘vali’ and string ‘xi’. To answer this query we have to find the minimum length of such a string y such that y is a substring of s and 'xi' and has at least valid occurrences as a substring in y. A substring is a contiguous series of characters within a string. The first line contains the string ‘s’ followed by Q queries in each query we are given with integer and string. For each and every query output the answer for it in a new separate line.If a string 'xi' occurs in s

less than 'xi' times the return the answer as -1.

It is for sure guaranteed that for any two queries the strings 'xi' from these queries are different.

Example 1

Input

aaaaa

3 a

3 aa

2 aaa

3 aaaa

1 aaaaa

Output

-1

Explanation

For the first query, we have to find a substring in our main string 'aaaaa', which should be of a minimum length such that the string passed in the query' a' has at least 3 occurrences in that as a substring. So here, the answer can be 'aaa' because for this, all the constraints are satisfied, and this is of minimum length. In 'aaa' we have 3 occurrences of 'a' as a substring also. So the length of 'aaa' for this query is 3.

For the second query, we have to find a substring in our main string 'aaaaa', which should be of a minimum length such that the string passed in the query' aa' has at least 3 occurrences in that as a substring. So here, the answer can be 'aaaa' because, for this, all the constraints are satisfied, and this is of minimum length. In 'aaaa' we have 3 occurrences of 'aa' as a substring also. So the length of 'aaaa' for this query is 4.

For the third query, we have to find a substring in our main string 'aaaaa', which should be of a minimum length such that the string passed in the query' aaa' has at least 2 occurrences in that as a substring. So here, the answer can be 'aaaa' because, for this, all the constraints are satisfied, and this is of minimum length. In 'aaaa' we have 2 occurrences of 'aaa' as a substring also. So the length of 'aaaa' for this query is 4.

For the fourth query, we have to find a substring in our main string 'aaaaa', which should be of a minimum length such that the string passed in the query' aaaa' has at least 3 occurrences in that as a substring. So here the answer is -1 because there is no such substring in our main string that has the occurrence of 3 times as a substring 'aaaa'.

For the last query, we have to mind a substring in our main string 'aaaaa' which should be of a minimum length such that the string passed in the query' aaaaa' has at least 1 occurrence in that as a substring. So here the answer can be 'aaaaa' because for this, all the constraints are satisfied, and this is of minimum length. In 'aaaaa' we have 1 occurrence of 'aaaaa' as a substring also. So the length of 'aaaaa' for this query is 5.

DFS Approach

For string s, we build a suffix automaton(sample), and for each node in sample, we can also know its matching position in string s. For each query, we find the node x in sample which represent string m. By dfs the fail tree of sample, we can get a sequence of the positions of each node in the fail tree of node x. We sort this sequence and traversal it to get the answer. You could use an O(n) linear sorting algorithm for an array whose elements are not very large. It could get rid of the "log" time complexity to construct a more fast solution. Below is the code for the same.

Code:


import java.io.OutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.PrintWriter;
import java.io.PrintStream;
import java.util.Arrays;
import java.io.BufferedWriter;
import java.util.InputMismatchException;
import java.io.IOException;
import java.io.File;
import java.util.ArrayList;
import java.io.Writer;
import java.io.OutputStreamWriter;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.InputStream;

public class Main 
{
   public static void main(String[] args) {
       InputStream inputStream = System.in;
       OutputStream outputStream = System.out;
       InputReader in = new InputReader(inputStream);
       OutputWriter out = new OutputWriter(outputStream);
       TaskD solver = new TaskD();
       solver.solve(1, in, out);
       out.close();
   }

   static class TaskD {
       public static int[] k;
       public static String[] tt;
       public AhoCorasick ac;
       public int[] pword;

       int get(int x) {
           if (pword[x] != -1) return pword[x];
           return pword[x] = (ac.nodes[x].leaf ? x : get(ac.suffLink(x)));
       }

       public void solve(int testNumber, InputReader in, OutputWriter out) {
           char[] s = in.next().toCharArray();
           int n = in.nextInt();
           k = new int[n];
           tt = new String[n];
           for (int i = 0; i < n; i++) {
               k[i] = in.nextInt();
               tt[i] = in.next();
           }
           int maxn = 101010;
           ac = new AhoCorasick(maxn);
           int[] end = new int[n];
           for (int i = 0; i < n; i++) {
               end[i] = ac.addString(tt[i]);
           }
           int[] idd = new int[ac.nodeCount];
           for (int i = 0; i < n; i++) {
               idd[end[i]] = i;
           }
           pword = new int[ac.nodeCount];
           Arrays.fill(pword, -1);
           pword[0] = 0;
           int[] count = new int[n];
           int cnode = 0;
           int p = 0;
           for (char c : s) {
               cnode = ac.transition(cnode, c);
               int w = get(cnode);
               while (w != 0) {
                   count[idd[w]]++;
                   w = get(ac.suffLink(w));
               }
               p++;
           }
           Debug.print(count);
           int[][] pos = new int[n][];
           for (int i = 0; i < n; i++) pos[i] = new int[count[i]];
           int[] pidx = new int[n];
           p = 0;
           cnode = 0;
           for (char c : s) {
               cnode = ac.transition(cnode, c);
               int w = get(cnode);
               while (w != 0) {
                   pos[idd[w]][pidx[idd[w]]++] = p;
                   w = get(ac.suffLink(w));
               }
               p++;
           }

           for (int i = 0; i < n; i++) {
               int r = k[i];
               if (count[i] < r) {
                   out.println(-1);
                   continue;
               }
               int ww = 10000001;
               for (int j = 0; j + r - 1< count[i]; j++) {
                   ww = Math.min(ww, pos[i][j + r - 1] - pos[i][j]);
               }
               out.println(ww + tt[i].length());
           }
       }

   }

   static class Debug {
       public static boolean DEBUG;

       static {
           try {
               DEBUG = System.getProperty("user.dir").contains("Dropbox");
           } catch (Exception e) {
               DEBUG = false;
           }
       }

       private static ArrayList<String> getParams() {
           StackTraceElement[] t = Thread.currentThread().getStackTrace();
           StackTraceElement up = t[3];

           ArrayList<String> ret = new ArrayList<>();
           String qqq = up.toString();
           ret.add("." + up.getMethodName() + qqq.substring(qqq.indexOf("("), qqq.length()));
           try {
               BufferedReader br = new BufferedReader(new FileReader(
                       new File("src/" + up.getClassName().replaceAll("\\.", "/") + ".java")));
               int g = up.getLineNumber();
               while (--g > 0) br.readLine();
               String q = br.readLine();
               String[] ss = q.split("Debug\\.print\\(");
               String[] qq = ss[1].substring(0, ss[1].lastIndexOf(")")).split(",");
               for (int i = 0; i < qq.length; i++) {
                   ret.add(qq[i].trim());
               }
           } catch (Exception e) {
           }
           for (int i = 0; i < 100; i++) ret.add("???");
           return ret;
       }

       public static void print(int[] x) {
           if (!DEBUG) return;
           ArrayList<String> s = getParams();
           System.out.println(s.get(0) + ": ");
           for (int i = 0; i < x.length; i++) {
               System.out.println(s.get(1) + "[" + i + "] = " + x[i]);
           }
           System.out.println("***");
       }

   }

   static class OutputWriter {
       private final PrintWriter writer;

       public OutputWriter(OutputStream outputStream) {
           writer = new PrintWriter(new BufferedWriter(new OutputStreamWriter(outputStream)));
       }

       public OutputWriter(Writer writer) {
           this.writer = new PrintWriter(writer);
       }

       public void close() {
           writer.close();
       }

       public void println(int i) {
           writer.println(i);
       }

   }

   static class AhoCorasick {
       static final int ALPHABET_SIZE = 26;
       public AhoCorasick.Node[] nodes;
       public int nodeCount;

       public AhoCorasick(int maxNodes) {
           nodes = new AhoCorasick.Node[maxNodes];
           // create root
           nodes[0] = new AhoCorasick.Node();
           nodes[0].suffLink = 0;
           nodes[0].parent = -1;
           nodes[0].depth = 0;
           nodeCount = 1;
       }

       public int addString(String s) {
           int cur = 0;
           for (char ch : s.toCharArray()) {
               int c = ch - 'a';
               if (nodes[cur].children[c] == -1) {
                   nodes[nodeCount] = new AhoCorasick.Node();
                   nodes[nodeCount].parent = cur;
                   nodes[nodeCount].charFromParent = ch;
                   nodes[nodeCount].depth = nodes[cur].depth + 1;
                   nodes[cur].children[c] = nodeCount++;
               }
               cur = nodes[cur].children[c];
           }
           nodes[cur].leaf = true;
           return cur;
       }

       public int suffLink(int nodeIndex) {
           AhoCorasick.Node node = nodes[nodeIndex];
           if (node.suffLink == -1)
               node.suffLink = node.parent == 0 ? 0 : transition(suffLink(node.parent), node.charFromParent);
           return node.suffLink;
       }

       public int transition(int nodeIndex, char ch) {
           int c = ch - 'a';
           AhoCorasick.Node node = nodes[nodeIndex];
           if (node.transitions[c] == -1)
               node.transitions[c] = node.children[c] != -1 ? node.children[c] : (nodeIndex == 0 ? 0 : transition(suffLink(nodeIndex), ch));
           return node.transitions[c];
       }

       public static class Node {
           public int parent;
           public char charFromParent;
           public int suffLink = -1;
           public int depth;
           public int[] children = new int[ALPHABET_SIZE];
           public int[] transitions = new int[ALPHABET_SIZE];
           public boolean leaf;

           {
               Arrays.fill(children, -1);
               Arrays.fill(transitions, -1);
           }
       }

   }

   static class InputReader {
       private InputStream stream;
       private byte[] buf = new byte[1 << 16];
       private int curChar;
       private int numChars;

       public InputReader(InputStream stream) {
           this.stream = stream;
       }

       public int read() {
           if (this.numChars == -1) {
               throw new InputMismatchException();
           } else {
               if (this.curChar >= this.numChars) {
                   this.curChar = 0;

                   try {
                       this.numChars = this.stream.read(this.buf);
                   } catch (IOException var2) {
                       throw new InputMismatchException();
                   }

                   if (this.numChars <= 0) {
                       return -1;
                   }
               }

               return this.buf[this.curChar++];
           }
       }

       public int nextInt() {
           int c;
           for (c = this.read(); isSpaceChar(c); c = this.read()) {
               ;
           }

           byte sgn = 1;
           if (c == 45) {
               sgn = -1;
               c = this.read();
           }

           int res = 0;

           while (c >= 48 && c <= 57) {
               res *= 10;
               res += c - 48;
               c = this.read();
               if (isSpaceChar(c)) {
                   return res * sgn;
               }
           }

           throw new InputMismatchException();
       }

       public String next() {
           int c;
           while (isSpaceChar(c = this.read())) {
               ;
           }

           StringBuilder result = new StringBuilder();
           result.appendCodePoint(c);

           while (!isSpaceChar(c = this.read())) {
               result.appendCodePoint(c);
           }

           return result.toString();
       }

       public static boolean isSpaceChar(int c) {
           return c == 32 || c == 10 || c == 13 || c == 9 || c == -1;
       }

   }
}

You can also try this code with Online Java Compiler

Run Code

Input
aaaaa
5
3 a
3 aa
2 aaa
3 aaaa
1 aaaaa

Output
3
4
4
-1
5

Time Complexity

O(N·sqrt(N)·log(N))

The time complexity to solve this problem Frequency of String is O(N·sqrt(N)·log(N)) where ‘N’ is the size of given string. IN the worst case, the complexity of this solution is O(N·sqrt(N)·log(N)), Since we are using dfs which will explore all the paths and in each path we are iterating on the string and we have to perform the same task for all the queries. You could use an O(n) linear sorting algorithm for an array whose elements are not very large. It could get rid of the "log" time complexity to construct a more fast solution.

Space Complexity

O(sum of lengths of words)

The space complexity to solve this problem Frequency of String is O(sum of lengths of words). Since we are creating an extra array to store the occurrence of these characters.

Also check out - Substr C++

Burrows-Wheeler Data Transform Approach

To solve this problem, "Frequency of String," you should have a great grasp of the string matching algorithms. So we can solve this problem using the suffix arrays and Burrows-Wheeler Data Transform Algorithm. In a nutshell, the challenge is to match billions of tiny strings (about 50-100 characters each) to a 3 billion-character-long text. The 3 billion character string (also known as the reference) is known ahead of time and is set in stone (at least for a species). As a result of an experiment, shorter strings (also known as reads) are formed. We need an algorithm that allows us to search a text as many times as possible as quickly as feasible. We are allowed to preprocess the text once if it will help us in achieving this goal. One such algorithm is the FM index. It necessitates a one-time preprocessing of the reference to create an index, after which the query time is proportional to the query length (instead of the reference).

So coming to the point, the Burrows-Wheeler transform is a reversible string transformation that has been widely used in data compression. you can refer here to get more insights about this algorithm. First, you should be aware of how to do this with the suffix arrays.For that, we can use binary search to get the first and last entry. After that, we can get all the occurrences of a word in logarithmic time per word.This can be sped up by using BWT, which takes O(|w|) time for each word w (so the total is O(sum of word lengths) rather than O((number of words) * log(n)). The core notion is that you can keep an interval on the suffix array that contains a pattern suffix, and you can lower the size of this interval in constant time as you add more letters.

Code:

// Java implementation for  Frequency of String using Bwt
// Importing important libraries
import java.util.*;
import java.lang.*;
import java.io.*;


// Java implementation for  Frequency of String using Bwt
public class Main {
    public static void main(String[] args) {


        // Initialization
        InputStream inputStream = System.in;
        OutputStream outputStream = System.out;
        InputReader in = new InputReader(inputStream);
        OutputWriter out = new OutputWriter(outputStream);
        TaskD solver = new TaskD();
        solver.solve(1, in, out);
        out.close();
    }
 
    static class TaskD {
        public static final int K = 26;
        
        // Function for  implementation for  Frequency of String using Bwt
        public void solve(int testNumber, InputReader in, OutputWriter out) {
            String s = in.next() + "$";
            int n = s.length();
 
            int[] suffixArray = SuffixArray.suffixArray(s);
            int[][] occ = new int[n + 1][K];
            int[] totalCount = new int[K + 1];
            for (int i = 0; i < n; i++) {
                int t = s.charAt(suffixArray [i] == 0 ? n - 1 : suffixArray [i] - 1) - 'a';
                if (t >= 0) {
                    occ[i][t]++;
                    totalCount[t + 1]++;
                }
                System.arraycopy(occ[i], 0, occ[i + 1], 0, K);
            }
            for (int i = 1; i < totalCount.length; i++) {
                totalCount[i] += totalCount[i - 1];
            }
 
            int q = in.nextInt();
            while (q-- > 0) {
                int k = in.nextInt();
                char[] dumyArray = in.next().toCharArray();
                int begin = 0, end = n - 1;
                for (int j = dumyArray.length - 1; end - begin + 1 >= k && j >= 0; j--) {
                    int let = dumyArray[j] - 'a';
                    int start = totalCount[let] + (begin == 0 ? 0 : occ[begin - 1][let]) + 1;
                    int nend = totalCount[let] + occ[end][let];
                    begin = start;
                    end = nend;
                }
                if (end - begin + 1 < k) {
                    out.println(-1);
                    continue;
                }
                int[] preComputation = new int[end - begin + 1];
                for (int j = begin; j <= end; j++) preComputation[j - begin] = suffixArray [j];
                Arrays.sort(preComputation);
                int maxValue = Integer.MAX_VALUE;
                for (int j = 0; j + k - 1 < preComputation.length; j++) {
                    maxValue  = Math.min(maxValue , preComputation[j + k - 1] - preComputation[j]);
                }
                out.println(maxValue  + dumyArray.length);
            }
 
        }
 
    }


     //  suffix array for  Frequency of String
    static class SuffixArray {
        public static int[] suffixArray(CharSequence Seq) {
            int seq_size = Seq.length();
 
            // Using lambda function stable sort of characters
            int[] suffixArray  = IntStream.range(0, seq_size).mapToObj(i -> seq_size - 1 - i).
                    sorted((a, b) -> Character.compare(Seq.charAt(a), Seq.charAt(b))).
                    mapToInt(Integer::intValue).toArray();
 
            int[] newclas = Seq.chars().toArray();


            // suffixArray [i] - 
            // suffix on i'th position after sorting by the first 'length' chars
            // newclas[i] - 
            // equivalence class of the i'th suffix after sorting by first length characters
 
            for (int length = 1; length < seq_size; length *= 2) {
                int[] cloneArray = newclas .clone();
                for (int i = 0; i < seq_size; i++) {


                    // condition suffixArray [i - 1] + length < seq_size simulates 0-symbol 
                    // at the end of the string
                    // a separate class is created for each suffix followed by simulated 0-symbol
                    newclas [suffixArray [i]] = i > 0 && cloneArray[suffixArray [i - 1]] == c[suffixArray [i]] 
                    && suffixArray [i - 1] + length < seq_size && cloneArray[suffixArray [i - 1] + len / 2] == 
                    cloneArray[suffixArray [i] + len / 2] ? newclas [suffixArray [i - 1]] : i;
                }


                // Suffixes here are sorted before head by first length characters
                // Now we have to sort the suffixes by first length * 2 characters
                int[] cnt = IntStream.range(0, seq_size).toArray();
                int[] suffixCloneArray = suffixArray .clone();
                for (int i = 0; i < seq_size; i++) {
                    int s1 = suffixCloneArray[i] - length;


                    // Sort only suffixes with a length greater than length; the rest have already
                    // been sorted.
                    if (s1 >= 0){
                        suffixArray [cnt[classes[s1]]++] = s1;
                    }
                }
            }
            return suffixArray ;
        }
 
    }
 
    static class InputReader {
        private InputStream stream;
        private byte[] buf = new byte[1 << 16];
        private int curChar;
        private int numChars;
 
        public InputReader(InputStream stream) {
            this.stream = stream;
        }
 
        public int read() {
            if (this.numChars == -1) {
                throw new InputMismatchException();
            } else {
                if (this.curChar >= this.numChars) {
                    this.curChar = 0;
 
                    try {
                        this.numChars = this.stream.read(this.buf);
                    } catch (IOException var2) {
                        throw new InputMismatchException();
                    }
 
                    if (this.numChars <= 0) {
                        return -1;
                    }
                }
 
                return this.buf[this.curChar++];
            }
        }
 
        public int nextInt() {
            int c;
            for (c = this.read(); isSpaceChar(c); c = this.read()) {
                ;
            }
 
            byte sgn = 1;
            if (c == 45) {
                sgn = -1;
                c = this.read();
            }
 
            int res = 0;
 
            while (c >= 48 && c <= 57) {
                res *= 10;
                res += c - 48;
                c = this.read();
                if (isSpaceChar(c)) {
                    return res * sgn;
                }
            }
 
            throw new InputMismatchException();
        }
 
        public String next() {
            int chara;
            while (isSpaceChar(chara = this.read())) {
                ;
            }
 
            StringBuilder result = new StringBuilder();
            result.appendCodePoint(chara);
 
            while (!isSpaceChar(chara = this.read())) {
                result.appendCodePoint(chara);
            }
 
            return result.toString();
        }
 
        public static boolean isSpaceChar(int chara) {
            return chara == 32 || chara == 10 || chara == 13 || chara == 9 || chara == -1;
        }
 
    }
 
    static class OutputWriter {
        private final PrintWriter writer;
 
        public OutputWriter(OutputStream outputStream) {
            writer = new PrintWriter(new BufferedWriter(new OutputStreamWriter(outputStream)));
        }
 
        public OutputWriter(Writer writer) {
            this.writer = new PrintWriter(writer);
        }
 
        public void close() {
            writer.close();
        }
 
        public void println(int i) {
            writer.println(i);
        }
 
    }
}

You can also try this code with Online Java Compiler

Run Code

Input

aaaaa
5
3 a
3 aa
2 aaa
3 aaaa
1 aaaaa

You can also try this code with Online Java Compiler

Run Code

Output

You can also try this code with Online Java Compiler

Run Code

Time Complexity

O(sum of lengths of words)

The time complexity to solve this problem Frequency of String is O(sum of lengths of words), Since we have used BWT for the optimization. Our approach is speeded up by using BWT, which takes O(|w|) time for each word w (so the total is O(sum of word lengths) rather than O((number of words) * log(n)). The core notion is that you can keep an interval on the suffix array that contains a pattern suffix, and you can lower the size of this interval in constant time as you add more letters.

Space Complexity

O(sum of lengths of words)

The space complexity to solve this problem Frequency of String is O(sum of lengths of words). Since we are creating an extra array to store the occurrence of these characters.

Check out this article - Converting NFA to DFA

FAQs

1. What is a Binary search?

Ans: Binary Search is a sorted array searching algorithm that divides the search period in half regularly. The goal behind binary search is to take use of the fact that the array is sorted to reduce the time complexity to O(Log n).

2. Can we solve this problem without using Burrows-Wheeler Data Transform Algorithm?

Ans: We can do it using only suffix arrays too. No need for optimization using Burrows-Wheeler Data Transform Algorithm. It's the only thing if we optimise then the solution becomes faster.

Key Takeaways

In this article, we have extensively discussed the solution of the problem "Frequency of String," in which we answered some queries according to the given constraints in the problem statement. Along with the detailed solution, the article discussed the time and space complexity of the solution.

Until then, All the best for your future endeavours, and Keep Coding. "We hope that this blog has helped you enhance your knowledge regarding this problem, and if you would like to learn more, check out our articles on Coding Ninjas Studio. Do upvote our blog to help other ninjas grow. Happy Coding!”