Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
What is a Suffix Array?
2.1.
What is a suffix of a string?
3.
Why Suffix array?
4.
Construction of Suffix Array
5.
Pseudo-Code to construct the suffix array
6.
C++ Implementation
6.1.
Time Complexity
6.2.
Space Complexity 
7.
Applications of Suffix Array
8.
Frequently Asked Questions
9.
Key Takeaways
Last Updated: Mar 27, 2024

Suffix Array Construction (Brute Force N ^ 2 LogN)

Author Yukti Kumari
1 upvote
Roadmap to SDE career at Amazon
Speaker
Anubhav Sinha
SDE-2 @
25 Jun, 2024 @ 01:30 PM

Introduction

Have you come across any problem related to string matching? Most Probably Yes! Because problems dealing with strings and substrings are widely asked in tech interviews, and most importantly, strings have many real-life applications.

In this article, we will explore a very interesting concept suffix array, why they are important, their applications, along with examples and construction of Suffix arrays.

Suffix trees can also be used for pattern searching. If you are not familiar with this, you must check out this amazing blog, Pattern Searching using Trie to appreciate and understand the motivation behind learning suffix arrays. 

What is a Suffix Array?

As the name suggests, it is an array of integers representing the starting indexes of all the suffixes of a given string. The order is decided after sorting all the suffixes in ascending order alphabetically. 

What is a suffix of a string?

For a given string s having length n, the i’th suffix of s is the substring s[i,...n-1].

So, in total, there can be n suffixes of any string of length n.

An example will make it easier to understand.

Example 

Suppose we are given a string s = “Coding Ninjas Studio”. All the suffixes and the corresponding suffix array of this string is shown in the picture below:

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Why Suffix array?

When learning any new data structure, one should be curious why another data structure? You must have thought about this while deciding to learn suffix arrays, right? 

Suffix arrays are a very useful Data Structure and extremely easy to implement. Moreover, it allows for efficient string matching. Some of the applications of suffix arrays are in data compression algorithms, bioinformatics and full-text indexes. From the point of view of problem-solving, they can be used in any area dealing with string and pattern finding problems.

It is quite related to the suffix tree data structure. 

Now that you know what a suffix array is and why is it important, it's time to dive into constructing a suffix array.

Construction of Suffix Array

We will use the most naive approach, and hence the implementation is quite intuitive and simple.

Steps:

  • Store all the suffixes of the given string in an array along with their starting indexes.
  • Sort the array in ascending order.
  • Finally, create an array of integers and simply store the indexes of the sorted suffixes as obtained above, which is the required suffix array.

Note: After reading the Algorithm part, it is recommended to try writing the code on your own before reading the solution code.

Pseudo-Code to construct the suffix array

This program constructs the suffix array of a given string.
In the main function
Input the string s
Create a vector of pairs and sore the suffixes of string s along with its index
sort the suffixes array
Print the suffix array 


Let’s see the code implementation along with the analysis of time and space complexity in the next section.

C++ Implementation

/*
C++ Program to construct the suffix array of a given string
*/
#include <bits/stdc++.h>
using namespace std;


int main()
{
    string s = "Coding Ninjas Studio";
    int len = s.length();
    vector<pair<string, int>> suffixes;
    for (int i = 0; i < len; i++)
    {
        suffixes.push_back({s.substr(i), i});
    }


    sort(suffixes.begin(), suffixes.end());
    cout << "The suffix array of the string " << s << " is:" << endl;
    for (int i = 0; i < len; i++)
    {
        cout << suffixes[i].second << " ";
    }
    cout << endl;


    cout << "The sorted suffixes of the string are:\n";
    for (int i = 0; i < len; i++)
    {
        cout << suffixes[i].second << "-->" << suffixes[i].first << endl;
    }
}

Output

The suffix array of the string Coding Ninjas Studio is:
0 2 7 3 8 9 1 4 5 6
The sorted suffixes of the string are:
0-->Coding Ninjas Studio
2-->destudio
7-->dio
3-->estudio
8-->io
9-->o
1-->odestudio
4-->studio
5-->tudio
6-->udio

Time Complexity

O(N2logN), where N = length of the given string.

There are N suffixes in total, and sorting them takes O(NlogN) time, but as we use comparison-based sorting(quicksort or mergesort) algorithms, comparing two strings takes O(N) time in the worst case. Hence, the total time complexity turns out to be O(N*NlogN) = O(N2logN).

Space Complexity 

O(N), where N = length of the given string. Since, we use an additional space of size N to store all the suffixes of the given string.

Also check out - Substr C++ and  Euclid GCD Algorithm

Applications of Suffix Array

  • Searching a string x in another string s can be done easily using a suffix array. Since the suffix array contains the order of the suffixes of the string s, we can do a binary search over the suffix array. Thus, the worst-case time complexity of the search will be O(|x|log(|s|)).
     
  • It can be used for comparing two substrings of a string.
     
  • It can be used to find the longest common prefix of two substrings.

Frequently Asked Questions

  1. What is the disadvantage of suffix trees over suffix array search?
    The disadvantage of suffix trees over suffix arrays is that they generally require more space to store all the internal pointers in the tree.
     
  2. What are the pros and cons of suffix array over suffix trees?
    The suffix tree is lighter and faster than the trie and is used to index DNA or optimize large web search engines. The suffix array is slower in some pattern searches than the suffix tree but uses less space and is more widely used than the Suffix tree.
     
  3. How to construct a suffix array from a suffix tree?
    The suffix array can be constructed by doing a depth-first traversal of the suffix tree.

Key Takeaways

In this article, we learnt about suffix array, its introduction and saw its applications in problem-solving. We also saw the naive algorithm to construct the suffix array for any given string and the pseudocode and C++ implementation.

Recommended Problem - K Closest Points To Origin

There are many other efficient algorithms to construct suffix arrays. We will cover all those in our future blogs. So, stay tuned.


Are you planning to ace the interviews of reputed product-based companies like Amazon, Google, Microsoft, and more? Attempt our Online Mock Test Series on Coding Ninjas Studio now!
 

Previous article
Smallest Difference Triplet from Three arrays
Next article
Rearrange Positive and Negative Numbers With Constant Extra Space | Part 1
Live masterclass