Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
Different approaches
2.1.
Naive Approach
2.2.
Optimized Approach
3.
Visualization of a Suffix Tree
4.
Applications of Suffix Array
5.
Frequently Asked Questions
5.1.
What is suffix array construction?
5.2.
What is the disadvantage of a suffix tree over a suffix array?
5.3.
How do you create a suffix array of strings?
5.4.
What is the suffix sum array?
6.
Conclusion
Last Updated: Mar 27, 2024
Hard

Suffix Array Optimal Construction

Author GAZAL ARORA
0 upvote
Roadmap to SDE career at Amazon
Speaker
Anubhav Sinha
SDE-2 @
25 Jun, 2024 @ 01:30 PM

Introduction

What is a Suffix Array?

Suffix Array: It is a sorted array of all suffixes of a given string. It is a type of data structure used in full-text indices and data compression algorithms, and generally, it is used in any area that deals with strings and string matching problems.

The suffix array contains integers representing the starting indexes of a given string's suffixes after the suffixes are sorted.

Different approaches

Two widely used approaches for constructing a Suffix Array are the Naive Approach and the Optimized Approach. We will be discussing the Optimized Approach in depth.

Naive Approach

It is also known as the O(n2log⁡n) approach. In this approach, we get all the suffixes, sort them using quicksort or mergesort, and simultaneously retain their original indices.

Code for obtaining a suffix array in Java:

import java.util.Arrays;
import java.util.ArrayList;
 
class suffixArray {
    public static void main(String[] args) throws Exception
    {
		ArrayList<Integer> suffixIndex  = new ArrayList<Integer>();
        String w = "CodingNinjas";
        String ar1[] = new String[w.length()];
        String ar2[] = new String[w.length()];
        int suffixArray[] = new int[w.length()];
		
		int i;
        for (i = 0; i < w.length(); i++) {
            ar1[i] = w.substring(i);
        }
        
        ar2 = ar1.clone();
        Arrays.sort(ar1);
        
        for (String s : ar1) {
            String p = s;
            int index = new suffixArray().index(ar2, p);
            suffixIndex.add(index);
        }
        
        for (i = 0; i < suffixArray.length; i++) {
            suffixArray[i] = suffixIndex.get(i);
        }
        
        System.out.println("The suffix array for CodingNinjas");
        for (var c : suffixArray) {
            System.out.print(c + " ");
        }
    }
     
    //A function that returns the index of item from array a[]
    int index(String a[], String item)
    {
        for (int i = 0; i < a.length; i++) {
            if (item == a[i])
                return i;
        }
        return -1;
    }
}

 

Output

The suffix array for CodingNinjas
0 6 10 2 5 3 7 9 4 8 1 11 

Optimized Approach

This approach is also known as O(nlog⁡n) because our aim in the approach is to reduce the time from O(n2log⁡n) to O(nlog⁡n).

  • In this approach, first, we sort all the suffixes according to the starting (first) character; after that, we sort them according to the first two characters, followed by the first four characters, and till the number of characters is less than 2n.
  • Once we have sorted all the suffixes according to the first 2i characters, we can then sort suffixes according to the first 2i+1 characters in O(n log n) time using a sorting algorithm of the type (n Log n) like Merge Sort.
     

First, by giving each suffix a rank using the first character's ASCII value, we will initially sort by the first two characters.

Index

Suffix

Rank

0

banana

1

1

anana

0

2

nana

13

3

ana

0

4

na

13

5

a

0

For each character, we will also store the rank of the next adjacent character (It is required to sort the suffixes according to the first two characters). If the character is the last, we store the next rank as -1.

Index

Suffix

Rank

Next Rank

0

banana

1

0

1

anana

0

13

2

nana

13

0

3

ana

0

13

4

na

13

0

5

a

0

-1

 

Then, all Suffixes will be sorted by rank and adjacent rank. The first digit, or MSD, is regarded as rank, and the second digit is considered as the adjacent rank.

Index

Suffix

Rank

Adjacent Rank

5

a

0

-1

1

anana

0

13

3

ana

0

13

0

banana

1

0

2

nana

13

0

4

na

13

0

Now sorting according to the first four characters. 

All of the suffixes will be assigned new ranks. We will consider the sorted suffixes while allocating new rankings. We'll start by giving the first Suffix the new rank of zero. When allocating ranks to the remaining suffixes, we will consider the rank pair of suffixes immediately before the current Suffix. Give a suffix the same rank if its previous rank pair matches the previous rank of the Suffix that came before it. If not, give the previous Suffix's rank plus one.

Index

Suffix

Rank

 

5

a

0

Assigning 0 to first

1

anana

1

(0,13) is different from previous

3

ana

1

(0, 13) is the same as previous

0

banana

2

(1, 0) is different from previous

2

nana

3

(13, 0) is different from previous

4

na

3

(13, 0) is the same as previous

For every suffix str[i], we also store the rank of the next suffix at str[i + 2]. If no next suffix exists at i + 2, we store the next rank as -1.

Index

Suffix

Rank

Next Rank

5

a

0

-1

1

anana

1

1

3

ana

1

0

0

banana

2

3

2

nana

3

3

4

na

3

-1

We sort all the suffixes according to the rank and the next rank. 

Index

Suffix

Rank

Next Rank

5

a

0

-1

1

anana

1

0

3

ana

1

1

0

banana

2

3

2

nana

3

-1

4

na

3

3

Code for Optimized Approach:

import java.util.*;
class suffixArray
{
    // Class to store information of a suffix
    public static class Suffix implements Comparable<Suffix>
    {
        int index;
        int rank;
        int next;
 
        public Suffix(int ind, int r, int nr)
        {
            index = ind;
            rank = r;
            next = nr;
        }
         
        // A comparison function used by sort()
        // to compare two suffixes.
        // Compares two pairs, returns 1
        // if first pair is smaller
        public int compareTo(Suffix s)
        {
            if (rank != s.rank) return Integer.compare(rank, s.rank);
            return Integer.compare(next, s.next);
        }
    }
     
	// This function takes a string 'txt' of size n as an argument
	//builds and returns the suffix array for the given string

    public static int[] suffixArray(String s)
    {
        int n = s.length();
        Suffix[] su = new Suffix[n];
         
        // Store suffixes and their indexes in
        // an array of classes. The class is needed
        // to sort the suffixes alphabetically and
        // maintain their old indexes while sorting
        for (int i = 0; i < n; i++)
        {
            su[i] = new Suffix(i, s.charAt(i) - '$', 0);
        }
        for (int i = 0; i < n; i++)
            su[i].next = (i + 1 < n ? su[i + 1].rank : -1);
 
        // Sort the suffixes using the comparison function
        // defined above.
        Arrays.sort(su);
 
        // Here, all suffixes are sorted 
        //according to their first 2 characters.
        // Let us now sort suffixes according to first 4
        // characters, then first 8 and so on..

 
        // characters, then first 8 and so on
        int[] ind = new int[n];
         
        for (int length = 4; length < 2 * n; length <<= 1)
        {
             
            // Assigning rank and index values to first suffix
            int rank = 0, prev = su[0].rank;
            su[0].rank = rank;
            ind[su[0].index] = 0;
            for (int i = 1; i < n; i++)
            {
                // If first rank and next ranks are same as
                // that of previous suffix in array,
                // assign the same new rank to this suffix
                if (su[i].rank == prev &&
                    su[i].next == su[i - 1].next)
                {
                    prev = su[i].rank;
                    su[i].rank = rank;
                }
                else
                {
                    // Otherwise increment rank and assign
                    prev = su[i].rank;
                    su[i].rank = ++rank;
                }
                ind[su[i].index] = i;
            }
             
            // Assign next rank to every suffix
            for (int i = 0; i < n; i++)
            {
                int nextP = su[i].index + length / 2;
                su[i].next = nextP < n ?
                   su[ind[nextP]].rank : -1;
            }
             
            // Sort the suffixes according
            // to first k characters
            Arrays.sort(su);
        }
 
        // Store indexes of all sorted
        // suffixes in the suffix array
        int[] suf = new int[n];
         
        for (int i = 0; i < n; i++)
            suf[i] = su[i].index;
 
        // Return the suffix array
        return suf;
    }   
     
    static void printArr(int arr[], int n)
    {
        for (int i = 0; i < n; i++)
            System.out.print(arr[i] + " ");
        System.out.println();
    }
     
    public static void main(String[] args)
    {
        String txt = "banana";
        int n = txt.length();
        int[] suff_arr = suffixArray(txt);
        System.out.println("The suffix array for banana: ");
        printArr(suff_arr, n);
    }
}

 

Output

The suffix array for banana: 5 3 1 0 4 2 
Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Visualization of a Suffix Tree

Visualization of a Suffix Tree

Applications of Suffix Array

  1. Finding the smallest cyclic shift
  2. Finding a substring in a string
  3. Comparing two substrings of a string
  4. The longest common prefix of two substrings with additional memory
  5. The longest common prefix of two substrings without additional memory
  6. Number of different substrings

Also see, Euclid GCD Algorithm

Frequently Asked Questions

What is suffix array construction?

The construction of a suffix array simply involves sorting the set of all suffixes.

What is the disadvantage of a suffix tree over a suffix array?

Suffix trees are less advantageous than suffix arrays since they often take up more space to store all of the internal references in the tree. This additional space can be too expensive if we index a big text (or group of texts).

How do you create a suffix array of strings?

By performing a DFS traverse of the suffix tree, a suffix array can be created from the suffix tree. The suffix array and suffix tree can be created in linear time from one another. Making an array of all suffixes and sorting it is an easy way to create a suffix array.

What is the suffix sum array?

A precomputation technique called suffix sum computes the sum of all the elements in the original array starting at index i and continuing until the end of the array.

Conclusion

This article taught us about suffix arrays and how to create a suffix array in Computer Programming. We also learned about the advantages of using suffix arrays.

Start Programming now!

Check out the following problems - 

 

You can use Coding Ninjas Studio to practice important DSA questions asked in different interviews. It will help you master effective coding techniques and get interview experiences from people working in big companies.

Do upvote this blog if you find it helpful and engaging!

Happy Learning!

Previous article
Single Number
Next article
Intersection of Two Arrays
Live masterclass