Table of contents
1.
Introduction
2.
Problem Statement 
2.1.
Input
2.2.
Output
2.3.
Explanation
3.
Solution Approach
3.1.
Algorithm
3.2.
Dry Run
3.3.
Implementation in C++
3.3.1.
Output
3.4.
Implementation in Java
3.4.1.
Output
3.5.
Time Complexity
3.6.
Space Complexity
4.
Frequently Asked Questions
4.1.
What is the role of Disjoint Sets in data structures?
4.2.
What is the time complexity of merging two sets in a Disjoint Set Union?
4.3.
In the worst case, What is the time complexity of finding the parent of some node in a Disjoint Set Union?
4.4.
What are the applications of Disjoint Set Union Data Structure?
4.5.
What differentiates a string from an array?
5.
Conclusion
Last Updated: Mar 27, 2024
Medium

Count of Distinct Groups of Strings Formed after Performing an Equivalent Operation

Author Sahil Setia
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

As the title of this article suggests, we will discuss the problem based on the famous data structure Strings. Strings are one of the most popular and essential Data Structures and the most basic ones. Strings are a sequence of characters like “xyz” which is a string containing ‘x’, ‘y’, and ‘z’, which is the sequence of characters.

Title image

This blog will discuss the problem of counting the distinct group of strings formed after applying the equivalent operation. Before diving into the solution, let’s briefly examine the problem statement.

Problem Statement 

Given an array of strings where each string contains lowercase English letters. The task is to find the number of distinct groups of strings formed after performing an equivalent operation.

Note: If the same character is present in both strings, then the two strings are said to be equivalent, and if another string matches one of the strings in a group of equivalent strings, then that string is also equivalent to that group.

Input

Total number of strings (N) = 5

The array of strings = [“abc”, “cd”, “d”, “e”, “efg”]

Output

2

Explanation

The two distinct groups of strings formed after performing the equivalent operation are-

  1. The first group contains strings [“abc”, “cd”, “d”]. Strings “abc” and “cd” have a common character ‘c’, whereas strings “cd” and “d” have a common character ‘cd’. The first and second strings have a common character, and the second and third strings have a common character. Hence, they all lie in a single group.
     
  2. The second group contains strings [“e”, “efg”]. Strings “e” and “efg” have a common character ‘e’. Hence, they all lie in a single group.

Solution Approach

In this approach, we will use the Disjoint Set Union approach to solve the problem. Each character in the string is mapped to the first character in the string. Simply put, all the string characters are merged with the string's first character, making it the corresponding parent of all other characters. The reason for doing so is simple. Suppose some character is mapped to the first character of its string so that the character’s parent is the first character of its corresponding string. 

Now, let’s assume that that character occurs in some other string. But that string has some other character as its parent. Upon calling the ‘merge()’ function, these two strings will be in a single group due to having some common characteristics. This way, the ultimate task is to find the number of distinct parents possible in the given characters appearing in some string. Let’s take a look at the algorithmic part of the approach. 

Algorithm

  1. Call the ‘count_groups()’ function for the given array of strings to count the distinct groups of strings after applying an equivalent operation.
     
  2. Initialize the DSU(Disjoint Set Union) by calling the ‘initialization()’ function and set the parent of each node to itself and size as one.
     
  3. Iterate through each of the given strings, and iterate through each character in each string.
     
  4. Update the ‘freq’ array, which stores the frequency of each character in the array of strings.
     
  5. Merge each character of the string with the 1st character of the string using the ‘merge()’ function of DSU(Disjoint Set Union).
     
  6. In ‘merge()’ function, find the corresponding parents of the two character indexes using the ‘find_set()’ function and then perform the merge operation on ‘parent’ and ‘siz’ arrays.
     
  7. After iterating through each character in every string, calculate the ‘count’ of total characters with their parent equal to itself and have a frequency overall greater than zero. 
     
  8. Return the corresponding ‘count’, which stores the number of distinct groups of strings after applying an equivalent operation and is displayed as the output.

Dry Run

Given string 

Dry run image 1

As the characters present in the string are ‘a’, ‘b,’ ‘c’, ‘d’, ‘e,’ ‘f’, and ‘g’. So, we will focus on these values only in the ‘parent’ and ‘siz’ arrays.
 

Initially, the first string, “abc” is traversed, and each character is merged with the first character of the string “abc”. In this case, the parents of characters ‘b’ and ‘c’ are initialized to value 0(as the first character is ‘a’).

Dry run image 2

The second string, “cd” is traversed, and each character is merged with the first character of the string “cd”. In this case, the parent of character ‘d’ is initialized to 0 instead of 2(as the first character is ‘c’). This is because the character ‘c’ parent has been initialized to 0(in the above figure).

Dry run image 3

The third string, “d”, is traversed, and each character is merged with the first character of the string “d”. In this case, as there is a single character, the parent of ‘d’ will remain as before in the previous iteration.

Dry run image 4

The fourth string, “e” is traversed, and each character is merged with the first character of the string, “e”. In this case, as there is a single character, the parent of ‘e’ will remain as before in the previous iteration.

Dry run image 4

The last string, “efg” is traversed, and each character is merged with the first character of the string “efg”. In this case, the parents of characters ‘f’ and ‘g’ are initialized to value 4(as the first character is ‘e’).

Dry run image 5

Hence, among the characters which occur in any string at least once and have a value of its parent equal to itself are characters ‘a’(with value 0) and ‘e’(with value 4). Hence, after performing the equivalent operation, the total number of distinct groups of strings is 2.

Implementation in C++

#include <vector>
#include <iostream>

using namespace std;

// Parent and size arrays of DSU
vector <int>parent,siz;

// Initialization of DSU
void initialize(int n){
    
    for(int i=0;i<n;i++){
        parent[i] = i;
        siz[i] = 1;
    }
}

// Function for finding the parent of the current index
int find_set(int n){
    
    if(parent[n]==n)
    return n;
    return parent[n]=find_set(parent[n]);
}

// Function to merge two indexes
void merge(int x,int y){
    
    // Evaluating parents for both
    x=find_set(x);
    y=find_set(y);
    if(x!=y){
        
        // If size of x is less than y
        if(siz[x]<siz[y])
        swap(x,y);
        
        // Updating parent and size arrays
        parent[y]=x;
        siz[x]+=siz[y];
    }
}

// Function to count the distinct groups
int count_groups(vector <string> &arr, int n){
    
    // Assigning sizes of DSU arrays
    parent.assign(26,0);
    siz.assign(26,0);
    initialize(26);
    
    // Array for storing the frequency of characters
    vector <int> freq(26,0);
    
    // Iterating through each string
    for(int i=0;i<n;i++){
        
        // Iterating through each character
        for(char j:arr[i]){
            
            // Updating the frequency
            ++freq[j-'a'];
            
            // Merging the characters to the first character
            merge(arr[i][0]-'a',j-'a');
        }
    }
    
    // Count for storing distinct groups
    int count = 0;
    
    // Iterate through each character
    for(int i=0;i<26;i++){
        
        // Finding parent and checking frequency
        if(find_set(i)==i && freq[i]){
            ++count;
        }
    }
    
    // Returning the count
    return count;
}

int main() {
    
    // Total number of the strings given
    int N = 5;
    
    // Given strings
    vector <string>arr(N);
    arr[0] = "abc";
    arr[1] = "cd";
    arr[2] = "d";
    arr[3] = "e";
    arr[4] = "efg";
    
    // Calling the function and displaying the output
    cout<<count_groups(arr,N);
}
You can also try this code with Online C++ Compiler
Run Code


Output

Output image

Implementation in Java

public class MyClass {
    
    // Parent and size arrays of DSU
    static int parent[],siz[];
    
    // Initilization of DSU
    static void initialize(int n){
        
        for(int i=0;i<n;i++){
            parent[i] = i;
            siz[i] = 1;
        }
    }
    
    // Function for finding the parent of current index
    static int find_set(int n){
        
        if(parent[n]==n)
        return n;
        return parent[n]=find_set(parent[n]);
    }
    
    // Function to merge two indexes
    static void merge(int x,int y){
        
        // Evaluating parents for both
        x=find_set(x);
        y=find_set(y);
        if(x!=y){
            
            // If size of x is less than y
            if(siz[x]<siz[y]){
                int temp = x;
                x = y;
                y = temp;
            }
            
            // Updating parent and size arrays
            parent[y]=x;
            siz[x]+=siz[y];
        }
    }
    
    // Function to count the distinct groups
    static int count_groups(String arr[], int n){
        
        // Assigning sizes of DSU arrays
        parent = new int [26];
        siz = new int [26];
        initialize(26);
        
        // Array for storing the frequency of characters
        int freq[] = new int [26];
        
        // Iterating through each string
        for(int i=0;i<n;i++){
            
            // Iterating through each character
            for(int j=0;j<arr[i].length();j++){
                
                // Updating the frequency
                ++freq[arr[i].charAt(j)-'a'];
                
                // Merging the characters to first character
                merge(arr[i].charAt(0)-'a',arr[i].charAt(j)-'a');
            }
        }
        
        // Count for storing distinct groups
        int count = 0;
        
        // Iterate through each character
        for(int i=0;i<26;i++){
            
            // Finding parent and checking frequency
            if(find_set(i)==i && freq[i]>0){
                ++count;
            }
        }
        
        // Returning the count
        return count;
    }
    
    // Driver code
    public static void main(String args[]) {
        
        // Total number of the strings given
        int N = 5;
        
        // Given strings
        String arr[] = new String[N];
        arr[0] = "abc";
        arr[1] = "cd";
        arr[2] = "d";
        arr[3] = "e";
        arr[4] = "efg";
        
        // Calling the function and displaying the output
        System.out.print(count_groups(arr,N));
    }
}
You can also try this code with Online Java Compiler
Run Code


Output

Output image

Time Complexity

O(Sum of the length of all strings)

Explanation: Iterating through all the characters of each string takes up O(Sum of the length of all strings) and the merging operation takes a time of O(log(26)). Hence, the time complexity of the approach mentioned above is O(Sum of the length of all strings * log(26)). log(26) is a constant, so we can ignore it. Overall, the algorithm's time complexity is O(Sum of the length of all strings).

Space Complexity

O(1)

Explanation: We used two arrays, ‘parent’ and ‘siz’ of fixed length 26, which take up a constant space. Hence, the space complexity of the algorithm is O(1).

Frequently Asked Questions

What is the role of Disjoint Sets in data structures?

A disjoint set finds the number of disconnected components in a graph or joins the node satisfying a particular condition.

What is the time complexity of merging two sets in a Disjoint Set Union?

The time complexity of combining two sets in a DSU(Disjoint set Union) is O(logN).

In the worst case, What is the time complexity of finding the parent of some node in a Disjoint Set Union?

The worst time complexity of finding the parent of some node in a DSU(Disjoint set Union) is O(logN) and the best time complexity of finding the parent of some node in a DSU(Disjoint set Union) is O(1).

What are the applications of Disjoint Set Union Data Structure?

It is used in various algorithms like detecting a cycle in a graph, counting the number of connected components in a graph, and finding the minimum spanning tree.

What differentiates a string from an array?

The main difference is that an array is a data structure while a string is an object. The array can hold any data type, while strings only have character data types. Arrays can be changed, which means they are mutable, while strings are not.

Conclusion

In this blog, we discussed the problem of counting the number of distinct groups of strings formed after performing an equivalent operation. We saw the solution approach to solving this problem programmatically and saw the implementation of the solution approach in Java and C++. We also discussed the time and space complexity of the solution approach. 

Recommended Reading:

Recommended problems -

 

Do check out The Interview guide for Product Based Companies, as well as some of the popular interview problems, asked in top companies like Amazon, Adobe, Google, Uber, Microsoft, etc. on Coding Ninjas Studio.

Also, check out some of the Guided Paths on topics such as Data Structure and Algorithms, Competitive Programming, Operating Systems, Computer Networks, DBMS, and System Design, etc. as well as some Contests, Test Series, Interview Bundles, and some Interview Experiences curated by top Industry Experts only on Coding Ninjas Studio.

Happy Coding!

Live masterclass