Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
As the title of this article suggests, we will discuss the problem based on the famous data structure Strings. Strings are one of the most popular and essential Data Structures and the most basic ones. Strings are a sequence of characters like “xyz” which is a string containing ‘x’, ‘y’, and ‘z’, which is the sequence of characters.
This blog will discuss the problem of counting the distinct group of strings formed after applying the equivalent operation. Before diving into the solution, let’s briefly examine the problem statement.
Problem Statement
Given an array of strings where each string contains lowercase English letters. The task is to find the number of distinct groups of strings formed after performing an equivalent operation.
Note: If the same character is present in both strings, then the two strings are said to be equivalent, and if another string matches one of the strings in a group of equivalent strings, then that string is also equivalent to that group.
Input
Total number of strings (N) = 5
The array of strings = [“abc”, “cd”, “d”, “e”, “efg”]
Output
2
Explanation
The two distinct groups of strings formed after performing the equivalent operation are-
The first group contains strings [“abc”, “cd”, “d”]. Strings “abc” and “cd” have a common character ‘c’, whereas strings “cd” and “d” have a common character ‘cd’. The first and second strings have a common character, and the second and third strings have a common character. Hence, they all lie in a single group.
The second group contains strings [“e”, “efg”]. Strings “e” and “efg” have a common character ‘e’. Hence, they all lie in a single group.
Solution Approach
In this approach, we will use the Disjoint Set Union approach to solve the problem. Each character in the string is mapped to the first character in the string. Simply put, all the string characters are merged with the string's first character, making it the corresponding parent of all other characters. The reason for doing so is simple. Suppose some character is mapped to the first character of its string so that the character’s parent is the first character of its corresponding string.
Now, let’s assume that that character occurs in some other string. But that string has some other character as its parent. Upon calling the ‘merge()’ function, these two strings will be in a single group due to having some common characteristics. This way, the ultimate task is to find the number of distinct parents possible in the given characters appearing in some string. Let’s take a look at the algorithmic part of the approach.
Algorithm
Call the ‘count_groups()’ function for the given array of strings to count the distinct groups of strings after applying an equivalent operation.
Initialize the DSU(Disjoint Set Union) by calling the ‘initialization()’ function and set the parent of each node to itself and size as one.
Iterate through each of the given strings, and iterate through each character in each string.
Update the ‘freq’ array, which stores the frequency of each character in the array of strings.
Merge each character of the string with the 1st character of the string using the ‘merge()’ function of DSU(Disjoint Set Union).
In ‘merge()’ function, find the corresponding parents of the two character indexes using the ‘find_set()’ function and then perform the merge operation on ‘parent’ and ‘siz’ arrays.
After iterating through each character in every string, calculate the ‘count’ of total characters with their parent equal to itself and have a frequency overall greater than zero.
Return the corresponding ‘count’, which stores the number of distinct groups of strings after applying an equivalent operation and is displayed as the output.
Dry Run
Given string
As the characters present in the string are ‘a’, ‘b,’ ‘c’, ‘d’, ‘e,’ ‘f’, and ‘g’. So, we will focus on these values only in the ‘parent’ and ‘siz’ arrays.
Initially, the first string, “abc” is traversed, and each character is merged with the first character of the string “abc”. In this case, the parents of characters ‘b’ and ‘c’ are initialized to value 0(as the first character is ‘a’).
The second string, “cd” is traversed, and each character is merged with the first character of the string “cd”. In this case, the parent of character ‘d’ is initialized to 0 instead of 2(as the first character is ‘c’). This is because the character ‘c’ parent has been initialized to 0(in the above figure).
The third string, “d”, is traversed, and each character is merged with the first character of the string “d”. In this case, as there is a single character, the parent of ‘d’ will remain as before in the previous iteration.
The fourth string, “e” is traversed, and each character is merged with the first character of the string, “e”. In this case, as there is a single character, the parent of ‘e’ will remain as before in the previous iteration.
The last string, “efg” is traversed, and each character is merged with the first character of the string “efg”. In this case, the parents of characters ‘f’ and ‘g’ are initialized to value 4(as the first character is ‘e’).
Hence, among the characters which occur in any string at least once and have a value of its parent equal to itself are characters ‘a’(with value 0) and ‘e’(with value 4). Hence, after performing the equivalent operation, the total number of distinct groups of strings is 2.
Implementation in C++
#include <vector>
#include <iostream>
using namespace std;
// Parent and size arrays of DSU
vector <int>parent,siz;
// Initialization of DSU
void initialize(int n){
for(int i=0;i<n;i++){
parent[i] = i;
siz[i] = 1;
}
}
// Function for finding the parent of the current index
int find_set(int n){
if(parent[n]==n)
return n;
return parent[n]=find_set(parent[n]);
}
// Function to merge two indexes
void merge(int x,int y){
// Evaluating parents for both
x=find_set(x);
y=find_set(y);
if(x!=y){
// If size of x is less than y
if(siz[x]<siz[y])
swap(x,y);
// Updating parent and size arrays
parent[y]=x;
siz[x]+=siz[y];
}
}
// Function to count the distinct groups
int count_groups(vector <string> &arr, int n){
// Assigning sizes of DSU arrays
parent.assign(26,0);
siz.assign(26,0);
initialize(26);
// Array for storing the frequency of characters
vector <int> freq(26,0);
// Iterating through each string
for(int i=0;i<n;i++){
// Iterating through each character
for(char j:arr[i]){
// Updating the frequency
++freq[j-'a'];
// Merging the characters to the first character
merge(arr[i][0]-'a',j-'a');
}
}
// Count for storing distinct groups
int count = 0;
// Iterate through each character
for(int i=0;i<26;i++){
// Finding parent and checking frequency
if(find_set(i)==i && freq[i]){
++count;
}
}
// Returning the count
return count;
}
int main() {
// Total number of the strings given
int N = 5;
// Given strings
vector <string>arr(N);
arr[0] = "abc";
arr[1] = "cd";
arr[2] = "d";
arr[3] = "e";
arr[4] = "efg";
// Calling the function and displaying the output
cout<<count_groups(arr,N);
}
You can also try this code with Online C++ Compiler
public class MyClass {
// Parent and size arrays of DSU
static int parent[],siz[];
// Initilization of DSU
static void initialize(int n){
for(int i=0;i<n;i++){
parent[i] = i;
siz[i] = 1;
}
}
// Function for finding the parent of current index
static int find_set(int n){
if(parent[n]==n)
return n;
return parent[n]=find_set(parent[n]);
}
// Function to merge two indexes
static void merge(int x,int y){
// Evaluating parents for both
x=find_set(x);
y=find_set(y);
if(x!=y){
// If size of x is less than y
if(siz[x]<siz[y]){
int temp = x;
x = y;
y = temp;
}
// Updating parent and size arrays
parent[y]=x;
siz[x]+=siz[y];
}
}
// Function to count the distinct groups
static int count_groups(String arr[], int n){
// Assigning sizes of DSU arrays
parent = new int [26];
siz = new int [26];
initialize(26);
// Array for storing the frequency of characters
int freq[] = new int [26];
// Iterating through each string
for(int i=0;i<n;i++){
// Iterating through each character
for(int j=0;j<arr[i].length();j++){
// Updating the frequency
++freq[arr[i].charAt(j)-'a'];
// Merging the characters to first character
merge(arr[i].charAt(0)-'a',arr[i].charAt(j)-'a');
}
}
// Count for storing distinct groups
int count = 0;
// Iterate through each character
for(int i=0;i<26;i++){
// Finding parent and checking frequency
if(find_set(i)==i && freq[i]>0){
++count;
}
}
// Returning the count
return count;
}
// Driver code
public static void main(String args[]) {
// Total number of the strings given
int N = 5;
// Given strings
String arr[] = new String[N];
arr[0] = "abc";
arr[1] = "cd";
arr[2] = "d";
arr[3] = "e";
arr[4] = "efg";
// Calling the function and displaying the output
System.out.print(count_groups(arr,N));
}
}
You can also try this code with Online Java Compiler
Explanation: Iterating through all the characters of each string takes up O(Sum of the length of all strings) and the merging operation takes a time of O(log(26)). Hence, the time complexity of the approach mentioned above is O(Sum of the length of all strings * log(26)). log(26) is a constant, so we can ignore it. Overall, the algorithm's time complexity is O(Sum of the length of all strings).
Space Complexity
O(1)
Explanation: We used two arrays, ‘parent’ and ‘siz’ of fixed length 26, which take up a constant space. Hence, the space complexity of the algorithm is O(1).
Frequently Asked Questions
What is the role of Disjoint Sets in data structures?
A disjoint set finds the number of disconnected components in a graph or joins the node satisfying a particular condition.
What is the time complexity of merging two sets in a Disjoint Set Union?
The time complexity of combining two sets in a DSU(Disjoint set Union) is O(logN).
In the worst case, What is the time complexity of finding the parent of some node in a Disjoint Set Union?
The worst time complexity of finding the parent of some node in a DSU(Disjoint set Union) is O(logN) and the best time complexity of finding the parent of some node in a DSU(Disjoint set Union) is O(1).
What are the applications of Disjoint Set Union Data Structure?
It is used in various algorithms like detecting a cycle in a graph, counting the number of connected components in a graph, and finding the minimum spanning tree.
What differentiates a string from an array?
The main difference is that an array is a data structure while a string is an object. The array can hold any data type, while strings only have character data types. Arrays can be changed, which means they are mutable, while strings are not.
Conclusion
In this blog, we discussed the problem of counting the number of distinct groups of strings formed after performing an equivalent operation. We saw the solution approach to solving this problem programmatically and saw the implementation of the solution approach in Java and C++. We also discussed the time and space complexity of the solution approach.