Last Updated: 11 Jan, 2021

Accounts Merge

Hard
Asked in companies
FacebookPhonePeAmazon

Problem statement

You have been given an array/list 'accounts' where each element, i.e. 'accounts'[i] contains a list of strings.


In 'accounts'[i], the first element is the name of the account holder, and the rest of the strings are emails representing the emails of the account.


Now, you are supposed to merge the accounts. Two accounts definitely belong to the same person if there is some email that is common to both accounts. Note that it may be possible that two accounts belong to the same name, but they may belong to different people, as people could have the same name.


A person could have any number of accounts initially, but all their accounts definitely have the same name.


After merging the accounts, you have to return an array/list of merged accounts where each account, the first element is the person's name, and the rest elements are the email addresses in sorted order (non-decreasing). Accounts themselves can be in any order.


Example:
 Input:  'n' = 4,
 'accounts' = [
    ["Rohan", "rohan123@gmail.com", "1279ro@gmail.com"],
    ["Rohit", "rohit101@yahoo.com", "hitman30487@gmail.com"],
    ["Rohan", "1279ro@gmail.com",  "niemann01@gmail.com"],
    ["Rohan", "kaushik@outlook.com"],
 ]
 Output: [
    ["Rohan", "rohan123@gmail.com", "1279ro@gmail.com", "niemann01@gmail.com"],
    ["Rohit", "rohit101@yahoo.com", "hitman30487@gmail.com"],
    ["Rohan", "kaushik@outlook.com"],
 ]

 Explanation: The first and third "Rohan" are the same person as they have a shared email address, “1279ro@gmail.com”. The rest of the accounts are of different persons, as they don’t share any shared email addresses. So, we merge the first and third accounts.
Input Format :
The first input line contains an integer ‘n’ denoting the number of accounts.

Each of the next ‘n’ lines contains the length of 'accounts[i]' followed by space-separated strings where the first string denotes the account holder’s name, and the rest of the strings represent the email addresses.
Output Format :
In the first line, print the number of accounts after merging. 

Print the space-separated name and email address associated with each account, each in a separate line.
Note:
You are not required to print anything; it has already been taken care of. Just implement the function.

Approaches

01 Approach

The basic idea of this approach is to model this problem as a graph problem. Let us create a graph of emails with an edge between two emails if they occur in the same account. 

 

The figure below shows the graph constructed for the first test case of sample test case 1.

 

 

We can observe that the set of accounts which needs to be merged will be a part of connected components. So the problem is now reduced to finding the connected components in an undirected graph. We can easily get the connected component by doing a depth-first search of the graph. You can refer to this tutorial of cp-algorithms for a more detailed explanation. Search for connected components in a graph - https://cp-algorithms.com/graph/search-for-connected-components.html#:~:text=Given%20an%20undirected%20graph%20G,path%20exists%20between%20different%20groups.

 

Let unordered_map<string, string> ‘EMAIL_TO_NAME’ be HashMap to store the mapping of emails to the name of the person having it. This mapping will help us in getting the name of the person while creating the merged accounts.

 

And also let unorderd_map<string, vector<string>> ‘GRAPH’ stores the adjacency list of the constructed graph.

 

Now the consider the following steps :

  1. Iterate through each ‘ACCOUNT’ in the array/list of ‘ACCOUNTS'. And for each ‘ACCOUNT’:
    • Store the mapping of email to the name for each email address.
    • And also create an edge between the first email address and the rest of the email address.
  2. Now create a set<string> ‘VISITED’ which stores the email address which is visited during the depth-first search. Also, create a vector<vector<string>> ‘MERGED_ACCOUNTS’ to store the name and email addresses of the merged accounts.
  3. Now, start doing the depth-first search from each non-visited email address.
    • And store the email addresses belonging to the current connected component in a new array/list.
    • Insert the name of the person having the email addresses in a new array/list ‘ACCOUNT’ and then insert all the addresses. Then, insert the current account details into the ‘MERGED_ACCOUNTS’.

02 Approach

The basic idea of this approach is to use a Disjoint Set Union data structure to solve this task. We have seen in approach 1 that the problem is finally reduced to finding the connected component in an undirected graph. So we can use a DSU to do this task very efficiently. You can refer to this article from cp-algorithms to read more about DSU. https://cp-algorithms.com/data_structures/disjoint_set_union.html

We will use the union by size for optimization. 

 

Let us create an unorderd_map<string, int> ‘EMAIL_TO_ID’ a HashMap to store the mapping of emails to an integer. It will help us in using Disjoint Set Union.

 

Let unordered_map<string, string> "EMAIL_TO_NAME' be HashMap to store the mapping of emails to the name of the person having it. This mapping will help us in getting the name of the person while creating the merged accounts.

 

Let us also create an unordered_map<string, vector<string>> ‘COMPS’ to store the mapping of a person’s name and the email addresses belonging to him. 

 

Now consider the following steps:

  1. Create a DSU data structure and initialize it using ‘MAKE_SET()’.
  2. Iterate through each account in the array/list of accounts. And for each account:
    • Store the mapping of email to the name for each email address.
    • And also create a unique mapping from each email address to an integer.
    • Merge the set belonging to the first email address to the rest of the email addresses using ‘UNION_SETS()’. Because as mentioned in the approach1 the emails belonging to the same account will be connected using an edge.
  3. Now, iterate through each email address.
    • Find the parent of it using ‘FIND_PARENT()’. Get the name of the person having the parent’s email address.
    • Now, store the current email address in the set of email addresses belonging to that person.
  4. Also, create a vector<vector<string>> ‘MERGED_ACCOUNTS’ to store the name and email addresses of the merged accounts.
  5. Now, start traversing through each component in ‘COMPS’
    • Insert the name of the person having the email addresses in a new array/list ‘ACCOUNT’ and then insert all the addresses. Then, insert the current account details into the ‘MERGED_ACCOUNTS’.