Table of contents
1.
Introduction
1.1.
Problem Statement
1.2.
Example
2.
Depth First Search - Approach
2.1.
Algorithm
2.2.
Implementation
2.2.1.
Program
2.2.2.
Output
2.3.
Time Complexity
2.4.
Space Complexity
3.
Key Takeaways
Last Updated: Mar 27, 2024

Accounts Merge

Author Ishita Chawla
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Many real-life problems are solved using graphs. Networks like paths in a city, telephone networks, and circuits are all represented using graphs. We generally use DFS Algorithm to produce a minimum spanning tree, find paths, or even detect cycles in a graph. This blog will discuss one such problem, "Accounts Merge" that can easily be solved using Depth First Search.

Problem Statement

You are given a list of accounts, where each element ACCOUNTS[i] is a list of strings, with the first element ACCOUNTS[i][0] indicating the name and the rest of the elements representing the account's emails.

Now, your task is to combine these accounts. If both the accounts have the same email address, they are most likely the same person. It's important to note that even if two accounts have the same name, they could belong to different people. A person can have any number of accounts, but they must all have the same name.

You have to return the accounts in the following manner after merging them: the first element of each account is the name, and the remaining elements are emails in sorted order. However, the accounts can be returned in any order.

Let us look at the following examples to get a better understanding of the problem:

Example

  1.  

ACCOUNTS = [["John","johnsmith@mail.com", "john_newyork@mail.com"],

               ["John","johnsmith@mail.com", "john00@mail.com"],

               ["Mary", "mary@mail.com"],

               ["John","johnnybravo@mail.com"]] 

    

    [["John","john00@mail.com", "john_newyork@mail.com", "johnsmith@mail.com"],

    ["Mary","mary@mail.com"],

     ["John","johnnybravo@mail.com"]]

2. 

ACCOUNTS = [["Gabe","Gabe0@m.co","Gabe3@m.co","Gabe1@m.co"],

              ["Kevin","Kevin3@m.co","Kevin5@m.co","Kevin0@m.co"],

                        ["Ethan","Ethan5@m.co","Ethan4@m.co","Ethan0@m.co"],

             ["Hanzo","Hanzo3@m.co","Hanzo1@m.co","Hanzo0@m.co"],

   ["Fern","Fern5@m.co","Fern1@m.co","Fern0@m.co"]]

   

[["Ethan","Ethan0@m.co","Ethan4@m.co","Ethan5@m.co"],

["Gabe","Gabe0@m.co","Gabe1@m.co","Gabe3@m.co"],

["Hanzo","Hanzo0@m.co","Hanzo1@m.co","Hanzo3@m.co"],

["Kevin","Kevin0@m.co","Kevin3@m.co","Kevin5@m.co"],

["Fern","Fern0@m.co","Fern1@m.co","Fern5@m.co"]]
 

Our goal is to identify all of the emails that belong to each person. As a result, whenever we find two accounts with the same email address, we will consolidate them into one.

We should always consider visualizing our data as a graph whenever we must interact with a set of elements (emails) that are connected (belong to the same person). Converting the information into a graph will make "merging" two accounts easier in this case.

Emails can be represented as nodes, with a connecting edge indicating that they belong to the same person. We can connect all of the emails in an account with edges because they all belong to the same individual. As a result, a connected component can be used to represent each account. What if two accounts share the same email address? Then, we could add an edge between the two connected components, essentially merging them into one connected component.

Let’s dive in deeper and understand how to solve this problem.

Depth First Search - Approach

In this method, emails will be represented as nodes, with an edge indicating that two emails are connected and belong to the same individual. This implies that any two emails connected by a path of edges must be from the same individual. We are given accounts at first, with each account's emails forming a connected component.

The initial step should be to check that all of the nodes for each account are connected. Let's say an account has K emails that we'd like to connect. We can create an edge between each pair of emails because all emails in an account are related. This will result in a complete subgraph, which will necessitate the addition of K/2 edges.

But we know that two emails belong to the same account if a trail of edges connects them. We can design an acyclic graph with only K - 1 edges instead of a whole subgraph for each account. Remember that the smallest number of edges needed to connect K nodes is K - 1. We'll connect emails in an account in a star pattern, with the initial email serving as the internal node and all additional emails serving as the leaves.

The beauty of connecting the emails in each account in this way is that once an email is connected to a second account, it will have one edge going to an email in the first account and one edge going to an email in the second account. As a result, the two accounts are merged instantly.

We'll have one or more connected components after iterating over each account and linking the emails as discussed above. The nodes of each connected component will represent the individual's emails, and each connected component will represent one person. Now it's up to us to investigate each associated component in order to locate all of the emails that belong to each individual. We'll do a depth-first search on each connected component (person) to identify all the connected emails because DFS is guaranteed to explore every node in a connected component.

Algorithm

  • We first create an adjacency list by adding an edge between the account's first email (accountFirstEmail) and each of the account's other emails.
  • Then we check each account's initial email (accountFirstEmail) to see if it has already been visited. If that's the case, don't start a new DFS. Otherwise, use this email as the source node for DFS.
  • Store the traversed emails in an array mergedAccount during each DFS, and mark all of these emails as visited.
  • Sort the emails after the DFS traversal is complete, then add the account name (accountName) to the front of the vector mergedAccount.
  • In the answer list mergedAccounts, store the vector mergedAccount.

Implementation

Program

#include <iostream>
#include <vector>
#include <unordered_set>
#include <unordered_map>
#include <algorithm>
using namespace std;

// C++ function for the problem "Accounts Merge"
unordered_set<string> visited;
unordered_map<string, vector<string>> adjacent;

// To find all the connected emails.
void DFS(vector<string> &mergedAccount, string &email)
{
    visited.insert(email);

    // Adding the email vector that contains the current component's emails.
    mergedAccount.push_back(email);

    for (string &neighbor : adjacent[email])
    {
        if (visited.find(neighbor) == visited.end())
        {
            DFS(mergedAccount, neighbor);
        }
    }
}

vector<vector<string>> accountsMerge(vector<vector<string>> &accountList)
{
    int accountListSize = accountList.size();

    for (vector<string> &account : accountList)
    {
        int accountSize = account.size();

        // Preparing the adjacency list.
        // Adding an edge between the first email and all other emails in that account.
        string accountFirstEmail = account[1];
        for (int j = 2; j < accountSize; j++)
        {
            string email = account[j];
            adjacent[accountFirstEmail].push_back(email);
            adjacent[email].push_back(accountFirstEmail);
        }
    }

    // Traversing over all the vectors to store the components.
    vector<vector<string>> mergedAccounts;
    for (vector<string> &account : accountList)
    {
        string accountName = account[0];
        string accountFirstEmail = account[1];

        // We perform DFS only if the email is still not visited, otherwise it is part of a different component.
        if (visited.find(accountFirstEmail) == visited.end())
        {
            vector<string> mergedAccount;

            // The name of the person always occupies the first index.
            mergedAccount.push_back(accountName);
            DFS(mergedAccount, accountFirstEmail);

            // Skipping the first element since it is the name.
            // We only sort the emails, as name always occupies the first index.
            sort(mergedAccount.begin() + 1, mergedAccount.end());
            mergedAccounts.push_back(mergedAccount);
        }
    }

    // Returning the final answer.
    return mergedAccounts;
}

int main()
{
    vector<vector<string>> accounts, answer;

    accounts = {{"John", "johnsmith@mail.com", "john_newyork@mail.com"},
                {"John", "johnsmith@mail.com", "john00@mail.com"},
                {"Mary", "mary@mail.com"},
                {"John", "johnnybravo@mail.com"}};

    // Storing the answer.
    answer = accountsMerge(accounts);

    // Printing the answer.
    for (int i = 0; i < answer.size(); i++)
    {
        for (int j = 0; j < answer[i].size(); j++)
        {
            cout << answer[i][j] << " ";
        }
        cout << endl;
    }
    return 0;
}

You can also try this code with Online C++ Compiler
Run Code

Output

Time Complexity

The time complexity is given by O(NK log(NK)), where is the number of accounts and K is the maximum length of an account.

In the worst case, all the emails would belong to a single person. Then the total number of emails will be N * K, and they will also have to be sorted. DFS traversal will take N * K operations since no email will be traversed more than once.

Space Complexity

The space complexity is given by O(N * K), where is the number of accounts and K is the maximum length of an account.

It will require O(N * K) space to build the adjacency list. In the end, visited will include all of the emails, requiring O(N * K) space. In the worst-case scenario, the DFS call stack will occupy O(N * K) space.

Read More - Time Complexity of Sorting Algorithms

Key Takeaways

So, this blog discussed the problem of “Accounts Merge” using Depth First Search (DFS), discussing its time and space complexity.

To learn more, head over right now to Coding Ninjas Studio to practice problems on topics like DFSGraphs and crack your interviews like a Ninja!
Check out this problem - No of Spanning Trees in a Graph

Practicing a bunch of questions is not enough in this competitive world. So go check out where you stand among your peers by taking our mock tests and see which areas need improvement.

In case of any comments or suggestions, feel free to post them in the comments section.

 

Live masterclass