Last Updated: 11 Aug, 2020

Minimum Spanning Tree

Moderate
Asked in companies
AmazonWells FargoMicrosoft

Problem statement

You are given an undirected, connected and weighted graph G(V, E), consisting of V number of vertices (numbered from 0 to V-1) and E number of edges.

Find and print the total weight of the Minimum Spanning Tree (MST) using Kruskal's algorithm.

By definition, a minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight.

Input Format :
The first input contains two integers, N and M, the number of vertices and edges in the graph respectively.

The next M input lines contains three integers X, Y and W each, representing each edge of the graph.

The edge X Y W represents an edge between vertices X and Y, having weight W.
Note:
The edges will be passed to the function as a array of arrays. Each array will contain 3 integers, X, Y, and W in that order.
Output Format :
Print the total weight of the minimum spanning tree.
Note:
You don't explicitly have to print anything, just return the total weight.
Constraints :
2 <= V <= 10^5
1 <= E <= 3 * 10^5
0 <= X < N
0 <= Y < N
1 <= W <= 10^4

where V and E represent the number of vertices and edges respectively.
X and Y represent the vertices between which there is an edge.
W is the weight of the edge.

Time limit: 1sec

Approaches

01 Approach

The algorithm first puts all vertices of the graph as a collection of single-vertex trees, separate from each other. Then, they are gradually combined together to form a single tree. Before the algorithm is executed, we first sort all edges of the graph in order of non-decreasing weights. Then we start selecting edges in the sorted order. If the ends of the currently selected edge belong to different subtrees, we will connect those subtrees, and that edge is added in the pool of edges of the MST. 

 

After we've iterated through all the edges of the graph, all vertices will belong to the same subtree, which will be the MST. 

 

Following are the steps to implement this algorithm:

 

  1. Sort all edges of the graph in order of non-decreasing weights.
  2. Put each vertex of the graph in their own tree, i.e. set the parent of each vertex to itself.
  3. Iterate through all edges in the sorted order; if the two vertices of the edge belong to different subtrees, i.e. have different parents, add that edge to the MST and make the parents of each vertex in the two connected subtrees equal.
  4. After the iterating through all the edges, you will end up having a collection of edges that will make up the MST.

 

How to find whether two vertices belong to different or the same subtrees?

 

There exists a beautiful data structure called Disjoint Set Union (DSU), which can manage sets efficiently. Using a DSU, we can find which element exists in which set and merge two sets as well. Initially, we assign each vertex a new set, represented by a set number, equal to the value of the vertex itself.

 

To find which subtree a vertex lies in, we simply check for the set number corresponding to that element in the DSU. If the set number does not match the vertex value of the element, we recursively check for the ‘set number’ itself, and we do this until (current value) ≠ (set number). When we terminate this algorithm, we actually reach the ‘head’ of the set.

 

To merge two sets, we simply point the head of the smaller set to the head of the larger set (Union By Rank). Hence, this is an O(1) operation.

 

If you notice, the worst case for a ‘find’ operation will be O(N). To avoid this scenario, we use an algorithm called Path Compression. Putting it briefly, suppose the heads of the sets point in this fashion:

 

1→2→3→4

 

If we want to look for the set number of 1, we have to reach 4. After we know that the set head is 4, we simply point all values encountered so far to 3, so it would look something like:

 

1→4←3

      |

      2

 

This is Path Compression in essence. It has an amortized (per operation analysis) constant time complexity.