Kruskal's algorithm is a method for finding the minimum spanning tree (MST) in a weighted, connected graph. An MST is a subset of edges that connects all the vertices in the graph with the minimum total edge weight. Kruskal's algorithm builds the MST by greedily selecting the edges with the lowest weights, avoiding cycles.

In this article, we will learn how Kruskal's algorithm works, provide illustrations & implementations in C llanguages, analyze its time & space complexity, and discuss its advantages & disadvantages.

How to Find MST Using Kruskalâ€™s Algorithm?

Kruskal's algorithm finds the minimum spanning tree in a graph by following these steps:

Sort all the edges: Begin by sorting all the edges in the graph in non-decreasing order of their weight. This step is crucial as it allows the algorithm to consider the smallest edges first, which is key to finding the MST efficiently.

Initialize the forest: A forest is a collection of trees where each vertex in the graph is considered as an individual tree. At the start, the forest has as many trees as there are vertices in the graph.

Add edges to the MST: Go through the sorted list of edges and add each edge to the MST, provided it doesn't form a cycle. To check for cycles, use a union-find data structure. If adding an edge would connect two vertices in the same tree, a cycle would be formed, & hence, that edge is skipped.

Check if the MST is complete: The algorithm stops when the MST has Vâˆ’1 edges, where

V is the number of vertices in the graph. If all vertices are connected before reaching Vâˆ’1 edges, the remaining edges are not needed.

This method ensures that the added edges are the ones that connect disjoint parts of the graph at the lowest possible cost without closing any loops, thus effectively finding the MST.

Illustration of Kruskal's Algorithm

To better understand how Kruskal's algorithm works, let's consider an example. Suppose we have the following weighted, connected graph:

Now, let's apply Kruskal's algorithm to find the minimum spanning tree:

Sort the edges by weight: (1, AB), (2, AC), (3, BD), (4, AD), (5, BE), (6, CD), (7, DE).

Create a forest F with five separate trees: {A}, {B}, {C}, {D}, {E}.

Iterate through the sorted edges:

(1, AB): A & B are in different trees, so add (AB) to the MST & merge {A} & {B}.

(2, AC): A & C are in different trees, so add (AC) to the MST & merge {A, B} & {C}.

(3, BD): B & D are in different trees, so add (BD) to the MST & merge {A, B, C} & {D}.

(4, AD): A & D are already in the same tree, so discard (AD).

(5, BE): B & E are in different trees, so add (BE) to the MST & merge {A, B, C, D} & {E}.

The MST now contains 4 edges (V-1), so the algorithm terminates.

The resulting minimum spanning tree looks like this:

Implementation of Kruskalâ€™s Algorithm in C

Implementing Kruskal's algorithm in the C programming language involves several key steps: sorting edges, managing a forest of trees using the union-find data structure, & adding edges without forming cycles. Below is a simple yet comprehensive code example that demonstrates these steps.

Step 1: Define a Structure for Edges

#include <stdio.h>
#include <stdlib.h>
// Structure to represent an edge
typedef struct {
int src, dest, weight;
} Edge;
// Structure to represent a subset for union-find
typedef struct {
int parent;
int rank;
} subset;

Step 2: Union-Find Operations

These functions help manage the sets of vertices to ensure no cycles are formed.

// Find set of an element i (uses path compression technique)
int find(subset subsets[], int i) {
if (subsets[i].parent != i)
subsets[i].parent = find(subsets, subsets[i].parent);
return subsets[i].parent;
}
// Function to do union of two subsets
void Union(subset subsets[], int x, int y) {
int rootX = find(subsets, x);
int rootY = find(subsets, y);
if (subsets[rootX].rank < subsets[rootY].rank)
subsets[rootX].parent = rootY;
else if (subsets[rootX].rank > subsets[rootY].rank)
subsets[rootY].parent = rootX;
else {
subsets[rootY].parent = rootX;
subsets[rootX].rank++;
}
}

Step 3: The Kruskalâ€™s Algorithm Function

This function puts everything together: sorting edges & building the MST.

// Function to compare two edges according to their weights
int compareEdges(const void* a, const void* b) {
Edge* a1 = (Edge*)a;
Edge* b1 = (Edge*)b;
return a1->weight > b1->weight;
}
// Function to construct MST using Kruskal's algorithm
void KruskalMST(Edge edges[], int V, int E) {
Edge* result = malloc(V * sizeof(Edge)); // Tnis will store the resultant MST
int e = 0; // An index variable, used for result[]
int i = 0; // An index variable, used for sorted edges
// Step 1: Sort all the edges in non-decreasing order of their weight
qsort(edges, E, sizeof(Edge), compareEdges);
// Allocate memory for creating V subsets
subset *subsets = malloc(V * sizeof(subset));
// Create V subsets with single elements
for (int v = 0; v < V; ++v) {
subsets[v].parent = v;
subsets[v].rank = 0;
}
// Number of edges to be taken is equal to V-1
while (e < V - 1 && i < E) {
// Step 2: Pick the smallest edge. And increment the index for next iteration
Edge next_edge = edges[i++];
int x = find(subsets, next_edge.src);
int y = find(subsets, next_edge.dest);
// If including this edge does't cause cycle, include it in result
// and increment the index of result for next edge
if (x != y) {
result[e++] = next_edge;
Union(subsets, x, y);
}
}
// print the contents of result[] to display the built MST
printf("Following are the edges in the constructed MST\n");
for (i = 0; i < e; ++i)
printf("%d -- %d == %d\n", result[i].src, result[i].dest, result[i].weight);
free(result);
free(subsets);
}

Usage Example

To use this function, define your edges array, number of vertices (V), and number of edges (E), and call KruskalMST().

Time & Space Complexity of Kruskalâ€™s Algorithm

Time Complexity

The time complexity of Kruskal's algorithm is mostly influenced by the sorting step and the union-find operations. Hereâ€™s how it breaks down:

Sorting Edges: Sorting all the edges of the graph in non-decreasing order based on their weight is the most time-consuming part. If there are E edges, the sorting process typically takes O(ElogE) time.

Union-Find Operations: Each union and find operation takes nearly constant time, thanks to path compression and union by rank optimizations. For E edges, the total union-find operations would ideally take O(ElogV) time, where V is the number of vertices.

Combining these, the overall time complexity of Kruskal's algorithm can be represented as

O(ElogE) or O(ElogV). Since in most practical scenarios the number of edges E can be much larger than the logarithm of the vertices V,O(ElogE) is the more common notation used to represent its time complexity.

Space Complexity

The space complexity of Kruskalâ€™s algorithm is O(V+E):

Edge List: Space to hold all the edges, which is O(E).

Union-Find Structure: Space for the union-find data structure to keep track of subsets, which is O(V).

These complexities suggest that Kruskalâ€™s algorithm is quite efficient with sparse graphs where E is not significantly larger than V However, for dense graphs, where E approaches ^{v2}

, the time complexity can become a limiting factor due to the sorting step.

Advantages of Kruskalâ€™s Algorithm

Simplicity: Kruskal's algorithm is straightforward to understand & implement. The process of sorting edges & using a union-find data structure to build the MST is relatively simple, making it accessible for beginners learning about graph algorithms.

Flexibility: It works well with disconnected graphs too. If the graph is not connected, Kruskalâ€™s algorithm will produce a minimum spanning forest, which is the MST for each connected component.

Optimal for Sparse Graphs: This algorithm is very efficient for sparse graphs where the number of edges is much lower than the square of the number of vertices. This efficiency comes from the sorting step which is manageable with fewer edges.

Edge Weight Sensitivity: Kruskalâ€™s algorithm can handle graphs where weights might be repeated or very large without any modification in the logic or performance degradation.

No Need for Complete Graph: Unlike some algorithms that require a complete graph (where every pair of vertices is connected by an edge), Kruskal's works on graphs that are not fully connected.

Parallelism: The initial sorting of edges provides an opportunity for parallel computation. Multiple processes can work on different segments of the graph simultaneously, potentially reducing overall processing time.

Disadvantages of Kruskalâ€™s Algorithm

Sorting Cost: The need to sort all the edges of the graph can become computationally expensive, especially with a high number of edges, as the time complexity is dominated by theO(ElogE) sorting step.

Space Requirement: It requires additional memory for maintaining the list of edges and the union-find data structure, which isO(V+E). This can be significant for large graphs.

Slower with Dense Graphs: For dense graphs, where the number of edges is close to v^{2}, Kruskalâ€™s algorithm might not be the most time-efficient due to the sorting of a large number of edges.

Sensitive to Edge Sorting: The performance of Kruskalâ€™s algorithm can significantly depend on the method used for sorting the edges. Inefficient sorting can lead to increased overall complexity.

Requires Union-Find Data Structure: To efficiently check for cycles and manage connected components, Kruskalâ€™s algorithm relies heavily on the union-find data structure. Implementing and managing this structure can be complex and error-prone.

Not Incremental: If a new edge or vertex is added to the graph, Kruskalâ€™s algorithm cannot be easily adapted to update the MST. The entire algorithm needs to be rerun, making it less suitable for dynamic graphs where updates are frequent.

Frequently Asked Questions

What happens if there are edges with the same weight?

If multiple edges have the same weight, you can choose any of them without affecting the final minimum spanning tree (MST). The algorithm remains correct as long as no cycles are formed.

Can Kruskal's algorithm be used for directed graphs?

Kruskal's algorithm is designed for undirected graphs because the concept of a minimum spanning tree does not apply in the same way to directed graphs. For directed graphs, you would look into algorithms designed for finding minimum spanning arborescences, like Edmonds' algorithm.

How does Kruskal's algorithm handle negative weight edges?

Kruskalâ€™s algorithm can handle negative weight edges without any issues. It treats them the same as positive weights and includes them in the MST as long as they do not form a cycle, ensuring the total weight is minimized.

Conclusion

In this article, we have learned the basics of Kruskal's algorithm and its application in finding the minimum spanning tree for a graph. We explored the step-by-step process of the algorithm, from sorting the edges based on their weight, through handling the forest of trees with union-find structure, to building the MST effectively. We also talked about both the advantages, like its simplicity and efficiency with sparse graphs, and its disadvantages, such as the high computational cost with dense graphs and the reliance on a good sorting mechanism.