Training Material

### Kruskal's algorithm

Recall our example for spanning trees.

```         G            Tree 1        Tree 2        Tree 3    ...

o               o             o             o
7 / \ 8         7 /           7 /               \ 8
/ 6 \           /             / 6             6 \
o-----o         o     o       o-----o       o-----o   ...
\   /           \   /             /         \
2 \ / 5         2 \ / 5           / 5       2 \
o               o             o             o

Cost = 14    Cost = 18     Cost = 16
```

Here is Kruskal's algorithm to compute the minimum cost spanning tree.

Sort weights in ascending order—in the example above, 2,5,6,7,8.

Include the edges into the spanning tree in ascending order. Start with 2, add 5. Next is 6. This creates a cycle, so we cannot add it. (To detect whether an edge creates a cycle, mark each vertex that has been included in the tree. If the next edge has both end points marked, it will form a cycle). So we skip 6, add 7. Now we have m-1 edges and we stop with the spanning tree Tree 1 above.

This may not always work so smoothly. When we pick up the next smallest edge, neither end point may be marked already. Thus, we have an intermediate stage that is a forest, rather than a tree.

```         o
7 / \ 8
/ 5 \   3
o-----o-----o
\   /
2 \ / 6
o
```

Here the edges in sorted order are {2,3,5,6,7,8}. We start with 2 and add 3. These two edges form disjoint components (a forest). If we now consider 5, we can add it though both edges are marked. We use different "colours" to mark each component. An edge that connects marked nodes of different colour can be added --- the components get merged, so convert one colour to the other to signify a new single component.

``` After adding {2,3}             After adding {2,3,5}

3                          5     3
o     o-----o                    o-----o-----o
BLUE \      RED                       \  BLUE
2 \                              2 \
o                                o
```

Implementing this algorithm requires a good data structure to keep track of components, merge them etc. This can be done efficiently using the disjoint-set data structure.

Complexity

1. Sorting takes O(m log m) time
2. We have to examine, in the worst case, all m edges. To stay within the sorting cost, the set operations should work in log m time.

### Video material

Video lecture on Kruskal's algorithm, NPTEL online course on Design and Analysis of Algorithms (Lecture slides)