Indian Computing Olympiad

Training Material


Sorting

Mergesort

Mergesort is an example of a technique called divide-and-conquer.

  • Split the problem into two smaller parts.
  • Solve each subproblem separately.
  • Combine the solved subproblems efficiently.

In mergesort, we split the array arbitrarily at the midpoint. We sort each half separately. We then have to combine these two halves, but we are not guaranteed that all values in one half are uniformly smaller than those in the other half.

Let N = 2M be the size of the array. Suppose, after sorting the two parts, we have the array in the form i1 i2 ... iM j1 j2 ... jM where i1 i2 ... iM and j1 j2 ... jM are both sorted half arrays.

We can then merge the two sorted subarrays as follows. Compare i1 and j1 and extract the minimum value into the first position of a new array. Suppose i1 is selected. In the next step, compare i2 and j1 and extract the minimum value into the next position of the new array. Proceed in this fashion till we exhaust both halves. (If we find that the elements are equal, we take the value from the left.)

This merge operation takes time N—each comparison adds one element to the merged list and there are M+M=N items to be added in all.

To compute the overall complexity of mergesort, let T(N) denote the time taken to sort an array of N elements. The divide-and-conquer approach requires us to first sort two arrays of N/2 elements and then combine them using N steps. So

T(N) = 2*T(N/2) + N

We recursively solve the two halves using the same procedure. Thus, the same equation applies to T(N/2), so we can unravel this equation as

T(N)= 2*(2*T(N/4) + N/2) + N = 4*T(N/4) + 2N
= 4*(2*T(N/8) + N/4) + 2N = 8*T(N/8) + 3N
= 8*(2*T(N/16) + N/8) + 3N = ...

After expanding this equation K times, we get

T(N) = 2K * T(N/2K) + KN

Eventually, when K = log N, we get

T(N) = 2log N * T(1) + (log N)*N

T(1), the time to sort a 1 element array, is 1 so we can simplify the equation above to get

T(N) = N + N*(log N)

We ignore the smaller term and get T(N) to be proportional to N*(log N), which is considerably better than N2.

The problem with this algorithm is that we need an extra array of size N to merge the two halves.

Video material

Video lectures on Mergesort, NPTEL online course on Design and Analysis of Algorithms