Training Material

## Sorting

We want to sort an array with N elements. Naive algorithms such as bubblesort and insertion sort take N2 operations to sort the array. Better sorting algorithms work in time proportional to N log N. Is this improvement worth it?

Consider the problem of searching for a value in an array of size 10000. We can either sort the array and then use binary search (that takes time log N) or do an exhaustive linear search.

Exhaustive search takes time N, in general, since we may have to scan the entire array. For our example, we can thus search for a single value in 10000 steps.

Binary search takes time log N (all our logs are to the base 2). Recall that 210 = 1032 so log 1000 is approximately 10. Since 10000 = 1000 * 10, log 10000 = log 1000 + log 10 = 10 + 4 = 14.

If we are searching for a single value, we can use exhaustive search and do the job in 10000 steps. If we sort the array using an N2 sorting algorithm and then use binary search, we spend 10000*10000 = 108 steps sorting it, followed by 14 steps to search.

If we repeat this search 10000 times, we would spend 10000*10000 operations doing exhaustive search and 10000*10000 + 14*10000 = 10014*10000 operations doing sorting plus binary search. For any value above 10014, sorting will start to make itself effective---if we check when 10000*M = 10000*10000 + 14*M, we get M a little above 10014.

On the other hand, suppose we use an N log N sorting algorithm. Then, the sort phase takes time 14*10000. So, to see when sorting+binary search beats exhaustive search, we need to check when 10000*M = 14*10000 + 14*M, which is approximately when M = 14. Thus, if we do even 15 searches, it helps to sort.

What if the array grows from 10000 to 100000? Since log 100000 is about 17, the equations we have are

• N2 sorting: 100000*M = 100000*100000 + 17*M
• N log N sorting: 100000*M = 100000*17 + 17*M

Thus, we need to about 100017 exhaustive searches to justify justify an N2 sorting algorithm, but only 17 searches to justify an N log N sorting algorithm.

More importantly, if we use an N log N sorting algorithm, for values much less than the size of the array, we can use sorting to significantly speed up the search process.

The reason why bubblesort etc are inefficient is that they duplicate comparisons. One way to avoid this is to break up the array into non overlapping segments, sort these and then combine these portions efficiently.

For instance, suppose we have a box with slips of paper with numbers between 100 and 300 to be sorted. We could first separate the slips into two bunches: those with values below 200 and those with values above 200. We can sort these bunches separately. Since all the values in the second bunch dominate those in the first bunch, we can combine them easily into a single sorted bunch.