Indian Computing Olympiad

Training Material


Greedy AlgorithmsHeaps→ Backup (APIO 2007)

You need to back up computer data for offices situated along single street. You decide to pair off the offices. For each pair of offices you run a network cable between the two buildings so that they can back up each others' data.

However, your local telecommunications company will only give you K network cables, which means you can only arrange backups for K pairs of offices (2K offices in total). No office may belong to more than one pair (that is, these 2K offices must all be different). Furthermore, the telecommunications company charges by the kilometre. This means that you need to choose these K pairs of offices so that you use as little cable as possible.

As an example, suppose you had five offices on a street, situated at distance 1, 3, 4, 6 and 12 from the beginning of the street, and the telecommunications company will provides you with K = 2 cables. The best pairing in this example is created by linking the first and second offices together, and linking the third and fourth offices together. This uses K = 2 cables as required, where the first cable has length 3 – 1 = 2, and the second cable has length 6 – 4 = 2. This pairing requires a total of 4 of network cables, which is the smallest total possible.

Given the locations of N offices and the number of cables K that are made available, compute the cost in cable length of the best K way pairing. (N≤100,000).

Solution

Observe that in the optimal solution any cable must run from an office to its neighbour. If a cable spans an office (or another connection), we can reconnect it in a shorter way.

          __________
         |          |
         w    x     y              =>   w---x     y

          _________________
         |                 |
         w    x-----y      z       =>   w---x     y-----z
      

This provides us an immediate dynamic programming solution.

Let Cable(j,m) be the optimal way of connecting offices 1..j using m cables. Then

  • Cable(j+1,m) =
                  min{
                     Cable(j,m), // Do not connect office j
                     Cable(j-1,m-1) + (dist(j) - dist(j-1)) // Connect j to j-1
                  }

This yield an N*K algorithm, which is good enough for 60% of the test cases, but not the full range.

Another approach is to greedily assign pairs according to shortest distance. This is wrong, but from the wrong solution, we can extract a better solution than the dynamic programming above.

Suppose the offices are arranged as follows

	A <-11-> B <-5-> C <-3-> D <-6->E <-14-> F <-10-> G <-9-> H <-10-> I
      

and we have K = 4 cables.

A greedy solution would first pick CD = 3. This rules out BC and DE. The next best is GH = 9. This rules out FG = 10 and HI = 10, so we have to finish with AB = 11 and EF = 14 for a total length of 3+9+11+14=37.

On the other hand, we could have picked {BC=5, DE=6, FG=10, HI=10} for a total of 31.

Notice that if the minimum edge (CD, in this case) is not in the optimal solution, both its adjacent edges must be in the optimal solution (BC and DE, in this case). Suppose this is not the case --- that is, CD is minimum, but one of BC or DE is not used in the optimum solution. Then, either C or D is still free and we can replace BC by CD or DE by CD, depending on which one is available, and get a shorter solution.

This yields the following approach. We greedily add the edge CD = 3. Then we compensate for a possible mistake by adding a "virtual" edge "BE". The edge "BE" corresponds to undoing CD and adding instead the pair {BC,DE}. The cost of BC+CD = 11. The cost of CD is 3. If we say that adding CD and then adding "BE" is the same as BC+CD, we need the cost of CD+"DE" to be 11, so the cost of "BE" is 11-3 = 8.

We now have a partial greedy solution {CD} and a new problem involving A,B,E,F,G,H,I that looks like this.

	 A <-11-> B     C **3** D     E <-14-> F <-10-> G <-9-> H <-10-> I
	          |___________________|
	                    8
     

In this problem, the virtual edge "BE" is shortest, so we add this to our greedy solution and get a new virtual edge "AF" of length 11+14-8 = 17.

        Soln = {CD,"BE"}

        A <-11-> B     C **3** D     E <-14-> F <-10-> G <-9-> H <-10-> I
        |        *                  *         |
        |         ******** 8 *******          |
         ----------------17-------------------
      

Using greedy, the next best edge is GH, so we add it and put in a virtual edge "FI" of length 10+10-9 = 11.

	 Soln = {CD,"BE",GH}

	 A <-11-> B     C **3** D     E <-14-> F <-10-> G **9** H <-10-> I
	 |        *                  *         |                         |
	 |         ******** 8 *******          |                         |
	  ----------------17------------------- -----------11------------
      

For the fourth edge, we choose the smaller of the two virtual edges "FI", and we have a solution for K = 4 as follows:

	Soln = {CD,"BE",GH,"FI"}
      

This corresponds to doing CD, undoing it and replacing it by {BC,DE}, doing GH, undoing it and replacing it by {FG,HI}, for a final solution of {BC,DE,FG,HI}, which matches the solution we pointed out earlier.

Why does this work? The idea is an extension of what we argued about the overall minimum edge and its two neighbours. For each edge we add, we preserve the possibility of undoing it in the virtual edge, using the fact that undoing a greedy choice requires adding both neighbours to the solution.

How do we implement this?

  1. . Maintain the current cost of edges in a heap. Also maintain a boolean array Used[1..N] where Used[i] is true if i has been used up already.
  2. . At each step, delete the minimum cost edge (i,j) from the heap.
    • If either i or j has already been used, we discard this edge and pick the next minimum edge from the heap.
    • If both i and j are unused, we add (i,j) to our solution, mark Used[i] = Used[j] = True and add a virtual edge (i-1,j+1) with the appropriate cost to the heap.

Thus, we spend O(N) time building the initial heap and make K updates to the heap, so the overall time is N + K log N.