Solution to Problem 2: The Bickering Task Force
It is convenient to model this problem as a graph. In general, any problem that involves a set of entities and relationships between pairs of them suggests graphs as a possible method to model and solve the problem. Let us recall, from the solution to the "The Great Escape Problem" of October 2004, the basic definitions regarding graphs: A graph, say G, consists of two sets, a set of vertices or nodes (V) and a set of edges (E). The set of vertices could be given finite set. Each edge is a set consisting of two vertices. We may interpret the set of edges as describing which pairs of vertices are "neighbours". Example: Here is a graph G = (V,E) with 7 vertices and 9 edges: V = { 1,2,3,4,5,6,7 } E = { {1,2},{1,3},{2,3},{3,4},{5,6},{5,7},{6,7} } It is often convenient to draw graphs with boxes or circles to represent the nodes and lines/curves connecting boxes/circles to denote edges: --- --- --- | 1 |---------| 2 |---------------| 7 | --- /--- /---\ | / / \ | /-----/ / \ | / / \ ---/ --- ---/ \ --- | 3 |--------| 4 |--------| 5 |------------| 6 | --- --- --- --- Note that the placement of the boxes or the style of drawing the edges are irrelevant. The "real" graph is given by the two sets V and E and the picture is just to help us think. For the purposes of our problem, the vertices of our graph are the nobles (1, 2, ... N). Two vertices i and j are connected by a (undirected edge) if they hate each other. Thus, the degree of a vertex i (i.e. the number of edges incident on vertex i) is the number of nobles that i hates. For example, in the above graph, noble 3 hates three other nobles, namely 1, 2 and 4. We say that a graph G'=(V',E') is a subgraph of a graph G = (V,E) if V' is a subset of V and E' is a subset of E. For example, --- --- --- --- | 3 |--------| 4 | | 5 |------------| 6 | --- --- --- --- is a subgraph of the graph G described above. The king wants us to identify a collection of nobles such that everyone in the collection hates at least K others in the collection. In a graph G = (V, E) any subset V_1 of V identifies a special subgraph G_1 of G containing the elements of V_1 as vertices with the edge set E_1 containing all the edges in E with both endpoints in V_1. G_1 is called the "induced subgraph" on V_1. The above example of subgrpahs is not an induced subgraph since the edge connecting vertices 4 and 5 has been omitted. The induced subgraph on V_1 = { 3, 4, 5, 6 } is --- --- --- --- | 3 |--------| 4 |--------| 5 |------------| 6 | --- --- --- --- The degree of a vertex v in G_1 is the number of elements in V_1 that are neighbours of v in G (or equivalently G_1). Thus, our task is to identify the largest subset V_1 such that in the induced subgraph on V_1 every vertex has degree at least K. Let us call a graph to be K-good if every vertex in the graph has degree at least K. The naive idea would be to consider each subset of V and such an algorithm would take time proportional to 2^N. We can do much better. Suppose, every noble hated at least K other nobles, then the minister could simply appoint every noble to the committee. Otherwise, there is a nonempty set D_0 containing all the nobles who hate fewer than K nobles each. Clearly they cannot be part of any valid committee that the minister will constitute. So we need to drop them and get V_1 = V - D_0 and he might as well restrict his attention to the nobles in this set in trying to constitute his committee. If every noble in V_1 hates at least K other nobles in V_1 then the minister may choose the set V_1 to be his committee. Otherwise, the subset D_1 of V_1, consisting of all the nobles in V_1 which hate fewer than K nobles in G_1 is not empty. Clearly any member of D_1 cannot belong to any committee (since no member of D_0 can, and once the members of D_0 are not in the committee no member of D_1 can have at least K members he hates in the committee). Thus, the minister should drop all the nobles in D_1 from V_1 and focus on the set of nobles V_2 = V_1 - D_1 to constitute his committee. Continuing in this manner will lead us to our solution. In other words, we come up with two sequences of sets of vertices V = V_0, V_1, ... V_k and D_0, D_1, ... D_k such that 1) D_i is the set of vertices in G_i (the induced subgraph on V_i) whose degree is less than K and 2) V_{i+1} is V_i - D_i. Eventually we will find a k with D_k being the empty set (since there are only finitely many elements in V_0). Then, every vertex in V_k has at least K neighbours in G_k (i.e.) G_k is a K-good graph. (Thus, the nobles corresponding to the vertices in V_k can be picked to form a committee meeting the requirement.) Is this the largest set with this property? We first show, by induction on j, that if v is in D_j then it cannot appear in any K-good induced subgraph of G. No member of D_0 can appear in any K-good induced subgraph of G_0 by definition. Suppose this is true for D_0, ... D_i. We exclude all the vertices in D_0 U ... U D_i from V to get V_{i+1} and by the definition of D_{i+1} there is no K-good induced subgraph of G_{i+1} that contains any vertex from D_{i+1}. Any K-good induced subgraph of G is also an induced subgraph of G_{i+1} (by the induction hypothesis) and this means that it cannot contain any vertex from D_{i+1}. Thus, any K-good induced subgraph of G is also a (K-good) induced subgraph of G_k. But G_k itself is a K-good subgraph and thus it is the largest K-good subgraph of G. How efficiently can we compute G_k ? The direct method is to compute G_0, G_1, ... G_k. Computing G_{i+1} from G_i involves deleting all the vertices whose degree is less than K from G_i. In time proportional to E one can determine the degree of each vertex in G_0. In time proportional to N one can determine the vertices to be deleted and update the degrees of the undeleted vertices in time proportional to E. Thus the computation of the entire sequence takes time proportional to E*N (since k is bounded by N). However, we now show that the computation of G_k can be done in time proportional to E. The idea is to maintain a queue where all the vertices that have been identifed as "to be deleted" are placed before they are actually "deleted". Initially the queue is initialized with all the vertices of G whose degree is below K. In each iteration a vertex is picked up from the queue, it is marked as deleted and the degree of all its neighbours is reduced by 1. If this results in the degree of any of its neighbours dropping below K, then add that vertex to the queue. The procedure ends when the queue is empty. The undeleted vertices will constitute V_k. Here are the details: ----------------------------------------------------------------------- /* Determine the degree of all the vertices. If the graph is stored as an adjacency matrix, then this will take time proportional to N*N, since counting the number of neighbours will need a scan of a row of the matrix. If the graph is stored as an adjacency list, then it takes time proportional to E. */ for i=1 to N { Deg[i] = number of neighbours of i. } /* Head and tail are set to ensure that the Q is emtpy */ H = 1; T = 0; /* Initialise queue with all vertices with degree < K */ for i=1 to N { if (Deg[i] < K) { T=T+1; Q[T]=i; InQ[i] = True; /* Remember that i is in the Q*/ } } While (H ≤ T) { /* while the queue is not empty */ v = Q[H]; H = H+1; /* delete v from the Queue */ /* Move v from Q to deleted status */ InQ[v] = False; /* v is removed from Q*/ Deleted[v] = True; /* Mark v as deleted */ for each neighbour w of v { /* For as yet unqueued/undeleted neighbours*/ if ((InQ[w]=False) and (Deleted[w]=False)) { Deg[w]=Deg[w]-1; /* reduce degree */ /* Move it to Q if necessary */ if (Deg[w] < K) { T=T+1; Q[T]=w; InQ[w]=True; } } } } Print the number of undeleted vertices --------------------------------------------------------------------------- Notice that each vertex enters and leaves the queue at most once. Thus, the body of the while loop is executed at most once for each vertex. Moreover, the inner for loop takes at most time proportional to N (if the graph is stored as an adjacency matrix and hence requires a scan of the row). Thus, this algorithm takes time bounded by N*N. We now show that if the graph is stored using what are called adjacency lists then this algorithm would run in time proportional to |E|. In the adjacency matrix representation, it takes time proportional to N to examine all the neighbours of a given vertex v. It turns out that there is an alternative representation that allows us to examine the neighbours of a vertex v in time proportional to the degree of v. This turns out to be extremely useful in many algorithms. What is an adjacency list? Let the vertices of the graph be 1 ... N. For each vertex i, we maintain an array Ai whose elements are the neighbours of the vertex i. In effect, we maintain a two dimensional array A[][] where the elements of the ith row are the neighbours of i. Observe that not all rows will have equal number of elements. To keep track of the number of elements in each row, we maintain a array D where D[i] is the degree of i (i.e. the number of neighbours of i). For example, the adjacency lists corresponding the example graph given at the beginning of this section is the follwoing: A 1 2 3 4 5 6 i D[i] --------------------- --------- 1 | 2 3 1 2 2 | 2 7 1 2 3 3 | 1 2 4 3 3 4 | 3 5 4 2 5 | 4 6 7 5 3 6 | 5 7 6 2 7 | 2 5 6 7 3 Notice that the number ``used'' entries in the matrix A is just two times the number of edges in the graph. However, when we use a 2 dimensional matrix to store the adjacency lists, we need to define A as a N x N-1 matrix as it is possible that some vertex may have upto N-1 neighbours. (It is possible to use pointers and store these much more efficiently. However, we shall not concern ourselves with pointers in this section.) Once a graph is stored in the adjacency list form, in order to process the neighbours of a vertex v it suffices to do for i=1 to D[v] process(A[v][i]) which clearly takes time proportional to D[v]. However note that if we need to determine whether there is an edge between v and a vertex w then we need to scan the adjacency lists of v (or w) to see if w (or v) appears there. Thus, checking whether an edge exists takes upto N steps if we use the adjacency lists representation. On the other hand, in the adjacency matrix representation this can be done in one step by examining the A[v][w] entry. So, the choice of representation should depend on the needs of the algorithm. We now modify the above algorithm to work with adjacency lists. The queue stored the ``candidates'' for deletion as usual. The array Deg stores the number of neighbours of a vertex that have not been marked as deleted. ----------------------------------------------------------------------- /* We assume that the graph is stored in the adjacency lists notation using arrays A[][] and D[]. */ /* Deg will store the number of neighbours of i that have not been marked as deleted. Initial Deg[i] = D[i] */ for i=1 to N { Deg[i] = D[i] } /* Head and tail are set to ensure that the Q is emtpy */ H = 1; T = 0; /* Initialise queue with all vertices with degree < K */ for i=1 to N { if (Deg[i] < K) { T=T+1; Q[T]=i; InQ[i] = True; /* Remember that i is in the Q*/ } } While (H ≤ T) { /* while the queue is not empty */ v = Q[H]; H = H+1; /* delete v from the Queue */ /* Move v from Q to deleted status */ InQ[v] = False; /* v is removed from Q*/ Deleted[v] = True; /* Mark v as deleted */ /* for each neighbour of v in G do */ for i=1 to D[v] do { w = A[v][i]; /* If it is still unqueued and not marked as deleted */ if ((InQ[w]=False) and (Deleted[w]=False)) { Deg[w]=Deg[w]-1; /* reduce degree */ /* Move it to Q if necessary */ if (Deg[w] < K) { T=T+1; Q[T]=w; InQ[w]=True; } } } } Print the number of undeleted vertices --------------------------------------------------------------------------- As observed earlier the outer while loop executes at the most N times as in each iteration at least one new vertex is marked as deleted. In each iteration, of the while loop we examine the neighbours of exactly one vertex, the newly deleted vertex, in the inner for loop. Each vertex is deleted at the most once. Thus across all the iterations of the while loop the inner for loop is executed at the most D[1] + D[2]+ ... D[N] times. But this is just 2*|E|. Thus, this algorithm runs in time proportional to |E|. Notice that if E has say 10 * N elements this is a considerable improvement over N*N. Thus, if there are not too many edges in the graph the algorithm using adjacency lists does significantly better. However, if the graph is complete, then it has of the order of N*(N-1)/2 edges and then its performance is not significantly better than that of the one using adjacency matrix. Adjacency lists can be used to improve the running time of the DFS and BFS algorithms describe earlier to run in time proportional to |E| instead of N*N. We encourage the reader to work out the details of these modifications and convince themselves that the modified algorithms do run in time proportional to |E|. We shall be using these facts in some of the algorithms that follow.