Indian National Olympiad in Informatics

Related:
Online Programming Contest, 2-3 October 2004

IARCS home > OLYMPIAD

Advanced Division

Solution to Problem 2: The Great Escape
With this problem begins our first foray into what is called
Graph Theory. More about that later.

We say that a pair of buildings B_1 and B_2 are adjacent if the
director has determined that it is possible to jump from B_1 to
B_2 (and vice versa).

We know the building the hero starts his search from. Call this
building B and set S_0 = { B } and clearly every building in S_0
is reachable in 0 steps by the hero.

The set of buildings that can be reached in a single step can
computed as follows. It is the set of buildings that are adjacent
to B. Call this set S_1.

What about buildings that can be reached in 2 jumps (and not in
fewer than 2 jumps)?  Every such building must be adjacent to
some building in S_1. However, not all buildings adjacent to
buildings in S_1 will be in S_2 (since they may already be in S1
or S0).  For example, if we had 4 buildings B, C, D and E and the
following pairs are adjacent:

B C
B D
C D
D E 

Then, S_0 = { B } and S_1 = { C,D }. Now, the buildings C, D, B,
and E have buildings in S_1 adjacent to them. However, since B, C
and D are already present in S_0 U S_1 (where U denotes the union
of sets), S_2 is the set { E }.

In general let S_i be the set of buildings that can be reached
from B by i jumps but not in i-1 or fewer jumps. Then,

S_0 = { B }

S_i = { C |  C is adjacent to some building in S_{i-1} and
               C is not in S_0, S_1, ... S_{i-1}               }


So, we can compute S_0, S_1, ... in that order. When do we stop?
Since S_i contains only buildings that do not belong to S_0, S_1,
... S_{i-1}, we will soon run out of buildings and for some k,
S_k will be empty.  As a matter of fact, k <= n if n is the
number of buildings.  If S_k is empty then there are no buildings
that are reachable in k steps but not fewer.  This in turn means
that there are no buildings that are reachable in k+1 steps and
not fewer and so on (Why?)

To illustrate this, consider the set of buildings B,C,D,E,F,G,H
with adjacency given by:

B C
B D
C D
D E
F G
H G
F H 

Then S_0 = { B }
     S_1 = { C,D }
     S_2 = { E }
     S_3 = { }

Thus, the buildings F,G and H are not reachable from B. C and D
are reachable in 1 step and E in two steps.

Now, how do we program this? Let the buildings be 1 ... n. We
keep an array D with D[i]=j indicating that "i belongs to S_j" or
in other words indicating that building i is at distance j from
the starting building.

We also assume that there are e adjacency relations and they are
stored in a two dimensional array E. That is, E is a two
dimensional array and The buildings E[i][1] and E[i][2], for i=1,
2, ... e are deemed to be close by the director.

So our above example with buildings B,C,D,E,F,G and H would now
consist of buildings 1,2,3,4,5,6,7 and the array E looks like:

    1 2
    1 3 
    2 3
    3 4
    5 6
    7 6
    5 7 

where the first entry on row i is E[i][1] and the second entry is
E[i][2].

We now describe the algorithm to compute the shortest distance
from building b to all the other vertices.

     for (i = 1 to n) 
          D[i] = MAXINT       /* we use MAXINT > n  
                                 to denote "undefined" */

     D[b]=0               /* We start at building b and it belongs to 
                             S0 */

     for (i=1 to n)   {     /* compute S_1,S_2,...,S_n */

           for (j=1 to e) {
           /* Check if each adjacency relation connects a building
              in S_{i-1} to a building that is not in S_0,...S_{i-1} 
              Update S_i appropriately */
                 
                 if ((D[E[j][1]] == (i-1) ) && (D[E[j][2]] == MAXINT))
                        D[E[j][2]]=i
                 /* You have found an as yet unvisited building that
                    is adjacent to an building in S{i-1}, so put that
                    in Si */
                
                 if ((D[E[j][2]] == (i-1) ) && (D[E[j][1]] == MAXINT))
                        D[E[j][1]]=i
                 /* You have found an as yet unvisited building that
                    is adjacent to an building in S{i-1}, so put that
                    in Si */

                  }

       }

If the target is to reach building t, then D[t] gives us the
length of the shortest sequence of jumps that takes the hero to
building t. If D[t] is MAXINT then there is no route by which the
hero can reach the target.

Observations: 

1) The above algorithm determines the shortest path from b to
   every building and not just t!!

2) It also determines which buildings are reachable eventually
   and which are not.

The structure of the above algorithm suggests that it takes
roughly n * e steps where n is the number if buildings and e is
the number of edges.  We can do significantly better, in
pariticular this problem can be solved in roughly e steps. Before
we describe that solution, we want to introduce the terminology
of graph theory for future reference.

A graph, say G, consists of two sets, a set of vertices or nodes
(V) and a set of edges (E). The set of vertices could be given
finite set.  Each edge is a set consisting of two vertices. We
may interpret the set of edges as describing which pairs of
vertices are "neighbours".

The buildings and close pairs define a graph. We take the set of
buildings to be the set of vertices and the set of edges consist
of all those pairs of vertices that the director has deemed fit.
The example with 7 vertices (1,2,...,7) is denoted by the graph

  G = (V,E)

where 

  V = { 1,2,3,4,5,6,7 }

  E = { {1,2},{1,3},{2,3},{3,4},{5,6},{5,7},{6,7} }

It is often convenient to draw graphs with boxes or circles to
represent the nodes and lines/curves connecting boxes/circles to
denote edges:

       ---           ---                 ---
      | 1 |---------| 2 |               | 7 |
       ---          /---                /---\
        |          /                   /     \
        |   /-----/                   /       \
        |  /                         /         \
       ---/         ---          ---/           \ ---
      | 3 |--------| 4 |        | 5 |------------| 6 |
       ---          ---          ---              ---
       
Note that the placement of the boxes or the style of drawing the
edges are irrelevant. The "real" graph is given by the two sets V
and E and the picture is just to help us think.


A path in a graph is a sequence of vertices v1,v2,...,vk such
that {v1,v2}, {v2,v3} ... {v{k-1}vk} are edges. For example,
1,2,3,4 is a path in the above graph. The length of a path is the
number of edges in the path. For example, the length of the path
1,2,3,4 is 3.

The solution to the hero problem described above is actually a
solution to the following general problem on graphs: given a
vertex u, find the length of the shortest path from u to every
other vertex v in the graph.  And the solution recast in terms of
arbitrary graphs reads as follows (and see that all I have to do
replace the word "building" by vertex and so on ..)

We assume that there are n vertices 1...n and e edges. We assume
further that the edges are stored in a two dimensional array E,
so that if E[i][1]=a and E[i][2]=b then {a,b} is an edge in the
graph.

     for (i = 1 to n) 
          D[i] = MAXINT       /* we use MAXINT > n  
                                 to denote "undefined" */

     D[b]=0               /* We start at vertex b and it belongs to 
                             S0 */

     for (i=1 to n)   {     /* compute S_1,S_2,...,S_n */

           for (j=1 to e) {
           /* Check if any edge connects a vertex 
              in S_{i-1} to a vertex that is not in S_0,...S_{i-1} 
              Update Si appropriately */
                 
                 if ((D[E[j][1]] == (i-1) ) && (D[E[j][2]] == MAXINT))
                        D[E[j][2]]=i
                 /* You have found an as yet unvisited vertex that
                    is adjacent to a verted in S_{i-1}, so put that
                    in S_i */
                
                 if ((D[E[j][2]] == (i-1) ) && (D[E[j][1]] == MAXINT))
                        D[E[j][1]]=i
                 /* You have found an as yet unvisited vertex that
                    is adjacent to a vertex in S_{i-1}, so put that
                    in S_i */

                  }

       }

A vertex i is reachable from the vertex b (i.e. there is a path
starting at vertex b and ending at vertex i) if and only if D[i]
is not MAXINT. And if it is reachable then D[i] is the length of
the shortest path from b to i.

Note: The above algorithm is called a Breadth-First Search
algorithm (or BFS) as unravels the graph level by level, where
level i consisting of all the vertices at distance i from b.
This is in contrast to a different algorithm called Depth-First
Search.

Is the above algorithm the best we can do?  The inner loop which
constructs S_i from S_{i-1} examines every edge. Is that
necessary? After all, elements of S_i are those unvisited
vertices (i.e. vertices that do no belong to S0, ... S{i-1}) that
are connected by an edge to some vertex in S_{i-1}.  So it should
suffice to examine all the vertices that are connected by edges
to vertices in S_{i-1}. In other words, it should be enough to
consider only those edges that connect element of S_{i-1} to
other vertices.

If edges are stored in the array E as assumed in the earlier
solution, then it is not possible to examine the just this subset
without scanning through the entire array E.

The idea is to store the edges in what is called an "Adjacency
Matrix".  Given a graph G with vertices 1,...,n, the edges of the
graph can be represented by a 2 dimensional array (or
equivalently a (n x n)-matrix) A defined as follows:

     A[i][j] = 1   if vertex i and vertex j are connected by an
                   edge
     A[i][j] = 0   otherwise.

Observation: If A[i][j] is 1 then A[j][i] will also be 1 (Since
our edges are subsets are undirected. i.e. an edge {i,j} and we
can use this to go from i to j as well as from j to i).

With this we rewrite the above algorithm as follows:

     for (i=1 to n)
          for (j=1 to n)
          A[i][j]=0          /* Initialize the adjacency matrix*/

     for (i=1 to e) {          /* set the adjacency matrix based on E
         A[E[i][1],E[i][2]]=1;    if {a,b} is an edge set A[a][b]=1  
         A[E[i][2],E[i][1]]=1;    and A[b][a]=1 */
         }

     for (i = 1 to n) 
          D[i] = MAXINT       /* we use MAXINT > n  
                                 to denote "undefined" */

     D[b]=0               /* We start at vertex b and it belongs to 
                             S0 */

     for (i=1 to n)   {     /* compute S1,S2,...,Sn */

             for (j=1 to n) {        /* scan vertices to find vertices
                  if (D[j] is i-1)      in S{i-1} */
                       {          /* found one. look for its unvisited
                                     neighbours*/
                         for (k=1 to n) {  
                              if (A[j][k] == 1) and (D[k]==MAXINT)
                                  /* Found one, so put it in Si */
                                  D[k]=i
                                  }
                        }
                }

      }

How much time does this take? The initialization takes time
roughly proportional to e + n*n. And now we have three levels of
nested loops and this is going to take roughly n*n*n steps and so
we get e + n*n + n*n*n which is roughly n*n*n (since e is not
more than n*n). Well, that is clearly not better than n*e which
can in the worst case be n*n*n!!!

What we needed to do was to directly identify the elements of
S_{i-1} (WITHOUT scanning the entire list of vertices) and then
for each such vertex find the list of neighbours (WITHOUT
scanning through the entire list of edges).  The adjacency matrix
offers a solution to the second problem but not to the first.

Consider the following graph with 7 vertices:

 G = (V,E)

where 

 V = { 1,2,3,4,5,6,7 }

 E = { {1,2},{1,3},{2,3},{3,4},{3,6},{4,7},{6,5} }

Here, starting at vertex 1 we get
   S0 = { 1 }
   S1 = { 2,3 }
   S2 = { 4,6 }
   S3 = { 7,5 }
   S4 = { }

The order in which we "visit" the vertices is <1 2 3 4 6 7 5> and
this sequence breaks nicely into S0,S1,... as {1}{2,3}{4,6}{7,5}.
This is not an accident and will always happen since we visit all
the elements of S_{i-1} before visiting the elements of S_i.
Moreover when we intend to explore the neighbours of a vertex v
in this sequence, we would have already explored all the
neighbours of all the vertices that appear before it in the
sequence.

Thus, if we keep track of the sequence in which vertices have
been visited and the last vertex whose neighbours have been
explored we can determine the next vertex whose neighbours need
to be explored.

Intially we have the sequence <1> and nothing is explored. So we
examine the neighbours of 1 and find that 2 and 3 are its
neighbours and we add them to the list to get <1.2 3> where I
have placed a .  after 1 to indicate that 1 is the last vertex
whose neighbours have been explored. We examine the neighbours of
2 next and get <1 2.3 > (since the only neighbour of 2 is 3 and
it is already in the list.)

Considering 3 next we get <1 2 3.4 6> and then <1 2 3 4.6 7> and
then exploring neighbours of 6 we get <1 2 3 4 6.7> and finally
exploring the neighbours of 7 we get <1 2 3 4 6 7 5.> and now
there is nothing more to explore.

Observe that we need not carry along the vertices to the left of
the . as they play no further role. Thus, we can simply drop the
vertices to the left of the . in all these sequences. Thus, we
begin with <1>, exploring the neighbours of 1 we get <2 3> then
exploring the neighbours of 2 we get <3> and then exploring the
neighbours of 3 we get <4 6> and then <6 7> followed by <7> and
finally <>.

The way updated the sequences above has the following feature: we
add elements to the right and delete elements from the left.
This similar to what happens at the queue at a railway counter or
a cinema ticket counter. You enter at one end and leave from the
other.  Thus, we need to keep the vertices in a queue, pick the
one at the head of the queue, explore its neighbours and and add
any unvisited ones to the tail of the queue and then delete the
head of the queue.

We can use an array Q and two integer variables H (for head) and
T (for tail) to do this as follows:

For instance the queue <2 7 6 3> would be represented by

   i :  1  2  3  4  
 Q[i]:  2  7  6  3 

 H = 1 and T = 4
 
Now, if we add 4 to this queue, we will store it position T+1 (5)
and increment T to 5 to get:

   i :  1  2  3  4  5
 Q[i]:  2  7  6  3  4

 H = 1 and T = 5.

Now if we delete an element from the head of this queue, the
resulting queue would look like:

   i :  1  2  3  4  5
 Q[i]:  2  7  6  3  4

 H = 2 and T = 5.

Now if we add an element say 1, we would get:

   i :  1  2  3  4  5  6
 Q[i]:  2  7  6  3  4  1

 H = 2 and T = 6.


At any point the queue consists of the elements Q[H],Q[H+1]
...Q[T].  Q[H] is at the head of the queue and Q[T] is at the
tail.

How is the empty queue represented? Simple, set T n  
                                 to denote "undefined" */

     H = 1
     T = 0                /* Initialize the queue to be empty */

     D[b]=0               /* We start at vertex b and it belongs to 
                             S_0 */
     Q[T+1]=b
     T=T+1               /* Insert b as the first element of the Queue*/
    
     While ( T >= H ) {  /* As long as the queue is not empty */
             j = Q[H]    /* Let us pick the first vertex in the queue
                          i.e. the one at the head of the queue */
                 for (k=1 to n) {  /* See if it has unvisited neighbours */
                      if (A[j][k] == 1) and (D[k]==MAXINT) { /* Found one */
                      /* Since j belongs to S_{D[j]} k must belong
                         to S_{D[j]+1}. Update D[k] to indicate that*/
                           D[k]=D[j]+1  
                          /* and insert it at the tail end of the queue*/
                           Q[T+1]=k
                           T=T+1
                           }
                      }

             H = H+1  /* Since the neighbours of j have been considered
                         delete j from the queue
         }

 
How long does this algorithm take? Notice that the inner "for"
loop is executed for a vertex j only when it is at the head of
the queue. Each vertex in the graph can be at the head at the
most once (If it ever enters the queue it will be at the head
exactly once and then it will be deleted from the queue).  Thus
the inner for loop is executed at the most once for each vertex
in the graph and so the number of steps taken by this algorithm
is roughly n*n steps.  This is a significant improvement on n*n*n
as well as n*e (since e can be as large as n(n-1)/2).

Can we do better? One place where we could optimize is the inner
for loop. To explore the neighbours of j we look at every vertex
k and check if it is a neighbour of j and then if it has been
visited.  If we could somehow keep track of the neighbours of j
then we only need to examine whether each neighbour has been
visited. It turns out that this can be done and the resulting
algorithm takes roughly e steps. We postpone a description of
that idea to a different day.