Indian Computing Olympiad

Training Material


Directed Acyclic Graphs

Basic definitions

In a directed graph, each edge has a direction: one end point is the source and the other end point is the target. If G is directed, we could have an edge from v to w and another from w to v. This does not violate our assumption of simple graphs. For directed graphs, we separately talk of the "outdegree" and the "indegree" of a vertex, instead of just the degree. The outdegree of v is the number of directed edges leaving v while the indegree of v is the number of directed edges entering v.

A directed graph that does not have any (directed) cycles is called a directed acyclic graph (dag).

Topological sort

A topological sort of a dag involves listing out the vertices in a sequence so that there is no dag edge going from any vertex in the sequence to an earlier vertex in the sequence.

For instance, consider this dag, in which all edges are oriented from left to right.

        -->d-------->c
       /        /     \
      a-------->b      -->g
       \     /   \    /
         -->e------->f
      

Here is a sorted sequence of the type we want:

        a e d b c f g
      

In general, there will be more than one such sequence that is compatible with the dag. Another topologically sorted sequence for the graph above is:

        a d e b f c g
      

How do we compute a topologically sorted sequence?

In a dag, there must be at least one vertex with no incoming edge, or a vertext with indegree 0. (Suppose not. Then for each vertex, we can move backwards through an incoming edge. There are only n vertices overall. If we walk backwards n times, we must hit a vertex we have seen before, which means there is a cycle, so the graph is not a dag.)

We could have multiple source vertices. We classify all these as level 0. Level 1 consists of all vertices whose incoming neighbours are of level 0. In general, level i+1 consists of all vertices whose incoming neigbhours have a maximum level of i.

In other words, we are ranking vertices as follows:

  • rank(u) = 0 if u is a source vertex
  • rank(u) = i if the longest path from some source vertex to u is of length i

In a topological sort, every vertex of rank i has at least one incoming edge from a vertex of rank i-1, so vertices of rank i should be listed only after listing vertices of rank i-1. Topological sort thus involves listing out vertices in order of rank, where different vertices of the same rank may be listed in any order.

Computing the rank is similar to doing a simultaneous BFS from all the source vertices. we maintain a queue, as in BFS. Here is the algorithm.

      compute indegree(v) for all v;

      /* First process source vertices */
      for each vertex v with indegree(v) = 0{
        rank(v) = 0;
        add v to the tail of in a queue;
      }

      /* Process remaining vertices */
      until the queue is empty{
        remove the vertex v at the head of the queue;
        enumerate v and delete it from the graph;

        for each remaining vertex w{
          recompute indegree(w);
          /* Check if we have found a vertex all of whose
             predecessors have been processed */
          if indegree(w) becomes 0{
             rank(w) = rank(v) + 1;
             add w to the tail of the queue;
          }
        }
      }
      

How do we (re)compute indegree?

Initially, we can compute indegree in time O(n2) if we have an adjacency matrix representation of the dag: indegree(v) is the sum of all 1's in the column for v. With an adjacency list, we can do this it time O(m): for each edge (w,v), increment indegree(v).

After enumerating a vertex v, we have to decrement indegree(w) for each outgoing neighbour of v. So, overall, we make an indegree update once for edge in the original dag.

Hence, a more direct way of coding "delete a vertex and update indegrees of all remaining vertices" is as follows:

      /* Process remaining vertices */
      until the queue is empty{
        remove the vertex v at the head of the queue;
        enumerate v and delete it from the graph;

        for each edge (v,w){
          decrease indegree(w) by 1;
          if indegree(w) becomes 0{
            rank(w) = rank(v) + 1;
            add w to the tail of the queue
          }
        }
      }
      

Longest paths in dags

Edit ladder

Given a sequence of words w1, w2, …, wn, two words u and v are neighbours if they have edit distance 1 (that is, u can be transformed to v by changing one letter, adding one letter or deleting one letter).

      e.g. cat--cot
           rat--rot--rob
      

We orient these edges in dictionary order, so can go from cat to cot but not reverse. This defines a directed graph.

      cat-->cot

      rob---
            \ 
      rat------>rot
      

We want to find the longest path in this graph.

Solution

Construct a graph whose nodes are the words w1, … wn. There is an edge from wi to wj if wi is less than wj in dictionary order and wi and wj are neighbours. Note that this requires computing the so-called edit distance for n(n-1)/2 pairs of words.

We want the length of the longest path in this graph. In general, finding the longest path in a graph efficiently is difficult. No algorithm short of brute force enumeration of all paths is known for general graphs.

However, observe that this graph is a dag. There are no cycles because any edge goes respects the dictionary order in the list of words. For a dag, we can calculate the longest path by doing a topological sort and finding the maximum rank.

Why does this give a longest path? As we have observed, any vertex v at rank k+1 must have at least one incoming edge from a vertex at rank k&emdash;otherwise, this vertex v would have got a lower rank. Thus, for every vertex at rank n, we can walk backwards in the graph edge by edge, decreasing the rank by one each time, till we reach a vertex of indegree zero.

Video material

Video lectures on Directed acyclic graphs, NPTEL online course on Design and Analysis of Algorithms