Indian National Olympiad in Informatics

Related:
Online Programming Contest, 5-6 February 2005

IARCS home > OLYMPIAD

Advanced Division

Solution to Problem 1: Find the longest palindrome
The naive idea is to consider every subword of the given word,
check if it is a palindrome and finally find the longest.  There
are about N*N subwords for a word of length N and checking
whether a word is a palindrome can done in time proportional to
the length of the word. So this problem can be solved easily in
about N*N*N steps.

However, this problem can be solved in N*N steps. In order to do
this, we look at a related problem:

Given two words w and x (of length M and N respectively), we say
y is a common subword of w and x if y is a subword of x and y is
a subword of w. How do you find the longest common subword of two
words?  We can do this in time proportional to N*M using dynamic
programming.  Here is how:

Let the two words be w_1 w_2 ... w_M and x_1 x_2 ... x_N.  Let
L(i,j) denote the length of the longest subword of w beginning at
position i that is also a subword of x beginning at position j.

Example:
--------

If w is abbcda and x is ababcad then

  L(1,1) = 2 (since w_1 w_2 = x_1 x_2 = ab and w_3 ≠ x_3)
  L(1,2) = 0 (since w_1 = a and x_2 = b)
  L(3,2) = 1 (since w_3 = b, x_2 = b and w_4 ≠ x_3) 

  and so on.

How quickly can we compute L(i,j) for 1 ≤ i ≤ M, 1 ≤ j
≤ N ?  The simplest idea would to find L(i,j) by scanning the
two words (starting at positions i and j in w and x
respectively).  This would take time proportional to the length
of the common subword and thus, the overall algorithm would take
time proportional to N*M*(minimum(M,N)).

But a couple of simple observations will let us do much better.

First of all, if w_i ≠ x_j then L(i,j) = 0.  

If w_i = x_j then  

  1) if y_1 y_2 ... y_k is a common subword beginning at 
     position i in w and j in x then y_2 y_3 ... y_k is  a 
     common subword beginning at position i+1 in w and j+1 in x. 
  2) if y_2 y_3 ... y_k is a common subword beginning at positions
     i+1 and j+1 (in w and x respectively) then 
     w_i y_2 ... y_k = x_j y_2 ... y_k  is a common subword
     beginning at positions i and j (in w and x respectively).

Thus if w_i = x_j then L(i,j) = L(i+1,j+1) + 1.


So we have the following recursive equations for L(i,j)

   L(i,j) = 0    w_i ≠ x_j

   L(i,j) = L(i+1,j+1) + 1   otherwise.

We would like to use these equations to compute L(i,j).  From the
dependency expressed by the above equations it is clear that we
need to have the values of L(i,j) for higher values of i and j
before computing it for lower values.


    ----------------------------------------------------------------
   |  L(1,1)  |  L(1,2)  |  ...                          |  L(1,N)  |
   |----------|----------|-------------------------------|----------|
   |  L(2,1)  |  L(2,2)  |  ...                          |  L(2,N)  |
   |----------|----------|-------------------------------|----------|
   |     .    |      .   |                               |     .    |
   |     .    |      .   |                               |     .    |
   |     .    |      .   |                               |     .    |
   |          |          |                               |          |
   |          |          |                               |          |
   |----------|----------|-------------------------------|----------|
   | L(M-1,1) | L(M-1,2) |  ...                          | L(M-1,N) |
   |----------|----------|-------------------------------|----------|
   |  L(M,1)  |  L(M,2)  |  ...                          |  L(M,N)  |
    ----------------------------------------------------------------

In attempting to compute L(i,j) we might need L(i+1,j+1) which in
turn needs L(i+2,j+2) and so on till either the first coordinate
reaches M or the second coordinate reaches N (i.e. we move along
a line parallel to the main diagonal till we either reach the
right most column or the bottom most row).

Entries on the rightmost column are of the form L(i,N). But this
can take only two values --- L(i,N) is 0 if w_i ≠ x_N and is 1
otherwise.  Similarly, the last row contains elements of the form
L(M,j) and these values are determined as --- L(M,j) is 0 if w_M
≠ x_j and is 1 otherwise.

Now that L(i,j) depends only on the value of L(i+1,j+1), if
either all the the entries on either row i+1 or coloumn j+1 are
computed then L(i,j) can be computed.

Here is the algorithm to compute all the L(i,j)'s using this idea.

--------------------------------------------------------------------------

/* We assume that we have two arrays W and X with w stored as 
   W[1]W[2]... W[M] and X stored as X[1]X[2]...X[N] */

   for j=1 to N {   /* Fill the last row */
       
          L[M][j] = 0  if W[M] ≠ X[j]
                  = 1  if W[M] = X[j]

         }

   for i=1 to M { /* Fill the last column */

          L[i][N] = 0  if W[i] ≠ X[N]
                  = 1  if W[i] = X[N]

         }

   for j=N-1 to 1 { /* Fill column j 
                       from bottom to top */
      
        for  i=M-1 to 1 { 
          
                 L[i][j] = 0              if W[i] ≠ X[j]
                         = L[i+1][j+1]+1  if W[i] = W[j]

                }

        }

   
   Find the maximum value of L[i,j] and print it.
   (that is the length of the longest common subsequence of w and x)

-------------------------------------------------------------------
The nested for loops take upto about N*N steps. (Finding the maximum
also takes and additional N*N steps, but it can be done as we compute
the L[i][j]s.) 

Exercise: 
--------

A subsequence of a word w  is any word obtained by deleting some
letters from w. For example, the subsequences of the word abca are, 
abca, abc, bca, aca, aba, ca, ab, aa, bc, ba, ac, a, b, c, a and the empty
word.  We say that y is a common subsequence of two words w and x if
it is a subsequence of w as well as x. How do you find the length of the
longest common subsequence of two given words?

----------------------------------------------------------------------

We now return back to the longest palindrome problem.  If w is the word
w_1 w_2 ... W_M, we write Rev(w) for the reverse of w:

    Rev(w) = w_M w_{M-1} ... w_1

Notice, that any subword y of w that is a palidrome would also be a subword
of Rev(w) and is hence a common subword of w and Rev(w).

For example, consider the word w = ababbabbaa. babbab is subword of this word
( w = a babbab baa) and it is also a palindrome. 

     Rev(w) = aabbabbaba  (which can be broken up as  aab babbab a).

However, the longest palindrome need not be the longest common subword of
w and Rev(w). If w = aabcdabdcbadc then Rev(w) = cdabcdbadcbaa and the longest
common subword of w and Rev(w) is abcd (also cdab,dcba) and  has length 4.  
However, the longest palindrome in w is aa and has length 2.  

The point is that if y is a palindrome subword of w starting at position i
and ending at position j then it is a subword of Rev(w) starting at position
N-j+1 and ending at position N-i+1.  This is because, if u is the subword
between positions i and j in x then Rev(u) is the subword between positions
N-j+1 and N-i+1 in Rev(w).  In the above example, N is 13. In w,
aa appears between positions 1 and 2 and in Rev(w) it appears between positions
13-2+1 = 12 and 13-1+1 = 13.

Conversely, if the subword u appearing between positions i and j in w  is
the same as the subword appearing between positions N-j+1 and N-i+1 in Rev(w)
then u is a palindrome subword (the word appearing between N-j+1 and
N-i+1 in Rev(w) is Rev(u), u=Rev(u) means u is a palindrome).

But the subword between positions i and j in w and positions N-j+1 and N-i+1
in Rev(w) are equal if and only if the length of the longest common subword
of w and Rev(w) starting at positions i and N-j+1 respectively is at least 
j-i+1.

Thus, to check if the subword between positions i and j in w is a palindrome
it suffices to check whether the length of the longest common subword
of w and Rev(w) beginning at position i and N-j+1 respectively is at least
j-i+1.  

How do we do this?

We first run the dynamic programming algorithm for the longest common
subword of w and Rev(w) to compute the matrix L.

Now, to determine if the subword of w between positions i and j is
a palindrome it suffices to check whether L(i,N-j+1) ≥ j-i+1. This
can be done in just one step. Thus, after computing the array L,
it takes only one unit of time for each i, j to check if the subword
between positions i and j is a palindrome. We can find the maximum
but examining all paris i, j and thus, once the array L is computed,
we may compute the maximum length palindrome subword in time proportional
to N*N. Computation of the array itself takes time proportional to N*N.
Thus the overall time taken is still proportional to N*N.

Here is the detailed algorithm:

----------------------------------------------------------------------
/* We assume that we have an array W, with w stored as
   W[1]W[2]... W[N]                                   */
   
   for i=1 to N
       R[i] = W[N-i+1]    /* Store Rev(w) in X */

   Compute the Array L[i][j] 1≤i,j≤N, as described in the
      algorithm to compute the longest common subword of the words
      in W and X (w and Rev(w) respectively). L[i,j] is the length
      of the longest common subword of w and Rev(w) beginning at
      position i in w and j in Rev(w).


   max = 0

   for i = 1 to N
      for j = N  to i {  /* check if the subword between i to j is a palidrome
       
          if (L[i,N-j+1] $ge; j-i+1) { /* a palindrome subword is found */

               if (max < j-i+1) max=j-i+1
   
             }
        }
   print(max)

--------------------------------------------------------------------

Exercise: Modify that above algorithm to eliminate the variable max.
Search through the subwords so that the first palindrome subword found
will be of maximum possible length.

----------------------------------------------------------------------

Having convinced you to read all this about finding longest common subwords,
here is a much easier solution that works in time proportional to N*K, where K
is the length of the longest palindrome subword.  This solution was suggested
by Saatvik Agarwal.  Here are the details:

How to find the longest odd length palindrome subword?

Simple. For each position i, find the length of the longest odd palindrome
subword with position i as the centre. That can be done by first comparing the
letter W[i-1] with W[i+1], W[i-2] with W[i+2] and so on till you either reach
the end of the word, or encounter a mismatch:


        Longest_at(i) {

                length=0;

                k=1;

                while (W[i-k] = W[i+k])  k=k+1;

                /* We assume that W[0] and W[N+1] have been set
		in such a way that they will never match each
		other or any of the letters in the word */

                return (k-1);

        }

Notice that this procedure takes time proportional to the K_i where K_i is the
length of the longest odd palindrome with center at i.  Thus by calling
Longest_at(i) we can determine the length of the longest odd length palindrome
subword in time bounded by N*K where K is the length of the longest odd length
palindrome subword in W.

For even length palindromes, we consider those positions i such that
W[i] = W[i+1], and scan for the smallest k such that W[i-k] ≠ W[i+1+k].  That
gives the length of the longest even length palindrome with it centers at
positions i, i+1. The details are left as an exercise.  Thus, the length of the
longest even length palindrome can be determined in time bounded by N*K where K
is the length of the longest even length palindrome subword.

-------------------------------------------------------------------------------