Solution to Problem 1: Find the longest palindrome

The naive idea is to consider every subword of the given word, check if it is a palindrome and finally find the longest. There are about N*N subwords for a word of length N and checking whether a word is a palindrome can done in time proportional to the length of the word. So this problem can be solved easily in about N*N*N steps. However, this problem can be solved in N*N steps. In order to do this, we look at a related problem: Given two words w and x (of length M and N respectively), we say y is a common subword of w and x if y is a subword of x and y is a subword of w. How do you find the longest common subword of two words? We can do this in time proportional to N*M using dynamic programming. Here is how: Let the two words be w_1 w_2 ... w_M and x_1 x_2 ... x_N. Let L(i,j) denote the length of the longest subword of w beginning at position i that is also a subword of x beginning at position j. Example: -------- If w is abbcda and x is ababcad then L(1,1) = 2 (since w_1 w_2 = x_1 x_2 = ab and w_3 ≠ x_3) L(1,2) = 0 (since w_1 = a and x_2 = b) L(3,2) = 1 (since w_3 = b, x_2 = b and w_4 ≠ x_3) and so on. How quickly can we compute L(i,j) for 1 ≤ i ≤ M, 1 ≤ j ≤ N ? The simplest idea would to find L(i,j) by scanning the two words (starting at positions i and j in w and x respectively). This would take time proportional to the length of the common subword and thus, the overall algorithm would take time proportional to N*M*(minimum(M,N)). But a couple of simple observations will let us do much better. First of all, if w_i ≠ x_j then L(i,j) = 0. If w_i = x_j then 1) if y_1 y_2 ... y_k is a common subword beginning at position i in w and j in x then y_2 y_3 ... y_k is a common subword beginning at position i+1 in w and j+1 in x. 2) if y_2 y_3 ... y_k is a common subword beginning at positions i+1 and j+1 (in w and x respectively) then w_i y_2 ... y_k = x_j y_2 ... y_k is a common subword beginning at positions i and j (in w and x respectively). Thus if w_i = x_j then L(i,j) = L(i+1,j+1) + 1. So we have the following recursive equations for L(i,j) L(i,j) = 0 w_i ≠ x_j L(i,j) = L(i+1,j+1) + 1 otherwise. We would like to use these equations to compute L(i,j). From the dependency expressed by the above equations it is clear that we need to have the values of L(i,j) for higher values of i and j before computing it for lower values. ---------------------------------------------------------------- | L(1,1) | L(1,2) | ... | L(1,N) | |----------|----------|-------------------------------|----------| | L(2,1) | L(2,2) | ... | L(2,N) | |----------|----------|-------------------------------|----------| | . | . | | . | | . | . | | . | | . | . | | . | | | | | | | | | | | |----------|----------|-------------------------------|----------| | L(M-1,1) | L(M-1,2) | ... | L(M-1,N) | |----------|----------|-------------------------------|----------| | L(M,1) | L(M,2) | ... | L(M,N) | ---------------------------------------------------------------- In attempting to compute L(i,j) we might need L(i+1,j+1) which in turn needs L(i+2,j+2) and so on till either the first coordinate reaches M or the second coordinate reaches N (i.e. we move along a line parallel to the main diagonal till we either reach the right most column or the bottom most row). Entries on the rightmost column are of the form L(i,N). But this can take only two values --- L(i,N) is 0 if w_i ≠ x_N and is 1 otherwise. Similarly, the last row contains elements of the form L(M,j) and these values are determined as --- L(M,j) is 0 if w_M ≠ x_j and is 1 otherwise. Now that L(i,j) depends only on the value of L(i+1,j+1), if either all the the entries on either row i+1 or coloumn j+1 are computed then L(i,j) can be computed. Here is the algorithm to compute all the L(i,j)'s using this idea. -------------------------------------------------------------------------- /* We assume that we have two arrays W and X with w stored as W[1]W[2]... W[M] and X stored as X[1]X[2]...X[N] */ for j=1 to N { /* Fill the last row */ L[M][j] = 0 if W[M] ≠ X[j] = 1 if W[M] = X[j] } for i=1 to M { /* Fill the last column */ L[i][N] = 0 if W[i] ≠ X[N] = 1 if W[i] = X[N] } for j=N-1 to 1 { /* Fill column j from bottom to top */ for i=M-1 to 1 { L[i][j] = 0 if W[i] ≠ X[j] = L[i+1][j+1]+1 if W[i] = W[j] } } Find the maximum value of L[i,j] and print it. (that is the length of the longest common subsequence of w and x) ------------------------------------------------------------------- The nested for loops take upto about N*N steps. (Finding the maximum also takes and additional N*N steps, but it can be done as we compute the L[i][j]s.) Exercise: -------- A subsequence of a word w is any word obtained by deleting some letters from w. For example, the subsequences of the word abca are, abca, abc, bca, aca, aba, ca, ab, aa, bc, ba, ac, a, b, c, a and the empty word. We say that y is a common subsequence of two words w and x if it is a subsequence of w as well as x. How do you find the length of the longest common subsequence of two given words? ---------------------------------------------------------------------- We now return back to the longest palindrome problem. If w is the word w_1 w_2 ... W_M, we write Rev(w) for the reverse of w: Rev(w) = w_M w_{M-1} ... w_1 Notice, that any subword y of w that is a palidrome would also be a subword of Rev(w) and is hence a common subword of w and Rev(w). For example, consider the word w = ababbabbaa. babbab is subword of this word ( w = a babbab baa) and it is also a palindrome. Rev(w) = aabbabbaba (which can be broken up as aab babbab a). However, the longest palindrome need not be the longest common subword of w and Rev(w). If w = aabcdabdcbadc then Rev(w) = cdabcdbadcbaa and the longest common subword of w and Rev(w) is abcd (also cdab,dcba) and has length 4. However, the longest palindrome in w is aa and has length 2. The point is that if y is a palindrome subword of w starting at position i and ending at position j then it is a subword of Rev(w) starting at position N-j+1 and ending at position N-i+1. This is because, if u is the subword between positions i and j in x then Rev(u) is the subword between positions N-j+1 and N-i+1 in Rev(w). In the above example, N is 13. In w, aa appears between positions 1 and 2 and in Rev(w) it appears between positions 13-2+1 = 12 and 13-1+1 = 13. Conversely, if the subword u appearing between positions i and j in w is the same as the subword appearing between positions N-j+1 and N-i+1 in Rev(w) then u is a palindrome subword (the word appearing between N-j+1 and N-i+1 in Rev(w) is Rev(u), u=Rev(u) means u is a palindrome). But the subword between positions i and j in w and positions N-j+1 and N-i+1 in Rev(w) are equal if and only if the length of the longest common subword of w and Rev(w) starting at positions i and N-j+1 respectively is at least j-i+1. Thus, to check if the subword between positions i and j in w is a palindrome it suffices to check whether the length of the longest common subword of w and Rev(w) beginning at position i and N-j+1 respectively is at least j-i+1. How do we do this? We first run the dynamic programming algorithm for the longest common subword of w and Rev(w) to compute the matrix L. Now, to determine if the subword of w between positions i and j is a palindrome it suffices to check whether L(i,N-j+1) ≥ j-i+1. This can be done in just one step. Thus, after computing the array L, it takes only one unit of time for each i, j to check if the subword between positions i and j is a palindrome. We can find the maximum but examining all paris i, j and thus, once the array L is computed, we may compute the maximum length palindrome subword in time proportional to N*N. Computation of the array itself takes time proportional to N*N. Thus the overall time taken is still proportional to N*N. Here is the detailed algorithm: ---------------------------------------------------------------------- /* We assume that we have an array W, with w stored as W[1]W[2]... W[N] */ for i=1 to N R[i] = W[N-i+1] /* Store Rev(w) in X */ Compute the Array L[i][j] 1≤i,j≤N, as described in the algorithm to compute the longest common subword of the words in W and X (w and Rev(w) respectively). L[i,j] is the length of the longest common subword of w and Rev(w) beginning at position i in w and j in Rev(w). max = 0 for i = 1 to N for j = N to i { /* check if the subword between i to j is a palidrome if (L[i,N-j+1] $ge; j-i+1) { /* a palindrome subword is found */ if (max < j-i+1) max=j-i+1 } } print(max) -------------------------------------------------------------------- Exercise: Modify that above algorithm to eliminate the variable max. Search through the subwords so that the first palindrome subword found will be of maximum possible length. ---------------------------------------------------------------------- Having convinced you to read all this about finding longest common subwords, here is a much easier solution that works in time proportional to N*K, where K is the length of the longest palindrome subword. This solution was suggested by Saatvik Agarwal. Here are the details: How to find the longest odd length palindrome subword? Simple. For each position i, find the length of the longest odd palindrome subword with position i as the centre. That can be done by first comparing the letter W[i-1] with W[i+1], W[i-2] with W[i+2] and so on till you either reach the end of the word, or encounter a mismatch: Longest_at(i) { length=0; k=1; while (W[i-k] = W[i+k]) k=k+1; /* We assume that W[0] and W[N+1] have been set in such a way that they will never match each other or any of the letters in the word */ return (k-1); } Notice that this procedure takes time proportional to the K_i where K_i is the length of the longest odd palindrome with center at i. Thus by calling Longest_at(i) we can determine the length of the longest odd length palindrome subword in time bounded by N*K where K is the length of the longest odd length palindrome subword in W. For even length palindromes, we consider those positions i such that W[i] = W[i+1], and scan for the smallest k such that W[i-k] ≠ W[i+1+k]. That gives the length of the longest even length palindrome with it centers at positions i, i+1. The details are left as an exercise. Thus, the length of the longest even length palindrome can be determined in time bounded by N*K where K is the length of the longest even length palindrome subword. -------------------------------------------------------------------------------

Copyright (c) IARCS 2003-2018; Last Updated: 25 Jul 2005