The second puzzle from last time was the following: Puzzle 2 Are there Nondeterministic Finite Automata (NFA) with $n$ states that accept all strings of length less than or equal to $n$ but not all of $\Sigma^*$?
A.A. Diwan supplies the counter example on the right. All states in this NFA are final states. The shortest string which this NFA does not accept is $abab$. By cascading this NFA, it is possible to show a general superlinear lower bound.
For figure, see postscript file.
However, the situation is actually much worse. Eric Allender pointed out that automata that work over computation sequences of space-bounded Turing Machines suffice to solve this problem. The simplification (?) described here was arrived at by K. Narayan Kumar during discussions with Meena Mahajan.
It turns out that there is a constant $c$ such that there are NFAs of size $cn$ that accept all strings of length less than or equal to $2^{n}$ but not all of $\Sigma^*$. Any DFA of $n$ states that accepts all strings of length $\leq n$ must accept $\Sigma^*$. The determinisation of an NFA produces a DFA of size at most $2^n$ and so any NFA of size $n$ accepting all strings of length less than or equal to $2^n$ must necessarily accept $\Sigma^*$. Hence, the situation for NFAs is almost as bad as it can get.
We first describe the result over alphabets of size $2^n$ and then show how the proof can be modified to work over constant size alphabets. Consider $\Sigma = \{0,1\}^n$. Let $b_i$ denote the $n$-bit binary representation of the number $i$. Then $\Sigma = \{b_0,b_1, ... ,b_{2^n -1}\}$. Consider a finite automaton that rejects an input if and only if it is the $2^n$ length sequence $b_0 b_1 ... b_{2^n -1}$.
The automaton operates as follows: It checks whether the first letter is $b_0$ and accepts otherwise. It then nondeterministically checks whether the $i+1$st letter of the input is obtained from the $i$th letter by adding one. If not, it accepts. If the last letter is not $b_{2^n -1}$ it accepts. It does not accept otherwise. The only string that this automaton rejects is $b_0 b_1 ... b_{2^n -1}$.
Unfortunately this automaton has size $2^n$: to verify that the $i+1$st letter is one more than the $i$th one, it has to remember the $i$th letter and the alphabet is of size $2^n$. Hence there is a need for an efficient technique for detecting whether the $i+1$st letter should follow $i$th letter (in the fixed sequence we have in mind).
We solve this problem by ``expanding'' our sequence and using nondeterminism to compare just small parts of these strings (constant sized substrings) in order to verify that the $i+1$st letter is allowed to follow the $i$th.
We now consider sequences over $\Sigma = (\{0,1\} \times \{0,1\})^n$.
A typical element looks like
0100000...
1010101...
The idea is to expand the original sequence with the details of how to add $1$ to the bit vector $b_i$ to obtain $b_{i+1}$. The bottom row holds the number and the top row the information about carry bits. For instance, for $n=3$, in the sequence $b_0 b_1 ... b_{2^n -1}$ the letter $b_4 = 100$ would immediately follow the string $b_3 = 011$. Over the new alphabet the sequence $b_3 b_4$ would be replaced by following string:
000 001 010 100 000
011 011 010 000 100
We refer to the letter which contains all $0$s in the top row and $b_i$ in the bottom row as $d_i$. The letters that appear between $d_i$ and $d_{i+1}$ describe the process of incrementing $b_i$ to $b_{i+1}$ using carries. We want our new automaton to reject the string $d_0 ... d_1 ... ... d_{2^n -1}$ corresponding to the original string $b_0 b_1 ... b_{2^n -1}$.
Our new finite automaton works as follows: It verifies that the
first letter is indeed $d_0$. It then skips some letters (going
to the accepting state whenever a letter with more than one 1 in
the top row is encountered). While skipping a letter it also
nondeterministically chooses a $j \in \{1, ... ,n\}$ and
remembers the block
a b
c d
located at the $j$th position.
1 2 3 4 ... j ~... n
0 1 0 0 ... ab ... 0
1 1 0 1 ... cd ... 1
(At any point the position $j$ and the corresponding entries $a,b,c$
and $d$ in the letter last read are stored in the state.)
The automaton then nondeterministically goes to a state where it verifies the correctness of the $j$th element of the next letter---the correct value can be determined from the values of $a,b,c$ and $d$ of the previous letter. If the value is incorrect then it accepts. Finally, it also accepts if the last letter is not $d_{2^n -1}$.
The only sequence which this automaton does not accept is the one which starts with $d_0$ and incrementally leads up to $d_{2^n -1}$.
The part of the automaton that checks that the first letter is $d_0$
has size 1. The second part that skips over its input while
remembering the index $j$ as well as the block
a b
c d
at position $j$ has size $O(n)$. The last part that verifies the
addition from one step to the next has constant size. In all the
automaton has size $O(n)$ and the length of the only string it
rejects is more than $2^n$.
How can we modify the construction to work over a constant sized alphabet? Simple: just use the fact that each letter in $\Sigma$ can be treated as a word of length $n$ over the alphabet $\{0,1\} \times \{0,1\}$. Take $\Sigma' = \{\{0,1\} \times \{0,1\}\} \cup \{\#\}$. Any string over this alphabet can be considered as a sequence of strings over $\{\{0,1\} \times \{0,1\}\}$ separated by $\#$s. The new automaton works over $\Sigma'$ and rejects exactly the string that represents the sequence of strings $d_0 ... d_1 ... ... d_{2^n -1}$.
Like the earlier automaton, this has an inital part that verifies that
the first string (the string up to the first $\#$) is indeed $d_0$.
This part is of size $O(cn)$. The second part skips
nondeterministically through a number of strings, accepting whenever
more than one 1 is encountered in the first component between any pair
of $\#$'s. It also guesses and remembers the position $j$ together
with the block
a b
c d
at position $j$ in the last string it encountered. At some point
it nonderministically moves in to a third component that verifies
that the $j$th letter of the next string can follow the ones in
its memory. If not it accepts. The automaton also accepts if the
last string is not $d_{2^n -1}$. It is not difficult to check
that the second and third components have size $O(n)$ so that the
entire automaton is of size $cn$ for some $c$. Further the
construction is uniform in that for each $n$ the constant $c$
does not change.
Thus, there are automata of size $cn$ for each $n$ which accept all of $\Sigma^*$ except for one string of length $> 2^n$, where $\Sigma$ is of size $5$.