The Schützenberger theorem shows which languages are definable in first-order logic. What about less expressive logics? One direction is to study formulas where the quantification pattern is somehow limited. This idea has been seen a lot of work, of which we only describe here the simplest quantification patterns.
Existential formulas. As a warmup, we will talk about existential formulas. Define an existential, also known as formula, to be a first-order formula of the form
where is quantifier free and uses predicates as in the Schützenberger theorem, namely unary predicates for the labels and a binary predicate for the order. For example, the existential formula
defines the words where some is followed by some
.
When talking about full first-order logic, the successor predicate does not need to be added to the vocabulary, because it can be defined in terms of the order predicate. However, for existential formulas, this is not the case, and therefore existential formulas which allow both order and successor would have different power than existential formulas with only order. We will not talk about the successor here, although it is studied.
Define the Higman ordering on words to be the ordering where holds if one can remove some letters, not necessarily in a connected way, to go from
to
. This is also known as the subsequence ordering. For example “ila” is a subsequence of “Mikolaj”. We will use the following result:
Lemma (Higman’s Lemma) For a fixed alphabet, the Higman ordering is a well quasi-order, which means that it is well founded and it does not have infinite antichains.
Proof. For a finite or infinite sequence of words define a growth to be positions
such that the word
is a subsequence of
. Define a growth-free sequence to be a sequence of words without a growth. Antichains and violations of well-foundedness are examples of growth-free sequences, and therefore to prove the lemma it suffices to show that there is no infinite growth-free sequence.
Define the radix ordering on words as follows: shorter words come before longer ones, and same length words are ordered lexicographically. To prove the lemma, suppose that an infinite growth-free sequence sequence exists. Define a sequence by induction as follows. The word
is the least word, in the radix ordering (which is well-founded, so it makes sense to talk about least words), which is the beginning of some infinite growth-free sequence. For
, define
to be the least word in the radix ordering such that
is the beginning of some infinite growth-free sequence, in particular
is a finite growth-free sequence. Growth-free sequences are closed under limits, and therefore
is an infinite growth free sequence.
Consider the sequence defined in the previous paragraph. Because the alphabet is finite, there must be some letter, call it
, such that infinitely many words in the sequence, say with indexes
, begin with the letter
. Define a new sequence as follows. Let
be the first index such that
begins with
. Remove from the sequence all words after
that do not begin with
, and for all words that do begin with
, remove the first letter. It is not difficult to show that the new sequence is also growth-free, but it contradicts the construction of
because we have just shortened the word
, and therefore decreased its position in the radix ordering.
From Higman’s Lemma, we obtain the following result.
Theorem. A language is definable by an existential formula if and only if it is upward closed with respect to the Higman ordering.
Proof. The left-to-right implication is immediate, because adding letters can only make an existential formula more true. Let us focus on the right-to-left implication. For a set of words , let us write
for its upward closure with respect to the Higman ordering. Suppose that
is upward closed. Define
to be the elements of
that are minimal with respect to the Higman ordering. Clearly
is an antichain, and therefore by the Higman Lemma it is finite. Finally, observe that
The inclusion is a consequence of the assumption that
is upward closed, while for the
inclusion we use well-foundedness of the Higman ordering to prove that for every word
, there is some minimal word that is smaller than it. It is easy to see that for every finite set, its upward closure is definable by an existential formula, and therefore
is definable by an existential formula.
This completes our study of languages defined by existential formulas. Let us move to a more interesting case, namely Boolean combinations of existential formulas.
Boolean combinations of existential formulas. Boolean combinations of existential formulas, sometimes also known as piecewise testable languages, are a much more interesting class. In particular, every finite language is a boolean combination of existential formulas. For example, the language which contains only the word abaa is equal to the set of words which contain abaa as a subsequence, but which do not contain any subsequence of length 5. A beautiful result by Imre Simon characterises piecewise testable languages as follows.
Theorem (Piecewise is -trival). A language is piecewise testable if and only its syntactic monoid is
-trivial, i.e. all
-classes are singletons.
A corollary of the above theorem is that one can decide if a language is piecewise testable (whatever the use of that could be), by computing the syntactic monoid, and then computing the -classes.
We will not prove this theorem, but a very similar one, which uses separation. We say that two languages are separated by a language
if
contains
but is disjoint with
. We will prove the following result.
Theorem (Piecewise Separation). It is decidable if two regular languages can be separated by a piecewise testable one.
A corollary of the above theorem is the same as the corollary of the theorem on piecewise being -trivial, namely one can decide if a language is piecewise testable. This is because a language is piecewise testable if and only if it can be separated from its complement by a piecewise testable language.
The proof of the Piecewise Separation Theorem will consist of two parts. In part 1, we will show that piecewise separability can be characterised in terms of something called zigzags. In part 2, we will show that zigzags can be characterised by a certain game.
Leave a Reply