Define a data word to be a word over an alphabet of the form , where is a finite set, and is an infinite set. The first coordinate is called the label and the second coordinate is called the data value. The idea is we will be able to test labels explicitly by asking questions like “does the second letter have as its label?” but we will only be allowed to test the data value for equality e.g. ask “do the third and fifth letters have the same data value?”
By abuse of notation, we assume that a word over the alphabet is also a data word, here there are no labels. Here are some examples of languages of data words, in all of these examples we use no labels:
In this lecture, we introduce automata models for data words that capture the properties above. We begin with a nondeterministic register automaton.
Definition. A nondeterministic register automaton consists of:
• a finite alphabet for the labels;
• a finite set control states;
• a finite set of register names;
• an initial state and a set of accepting states ;
• a transition relation
subject to the equivariance condition described below.
The automaton is used to accept or reject data words where the alphabet is . After processing part of the input, the automaton keeps track of a configuration, which consists of a control state and a register valuation (i.e. a partial function from register names to atoms). Initially, the configuration consists of the initial state and a completely undefined register valuation. The configuration is then updated according to the transition relation , and the automaton accepts if at the end of the word the control state belongs to the accepting set.
The only issue is how to describe the transition relation. Since the space of configurations is infinite, the transition relation cannot be any relation. We choose the following restriction, called equivariance: the transition relation can only compare data values with respect to equality. Equivariance can be formalized in two different ways below:
The advantage of semantic equivariance is that the definition is short. The advantage of syntactic equivariance is that it shows how the transition relation can be represented. The following straightforward lemma is given without proof.
Lemma. Semantics and syntactic equivariance are the same.
This completes the definition of nondeterministic register automata: the transition relation is required to be equivariant in either of the two equivalent senses defined above. The transition relation is called deterministic if the source configuration and the input letter determine uniquely the target configuration.
Recall the five example languages. Deterministic register automata can 1,4,5. Nondeterministic register automata can do 1,2,4,5. In particular, deterministic register automata are strictly weaker than nondeterministic ones, and nondeterministic ones are not closed under complement.
Theorem. Emptiness is decidable for nondeterministic register automata, but universality is not.
Proof. When talking about decidability, we assume that the transition function is represented according to the syntactic equivariance condition.
Let us begin with the decidability argument. Define the equality type of a configuration to be the following information: the control state, which registers are defined, and which registers store the same data value. It is not difficult to see that if a configuration of some equality type is reachable in the automaton, then all configurations of this equality type are also reachable. The algorithm for nonemptiness computes the equality types of reachable configurations. Intitially, we have the equality type of the unique initial configuration, which can be easily computed. If we have the equality type of some configuration, we can easily compute the equality types of all configurations reachable from it in one step; thus finishing the description of the algorithm.
Let us do the undecidability. We reduce from the halting problem. Suppose that we have a Turing machine which is an instance of the halting problem. We encode a run of a Turing machine as a data word according to the following following picture:
Each letter encodes a single cell in a single configuration. The word represents a sequence of configurations, padded with blanks so that they all have the same length, and separated by a letter #. The labels are used to store the contents of the cell (light blue), plus the control state (dark blue) of the head if the head happens to be over that cell. Finally, each cell gets a unique identifier, a data value (red). We claim that there is a nondeterministic register automaton which accepts a data word if and only if it is not an encoding of an accepting run of the Turing machine, and therefore solving universality for this automaton also solves the halting problem. To prove the claim, we list the mistakes that can happen in a word that does not encode an accepting run of a Turing machine:
1. The data values identifying the cells are chosen wrong. This means that either:
• the separator # is used with more than one data value; or
• some data value appears in two different places, with successor positions having data values .
The first condition can be tested using one register, the second condition using two registers.
2. There is a mistake between two consecutive configurations. Assuming the identifiers are chosen correctly, this can be tested using only one register, to tell which cells correspond to which ones in the following configuration.
3. The first configuration is not initial, or the last configuration is not accepting. For this, no registers are needed.
Leave a Reply