## State Complexity of Tree Automata

##### Abstract

Modern applications of XML use automata operating on unranked trees. A common definition of tree automata operating on unranked trees uses a set of vertical states that define the bottom-up computation, and the transitions on vertical states are determined by so called horizontal languages recognized by finite automata on strings. The bottom-up computation of an unranked tree automaton may be either deterministic or nondeterministic, and further variants arise depending on whether the horizontal string languages defining the transitions are represented by DFAs or NFAs. There is also an alternative syntactic definition of determinism introduced by Cristau et al.
It is known that a deterministic tree automaton with the smallest total number of states does not need to be unique nor have the smallest possible number of vertical states. We consider the question by how much we can reduce the total number of states by introducing additional vertical states. We give an upper bound for the state trade-off for deterministic tree automata where the horizontal languages are defined by DFAs, and a lower bound construction that, for variable sized alphabets, is close to the upper bound.
We establish upper and lower bounds for the state complexity of conversions between different types of deterministic and nondeterministic unranked tree automata. The bounds are, usually, tight for the numbers of vertical states. Because a minimal deterministic unranked tree automaton need not be unique, establishing lower bounds for the number of horizontal states, that is, the combined size of DFAs used to define the horizontal languages, is challenging. Based on existing lower bound results for unambiguous finite automata we develop a lower bound criterion for the number of horizontal states.
We consider the state complexity of operations on regular unranked tree languages. The concatenation of trees can be defined either as a sequential or a parallel operation. Furthermore, there are two essentially different ways to iterate sequential concatenation. We establish tight state complexity bounds for concatenation-like operations. In particular, for sequential concatenation and bottom-up iterated concatenation the bounds differ by an order of magnitude from the corresponding state complexity bounds for regular string languages.