Distances Between Languages: Algorithms and Descriptional Complexity
We consider the descriptional complexity of neighbourhoods of regular languages and the computational complexity of computing the distance between languages. Distance measures are defined on words to describe their similarity. These measures can be extended to languages, which are sets of words. We consider several different notions of distance measures on languages. The neighbourhood of a language L is the set of words within some fixed distance of a word in L. We show that we can construct weighted finite automata for neighbourhoods of regular languages with respect to additive quasi-distances. We consider the state complexity of the neighbourhoods and the state complexity of approximate pattern matching. We establish a tight state complexity lower bound for the edit distance neighbourhood of a language recognized by a deterministic finite automaton. We consider the deterministic and nondeterministic state complexity of prefix, suffix, and subword distance neighbourhoods. We establish tight deterministic state complexity bounds for the prefix and suffix distance neighbourhoods of regular languages and of finite languages. For prefix distance neighbourhoods, we consider prefix-convex subclasses of regular languages. We also establish the state complexity of suffix neighbourhoods of suffix-closed languages. We give an upper bound for the state complexity of the subword distance neighbourhood of a regular language. Based on the edit distance algorithms of Mohri and Han and co-authors, we give algorithms for computing the prefix distance between two regular languages, a context-free language and a regular language, and two visibly pushdown languages. We extend these algorithms to compute the inner prefix distance of regular languages and visibly pushdown languages. We give algorithms to compute the relative prefix distance of two regular languages. For a given integer k, we show that for a deterministic context-free language and a regular language or for two visibly pushdown languages, it is decidable whether their relative prefix distance is at most k. This is very close to the borderline between decidable and undecidable since, on the other hand, we show that deciding the same question for a deterministic context-free language and a visibly pushdown language is undecidable.