A Computer Language Transformation System Capable of Generalized Context-Dependent Parsing

Thumbnail Image
Thurston, Adrian
parsing , transformation , programming language , computer language , generalized parsing , context-dependent parsing
Source transformation systems are special-purpose programming languages, or in some cases suites of languages, that are designed for the analysis and transformation of computer languages. They enable rapid prototyping of programming languages, source code renovation, language-to-language translation, design recovery, and other custom analysis techniques. With the emergence of these systems a serious problem is evident: expressing a parser for common computer languages is sometimes very difficult. Source transformation systems employ generalized parsing algorithms, and while these are well suited for the kind of agile parsing techniques in use by transformation practitioners, they are not well suited for parsing languages that are context-dependent. Traditional deterministic parser generators do not stumble in this area, but they sacrifice the generalized parsing abilities that transformation systems depend on. When it is hard to get the input into the system as a correct and accurate parse tree the utility of the unified transformation environment is degraded and more ad hoc approaches become attractive for processing input. This thesis is about the design of a new computer language transformation system with a focus on enhancing the parsing system to support generalized context-dependent parsing. We argue for the use of backtracking LR as the generalized parsing algorithm. We present an enhancement to backtracking LR that allows us to control the parsing of an ambiguous grammar by ordering the productions of the grammar definitions. We add a grammar-dependent lexical solution and integrate it with our ordered choice parsing strategy. We design a transformation language that is closer to general-purpose programming languages, yet enables common transformation techniques. We add semantic actions to our backtracking LR parsing engine and encourage the modification of global state in support of context-dependent parsing. We introduce semantic undo actions for reverting changes to global state during backtracking, thereby enabling generalized context-dependent parsing. Finally, we free the user from having to write undo actions by employing automatic reverse execution. The resulting system allows a wider variety of computer languages to be analyzed. By focusing on improving parsing abilities and moving to a transformation language that resembles general-purpose languages, we aim to extend the transformation paradigm to allow greater use by practitioners who face an immediate need to parse, analyze and transform computer languages.
External DOI