Type-safe Computation with Heterogeneous Data
Huang, Freeman Yufei
MetadataShow full item record
Computation with large-scale heterogeneous data typically requires universal traversal to search for all occurrences of a substructure that matches a possibly complex search pattern, whose context may be different in different places within the data. Both aspects cause difficulty for existing general-purpose programming languages, because these languages are designed for homogeneous data and have problems typing the different substructures in heterogeneous data, and the complex patterns to match with the substructures. Programmers either have to hard-code the structures and search patterns, preventing programs from being reusable and scalable, or have to use low-level untyped programming or programming with special-purpose query languages, opening the door to type mismatches that cause a high risk of program correctness and security problems. This thesis invents the concept of pattern structures, and proposes a general solution to the above problems - a programming technique using pattern structures. In this solution, well-typed pattern structures are defined to represent complex search patterns, and pattern searching over heterogeneous data is programmed with pattern parameters, in a statically-typed language that supports first-class typing of structures and patterns. The resulting programs are statically-typed, highly reusable for different data structures and different patterns, and highly scalable in terms of the complexity of data structures and patterns. Adding new kinds of patterns for an application no longer requires changing the language in use or creating new ones, but is only a programming task. The thesis demonstrates the application of this approach to, and its advantages in, two important examples of computation with heterogeneous data, i.e., XML data processing and Java bytecode analysis.