Automated Generation of Language Use Vector Extractors from TXL Grammars

Loading...
Thumbnail Image

Authors

Vishwambar, Karthik

Date

Type

thesis

Language

eng

Keyword

LUV , RCLUV , Rich-Contextualized Language Use Vectors , Language Use Vectors , language feature , Static Analysis , TXL

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

Software development continues to demand advanced features which current programming languages do not offer. Language developers constantly evolve their languages to simplify common tasks, make the language more natural to use, and extend its features in response to the demands of software development. Language features are often not utilized efficiently in software development due to various obstacles. Existing software source code provides vital information on programming language use and the frequency with which the developers utilize language features. This has encouraged many researchers to examine the use of language features. TXL is a domain-specific language used in research focused on software analysis. TXL has proven to be an excellent option for constructing language parsers with which researchers can efficiently extract information on language use. Language use vectors, which encode language use data, can be a crucial feature in language use studies and software analysis, because they precisely represent language use statistics in compact mathematical form. Language use vectors can be derived directly from the grammar of the target programming language, which gives them an edge over other metrics employed in language use studies. In all the prior studies, researchers have manually constructed TXL feature analyzers for each target language, a process that significantly delays analysis projects. This thesis explores the possibility of automating the construction of TXL language use extractors from a given language grammar. In this way, language feature analyzers for new programming lanuguages can be rapidly and accurately built with little or no manual effort. We present an automated process for generating TXL-based extractors directly from programming language grammars, and demonstrate its use in analyzing language use in large corpora of three programming languages, Java, Ruby and C.

Description

Citation

Publisher

License

Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.

Journal

Volume

Issue

PubMed ID

External DOI

ISSN

EISSN