Improving Code Search Using Learning-to-Rank and Query Reformulation Techniques

Loading...
Thumbnail Image

Authors

Niu, Haoran

Date

2015-07-24

Type

thesis

Language

eng

Keyword

Code Search

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

During the process of software development, developers often encounter unfamiliar programming tasks. Online Q&A forums, such as StackOverflow, are one of the resources that developers can ask for answers to their programming questions. Automatic recommendation of a working code example can be helpful to solve developers’ programming questions. However, existing code search engines support mainly keyword-based queries, and do not accomodate well natural-language code search queries. Specifically, natural-language queries contain less technical keywords, i.e., class or method names, which negatively affects the success of the code search process of existing code search engines. On the other hand, a code search engine requires a ranking schema to place relevant code examples at the top of the result list. However, existing ranking schemas are hand-crafted heuristics where the configurations are hard to determine, which leads to the difficulty in using them for new languages or frameworks. In this paper, we propose the approach which uses query reformulation techniques to improve the search effectiveness of existing code search engines for naturallanguage queries. The approach automatically reformulate natural-language queries using class-names with semantic relations. We also propose an approach to automatically train a ranking schema for the code example search using the learning-to-rank technique. We evaluate the proposed approaches using a large-scale corpus of code examples. The evaluation results show that our approaches can effectively recommend semantically related class-names to reformulate natural-language queries, and the improvement on the search effectiveness over existing query reformulation approaches is statistically significant. The automatically trained ranking schema can effectively rank code examples, and outperform the existing ranking schemas by 35.65% and 48.42% in terms of normalized discounted cumulative gain (NDCG) and expected reciprocal rank (ERR), respectively.

Description

Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2015-07-23 16:44:28.655

Citation

Publisher

License

Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
Creative Commons - Attribution - CC BY
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.

Journal

Volume

Issue

PubMed ID

External DOI

ISSN

EISSN