Improving Code Search Using Learning-to-Rank and Query Reformulation Techniques
Loading...
Authors
Niu, Haoran
Date
2015-07-24
Type
thesis
Language
eng
Keyword
Code Search
Alternative Title
Abstract
During the process of software development, developers often encounter unfamiliar
programming tasks. Online Q&A forums, such as StackOverflow, are one of the
resources that developers can ask for answers to their programming questions. Automatic recommendation of a working code example can be helpful to solve developers’ programming questions. However, existing code search engines support mainly
keyword-based queries, and do not accomodate well natural-language code search
queries. Specifically, natural-language queries contain less technical keywords, i.e.,
class or method names, which negatively affects the success of the code search process
of existing code search engines. On the other hand, a code search engine requires a
ranking schema to place relevant code examples at the top of the result list. However, existing ranking schemas are hand-crafted heuristics where the configurations
are hard to determine, which leads to the difficulty in using them for new languages
or frameworks.
In this paper, we propose the approach which uses query reformulation techniques to improve the search effectiveness of existing code search engines for naturallanguage queries. The approach automatically reformulate natural-language queries
using class-names with semantic relations. We also propose an approach to automatically train a ranking schema for the code example search using the learning-to-rank technique. We evaluate the proposed approaches using a large-scale corpus of code examples. The evaluation results show that our approaches can effectively recommend
semantically related class-names to reformulate natural-language queries, and the improvement on the search effectiveness over existing query reformulation approaches
is statistically significant. The automatically trained ranking schema can effectively
rank code examples, and outperform the existing ranking schemas by 35.65% and
48.42% in terms of normalized discounted cumulative gain (NDCG) and expected
reciprocal rank (ERR), respectively.
Description
Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2015-07-23 16:44:28.655
Citation
Publisher
License
Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
Creative Commons - Attribution - CC BY
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
Creative Commons - Attribution - CC BY
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.