NLQ INTO SQL TRANSLATION USING COMPUTATIONAL LINGUISTICS
This research discusses an automatic translation of an unstructured Natural Language Question (NLQ) into a Structured Query Language (SQL) statement. SQL is used as a Relational DataBase (RDB) interaction language with special query syntax and a computer-executable artificial language. This way, DataBase (DB) administrators or general users with little or no SQL querying abilities can perform queries on RDBs in an interactive manner. The Human-Computer Interaction (HCI) happens using users’ NLQs, which is in English in the proposed research. Users do not need to know any RDB schema elements or structures such as tables’ names, relationships, formats, attributes or data types. The RDB schema is a brief description of the RDB elements’ organization, excluding any RDB values. In this work, a lightweight NLQ into SQL translation approach is implemented by utilizing an RDB MetaTable. The main goal is to exploit a manually written rule-based mapping constraints algorithm. This algorithm maps NLQ tokens’ semantic/syntactic information into RDB elements’ semantic roles (i.e., value, attribute) via pairing and matching means. The matching RDB elements, called “identified lexica”, are then mapped into the SQL clauses consistently for SQL generation and execution. The matching process uses a computational linguistic analysis mapping algorithm, represented in the MetaTables. This mapping algorithm is efficient especially with small RDBs with an accuracy of 95% and is about 93% accurate with larger RDBS.