Index Structures for XML Databases
MetadataShow full item record
Extensible Markup Language (XML) is a de facto standard for data exchange in the World Wide Web. Indexing plays a key role in improving the execution of XML queries over that data. In this thesis we discuss the three main categories of indexes proposed in the literature to handle the XML semistructured data model, and identify limitations and open problems related to these indexing schemes. Based on our findings, we propose two novel XML index structures to overcome most of these limitations: a native index structure called Level-based Tree Index for XML databases (LTIX) and a relational index structure called Universal Index Structure for XML data (UISX). A proper labeling scheme is an essential part of a well-built XML index structure. We found that existing labeling schemes are not suitable for our index structures and therefore propose a novel labeling scheme, Level-based Labeling Scheme (LLS), which has the advantages of most popular types of labeling schemes while eliminating the main disadvantages. We then combine our LLS labeling scheme with our index structures. An evaluation shows that LLS performs well in comparison to existing labeling schemes using different mappings to relational tables. We propose the LTIX to minimize the number of joins and matches required to evaluate twig queries, and also to facilitate effective query optimization through early pruning of the space search. Our experimental results show that this approach performs well in comparison to existing state-of-the-art approaches. We propose the UISX to overcome the key problem with the state-of-the-art approaches, namely that they cannot support efficient processing of twig queries without requiring significant storage. We use a light-weight native XML engine on top of an SQL engine to perform the optimization related to the structure of the XML data prior to shredding. Experimental results show that our approach achieves lower response times than other similar approaches while using less space to store XML data.