Queen's University - Utility Bar

QSpace at Queen's University >
Graduate Theses, Dissertations and Projects >
Queen's Graduate Theses and Dissertations >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1974/6242

Authors: NAHLAWI, Layan

Files in This Item:

File Description SizeFormat
Layan_Nahlawi_201012_Masters.pdf899.13 kBAdobe PDFView/Open
Keywords: SNP Selection
Fast Orthogonal Search
Independent Component Analysis
Genetic Data Analysis
Issue Date: 2010
Series/Report no.: Canadian theses
Abstract: The recent decade has witnessed great advances in microarray and genotyping technologies which allow genome-wide single nucleotide polymorphism (SNP) data to be captured on a single chip. As a consequence, genome-wide association studies require the development of algorithms capable of manipulating ultra-large-scale SNP datasets. Towards this goal, this thesis proposes two SNP selection methods; the first using Independent Component Analysis (ICA) and the second based on a modified version of Fast Orthogonal Search. The first proposed technique, based on ICA, is a filtering technique; it reduces the number of SNPs in a dataset, without the need for any class labels. The second proposed technique, orthogonal search based SNP selection, is a multivariate regression approach; it selects the most informative features in SNP data to accurately model the entire dataset. The proposed methods are evaluated by applying them to publicly available gene SNP datasets, and comparing the accuracies of each method in reconstructing the datasets. In addition, the selection results are compared with those of another SNP selection method based on Principal Component Analysis (PCA), which was also applied to the same datasets. The results demonstrate the ability of orthogonal search to capture a higher amount of information than ICA SNP selection approach, all while using a smaller number of SNPs. Furthermore, SNP reconstruction accuracies using the proposed ICA methodology demonstrated the ability to summarize a greater or equivalent amount of information in comparison with the amount of information captured by the PCA-based technique reported in the literature. The execution time of the second developed methodology, mFOS, has paved the way for its application to large-scale genome wide datasets.
Description: Thesis (Master, Computing) -- Queen's University, 2010-12-15 18:03:00.208
URI: http://hdl.handle.net/1974/6242
Appears in Collections:Queen's Graduate Theses and Dissertations
School of Computing Graduate Theses

Items in QSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


  DSpace Software Copyright © 2002-2008  The DSpace Foundation - TOP