Prioritizing SNPs for Disease-Gene Association Studies: Algorithms and Systems
Lee, Phil Hyoun
MetadataShow full item record
Identifying single nucleotide polymorphisms (SNPs) that are involved in common and complex diseases, such as cancer, is a major challenge in current molecular epidemiology. Knowledge of such SNPs is expected to enable timely diagnosis, effective treatment, and, ultimately, prevention of human disease. However, the tremendous number of SNPs on the human genome, which is estimated at more than eleven million, poses challenges to obtain and analyze the information of all the SNPs. In this thesis we address the problem of selecting representative SNP markers for supporting effective disease-gene association studies. Our goal is to facilitate the genotyping and analysis procedure, associated with such studies, by providing effective prioritization methods for SNP markers based on both their allele information and functional significance. However, the problem of SNP selection has been proven to be NP-hard in general, and current selection methods impose certain restrictions and use heuristics for reducing the complexity of the problem. We thus aim to develop new heuristic algorithms and systems to advance the state-of-the-art, while relaxing the restrictions. To address this challenge, we formulate several SNP selection problems and present novel algorithms and a database system based on the two major SNP selection approaches: tag SNP selection and functional SNP selection. Furthermore, we describe an innovative approach to combine both tag SNP selection and functional SNP selection into one unified selection process. We demonstrate the improved performance of all the proposed methods using comparative studies.