Data Mining the Genetics of Leukemia
MetadataShow full item record
Acute Lymphoblastic Leukemia (ALL) is the most common cancer in children under the age of 15. At present, diagnosis, prognosis and treatment decisions are made based upon blood and bone marrow laboratory testing. With advances in microarray technology it is becoming more feasible to perform genetic assessment of individual patients as well. We used Singular Value Decomposition (SVD) on Illumina SNP, Affymetrix and cDNA gene-expression data and performed aggressive attribute se- lection using random forests to reduce the number of attributes to a manageable size. We then explored clustering and prediction of patient-specific properties such as disease sub-classification, and especially clinical outcome. We determined that integrating multiple types of data can provide more meaningful information than individual datasets, if combined properly. This method is able to capture the cor- relation between the attributes. The most striking result is an apparent connection between genetic background and patient mortality under existing treatment regimes. We find that we can cluster well using the mortality label of the patients. Also, using a Support Vector Machine (SVM) we can predict clinical outcome with high accu-racy. This thesis will discuss the data-mining methods used and their application to biomedical research, as well as our results and how this will affect the diagnosis and treatment of ALL in the future.