Integrative Analysis of Transcriptomic Data Applied to Prostate Cancer Metastasis
Abstract
Prostate cancer is the most common non-dermatological cancer amongst men in the developed world. The disease is manageable if detected early; treatment is thus becoming highly individualized, placing emphasis on detection and prediction of disease prognosis. This thesis is concerned with the integrative analysis of gene expression data from prostate cancer, to reveal molecular signatures of metastases and the mechanisms of disease progression.
Meta-analytic procedures are used to integrate three datasets and compare primary tumors based on metastatic outcome. Four datasets are also integrated to compare primary and metastatic tumour tissue types. This statistical integration provides a more robust and accurate characterization of gene expression signatures, and helps minimize microarray noise and study-specific effects. Multiple methods of integration are explored. Once integrated, a subset of significantly differentiated transcripts is selected to form a tentative expression signature. A support vector machine was used to construct a predictive model of metastatic outcome based on the identified expression signature. Its performance was assessed using a nested cross-validation procedure and out-of-sample testing.
Data integration and network analysis have proved to be useful tools in providing context to the complex systems studied in system biology. This thesis makes use of a number of such techniques. Pathway enrichment analysis with DAVID and PathDIP were used to identify which biological pathways and related functions are most influenced by the signatures. Pathways related to extracellular matrix were found to be significantly enriched in the metastatic outcome comparison. Integrating these lists using iCTnet and Cytoscape with heterogeneous disease-gene interaction networks revealed the relationships between the expression signatures and other cancers. Integrating the data with protein interaction datasets using the I2D database and Navigator network application allowed for the more robust comparison of various integration methodologies and expression signatures beyond just simple intersection. The results of the comparisons agree well with previous findings.