Deep Cancer Classifier: Exploring microRNA Regulation of Cancer with Deep Learning

Loading...
Thumbnail Image

Authors

Pyman, Blake

Date

Type

thesis

Language

eng

Keyword

Artificial Intelligence , Deep learning , miRNA , Cancer diagnosis , Bioinformatics , Transcriptomics

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

Background: MicroRNAs (miRNAs) are small, non-coding RNAs that negatively regulate gene expression. Differential expression observed in miRNAs, combined with advancements in deep learning (DL), have the potential to improve cancer classification by modelling non-linear miRNA-phenotype associations. We propose a novel miRNA-based deep cancer classifier (DCC) incorporating genomic and hierarchical tissue annotation, capable of accurately predicting the presence of cancer in wide range of human tissues. Methods: miRNA expression profiles were analyzed for 2530 neoplastic and 5184 non-neoplastic samples, across 38 organ types involving 78 organ sub-structures and 173 cell types. Specificity of miRNA expression was explored in relation to tissue type and neoplasticity, adjusting for sampling bias using three levels of hierarchical annotation. A DL architecture composed of stacked autoencoders (AE) and a multi-layer perceptron (MLP) was trained to predict neoplasticity using 845 high-confidence miRNAs. Additional DCCs were trained using expression of miRNA cistrons and sequence families, and combined as a diagnostic ensemble. Predictive importance of miRNAs was measured using backpropagation, and top miRNAs analyzed in Cytoscape using iCTNet and BiNGO. Results: Performance of the DCC was tested on an unseen, randomly selected set of 1511 samples. The model classified cancer with 94.73% accuracy, 98.6% AUC/ROC, 95.1% sensitivity, and 94.3% specificity. A concise assay of the 20 most predictive miRNAs achieved 85.0% accuracy, 93.3% AUC/ROC, 92.3% sensitivity and 74.9% specificity. Conclusion: Deep autoencoder networks are a powerful tool for modelling complex miRNA-phenotype associations in cancer. The proposed DCC improves classification accuracy by learning from the biological context of both samples and miRNAs, using genomic and anatomic annotation. Analyzing the trained DCC also provides estimates of miRNAs importance for cancer prediction, which can be used for feature selection and biological discovery, by performing gene ontology searches on the most highly significant features.

Description

Citation

Publisher

License

Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
Attribution-NonCommercial 3.0 United States

Journal

Volume

Issue

PubMed ID

External DOI

ISSN

EISSN