Deep Cancer Classifier: Exploring microRNA Regulation of Cancer with Deep Learning
Abstract
Background: MicroRNAs (miRNAs) are small, non-coding RNAs that negatively regulate gene expression. Differential expression observed in miRNAs, combined with advancements in deep learning (DL), have the potential to improve cancer classification by modelling non-linear miRNA-phenotype associations. We propose a novel miRNA-based deep cancer classifier (DCC) incorporating genomic and hierarchical tissue annotation, capable of accurately predicting the presence of cancer in wide range of human tissues.
Methods: miRNA expression profiles were analyzed for 2530 neoplastic and 5184 non-neoplastic samples, across 38 organ types involving 78 organ sub-structures and 173 cell types. Specificity of miRNA expression was explored in relation to tissue type and neoplasticity, adjusting for sampling bias using three levels of hierarchical annotation. A DL architecture composed of stacked autoencoders (AE) and a multi-layer perceptron (MLP) was trained to predict neoplasticity using 845 high-confidence miRNAs. Additional DCCs were trained using expression of miRNA cistrons and sequence families, and combined as a diagnostic ensemble. Predictive importance of miRNAs was measured using backpropagation, and top miRNAs analyzed in Cytoscape using iCTNet and BiNGO.
Results: Performance of the DCC was tested on an unseen, randomly selected set of 1511 samples. The model classified cancer with 94.73% accuracy, 98.6% AUC/ROC, 95.1% sensitivity, and 94.3% specificity. A concise assay of the 20 most predictive miRNAs achieved 85.0% accuracy, 93.3% AUC/ROC, 92.3% sensitivity and 74.9% specificity.
Conclusion: Deep autoencoder networks are a powerful tool for modelling complex miRNA-phenotype associations in cancer. The proposed DCC improves classification accuracy by learning from the biological context of both samples and miRNAs, using genomic and anatomic annotation. Analyzing the trained DCC also provides estimates of miRNAs importance for cancer prediction, which can be used for feature selection and biological discovery, by performing gene ontology searches on the most highly significant features.
URI for this record
http://hdl.handle.net/1974/25856Request an alternative format
If you require this document in an alternate, accessible format, please contact the Queen's Adaptive Technology CentreThe following license files are associated with this item: