Classifying and Understanding Cancer Through microRNA-Based Deep Learning
Deep Learning , microRNA , Cancer Classification
Accurate cancer classification is essential for improved mechanistic understanding, prognostication, and treatment selection. Advances in next-generation sequencing have allowed large molecular datasets to be acquired rapidly. These datasets can be used to increase our understanding of disease through careful extraction and interpretation of molecular information. microRNAs (miRNAs) are small RNA molecules that negatively regulate gene expression, and their dyresgulation is a common disease mechanism in many cancers. While a number of mRNA and protein biomarkers have been suggested for certain cancers, there has been little translation of miRNA biomarkers to a clinical setting. Through a clearer understanding of miRNA dysregulation in cancer, improved mechanistic knowledge and better treatments can be sought. However, there is currently a paucity of computational methods that can handle large next-generation sequencing data for improved cancer classification and understanding. In this work, we present three distinct deep learning architectures designed to classify cancer and extract meaningful information from miRNA profiles: a Deep Cancer Classifier (DCC), Deep Neural Map (DNM), and Graph Transformer Network (GTN). First, we use the DCC to classify neoplastic from non-neoplastic tissue from the same origin. The DCC is compared to machine learning and feature selection algorithms to demonstrate the advantages of miRNA-based deep learning. We also identify miRNA cancer biomarkers with distinct differences in neoplastic and non-neoplastic tissue through our feature selection algorithm (e.g., miR-375 in breast cancer). Next, this work is extended to unsupervised classification with the DNM, using only similarities between miRNA profiles to stratify cancer by neoplasticity status and tissue-of-origin. Activation gradients are implemented to determine miRNA biomarkers and interpret sample misclassifications, finding clusters of cancer subtypes and potential higher-grade tumours. Finally, the GTN is employed to analyze the targeting relationship between miRNA and messenger RNAs (mRNAs). The attention of the network determines both miRNA and mRNA biomarkers, as well as important targeting pathways in cancer. We show the importance of analyzing molecular data through deep learning by achieving high classification accuracy, and identifying both well-known and novel biomarkers.