Supervised Dimensionality Reduction of Mass Spectrometry Images
Loading...
Authors
Ritcey, Emma
Date
Type
thesis
Language
eng
Keyword
machine learning , dimensionality reduction , supervised learning , mass spectrometry , mcml
Alternative Title
Abstract
In machine learning, high-dimensional data can contain hundreds or thousands of features. Within this type of data, a small amount of meaningful information could be interspersed among an abundance of potentially irrelevant information. Working with high-dimensional data requires answering the question: how can important information be extracted from high-dimensional data? Dimensionality reduction is a method that reduces the number of features, or dimensions, in a dataset to those determined to be important and can be an essential step to be able to extract meaningful insights from high-dimensional data. One way this can be achieved is by projecting the data to a low-dimensional subspace that represents a combination of the important features.
Dimensionality reduction is commonly performed using unsupervised methods, but these methods may not project the data so that relationships of interest are preserved. Supervised dimensionality reduction uses the data labels to guide the algorithm and project the data to a low-dimensional subspace while maintaining relationships that distinguish groups of interest.
In this work, we performed supervised dimensionality reduction of high-dimensional mass-spectrometry data for maximal separation of data that had distinct labels. We assessed the quality of the dimensionality reduction by assessing the separability of the data using statistical tests and two methods of supervised classification. We compared the performance of our proposed methods to a conventional unsupervised dimensionality-reduction method. We found that supervised dimensionality reduction projected the data to be more linearly separable than did the conventional projection. Our results suggested that supervised dimensionality reduction can be a useful projection, prior to supervised classification, for machine learning of high-dimensional data.
Description
Citation
Publisher
License
Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.