School of Computing Graduate Theses
Permanent URI for this collection
Browse
Recent Submissions
Item Towards Interpretable Feature Maps for Visualizing Prostate Cancer in Ultrasound DataComputing; Mousavi, ParvinProstate cancer is the second most common cancer worldwide, but has a high long-term survival rate if detected early. The clinical standard for definitive diagnosis is through histopathological analysis of prostate tissue, obtained during trans-rectal ultrasound-guided core biopsy. Conventional ultrasound (US) has a low sensitivity, so prostate biopsies are extracted systematically from pre-defined anatomical locations. As a result, this procedure has a high false negative rate. A targeted biopsy method, where US would be used for detection in addition to navigation, would be beneficial as this modality is safe, inexpensive, and accessible to clinicians. Micro-US has recently been proposed to improve US-based detection of prostate cancer. It has a higher frequency than conventional US and can visualize histological architectures associated with prostate cancer at a much higher resolution. Deep learning methods have also been proposed for prostate cancer detection with promising results. However, the underlying decisions these models make to arrive at a prediction is unclear and the interpretability of these models is required to foster clinical trust. To contribute to the overarching goal of creating an accurate US-based prostate cancer detection method, we propose using deep learning to create interpretable feature maps for visualizing prostate cancer in micro-US data. To do this, we extracted high-dimensional features from a deep learning model trained on multi-center data to classify cancer in micro-US images. We then conducted experiments using different dimensionality reduction techniques to visualize the classifier latent space in a way that is interpretable to humans. The resulting feature maps were compared and observed to analyze patterns occurring within the latent space. After comparing the feature maps resulting from the different dimensionality reduction techniques, it was determined that the low-dimensional embeddings generated by UMAP could successfully distinguish between benign and cancer micro-US features. Most significantly, the UMAP embedding identified distinct clusters associated with Gleason grades, a measure of prostate cancer severity, despite the original classifier being trained binary labels. These maps suggest that the model may be learning histopathologically informative representations in the latent space, similar to those observed by pathologists when diagnosing prostate cancer.Item Self-supervised learning and uncertainty estimation for surgical margin detection with mass spectrometrySyeda, Ayesha; Computing; Mousavi, Parvin; Fichtinger, GaborBreast cancer represents 25% of all new cancer cases and is the second leading cause of death from cancer in Canadian women. The preferred treatment for breast cancer patients is breast conserving surgery, which aims to to minimize the benign tissue removed, while removing all the tumor. The iKnife, which uses rapid evaporative ionization mass spectrometry (REIMS) to provide real-time feedback on tissue type during surgery, has shown promise in reducing the likelihood of incomplete resection. However, the heterogeneity of cancer tissue, small dataset size and coarse labels for the REIMS data present challenges for machine learning models. This thesis aims to develop robust, uncertainty-aware and generalizable machine learning cancer classification models for the iKnife. To address the challenges of heterogeneity and coarse labels, the thesis explores uncertainty estimation and self-supervised learning. We apply uncertainty estimation to REIMS data and analyze the uncertainty calibration of the models as well as their computational cost. We also pre-train self supervised deep networks on Basal Cell Carcinoma data and fine-tune the network on breast data, combining self supervised learning with uncertainty estimation.Item Swin PoseFormer, Efficient Skeleton Based Human Activity RecognitionQi, Haoran; Computing; Zulkernine, FarhanaHuman Activity Recognition (HAR) has undergone significant advancements lately. HAR involves identifying a person's movements by analyzing video or sensor data. Deep learning techniques, including convolutional neural networks, recurrent neural networks, and graph convolutional neural networks, have demonstrated the ability to automatically learn features and achieve state-of-the-art results. Dynamic skeletal data, represented as the 2D/3D coordinates of human joints, has been widely studied for human action recognition due to its high-level semantic information and environmental robustness. Many skeleton-based action recognition methods adopt graph convolutional networks (GCN) to extract features from human skeleton data. Despite the positive results achieved by the existing work in the literature, GCN-based methods are subject to limitations in robustness, interoperability, and scalability. Meanwhile, Transformer models have shown great success in modeling long-range interactions. Existing research has shown that self-attention mechanism is promising for solving video processing tasks. In this thesis, we propose a novel network ,“Swin PoseFormer” which relies on Swin Transformer backbone to process a 3D heatmap stack instead of a graph sequence as the base representation of human skeletons. Moreover, we propose a novel human pose generation pipeline "Skeletrack" which takes RGB video as input and produces 3-dimensional skeleton data. Our Skeletrack pipeline is able to handle fast moving objects with blurring issues by filling up the missing detections with estimated bounding boxes based on the target's spatial-temporal information. We test our Swin Poseformer model on FineGym dataset and demonstrate that our model outperforms the state of the art TSM with top-1 and top-5 accuracy of 84.32% and 98.45% respectively.Item Fuzzing Self-Described StructuresAbols, Kathleen; Computing; Dean, ThomasLegacy formats are pervasive in digital spaces due to the need to read older data. Fuzzing offers a way to proactively identify errors and vulnerabilities but can be computationally expensive when undirected. A method of directing fuzzing is to generate or mutate data based on a grammar to narrow the scope of inputs. In this thesis, we present our approach for parsing and generating data for self-defining data formats that include elements of their own grammar using a mixed data-type file format. Our research focuses on maritime cyber security, specifically S-57 naval charts built on the self-defining file specification ISO/IEC 8211. We define an approach to parse ISO/IEC 8211 and leverage generic parsing tools to create a framework for mutating S-57 charts. Our framework, ParseENC, makes both low-level syntactic and high-level semantic mutations to chart files to cause erroneous behaviour in maritime navigation software. As opposed to causing crashes, our focus is on generating malformed charts that are syntactically correct, but incorrect on a semantic level that is harder for the target system to automatically detect. Our research explores mutating charts at both the syntactic and higher-level semantic levels. The results include two instances where we triggered program crashes and found a bug in OpenCPN. Another low-level change caused unexpected rendering behaviour. Of the high-level changes, we explored various ways of breaking semantic rules without preventing the charts from being loaded in. We additionally implemented fuzzing for geometric data which allowed us to add a level of randomness to our experiments while adhering to desired semantic rules and other chosen constraints.Item Coordination Practices for Software Quality Assurance Activities in Open-source Software EcosystemsLin, Jia-Huei; Computing; Hassan, Ahmed E. Jr; Adams, Bram JrOpen source software ecosystems continue to gain popularity and significant importance. A software ecosystem consists of tens of thousands of software projects with complex relations among them. Users can install these projects in any combination. Due to the complex relations among the projects and the diversity of combined installation of such projects, coordination between developers is necessary to ensure the quality of both their own projects and the entire ecosystem. For coordination for software quality assurance activities, a software ecosystem usually does not have guidelines for developers to ensure the quality of each software project. In this thesis, we leverage data from large-scale software ecosystems, i.e., Linux and WordPress, in an effort to gain a better understanding of the current coordination practices for software quality assurance. In particular, we examine four areas of coordination activities for software quality assurance in software ecosystems: upstream bug coordination, vulnerability coordination, vulnerability fixing and disclosure coordination, and release coordination of co-evolving software projects, all within an ecosystem. In this thesis, we discuss the motivation and approach to study these four areas of coordination activities and perform empirical studies on the software ecosystems. Our results suggest the need of automated tools to track upstream bug coordination to facilitate in-depth investigation. Developers across software ecosystems coordinate to develop a vulnerability fix but work parallelly afterward. Co-evolving software projects need coordination mechanisms as they interfere with each other due to shared resources.