School of Computing Graduate Theses
Permanent URI for this collection
Browse
Recent Submissions
Item Proximity Effects in Multiplexed Immunohistochemistry Images of Breast Cancer(2024-09-05) Yach, Evelyn; Computing; Ellis, RandyThe tumor-immune microenvironment is an intricate ecosystem of cells with complex interactions that are poorly understood. Mass-cytometry imaging is an immunohistochemistry technique that captures high-resolution images of protein concentrations at a sub-cellular level. Despite being a rich data source, mass-cytometry imaging has been not been deeply researched - due in part to computational and interpretability hurdles. In this work, our goal was to identify the role that physical proximity plays in the interactions between immune and tumor cells by means of protein expression. Using a publicly available dataset acquired using mass-cytometry imaging of tumor samples collected from patients with an aggressive form of breast cancer. We used dimensionality reduction and linear classification to distinguish between binary groups of cells labelled by their proximity to each other. On a patient-specific basis, we found significant differences in protein expression between immune and tumor cells as well as differences between contact and non-contact cells within the immune and tumor groups. The population-based differences were lesser, consistent with cancer being a heterogeneous disease. In both cases, we achieved high data explainability with a minimal set of protein information. Our findings suggested that there are differences in protein expression for these cell types, which could contribute to the characterization of interactions between cancer and immune cells within our cohort.Item DepthPulse: A Passive Liveness Detection Framework for Face Presentation Attacks(2024-09-05) Sadman, Nafiz; Computing; Alaca, Furkan; Zulkernine, FarhanaFace Presentation Attacks (FPA) are a growing concern for face authentication systems. In FPA, attackers use face representations of the authorized user and present them to the camera for authentication. FPA can be devised using different mediums such as printed photos (photo attacks), images or videos on a device (video attacks), or wearing a face mask (mask attacks). The mediums to implement these attacks are called Face Presentation Attack Instruments (F-PAI). There are numerous Face Presentation Attack Detection (F-PAD) methods, each individually designed to defend against most or all types of F-PAI. In this thesis, we first review a few existing F-PAD methods and perform a qualitative evaluation based on the published literature to create a taxonomic mapping of F-PAD to F-PAI. Then, we propose DepthPulse, an ensemble framework, that combines two F-PAD methods; depth estimation and remote photoplethysmography (rPPG) signal processing. Our contributions are three-fold: i) we identify preprocessing techniques that enhance depth-based liveness detection method; ii) we apply Discrete Fourier transformation methods to rPPG-based liveness detection, and iii) with DepthPulse, we reduced the ACER by 5% for Protocol 1, 14% for Protocol 2, 8% for Protocol 3, and 0.7% for Protocol 4 of the OULU-NPU dataset compared to the results from the state-of-the-art F-PAD method.Item Classification and Analysis of Osteoarthritis Using Unstructured Chart Notes and Structured Data from EMRs(2024-08-29) Cai, Jiahao; Computing; Zulkernine, FarhanaOsteoarthritis (OA) is a very common musculoskeletal condition defined by the progressive degradation of joint cartilage and the underlying bone, resulting in the manifestation of pain and functional limitations. The timely intervention and effective management of OA rely heavily on the critical aspect of efficient and accurate diagnosis. Electronic Medical Records (EMR) in primary care settings contain patients' structured historical data including unstructured text data in the encounter chart notes. The unstructured notes are often very long, compiled from multiple patient-physician encounters, and contain medical jargon including personal data. The data offers a variety of computational challenges but contains valuable information for disease diagnosis, especially for detecting hip or knee OA. We applied different methodologies including information extraction, feature selection, word embeddings, and developed analytical pipelines including statistical machine learning (ML) algorithms, and deep learning (DL) algorithms to detect the OA-affected bone joints (Knee and hip OA, knee OA, hip OA, and other OA). Our methodology incorporated a range of text encodings, including TF-IDF, Word2Vec, GloVe, FastText, and Bidirectional Encoder Representations from Transformers (BERT), in order to represent textual data as vectors. we presented a variety of statistical ML and DL models including Extreme Gradient Boosting (XGBoost), Random Forest (RF), Support Vector Machines (SVM), Multilayer Perceptrons (MLP), Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM) networks, Bidirectional LSTMs (Bi-LSTM), CNN-Bi-LSTM, and BERT. We trained these models with encoded text data and evaluated whether the integration of paragraph extraction, duplicate removal, and negation tagging could improve the performance of the classification of OA. We first trained and evaluated a Bi-LSTM model with FastText on the gold labelled data, it achieved the highest F1 Score and accuracy of 87.13% and 87.97% among all models, respectively. Our trained Bi-LSTM model predicted OA from unlabelled data to produce pseudo labels and combined the pseudo labelled and gold labelled data to train a new Bi-LSTM model. Our proposed method achieved 88.23% F1 Score and 88.84% accuracy by using the Bi-LSTM model.Item On the Biases, Privacy Implications and Guardrail Viability of Large Language Models(2024-08-29) Dada, Muhammed Yusuf; Computing; Stinson, CatherineThe advent of Generative AI, particularly marked by the release of OpenAI’s ChatGPT in November 2022, has ushered in a new era of AI development, with Large Language Models (LLMs) becoming increasingly prominent in various domains, including Natural Language Processing, Understanding, and Generation. While LLMs have been rapidly adopted by the general public, industry, and academia for tasks such as question-answering, text generation, and machine translation, they also pose significant challenges, including privacy leakage and biased outputs. This thesis addresses these challenges by conducting a series of novel studies aimed at evaluating and mitigating undesirable behaviors in LLMs. The first study in this thesis uniquely assesses nationality bias in several recent open-source LLMs, a relatively unexplored area compared to the well-studied racial biases. This investigation underscores the importance of addressing biases related to country or nationality, which is a common data attribute in many applications and workflows. The second study then presents an empirical evaluation and comparison of multiple jail-breaking prompts across various LLMs, focusing on inducing privacy leakage and bypassing guardrails. This research highlights the vulnerabilities of these models and the effectiveness of different prompting techniques in aligning the model to generate sensitive information. Lastly, the third study explores name-based biases within the context of crime association, providing new insights into how LLMs associate certain racial names with criminal activities, thereby exposing potential risks in decision-making applications involving LLMs. Through these contributions, this thesis offers an examination and evaluation of biases, privacy leaking tendencies, and guardrail effectiveness in LLMs, situating these findings within the broader context of AI ethics research. The results emphasize the need for continued efforts to improve the safety and fairness of AI systems as they become increasingly integrated into everyday applications.Item Machine Learning for Natural-Product Discovery(2024-08-29) Reed, Georgia Helena Brown; Computing; Ellis, Randy EAntimicrobial drug-resistant pathogens pose a serious global threat, with millions projected to die each year by 2050. Natural products, namely secondary metabolites, play a crucial role in developing antimicrobial drugs. There is an urgent need for new antimicrobial drugs, which can be discovered by cultivating fungi under various conditions to stimulate the production of such secondary metabolites. This study examined mass-spectrometry data of Penicillium fungi grown under 13 different sub-conditions, using Principal Component Analysis (PCA) and sparse SPCA to find sub-condition differences. Sparse PCA was used to select mass-to-charge (m/z) bins that corresponded to potentially significant molecules, including secondary metabolites. Both PCA and sparse PCA effectively separated the growth sub-conditions, with sparse PCA providing some insights into m/z bins that differentiated growth sub-conditions. This study demonstrated that growth sub-conditions such as the type of light may have induced the fungi to produce interesting secondary metabolites, and that changes in nitrogen concentration significantly affected the m/z bin selections. Sparse PCA revealed notable trends in the selected m/z bins. Future research will focus on investigating other uses of machine learning with mass spectrometry for potential biological applications.