Automatic Detection of Metastatic Diseases from Radiology Reports Using Pre-trained Large Language Models
Loading...
Authors
Ashofteh Barabadi, Maede
Date
2024-10-02
Type
thesis
Language
eng
Keyword
Natural Language Processing , Large Language Models , Generative Data Augmentation
Alternative Title
Abstract
Artificial Intelligence (AI) has been instrumental in automating processes across various domains, resulting in increased productivity, less human error, and reduced labour costs. In healthcare, specifically, AI-driven automation holds particular promise, helping with staff shortages and improving patient outcomes. To bring these benefits to practice, AI systems should be designed in an effective and reliable manner and tailored to the specific challenges of healthcare applications. This dissertation focuses on automating the identification of metastatic disease in cancer patients by involving Natural Language Processing (NLP) advancements and carefully identifying and addressing the practical challenges and limitations in clinical setups.
Our proposed solution leverages a pre-trained Language Model (LM), fine-tuned on radiology reports annotated by human experts for the presence of metastatic disease in specific organs. However, the effectiveness of this approach is initially constrained by the limited availability of labelled data. We address the data scarcity challenge in a few ways. First, a large unlabelled dataset is automatically annotated to expand the training corpus. Despite the inherent noise introduced by automated labelling, our experimental results demonstrate the substantial benefits of this expanded dataset. Second, Parameter-Efficient Fine-Tuning (PEFT) techniques are applied, which enhance the LM’s performance in low-data scenarios compared to the traditional fine-tuning approach while also being more computationally efficient. Finally, synthetic data generation is utilized for data augmentation, where an instruction-tuned Large Language Model (LLM) is prompted to generate high-quality clinical text similar to the existing samples without any task or domain-specific training.
Additionally, we explore the crucial role of patient history in accurately detecting metastatic disease, as radiology reports often emphasize changes relative to previous findings rather than listing all observations explicitly. The model architecture is modified to incorporate historical radiology reports, enabling a more context-aware prediction process. Experimental findings underscore the importance of integrating historical information, demonstrating its positive impact on annotation accuracy. Overall, this research presents a cost-effective, high-performance solution for identifying metastatic sites in cancer patients through the analysis of their radiology reports, which enables large-scale, spatiotemporal analyses of cancer progression. Our methods have the potential to extend to other clinical tasks with similar settings.
Description
Citation
Publisher
License
Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
