A Privacy-Preserving Analytics Pipeline for De-Identified Primary Care Data
Loading...
Authors
Pepin, Ian
Date
Type
thesis
Language
eng
Keyword
De-identification , Natural Language Processing , Protected Health Information , PHI , Secure Multi-Party Computation , MPC , Secret Sharing , Private Set Intersection , PSI , Privacy
Alternative Title
Abstract
Data breaches in the healthcare industry are at an all-time high. The average breach in the healthcare industry reached US$10.10 million in 2022, which is highest among all industries for the 12th consecutive year [106]. Although the healthcare industry is one of the more highly regulated industries, initial attack vectors such as phishing, compromised credentials, or insider threats are at the root of many breaches today. Potential vulnerabilities to these attack vectors can be dangerous for individuals or organizations that share medical data with others. This research aims to address the challenges in the secured sharing and processing of clinical text data. The research objectives include evaluation and comparison of de-identification tools for clinical notes, and the assessment of Secure Multi-Party Computation (MPC) protocols and frameworks to perform computations on encrypted medical data.
The thesis makes several contributions in the area of secured analytics of sensitive data. First, we compare the features and performance of five state-of-the-art de-identification tools for free-text clinical notes, highlighting the strengths and weaknesses of each one. Next, we propose a de-identification pipeline that removes most of the manual work associated with this type of task. Finally, we build a solution that involves MPC, specifically Secret Sharing, to allow multiple parties to jointly evaluate functions on their encrypted inputs without revealing the unencrypted data to anyone. We evaluate the performance of the framework against the same framework for the analysis of unencrypted medical data. The contributions of this thesis benefit researchers and medical professionals by demonstrating the feasibility of our proposed methods in privacy-preserving secured data analytics.
Description
Citation
Publisher
License
Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
Attribution 4.0 International
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
Attribution 4.0 International