Toward Self-Supervised and Privacy-Preserving Remote Heart Rate Estimation from Facial Videos

Thumbnail Image
Gupta, Divij
Deep Learning , Remote Heart Rate , Self-supervised Learning , Privacy-preserving , Photoplethysmogram , RPPG , Contrastive Learning , Smart Environments
Remote heart rate (HR) estimation has become increasingly feasible through advances in deep learning in recent years. A popular approach for this purpose is remote photoplethysmography (rPPG) which aims to measure the volumetric changes in blood flow using computer vision techniques, which in turn can be used for remote HR estimation. While there are several challenges faced by modern deep learning solutions for rPPG estimation, in this thesis, we focus on addressing two major problems. First, is the reliance on large amounts of labeled data for effective training. Second, is the privacy concerns when performing remote HR estimation, which is caused due to the use of videos of face in this process. To reduce the reliance of video representation learning on labeled data as well as for improved performance, we introduce a solution based on self-supervised contrastive learning for remote HR monitoring, which makes use of various augmentations of the original input videos to learn robust spatiotemporal video representations. We propose the use of 3 spatial and 3 temporal augmentations for training an encoder through our contrastive framework, followed by fine-tuning of the encoder for rPPG and HR estimation. Our experiments on two publicly available datasets, COHFACE and PURE showcase the improvement of our proposed approach over several related works as well as supervised learning baselines, as our results approach the state-of-the-art. We also perform thorough experiments to showcase the effects of using different design choices such as the video representation learning method, the augmentations used in the pre-training stage, and others. We also demonstrate the robustness of our proposed method over the supervised learning approaches on reduced amounts of labeled data. To address the second problem (privacy), we propose a data perturbation method that involves extraction of certain areas of the face with less identity-related information, followed by pixel shuffling, and blurring. Our experiments on two rPPG datasets (PURE and UBFC) show that our approach reduces the accuracy of facial recognition algorithms by over 60%, with minimal impact on rPPG extraction. We also test our method on three facial recognition datasets (LFW, CALFW, and AgeDB), where our approach reduced performance by nearly 50%. Our findings demonstrate the potential of our approach as an effective privacy-preserving solution for rPPG estimation.
External DOI