Attention and Depth Hallucination for RGB-D Face Recognition with Deep Learning
Loading...
Authors
Uppal, Hardik
Date
Type
thesis
Language
eng
Keyword
Computer Vision , Deep learning , Biometrics , Depth estimation
Alternative Title
Abstract
ace recognition approaches that are based purely on RGB images rely solely on intensity information, and therefore are more sensitive to facial variations, notably pose, occlusions, and environmental changes such as illumination and background. These approaches also tend to process the whole image uniformly, weighing distinctive and non-distinctive regions of the image equally. In order to extract more representative facial features, we first propose two fusion techniques based on RGB and depth modalities using attention mechanisms. The first fusion technique uses an LSTM network to selectively focus on feature maps, followed by a convolution layer to generate spatial attention weights. Our method achieves competitive results on CurtinFaces and IIIT-D RGB-D datasets, achieving classification accuracies of over 98.2% and 99.3% respectively. Our second proposed fusion method is a novel attention mechanism that directs the deep network ``where to look'' for visual features in the RGB image by generating an attention map from depth features extracted using a CNN. Our proposed solution achieves notable improvements over the current state-of-the-art on four public datasets, namely Lock3DFace, CurtinFaces, IIIT-D RGB-D, and KaspAROV, with average (increased) accuracies of 87.3% (+5.0%), 99.1% (+0.9%), 99.7% (+0.6%) and 95.3%(+0.5%) for the four datasets respectively, thereby improving the state-of-the-art. Although depth data can provide useful information for face recognition, acquiring depth data in the wild still remains a challenge. To address this problem, we present the Teacher-Student Generative Adversarial Network (TS-GAN) to generate depth images from a single RGB image in order to boost the recognition accuracy of face recognition systems, where depth images are not available. The teacher learns a latent mapping between input RGB and paired depth images in a supervised fashion which the student then generalizes from new RGB data with no available paired depth information. The fully trained shared generator can then be used in runtime to hallucinate depth from RGB for downstream applications such as face recognition. We demonstrate that our hallucinated depth along with the input RGB images boost performance across various architectures when compared to a single RGB modality by average values of +1.2%, +2.6%, and +2.6% for IIIT-D, EURECOM, and LFW datasets respectively.
Description
Citation
Publisher
License
Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
Attribution-NonCommercial 3.0 United States
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
Attribution-NonCommercial 3.0 United States