Attention and Depth Hallucination for RGB-D Face Recognition with Deep Learning

Loading...
Thumbnail Image

Authors

Uppal, Hardik

Date

Type

thesis

Language

eng

Keyword

Computer Vision , Deep learning , Biometrics , Depth estimation

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

ace recognition approaches that are based purely on RGB images rely solely on intensity information, and therefore are more sensitive to facial variations, notably pose, occlusions, and environmental changes such as illumination and background. These approaches also tend to process the whole image uniformly, weighing distinctive and non-distinctive regions of the image equally. In order to extract more representative facial features, we first propose two fusion techniques based on RGB and depth modalities using attention mechanisms. The first fusion technique uses an LSTM network to selectively focus on feature maps, followed by a convolution layer to generate spatial attention weights. Our method achieves competitive results on CurtinFaces and IIIT-D RGB-D datasets, achieving classification accuracies of over 98.2% and 99.3% respectively. Our second proposed fusion method is a novel attention mechanism that directs the deep network ``where to look'' for visual features in the RGB image by generating an attention map from depth features extracted using a CNN. Our proposed solution achieves notable improvements over the current state-of-the-art on four public datasets, namely Lock3DFace, CurtinFaces, IIIT-D RGB-D, and KaspAROV, with average (increased) accuracies of 87.3% (+5.0%), 99.1% (+0.9%), 99.7% (+0.6%) and 95.3%(+0.5%) for the four datasets respectively, thereby improving the state-of-the-art. Although depth data can provide useful information for face recognition, acquiring depth data in the wild still remains a challenge. To address this problem, we present the Teacher-Student Generative Adversarial Network (TS-GAN) to generate depth images from a single RGB image in order to boost the recognition accuracy of face recognition systems, where depth images are not available. The teacher learns a latent mapping between input RGB and paired depth images in a supervised fashion which the student then generalizes from new RGB data with no available paired depth information. The fully trained shared generator can then be used in runtime to hallucinate depth from RGB for downstream applications such as face recognition. We demonstrate that our hallucinated depth along with the input RGB images boost performance across various architectures when compared to a single RGB modality by average values of +1.2%, +2.6%, and +2.6% for IIIT-D, EURECOM, and LFW datasets respectively.

Description

Citation

Publisher

License

Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
Attribution-NonCommercial 3.0 United States

Journal

Volume

Issue

PubMed ID

External DOI

ISSN

EISSN