Attentive Cross-Modal Connections for Learning Multimodal Representations from Wearable Signals for Affect Recognition
Loading...
Authors
Bhatti, Anubhav
Date
Type
thesis
Language
eng
Keyword
Multimodal , Representation Learning , Fusion , Affect Recognition , Wearable Signals , Emotion Recognition , Cognitive Load Assessment
Alternative Title
Abstract
We propose cross-modal attentive connections, a new dynamic and effective technique for multimodal representation learning from wearable data. Our solution can be integrated into any stage of the pipeline, i.e., after any convolutional layer or block, to create intermediate connections between individual streams responsible for processing each modality. Additionally, our method benefits from two properties. First, it can share information uni-directionally (from one modality to the other) or bi-directionally. Second, it can be integrated into multiple stages at the same time to further allow network gradients to be exchanged in several touch-points. We perform extensive experiments on three public multimodal wearable datasets, WESAD, SWELL-KW, and CASE, and demonstrate that our method can effectively regulate and share information between different modalities to learn better representations. Our experiments further demonstrate that once integrated into simple CNN-based multimodal solutions (2, 3, or 4 modalities), our method can result in superior or competitive performance to state-of-the-art and outperform a variety of baseline uni-modal and classical multimodal methods. Further, we explore the notion of 'cognitive load' classification to explore multimodal representation learning and our proposed solution in the field of affective computing but beyond `emotion recognition.’ To this end, given the lack of widely adopted datasets in this area, we introduce a new dataset called Cognitive Load Assessment in REaltime (CLARE), with which we evaluate our proposed method. In this dataset, we collect a number of wearable modalities from 24 participants. We use MATB-II to induce different levels of cognitive load in participants by changing the complexity of tasks during the experiment. Contrary to other datasets in this domain, we record the subjective cognitive load values in real-time at 10-second intervals during the experiment. We then show that our proposed solution results in effective multimodal representation learning, outperforming baseline uni-modal and classical multimodal methods (feature-level fusion and score-level fusion) in classifying cognitive load.
Description
Citation
Publisher
License
Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.