Unsupervised Multi-Modal Representation Learning using Convolutional Autoencoders for Affective Computing with Wearable Data
With recent developments in smart technologies, there has been a growing focus on the use of artificial intelligence and machine learning for affective computing to further enhance the user experience through emotion recognition. Typically, machine learning models used for affective computing are trained using manually extracted features from biological signals. Such features may not generalize well for large datasets and may be sub-optimal in capturing the information available in the raw input data. One approach to address this issue is to use fully supervised deep learning methods to learn latent representations of the biosignals. However, this method requires human supervision to label the data, which may be unavailable or difficult to obtain. In this work we propose an unsupervised solution for representation learning to reduce the reliance on human supervision. The proposed framework utilizes two stacked convolutional autoencoders to learn latent representations from wearable electrocardiogram (ECG) and electrodermal activity (EDA) signals. These representations are then utilized by a random forest model to classify arousal into high and low classes. This approach reduces human supervision and enables the aggregation of datasets allowing for higher generalizability. To validate this framework, a multi-corpus dataset comprised of 4 separate datasets, namely AMIGOS, ASCERTAIN, CLEAS, and MAHNOB-HCI, is created. The results of our proposed method are compared with a number of other methods including convolutional neural networks, as well as methods that employ manual extraction of hand-crafted features. The methodology used for fusing the two data modalities (ECG and EDA) is also investigated. Lastly, we show that our method outperforms other works that have performed multi-modal arousal detection on the same datasets, achieving new state-of-the-art results for all the datasets used. The results show the wide-spread applicability for stacked convolutional autoencoders to be used with machine learning for affective computing.