EXPLORATION OF 3D CNN BASED DEEP LEARNING MODELS FOR REAL-TIME HUMAN ACTIVITY RECOGNITION
HAR , Deep Learning
Human Activity Recognition (HAR) has drawn a lot of focus with the advancements in computer hardware, deep learning AI, availability of image and video data on the web, and daily life from security surveillance and monitoring applications. However, the high computing power requirement from video activity recognition is still a challenge for real-time applications. Inflated 3D (I3D) is a two-stream based method which has state of the art performance in RGB (red, green, and blue) video based HAR. It uses two parallel data processing branches to extract useful data features: an RGB branch and an optical flow branch. In this thesis, we propose improvements in the design of real-time I3D based models in two aspects: accuracy and speed. We explore multiple strategies to improve the model performance. The use of optical flow produces high computational cost which prevents it to be used in real-time applications. To reduce the high computing cost from optical flow, we propose a simple motion information branch to replace the optical flow branch. It is called two-stream I3D Light, and its motion information branch uses 128 frames of 112x112 images as input. The low spatial resolution and long temporal range of proposed I3D RGB stream can reduce the spatial information and enhance the motion information. Experiments show that the two-stream I3D Light can increase the accuracy of original RGB-only I3D by 4.13% on the Kinetic-400 dataset. Inflated 3D ConvNet (I3D) is based on 2D Inception-v1 which is an image classifier. YOLOv5, as an efficient object detector, can keep more object position information than an image classifier. We propose YOLO-I3D which replaces the first half part of RGB-only I3D model with the first half part of YOLOv5. Experiments show that this method can increase the accuracy of RGB-only I3D by 1.42% on the Kinetics400 dataset. In order to improve two-stream I3D Light, we apply the YOLO-I3D to two-stream I3D Light. We call this new model two-stream YOLO-I3D Light. Experiments show that two-stream YOLO-I3D Light can improve the accuracy of two-stream I3D Light by 0.41%.