TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals

Convolution (computer science)
DOI: 10.48550/arxiv.2404.09474 Publication Date: 2024-04-15
ABSTRACT
Engagement analysis finds various applications in healthcare, education, advertisement, services. Deep Neural Networks, used for analysis, possess complex architecture and need large amounts of input data, computational power, inference time. These constraints challenge embedding systems into devices real-time use. To address these limitations, we present a novel two-stream feature fusion "Tensor-Convolution Convolution-Transformer Network" (TCCT-Net) architecture. better learn the meaningful patterns temporal-spatial domain, design "CT" stream that integrates hybrid convolutional-transformer. In parallel, to efficiently extract rich from temporal-frequency domain boost processing speed, introduce "TC" uses Continuous Wavelet Transform (CWT) represent information 2D tensor form. Evaluated on EngageNet dataset, proposed method outperforms existing baselines, utilizing only two behavioral features (head pose rotations) compared 98 baseline models. Furthermore, comparative shows TCCT-Net's offers an order-of-magnitude improvement speed state-of-the-art image-based Recurrent Network (RNN) methods. The code will be released at https://github.com/vedernikovphoto/TCCT_Net.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()