Evaluation of MoViNet Streaming Models for Real-Time Action Recognition in Thermal Domain
Santrauka
This paper investigates the potential of Mobile Video Networks (MoViNet) for real-time human action recognition in the thermal domain. Although MoViNet models have demonstrated strong performance on RGB-based video datasets, their effectiveness on thermal imagery, known for its robustness to low lighting, occlusions, and privacy concerns, remains underexplored. To address this gap, we evaluated three MoViNet variants (A0, A1, A2) using a custom single-person thermal video dataset consisting of three action classes. Due to the limited size of the custom dataset, we apply fine-tuning, GMM-based normalization, and channel replication to adapt thermal inputs. Data augmentation techniques, including brightness adjustments, contrast enhancement, and spatial flips, are used to improve generalization. The findings show that MoViNet A2-stream achieves the highest accuracy (88.33%), with A0 and A1 also showing competitive performance. Real-time visualizations confirm early convergence and high confidence throughout each clip. These findings demonstrate that MoViNet models can be effectively fine-tuned for thermal action recognition with minimal modifications, offering promising potential for real-time deployment in resource-constrained or low-visibility environments.
