Evaluating CNN, RNN, and Vision Transformer for Emotion Recognition: Strengths and Weaknesses

Yushchenko, Artur; Smelyakov, Kirill; Chupryna, Anastasiya

Date

2025

Author

Yushchenko, Artur

Smelyakov, Kirill

Chupryna, Anastasiya

Metadata

Show full item record

Abstract

This paper explores three prominent deep learning architectures — Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and Vision Transformers (ViT) — for emotion recognition, examining their potential strengths and weaknesses under various conditions. It discusses how each approach may capture critical spatial, temporal, or global features in emotional data, highlighting differences in feature extraction, representational capacity, and scalability. Additionally, new solutions are proposed to enhance accuracy and adaptability, integrating design principles that address recognized challenges in real-world implementations. Novel insights are offered on aligning model selection with specific application demands, such as the nature of input signals, available computational resources, and desired real-time performance. While the comparative analysis remains broad to accommodate diverse use cases, it underscores the importance of carefully balancing accuracy and efficiency. Conclusions drawn from the investigation include recommendations on when each architecture may be most advantageous, providing a flexible framework for researchers and practitioners to navigate the trade-offs. These findings have implications for developing adaptive emotion recognition systems that leverage state-of-the-art deep learning techniques across multiple contexts.

Issue date (year)

2025

Author

Yushchenko, Artur

URI

https://etalpykla.vilniustech.lt/handle/123456789/159709

Collections

2025 International Conference "Electrical, Electronic and Information Sciences“ (eStream) [51]