To improve automatic emotion tracking, which aims to recognize and track emotional changes over time, a team of researchers specializing in human-computer interaction decided to solve the problem by focusing on internal emotional transitions rather than simply interpreting external emotional expressions.
Drawing on psychological theories, they created an evolutionary model of mental state transition that includes a mental state transition network. They tested its performance on two multimodal emotion datasets, achieving significantly more accurate results compared to existing methods. Their research was published on April 8, 2024, in the journal Intelligent Computing.
In addition to accuracy, another advantage of the evolutionary mental state transition model for emotion tracking is its reduced computation time and smaller size. The model uses fewer parameters than other published models, making it “suitable for use on mobile devices and robots,” according to the researchers.
Emotion tracking can be used in areas such as public opinion monitoring, marketing communications, mental health monitoring, and online learning. Future extensions of this model may enable personalized emotion tracking, accounting for individual differences in emotional variation. This would use a psychologically realistic model approach designed to capture “the natural dynamics of emotions and their effects on mental states.”
The researchers’ emotion-tracking system consists of several stages:
- Multimodal pattern recognition using speech, visual, and acoustic data.
- The peculiarity of the synthesis in the transformer.
- Combining to calculate “external emotional energy” (obvious emotion).
- Determination of actual emotion using a unique network of mental state transitions.
In the evolutionary model of mental state transition, features of speech, vision, and acoustic data are first extracted and encoded, preserving their chronological order. Multi-head cross-attention blocks are then used to integrate these features at each time step, which is the most computationally demanding. Third, the maximum pooling and mean pooling techniques are used to reduce the dimensionality by converting the features into external emotional energy at each time step. The mental state network takes into account the patterns of changes in the subject’s emotions over time along with the external emotional energy to determine the actual emotional state at any given moment.
The network was developed using probabilities derived from data previously collected from 200 participants on the relationships between different pairs of emotions. It predicts emotional states by assessing multiple simultaneous emotions rather than assuming that the subject is experiencing only one.
The performance of an evolutionary model of mental state transition was compared with several baseline methods using classification tasks on two large datasets: the CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSI) dataset and the Ren-CECps Chinese Emotion Corpus. The CMU-MOSI dataset, which includes recorded monologues in English, identifies emotions such as happiness, sadness, anger, disgust, surprise, and fear. The Chinese corpus contains blog texts and was used to test the mental state transition network component.