Speech Emotion Recognition Incorporating
Relative Difficulty and Labeling Reliability

Guess emotional state for the utterance-level speech


Youngdo Ahn, Sangwook Han, Seonggyu Lee, and Jong Won Shin

Gwangju Institute of Science and Technology, Korea


Most speech emotion recognition studies aim to classify emotion labels on audio-visual datasets.
This experiment shows the difficulty of speech emotion recognition with datasets where humans annotated for audio-visual data in conversation.
This page is for research demonstration purposes only.

IEMOCAP examples

Each example is annotated by three evaluators.
Evaluators watched the conversational video and labeled among "Anger, Disgust, Frustration, Happy (excited), Neutral, Sad, surprise, fear, and other".
True label stands for the majority vote result of the evaluations.
Please guess the emotion label after listening to "Speech only", watching "Video", and reading "Context" respectively.
Then, compare your labels with the "True label".

Test number Speech only Your guess based on speech Video Your guess based on video Context Your guess based on context True label
1


2


3


4


MSP-IMPROV examples

Each example is annotated by over five evaluators.
Evaluators watched the conversational video randomly and labeled among "Angry, Happy, Neutral, Sad, and Other".
True label stands for the majority vote result of the evaluations.
Please guess the emotion label after listening to "Speech only", watching "Video" respectively.
Then, compare your labels with the "True label".

Test number Speech only Your guess based on speech Video Your guess based on video True label
1
2
3
4