Speech Emotion Recognition Incorporating
Relative Difficulty and Labeling Reliability
Guess emotional state for the utterance-level speech
Youngdo Ahn, Sangwook Han, Seonggyu Lee, and Jong Won Shin
Gwangju Institute of Science and Technology, Korea
Most speech emotion recognition studies aim to classify emotion labels on audio-visual datasets.
This experiment shows the difficulty of speech emotion recognition with datasets where humans annotated for audio-visual data in conversation.
This page is for research demonstration purposes only.
IEMOCAP examples
Each example is annotated by three evaluators.
Evaluators watched the conversational video and labeled among "Anger, Disgust, Frustration, Happy (excited), Neutral, Sad, surprise, fear, and other".
True label stands for the majority vote result of the evaluations.
Please guess the emotion label after listening to "Speech only", watching "Video", and reading "Context" respectively.
Then, compare your labels with the "True label".
Test number | Speech only | Your guess based on speech | Video | Your guess based on video | Context | Your guess based on context | True label |
---|---|---|---|---|---|---|---|
1 |
|
|
... |
|
anger |
||
2 |
|
|
... |
|
happy (excited) |
||
3 |
|
|
... |
|
frustration |
||
4 |
|
|
... |
|
happy (excited) |
MSP-IMPROV examples
Each example is annotated by over five evaluators.
Evaluators watched the conversational video randomly and labeled among "Angry, Happy, Neutral, Sad, and Other".
True label stands for the majority vote result of the evaluations.
Please guess the emotion label after listening to "Speech only", watching "Video" respectively.
Then, compare your labels with the "True label".
Test number | Speech only | Your guess based on speech | Video | Your guess based on video | True label |
---|---|---|---|---|---|
1 |
sad |
||||
2 |
happy |
||||
3 |
happy |
||||
4 |
angry |