Speech Emotion Recognition Incorporating
Relative Difficulty and Labeling Reliability

Guess emotional state for the utterance-level speech

Youngdo Ahn, Sangwook Han, Seonggyu Lee, and Jong Won Shin

Gwangju Institute of Science and Technology, Korea

Most speech emotion recognition studies aim to classify emotion labels on audio-visual datasets.
This experiment shows the difficulty of speech emotion recognition with datasets where humans annotated for audio-visual data in conversation.
This page is for research demonstration purposes only.

IEMOCAP examples

Each example is annotated by three evaluators.
Evaluators watched the conversational video and labeled among "Anger, Disgust, Frustration, Happy (excited), Neutral, Sad, surprise, fear, and other".
True label stands for the majority vote result of the evaluations.
Please guess the emotion label after listening to "Speech only", watching "Video", and reading "Context" respectively.
Then, compare your labels with the "True label".

Test number	Context	True label
1	... [M] ...you can't get a job and refuse to have a resume. [F] Who do you think you are? [M] I don't think I'm anyone. [F] You are so high and mighty. I just--	anger (A/D/F/H/N/S) (3/0/0/0/0/0)
2	... [M] That's a good idea. [F] Let's go tomorrow. [M] Tomorrow, tomorrow, tomorrow- [F] yeah.	happy (excited) (A/D/F/H/N/S) (0/0/0/3/0/0)
3	... [F] This? What is this? This isn't even anything. [M] Yes, it is. [F] Sure, this is standing on the beach. This is waiting. This is fighting. [M] Right.	frustration (A/D/F/H/N/S) (1/0/2/0/0/0)
4	... [F] I'm so excited for you. That's great. I'm so- This isn't even anything. [M] Yeah, yeah. She was actually -- she was going to tell you but I told her that I wanted to tell you and she had to work today anyway, so I decided to- [F] Well I'm glad you told me. [M] Yeah, yeah.	happy (excited) (A/D/F/H/N/S) (0/0/0/2/1/0)

MSP-IMPROV examples

Each example is annotated by over five evaluators.
Evaluators watched the conversational video randomly and labeled among "Angry, Happy, Neutral, Sad, and Other".
True label stands for the majority vote result of the evaluations.
Please guess the emotion label after listening to "Speech only", watching "Video" respectively.
Then, compare your labels with the "True label".

Test number	Speech only	Your guess based on speech	Video	Your guess based on video	True label
1					sad (A/H/N/S/O) (0/0/1/4/0)
2					happy (A/H/N/S/O) (0/6/1/0/0)
3					happy (A/H/N/S/O) (0/4/1/1/0)
4					angry (A/H/N/S/O) (3/0/2/0/0)

Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability

Guess emotional state for the utterance-level speech

IEMOCAP examples

MSP-IMPROV examples

Speech Emotion Recognition Incorporating
Relative Difficulty and Labeling Reliability