Facial expression recognition

Facial expression recognition is an important research field in computer vision. Although detecting facial features is an easy task for a human, computers still have a hard time doing it. Factors such as interpersonal variation (gender, skin color), intrapersonal variation (pose, expression) and different recording conditions (image resolution, lighting) add to the complexity of the problem. This is particularly relevant in the context of emotion recognition, where systems should be able to automatically recognize in which emotional state humans are.

Facial Action Coding System (FACS)

On human faces, emotional expression heavily relies on the activation of individual facial muscles. A classical approach to describe facial expressions at the muscular level is the Facial Action Coding System (FACS) proposed by Ekman (1978). In this framework, movement of specific facial regions are described as Actions Units (AU), which basically describe deviations from a neutral expression. AUs are specific to facial regions (corner of the mouth or the eye, etc.). Although there are 69 AUs in the FACS theory, 28 of them are mostly useful for emotion recognition. We have focused on 12 of them: 1 (Inner Brow Raiser), 2 (Outer Brow Raiser), 4 (Brow Lowerer), 6 (Cheek Raiser), 7 (Lid Tightener), 10 (Upper Lip Raiser), 12 (Lip Corner Puller), 14 (Dimpler), 15 (Lip Corner Depressor), 17 (Chin Raiser), 23 (Lip Tightener), 24 (Lip Pressor).

There are different training sets generally available to the community containing various number of FACS-annotated images, with different numbers of annotated AUs: CCK+, MMI, UNBC-McMaster PAIN, DUSFA, BP4D, SEMAINE, etc. The 12 selected AUs correspond to the annotated AUs in BP4D, which is the most massive dataset. The main interest of these AUs is that they are mostly sufficient to predict the occurence of the 6 basic emotions using the EMFACS correspondance table:

Emotion	Action Units
Happiness	6, 12
Sadness	1, 4, 15
Surprise	1, 2, 5B, 26
Fear	1, 2, 4, 5, 7, 20, 26
Anger	4, 5, 7, 23
Disgust	9, 15, 16

FACS prediction using deep neural networks

After investigating various architectures to automatically predict AU occurence on faces, we converged towards a neural network architecture inspired from VGG-16:

It consists of 4 convolutional blocks, each composed of 2 convolutional layers (kernel size 3x3, ReLU activation function) and a max-pooling layers (2x2). A dropout layer with p=0.2 is added after the max-pooling. After 4 such convolutional blocks with increasing numbers of features (32, 64, 126 and 256), the last tensor (6x6x256) is flattened into a vector of 9216 elements and projected on a fully connected layer of 500 neurons. The output layer has 12 neurons using the sigmoid activation function, each representing one of the 12 AUs present in the combined dataset. The network has a total of 5.786.192 trainable parameters (weights and biases), what makes it a middle-sized deep network that can fit into the available GPUs at the lab. The model was trained over 120 epochs using Stochastic Gradient Descent (SGD) on minibatches of 128 samples, with a learning rate of 0.01 and a Nesterov momentum of 0.9. The network has successfully learned the training data (final loss of 0.02) and has only very slightly overfitted. F1 scores for each AU on the test set are well over 0.9.

The video below shows the performance of the network in real conditions. The detected AUs are in the top-left corner, the recognized emotion in the bottom-left one.

Associated projects

BMBF project StayCentered - Methodenbasis eines Assistenzsystems fuer Centerlotsen - MACeLot. BMBF 16SV7260 (2015-2018).

Facial expression recognition

Facial Action Coding System (FACS)

FACS prediction using deep neural networks

Associated projects

Gehirn-Schluckauf besser verstehen

Mit „SmartStart 2“ auf dem Weg zur Promotion

Digitale Prozessoptimierung vorantreiben

Vorgestellt: „Erfinderkultur in Chemnitz und Region“

Zentrum für Wissens- und Technologietransfer 12.03.2025 Ausstellung "Erfinderkultur in Chemnitz und Region"

TU Chemnitz 18.03.2025 18. Internationales Symposium für Informationswissenschaft (ISI) 2025

Human- und Sozialwissenschaften 19.03.2025 2025 Meeting of the SEM Working Group

TU Chemnitz 20.03.2025 Podiumsdiskussion "Literatur und Religion"

Sonstige 22.03.2025 Chemnitzer Linux-Tage – „The Culture of Open Source“

Maschinenbau 25.03.2025 14. SAXSIM (Saxon Simulation Meeting)

Facial expression recognition

Facial Action Coding System (FACS)

FACS prediction using deep neural networks

Associated projects

TUCaktuell

Gehirn-Schluckauf besser verstehen

Mit „SmartStart 2“ auf dem Weg zur Promotion

Digitale Prozessoptimierung vorantreiben

Vorgestellt: „Erfinderkultur in Chemnitz und Region“

Veranstaltungen & Tipps

Zentrum für Wissens- und Technologietransfer 12.03.2025 12 Mär Ausstellung "Erfinderkultur in Chemnitz und Region"

TU Chemnitz 18.03.2025 18 Mär 18. Internationales Symposium für Informationswissenschaft (ISI) 2025

Human- und Sozialwissenschaften 19.03.2025 19 Mär 2025 Meeting of the SEM Working Group

TU Chemnitz 20.03.2025 20 Mär Podiumsdiskussion "Literatur und Religion"

Sonstige 22.03.2025 22 Mär Chemnitzer Linux-Tage – „The Culture of Open Source“

Maschinenbau 25.03.2025 25 Mär 14. SAXSIM (Saxon Simulation Meeting)

Soziale Medien

Zentrum für Wissens- und Technologietransfer 12.03.2025 Ausstellung "Erfinderkultur in Chemnitz und Region"

TU Chemnitz 18.03.2025 18. Internationales Symposium für Informationswissenschaft (ISI) 2025

Human- und Sozialwissenschaften 19.03.2025 2025 Meeting of the SEM Working Group

TU Chemnitz 20.03.2025 Podiumsdiskussion "Literatur und Religion"

Sonstige 22.03.2025 Chemnitzer Linux-Tage – „The Culture of Open Source“

Maschinenbau 25.03.2025 14. SAXSIM (Saxon Simulation Meeting)