Facial expression recognition (FER) is one of the essential tasks in both computer vision and human-computer interaction (HCI) fields. It has been widely used in applications such as autonomous driving, robotics, and e-learning enhancement by recogniz...
Facial expression recognition (FER) is one of the essential tasks in both computer vision and human-computer interaction (HCI) fields. It has been widely used in applications such as autonomous driving, robotics, and e-learning enhancement by recognizing emotion through facial expressions. Though its practicality, Convolution Neural Network (CNN) -based FER have fallen into the overfitting problem due to the few numbers of samples available in the FER dataset.
To address this issue, we propose to a few-shot learning (FSL) method for FER. FSL is a training mechanism that can predict new categories of samples with only a few data. It learns the relation between data by similarity learning and inference test data by way of learning. In this way, FSL can help to solve the overfitting problem in FER.
This thesis proposes a method using the relationNet, which learns relation similarity among datasets. Based on the relationNet, we design a channel selection module and additional spatial data construction. To effectively exploits the best from a few datasets, we make a representative feature as an averaged feature of sample features. Then this representative feature of each channel is compared with each channel information of sample features to find which sample channel feature is the most similar channel information. By comparing channel information, the channel from a selected sample is extracted as an optimal channel of the corresponding sample feature. Therefore, one reconstructed feature is composed of each sample's channel information by the designed module. Focusing on fine-grained features, we figure out that facial expressions have significant information on eyes and lip area. We generate eyes and lip image patches and set this additional data as support and query sets.
We prove that the selected optimal feature and additional spatial information can improve the generalization performance. Comparing to the existing method, the average performances on RAFDB, FER2013, SFEW, and AFEW datasets are increased by 3.5%, 3.68%, 5.58%, and 2.31% of accuracy, respectively.