Efficient Few-shot Learning based on Channel Selective Spatial Relation Network for Facial Expression Recognition = 얼굴감정 인식을 위한 효율적 Few-shot 학습기법 기반 선택적 채널 공간관계 네트워크|RISS 상세보기

다국어 초록 (Multilingual Abstract)

Facial expression recognition (FER) is one of the essential tasks in both computer vision and human-computer interaction (HCI) fields. It has been widely used in applications such as autonomous driving, robotics, and e-learning enhancement by recognizing emotion through facial expressions. Though its practicality, Convolution Neural Network (CNN) -based FER have fallen into the overfitting problem due to the few numbers of samples available in the FER dataset.

To address this issue, we propose to a few-shot learning (FSL) method for FER. FSL is a training mechanism that can predict new categories of samples with only a few data. It learns the relation between data by similarity learning and inference test data by way of learning. In this way, FSL can help to solve the overfitting problem in FER.

This thesis proposes a method using the relationNet, which learns relation similarity among datasets. Based on the relationNet, we design a channel selection module and additional spatial data construction. To effectively exploits the best from a few datasets, we make a representative feature as an averaged feature of sample features. Then this representative feature of each channel is compared with each channel information of sample features to find which sample channel feature is the most similar channel information. By comparing channel information, the channel from a selected sample is extracted as an optimal channel of the corresponding sample feature. Therefore, one reconstructed feature is composed of each sample's channel information by the designed module. Focusing on fine-grained features, we figure out that facial expressions have significant information on eyes and lip area. We generate eyes and lip image patches and set this additional data as support and query sets.

We prove that the selected optimal feature and additional spatial information can improve the generalization performance. Comparing to the existing method, the average performances on RAFDB, FER2013, SFEW, and AFEW datasets are increased by 3.5%, 3.68%, 5.58%, and 2.31% of accuracy, respectively.

번역하기

국문 초록 (Abstract)

얼굴 감정인식 (FER) 은 컴퓨터 비전 및 인간-컴퓨터 상호작용 (HCI) 분야에서 중요한 작업 중 하나이다. FER은 얼굴 표정을 통해 감정을 인식하면서 자율 주행, 로봇공학 및 e-러닝 향상과 같은 응용프로그램들에서 널리 사용되고 있다. 하지만 그 실용성에도 불구하고, 기존의 컨볼루션 신경망 (CNN) 기반 FER은 FER 데이터셋에서 제한된 수의 샘플로 인해 과적합 문제에 직면하고 있다.

이 문제를 해결하기 위해 우리는 FER에 대한 퓨샷러닝 (FSL) 방법을 제안했다. FSL은 단 몇 개의 데이터만으로도 새로운 범주의 샘플을 예측할 수 있는 학습 메커니즘이다. FSL은 유사도 학습을 통해 데이터 간의 관계를 학습하고 테스트 데이터 또한 학습을 하는 방식과 같이 추론하며 작동한다. 이렇게 함으로써, FSL은 FER의 과적합 문제 해결에 도움을 줄 수 있다.

본 연구에서는 데이터셋 간의 관계 유사성을 학습하는 relationNet을 사용하는 방법을 제안한다. RelationNet을 기반으로 채널 선택 모듈과 추가적인 공간 데이터 구성을 설계했다. 몇 개의 데이터셋으로부터 최적의 정보를 효과적으로 활용하기 위해 샘플 피쳐들의 평균 피쳐를 대표 피쳐로 선정하였다. 다음으로는 각 채널의 대표 피쳐를 샘플 피쳐의 각 채널 정보와 비교하여 어떤 샘플의 채널 피쳐가 가장 유사한 채널 정보를 갖는지 찾게된다. 채널 정보를 비교함으로써 선택된 샘플에서의 채널이 해당 샘플 피쳐의 최적의 채널로 간주되어 추출된다. 따라서 설계된 모듈에 의해 하나의 재구성된 피쳐는 각 샘플의 채널 정보들로 구성되었다. 또한, 세밀하다는 특징에 중점을 두어, 얼굴 표정이 눈과 입술 영역에서 중요한 정보를 가지고 있다는 것을 알아내었다. 따라서 눈과 입술 이미지 패치를 생성하고 이 추가 데이터를 서포트와 쿼리 셋으로 설정했다.

선택된 최적의 피쳐와 추가적인 공간 정보가 일반화 성능을 향상시킬 수 있다는 것을 입증했다. 기존 방법과 비교하였을 때, RAFDB, FER2013, SFEW 및 AFEW 데이터셋에서의 평균 성능은 각각 3.5%, 3.68%, 5.58%, 2.31%의 정확도가 향상된다.

번역하기

얼굴 감정인식 (FER) 은 컴퓨터 비전 및 인간-컴퓨터 상호작용 (HCI) 분야에서 중요한 작업 중 하나이다. FER은 얼굴 표정을 통해 감정을 인식하면서 자율 주행, 로봇공학 및 e-러닝 향상과 같은 ...

목차 (Table of Contents)

I Introduction = 1
1.1 Motivation = 1
1.2 Related Works = 5
1.2.1 Relation Network = 6
1.3 Contributions = 7

I Introduction = 1
1.1 Motivation = 1
1.2 Related Works = 5
1.2.1 Relation Network = 6
1.3 Contributions = 7
1.4 Outline = 8
II Preliminaries = 10
2.1 Overview of FER = 10
2.1.1 FER issues related to methods and datasets = 13
2.2 Overview of FER with FSL = 14
2.2.1 Generalization on novel data = 17
2.2.2 Domain adaptation = 18
III Proposed Method = 20
3.1 Data Preprocessing = 20
3.2 Network architecture = 23
3.2.1 Feature embedding = 25
3.2.2 Channel Selection = 27
3.2.3 Emotion similarity learning = 31
3.3 Overall training process = 32
IV Experimental Results and Discussion = 35
4.1 Dataset = 35
4.1.1 Dataset division = 35
4.1.2 Dataset Construction = 37
4.2 Training Details = 39
4.3 Performance Analysis = 40
4.4 Ablation study = 46
4.4.1 Ablation study comparing the channel information of sample feature and average feature = 46
4.4.2 Ablation study on the impact of feature size in CS module = 48
4.4.3 Ablation study on fine-tuning weights on individual modules = 49
V Conclusion = 52
References = 54

상세검색

RISS 보유자료

상세검색

해외전자자료

Efficient Few-shot Learning based on Channel Selective Spatial Relation Network for Facial Expression Recognition = 얼굴감정 인식을 위한 효율적 Few-shot 학습기법 기반 선택적 채널 공간관계 네트워크

부가정보

분석정보

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료