RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      An Enhanced Multimodal Transformer with Hyper Attention for Real-Time and Robust Facial Emotion Analysis = 실시간 및 강력한 얼굴 감정 분석을 위한 하이퍼 어텐션을 갖춘 향상된 멀티모달 트랜스포머

      한글로보기

      https://www.riss.kr/link?id=T17373997

      • 0

        상세조회
      • 0

        다운로드
      서지정보 열기
      • 내보내기
      • 내책장담기
      • 공유하기
      • 오류접수

      부가정보

      다국어 초록 (Multilingual Abstract) kakao i 다국어 번역

      Facial expression analysis is an essential component of affective computing, as it allows intelligent systems to understand human emotional reactions from visual cues. Despite the progress achieved through modern deep learning approaches, many existing solutions still suffer performance drops when exposed to occlusions, illumination changes, or subtle and ambiguous facial movements. To address these challenges, this thesis introduces FERONet, a multimodal transformer-based framework designed for reliable and real-time facial expression recognition. The architecture incorporates a hyper-attentive feature extraction strategy that jointly leverages spatial, channel, and cross-region attention to capture detailed local patterns as well as broader structural relationships within the face. Furthermore, a hierarchical transformer equipped with token-reduction stages enhances computational efficiency, while a temporal decoder with cross-attention enables the system to model the progression of expressions in video sequences.
      The proposed method combines information from multiple sources RGB images, motion cues derived from optical flow, and geometric features extracted from depth or facial landmarks resulting in improved robustness across diverse recording conditions. Extensive evaluations conducted on five widely used benchmarks (FER-2013, RAF-DB, CK+, BU-3DFE, and AFEW) demonstrate that FERONet delivers competitive state-of-the-art accuracy, reaching up to 97.3%, while maintaining real-time inference of under 16 milliseconds per frame. These findings highlight the model’s suitability for deployment in practical environments such as driver monitoring systems, healthcare-related emotion assessment, and intelligent learning technologies.
      번역하기

      Facial expression analysis is an essential component of affective computing, as it allows intelligent systems to understand human emotional reactions from visual cues. Despite the progress achieved through modern deep learning approaches, many existin...

      Facial expression analysis is an essential component of affective computing, as it allows intelligent systems to understand human emotional reactions from visual cues. Despite the progress achieved through modern deep learning approaches, many existing solutions still suffer performance drops when exposed to occlusions, illumination changes, or subtle and ambiguous facial movements. To address these challenges, this thesis introduces FERONet, a multimodal transformer-based framework designed for reliable and real-time facial expression recognition. The architecture incorporates a hyper-attentive feature extraction strategy that jointly leverages spatial, channel, and cross-region attention to capture detailed local patterns as well as broader structural relationships within the face. Furthermore, a hierarchical transformer equipped with token-reduction stages enhances computational efficiency, while a temporal decoder with cross-attention enables the system to model the progression of expressions in video sequences.
      The proposed method combines information from multiple sources RGB images, motion cues derived from optical flow, and geometric features extracted from depth or facial landmarks resulting in improved robustness across diverse recording conditions. Extensive evaluations conducted on five widely used benchmarks (FER-2013, RAF-DB, CK+, BU-3DFE, and AFEW) demonstrate that FERONet delivers competitive state-of-the-art accuracy, reaching up to 97.3%, while maintaining real-time inference of under 16 milliseconds per frame. These findings highlight the model’s suitability for deployment in practical environments such as driver monitoring systems, healthcare-related emotion assessment, and intelligent learning technologies.

      더보기

      목차 (Table of Contents)

      • I. Introduction 1
      • 1.1 Research Background 1
      • 1.2 Research Motivation 2
      • 1.3 Main Contributions 3
      • 1.4 Composition of the Thesis 4
      • I. Introduction 1
      • 1.1 Research Background 1
      • 1.2 Research Motivation 2
      • 1.3 Main Contributions 3
      • 1.4 Composition of the Thesis 4
      • II. Literature Review 5
      • 2.1 Deep Learning and CNN-Based Models 6
      • 2.2 Limitations of CNNs and the Emergence of Transformer Based FER 6
      • 2.3 Multimodal and Cross-Domain FER 7
      • III. Proposed Method and Model Architecture 10
      • 3.1 Multimodal Feature Encoder 11
      • 3.2 Triple Attention Block 15
      • 3.3 Hierarchical Transformer with Token Merging 18
      • 3.4 Temporal Decoder with Cross-Attention 23
      • IV. Implementation Results and Evaluation 28
      • 4.1 Robustness Strategies 28
      • 4.2 Experimental Setup 31
      • 4.3 Experimental Results 34
      • 4.4 Comparison with SOTA Models 36
      • 4.5 Discussions 42
      • VI. Conclusion and Future Direction 45
      • References 47
      • Acknowledgments 53
      더보기

      분석정보

      View

      상세정보조회

      0

      Usage

      원문다운로드

      0

      대출신청

      0

      복사신청

      0

      EDDS신청

      0

      동일 주제 내 활용도 TOP

      더보기

      주제

      연도별 연구동향

      연도별 활용동향

      연관논문

      연구자 네트워크맵

      공동연구자 (7)

      유사연구자 (20) 활용도상위20명

      이 자료와 함께 이용한 RISS 자료

      나만을 위한 추천자료

      해외이동버튼