Recent advances in natural language processing have opened new possibilities for machines to understand and respond to human affect. As artificial intelligence systems become increasingly integrated into daily human activities, the ability to recogniz...
Recent advances in natural language processing have opened new possibilities for machines to understand and respond to human affect. As artificial intelligence systems become increasingly integrated into daily human activities, the ability to recognize, interpret, and appropriately react to users' affective states has emerged as a critical capability for creating natural and effective human-computer interactions. Affective computing addresses this need by developing computational methods that bridge the gap between human emotional expression and machine understanding. This field encompasses two complementary directions: affective understanding and affective generation. Affective understanding focuses on recognizing and interpreting human emotions from text and other modalities, while affective generation involves producing emotionally appropriate responses.
In this dissertation, we propose computational approaches to modeling human affect with language models, encompassing both affective understanding and affective generation. By investigating sentiment analysis and empathetic dialogue generation, we address key challenges at each stage of affective computing.
In the first study, we address the challenge of textual affective understanding in data-scarce settings. We propose a sentiment lexicon-integrated meta-training framework that optimizes pre-trained language models for few-shot sentiment analysis. This study demonstrates that language models can effectively learn to interpret textual sentiment without heavy reliance on large-scale datasets.
The second study advances from textual affective understanding to multimodal affective understanding, specifically addressing the issue of visual noise and modality misalignment in multimodal sentiment analysis. We propose a confidence-guided dual-adapter fusion framework that dynamically integrates the predictions from different views based on model confidence. This study emphasizes the importance of selective perception in multimodal affective understanding.
The third study expands the scope from affective understanding to affective generation, addressing the challenge of producing empathetic responses that go beyond surface-level empathy. We propose a simulation framework based on Motivational Interviewing (MI) theory, resulting in the construction of a synthetic dataset of MI dialogues. By grounding dialogue generation in established counseling principles, this study explores how language models can generate structured and procedurally coherent empathetic interactions.
Taken together, these three studies constitute a comprehensive investigation into how language models can be designed to model human affect more reliably and meaningfully across diverse settings. This dissertation presents novel frameworks that bridge recent advances in language modeling with the requirements of real-world affective applications, contributing to the development of more capable and practical affective computing systems.