라디오 청취자 문자 사연을 활용한 KoBERT 기반 한국어 다중 감정 분석 연구 = A Study on the Korean Multi-class Emotion Analysis Based on KoBERT in Text Messages of Radio Listeners|RISS 상세보기

국문 초록 (Abstract)

최근 딥러닝 기술 연구의 발전으로 감정 분석에 관한 다양한 연구가 진행되고 있다. 초기 자연어처리 분야에서는 인공지능이 인간의 감정 또는 감성을 단순 극성인 긍/부정으로 분류하는 연구가 다수 존재하였다. 그러나 최근에는 긍/부정으로 감정 극성을 분류하는 이진 감성 분석을 넘어서 더 복잡하고 어려운 태스크인 다중 감정 분석에 관한 연구로 발전하고 있다. 이러한 다중 감정 분석 기술은 방송 분야와 융합하여 새로운 결과 창출을 기대할 수 있다. 그러나 방송 분야에서의 감정 분석 연구는 높은 관심에도 불구하고 아직 부족한 실정이다. 특히, 방송 매체 중 라디오에서 청취자 문자 사연은 실제 인간이 가질 수 있는 다양한 감정이 담겨 있는 텍스트 데이터임에도 불구하고 관련 연구는 미흡할 뿐만 아니라 실제 사람들이 사용하는 문장에 대한 한국어 다중 감정 분석에 관한 연구는 부족하다. 이에 실제 환경에서 수집한 라디오 청취자 문자 사연을 활용하여 감정 분석을 수행하는 시스템을 제안하고, 이를 통해 한국어 다중 감정 분석에 관하여 연구를 진행하였다.
본 논문에서는 실제 환경에서 수집한 라디오 청취자 문자 사연을 활용하여 한국어 다중 감정 분석 성능을 향상하는 방안을 연구하였다. 기존의 감정 분석 연구에서 보편적으로 이용한 개방 데이터셋이 아닌 실제 라디오 방송의 청취자 문자 사연을 직접 수집하여 감정 분석을 위한 한국어 데이터셋으로 활용했다는 점에서 차별성이 있다. 실제 환경에서 수집한 라디오 청취자 문자 사연을 분석함으로써 한국어 감정 분석이 어려운 언어학적 특성에 대하여 고찰해보았다. 또한, 한국어 다중 감정 분석의 정확도를 높일 수 있는 데이터셋 구성에 관한 고찰과 분석을 위해 설문조사와 실험을 수행하였다. 실험을 진행하기에 앞서, 실험을 위한 한국어 말뭉치를 구축하기 위해 감정 레이블링의 보편적인 기준을 정의하기 위하여 설문조사를 진행하였다. 또한, 한국어 및 문어체에 특화된 KoBERT 언어 모델로 한국어 다중 감정 분석 시스템을 구축하여 두 가지 실험을 진행하였다. 정제된 데이터와 정제되지 않은 데이터를 감정 분석 모델에 각각 테스트 데이터로서 주입하여 비교함으로써 비문법적인 요소들이 KoBERT 기반 한국어 다중 감정 분석 시스템 성능에 어떤 영향을 끼치는지 고찰해보았으며, 개방 데이터셋과 직접 구축한 한국어 말뭉치를 비교 분석하여 한국어 다중 감정 분석 시스템의 정확도 향상을 위한 전이학습용 데이터셋 구성 방안을 제안하였다.
본 연구에서는 한국어 감정 분석 정확도가 높다고 검증된 KoBERT 언어모델을 이용한 다중 감정 분석 시스템을 구축하여 감정 분석이 수행되는 과정에서 한국어 다중 감정 분석이 어떠한 이유로 어려운지 분석하고 데이터셋 조성에 대한 방향성을 제시하였다. 이를 통하여 한국어 텍스트 감정 분석의 정확도를 향상할 자료로 쓰이는 데에 의미가 있으며, 방송 분야에서의 감정 분석 기술 활용에 도움이 되고자 한다.

번역하기

최근 딥러닝 기술 연구의 발전으로 감정 분석에 관한 다양한 연구가 진행되고 있다. 초기 자연어처리 분야에서는 인공지능이 인간의 감정 또는 감성을 단순 극성인 긍/부정으로 분류하는 연...

다국어 초록 (Multilingual Abstract)

With the recent development of Deep Learning technology research, various studies on Emotion Analysis are being conducted. In the early Natural Language Processing field, there were many studies in which Artificial Intelligence classified into various human emotions or into positive/negative emotion. However, recently, beyond Binary Sentiment Analysis, which classifies emotional polarity as positive/negative, it has evolved into a study on Multi-class Emotion Analysis, a more complex and difficult task. Such Multi-class Emotion Analysis technology can be expected to generate new results by converging with the broadcasting field. However, despite high interest in the field of broadcasting, research on Multi-class Emotion Analysis is still insufficient. In particular, although the Radio listeners' text messages are textual data that contains various emotions that humans can have, related studies are insufficient and studies of Korean Multi-class Emotion Analysis on sentences used by real people are insufficient. Accordingly, a system for performing Emotion Analysis using radio listeners’ text messages collected in the actual environment was proposed and through this, a study on Korean Multi-class Emotion Analysis was conducted.
In this paper, a method of improving the performance of Korean Multi-class Emotion Analysis was studied by using radio listeners’ text messages collected in a real environment. It is differentiated in that it directly collects listeners’ text messages of actual radio broadcasts and uses them as a Korean dataset for Emotion Analysis, rather than an open dataset commonly used in existing Emotion Analysis studies. By analyzing the radio listeners' text messages collected in the actual environment, the linguistic characteristics that are difficult to analyze Korean emotions were examined. In addition, a survey and experiment were conducted to consider and analyze the composition of a dataset that can increase the accuracy of Korean Multi-class Emotion Analysis. Prior to conducting the experiment, a survey was conducted to define a universal standard for emotional labeling in order to build a Korean corpus for the experiment. In addition, two experiments were conducted by establishing a Korean Multi-class Emotion Analysis system with a KoBERT Language Model specialized in Korean and literary styles. We investigated how non-grammatical factors affect the performance of the KoBERT-based Korean Multi-class Emotion Analysis system by injecting refined data and unrefined data each into the Emotion Analysis model as test data, and proposed a method of constructing a dataset for Fine-tuning to improve the accuracy of the Korean Multi-class Emotion Analysis system.
In this study, a Multi-class Emotion Analysis system using the KoBERT Language Model, which was proven to have high accuracy in Korean Emotion Analysis, was established to analyze Korean emotion in the process of Emotion Analysis, and to present the direction for creating a dataset. Through this, it is meaningful to be used as a material to improve the accuracy of Korean Text Emotion Analysis, and it is intended to help apply Emotion Analysis technology in the broadcasting field.

번역하기

목차 (Table of Contents)

요약 ⅰ
표목차 ⅴ
그림목차 ⅵ
I. 서 론 1

요약 ⅰ
표목차 ⅴ
그림목차 ⅵ
I. 서 론 1
1. 연구의 배경 1
2. 연구의 목적 3
3. 연구 범위 5
4. 논문의 구성 6
II. 이론적 배경 7
1. 관련 연구 7
2. 감성 분석/감정 분석 11
3. KoBERT 13
4. Transformer 구조 15
5. 한국어의 언어학적 특성 17
III. 감정분석용 데이터 전처리 및 분석 19
1. 설문조사 19
2. 설문조사 결과 및 고찰 21
3. 데이터셋 전처리 24
4. 데이터셋 분석 26
IV. 감정 분석 모델 27
1. 모델 구현 27
2. 실험용 데이터셋 구성 30
3. 실험 방법 33
V. 실험 결과 및 고찰 35
1. 정제/비정제 데이터에 대한 비교 35
2. 전이학습용 데이터셋에 대한 비교 37
Ⅵ. 결 론 41
참고문헌 43
영문초록(Abstract) 45

상세검색

RISS 보유자료

상세검색

해외전자자료

라디오 청취자 문자 사연을 활용한 KoBERT 기반 한국어 다중 감정 분석 연구 = A Study on the Korean Multi-class Emotion Analysis Based on KoBERT in Text Messages of Radio Listeners

부가정보

분석정보

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료