RISS 검색 - 학위논문 상세보기

국문 초록 (Abstract)

ICT 기술 발달에 따라 웹상에서 텍스트 데이터는 꾸준히 증가해왔다. 특히 최근에는 COVID19 등의 상황으로 사회 전반에 비대면 기조가 강화되었고, 다자간 의사소통이 온라인상에서 이루어지면서 다양한 형태의 담화형 텍스트데이터가 양산되고 있다. 일상 및 비일상적인 곳에서 양산되는 텍스트 데이터에 대해 독자가 정보를 효과적으로 취하기 위해서는 텍스트 요약 모델의 필요성이 존재한다. 하지만 일반적인 요약 모델의 경우에 뉴스나 논문 또는 산문 같은 단일 발화자 중심의 텍스트 요약에 초점을 맞추어 개발되어 왔고, 여러 명의 발화자가 등장하는 담화형 텍스트 데이터를 다루는 데 있어서 차별점을 두고 있지 않다.
본 연구에서는 담화형 텍스트 데이터를 효과적으로 요약할 수 있는 생성형(Abstractive) 문서 요약 모델을 개발하는 것을 목표로 한다. 본 목표를 달성하기 위해 수많은 요약문 생성 태스크에서 최고 성능(State-Of-The-Art)의 기반이 되는 BART 모델을 활용하였다. BART 모델에 발화의 흐름을 학습할 수 있는 발화 단위의 어텐션 층을 추가하고 트랜스포머 기반 텍스트 요약 모델의 한계점을 보완할 수 있도록 LDA 기반 주제 모델을 활용하여 모델을 고도화시켰다.
본 연구에서는 담화형 텍스트 데이터 셋인 DialSUMM에 대해 우리가 제안하는 모델이 더 높은 성능을 보이는지 비교 실험을 진행했다. 결과적으로 요약 모델의 평가지표 ROUGE F1 점수 측면에서 다른 순차 대 순차(Sequence to Sequence) 방식 및 자기 회귀(Autoregressive) 방식의 Baseline 모델보다 담화형 데이터를 우수하게 요약할 수 있다는 것을 입증하였다. 하지만, LDA model의 파라미터 최적화 측면에서 다양한 LDA 파라미터에 대한 비교 실험은 후속 연구로 남긴다.

번역하기

ICT 기술 발달에 따라 웹상에서 텍스트 데이터는 꾸준히 증가해왔다. 특히 최근에는 COVID19 등의 상황으로 사회 전반에 비대면 기조가 강화되었고, 다자간 의사소통이 온라인상에서 이루어지...

다국어 초록 (Multilingual Abstract)

As ICT technology develops, text data on the web has increased
significantly. In particular, in recent years, the non-face-to-face stance
has been strengthened throughout society due to situations such as COVID19, and various forms of discourse-type text have been mass-produced as
multilateral communication takes place online. There is a need for a text
summary model for readers to effectively take information on text data
mass-produced in everyday and extraordinary places. However, in the case
of a general summary model, it has been developed with a focus on single
speaker-centered text summaries such as news, papers, or prose, and there
is no distinction in dealing with discourse-based texts featuring multiple
speakers.
This study aims to develop a generative document summary model that can
effectively summarize discourse-type text. To achieve this goal, the BART
model, which is the basis of State-Of-The-Art, was used in numerous
summary generation tests. The model was advanced by using the LDA-based
thematic model to add an interest layer in speech units that can learn
the flow of speech to the BART model and to compensate for the limitations
of the transformer-based text summary model.
In this study, a comparative experiment was conducted to see if the
model we propose was showing higher performance for DialSUMM, a discoursetype text dataset. As a result, it was proved that discourse data can be
summarized better than other sequential and autoregressive baseline models
in terms of ROUGE F1 scores in the summary model. However, in terms of
parameter optimization in the LDA model, comparative experiments on
various LDA hyperparameters remain as follow-up studies.

번역하기

목차 (Table of Contents)

그림 차례․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ iii
표 차례․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ iv
국문 요약․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ v
1. 서론․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 1

그림 차례․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ iii
표 차례․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ iv
국문 요약․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ v
1. 서론․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 1
2. 이론적 배경․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 4
2.1 트랜스포머․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 4
2.2 LDA(Latent Dirichlet Allocation) ․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 7
3. 선행 연구․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 9
3.1 생성형(Abstractive) 텍스트 요약․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 9
3.1.1 순환신경망 기반 순차 대 순차 모델 및 포인터-생성 네트워크․․ 10
3.1.2 BART․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 11
3.2 주제(Topic) 모델의 활용․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 12
3.3 계층적(Hierarchical) 학습 ․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 14
4. 제안하는 모델 ․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 16
4.1 Topic representation․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 17
4.2 Word-level 인코더․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 18
4.2.1 Positional Encoding and Embedding Layer․․․․․․․․․․․․․․․․․․․․․․․ 19
4.2.2 Multi Head Self Attention․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 20
4.2.3 Position-wise Feed-Forward Networks ․․․․․․․․․․․․․․․․․․․․․․․․․․ 21
4.2.4 Topic Attention Layer ․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 22
4.3 Utterance level 인코더․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 25
4.4 디코더․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 27
5. 실험 및 결과․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 29
5.1 실험 데이터․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 29
5.2 실험 구성․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 30
5.2.1 BART 모델 실험 구성․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 31
5.2.2 LDA 모델 실험 구성․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 32
5.2.3 실험 장비․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 32
5.2.4 실험 범위․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 33
5.3 비교 모델․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 33
5.3.1 순차 대 순차(Sequence to Sequence) 모델․․․․․․․․․․․․․․․․․․․․․․ 34
5.3.2 자기 회귀(Autoregressive) 모델․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 35
5.4 실험 결과․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 35
6. 생성된 요약문의 예시․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 38
7. 결론․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 39
참고 문헌․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 41
영문 요약(Abstract) ․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․․ 44

상세검색

RISS 보유자료

상세검색

해외전자자료

토픽 모델과 트랜스포머를 활용한 담화형 문서의 생성형 요약에 대한 연구 = Abstractive summarization of dialogue documents using topic model and transformer

부가정보

분석정보

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료