대형 언어모델의 문맥 반영 향상을 위한 계층별 Contrast Decoding 연구 = Layer-wise Contrast Decoding for Enhancing Contextual Integration in Large Language Models|RISS 상세보기

다국어 초록 (Multilingual Abstract)

Large language models (LLMs) have achieved impressive performance across a wide range of NLP tasks, yet they often fail to fully incorporate newly provided context and instead rely excessively on prior knowledge acquired during pre-training. This imbalance frequently leads to hallucinated or non-factual outputs, which is especially problematic in domains such as question answering, summarization, law, and medicine, where factual correctness and evidence-based reasoning are essential.

In this thesis, we propose Dynamic Layer-Contrast Decoding (DLCD), a decoding strategy designed to improve contextual integration while preserving useful prior knowledge. DLCD analyzes the difference between context and no-context predictions at multiple transformer layers, and dynamically rebalances the contribution of context and pre-trained knowledge. Concretely, DLCD constructs a contrastive distribution at each layer by reweighting tokens whose probabilities increase under contextual input, and then uses Jensen–Shannon divergence to automatically select the layer at which contextual signals are most salient. The selected intermediate layer is then contrastively combined with the final layer through log-domain reweighting controlled by two hyperparameters, and , so that context-supported tokens are amplified and context-agnostic tokens are suppressed.

We evaluate DLCD on two open-domain QA benchmarks, HotPotQA and SQuAD v1.1, using LLaMA-based models without any additional fine-tuning. Experimental results show that DLCD improves exact match (EM) by up to 2.2 percentage points and F1 score by up to 3.9 points over simple context injection, and achieves performance comparable to or better than existing methods such as Context-Aware Decoding (CAD) and DoLa. Sensitivity analyses on the start layer and the contrast coefficients indicate that contextual and prior knowledge are most stably balanced at intermediate layers (around the 16th layer), and that DLCD remains robust over a broad range of lambda layer and lambda context values.

Overall, DLCD provides a lightweight decoding-time approach that mitigates hallucinations and enhances factual consistency without modifying model parameters. Future work will extend DLCD to other tasks such as summarization, dialogue, and multi-hop reasoning, and explore adaptive schemes that automatically tune the contrast strength per query, thereby further improving the reliability of LLM-based systems.

번역하기

국문 초록 (Abstract)

최근 대형 언어모델(Large Language Model, LLM)은 자연어 처리 전반에서 뛰어난 성능을 보이며 다양한 응용 분야로 확장되고 있다. 그러나 모델이 주어진 문맥을 충분히 반영하지 못하거나, 학습 데이터에 내재된 오래된 지식에 과도하게 의존함으로써 비사실적 정보를 생성하는 환각(hallucination) 현상이 지속적으로 보고되고 있다. 이러한 문제는 질의응답(QA), 요약, 법률·의료 문서 분석 등 정확한 사실성과 근거 기반 추론이 필수적인 응용에서 모델의 신뢰성을 저하시킨다.

이에 본 연구에서는 대형 언어모델의 문맥 반영 능력을 향상시키기 위한 계층별 Contrast Decoding(Dynamic Layer-wise Contrast Decoding, DLCD) 기법을 제안하였다. DLCD는 트랜스포머 내부의 중간 레이어와 최종 레이어 간 확률 분포 차이를 분석하여, 문맥 정보와 사전 학습 지식 간의 비중을 동적으로 조절한다. 구체적으로, 문맥이 반영된 상태와 반영되지 않은 상태의 확률 분포를 대조하여 문맥 의존성이 높은 토큰에 가중치를 부여하고, Jensen–Shannon Divergence(JSD)를 활용해 문맥 신호가 가장 뚜렷하게 반영된 중간 레이어를 자동 선택한다. 이후 선택된 레이어와 최종 레이어 간의 대조 기반 확률 결합을 수행함으로써, 모델이 문맥 정보를 보다 명확히 반영하도록 유도하였다.

실험은 HotPotQA와 SQuAD 두 개의 공개 질의응답 데이터셋에서 수행되었다. 실험 결과, 제안한 DLCD 기법은 단순 문맥 주입 방식 대비 정확도(Exact Match) 약 2.2, F1 점수 약 3.9 향상을 보였으며, 기존의 Context-Aware Decoding(CAD) 및 Decoding by Contrasting Layers(DoLa) 기법과 비교했을 때도 동등하거나 더 우수한 성능을 달성하였다. 또한, 시작 레이어와 대조 계수에 대한 민감도 분석을 통해 중간층(약 16번째 레이어)에서 문맥-지식 균형이 가장 안정적으로 형성됨을 확인하였다.

본 연구는 파인튜닝 없이 디코딩 단계만 수정하여 적용 가능한 경량화된 접근으로, 대형 언어모델의 환각 문제를 완화하고 사실성·신뢰성을 높이는 새로운 방법론을 제시한다는 점에서 학문적·실무적 의의가 있다. 향후에는 DLCD의 적용 범위를 요약, 대화, 다중 추론 등 다양한 과제로 확장하고, 질의별로 최적 레이어를 자동 조정하는 적응형 디코딩 프레임워크로 발전시킬 예정이다. 이를 통해 대형 언어모델의 문맥 이해 능력과 생성 품질을 동시에 향상시켜, 신뢰 가능한 언어지능 시스템 구현에 기여할 수 있을 것으로 기대한다.

번역하기

최근 대형 언어모델(Large Language Model, LLM)은 자연어 처리 전반에서 뛰어난 성능을 보이며 다양한 응용 분야로 확장되고 있다. 그러나 모델이 주어진 문맥을 충분히 반영하지 못하거나, 학습 ...

목차 (Table of Contents)

제1장 서 론 1
1.1 연구 배경 및 필요성 1
1.2 연구 목적 3
1.3 연구 범위 및 구성 4
제2장 대형 언어모델과 환각 문제의 개요 5

제1장 서 론 1
1.1 연구 배경 및 필요성 1
1.2 연구 목적 3
1.3 연구 범위 및 구성 4
제2장 대형 언어모델과 환각 문제의 개요 5
2.1 대형 언어모델의 구조 및 특성 5
2.2 환각(Hallucination) 현상의 정의와 유형 6
2.2.1 환각 발생의 주요 원인 7
2.2.2 기존 연구 동향 및 한계 9
2.3 문맥 반영과 사전 학습 지식 간의 상호작용 9
제3장 동적 계층 대조 디코딩(DLCD) 기법 제안 12
3.1 연구 개요 12
3.2 문맥과 사전 학습 지식 간 충돌 문제 12
3.3 중간 계층 기반 확률 산출 13
3.4 계층 간 대조 디코딩 절차 16
제4장 실험 19
4.1 데이터셋 19
4.2 실험 설정 20
4.3 프롬프트 구성 21
제5장 결과 및 분석 24
5.1 주요 실험 결과 24
5.2 시작 계층(start layer) 선택 실험 27
5.3 계층 가중치 λℓ 에 대한 분석 29
5.4 계층 선택 전략 비교 30
5.5 문맥 가중치 λc 및 지연시간 분석 32
제6장 논 의 35
제7장 결 론 37
참고문헌 38
ABSTRACT 40

상세검색

RISS 보유자료

상세검색

해외전자자료

대형 언어모델의 문맥 반영 향상을 위한 계층별 Contrast Decoding 연구 = Layer-wise Contrast Decoding for Enhancing Contextual Integration in Large Language Models

부가정보

분석정보

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료