RAG와 CoT 프롬프팅을 활용한 중소형 LLM의 한국어 수학추론 성능 향상 기법: 최적 임베딩 모델 탐색을 중심으로 = Enhancing Mathematical Reasoning in Mid-Scale Language Models for Korean Problems through RAG and CoT Prompting: Focusing on Embedding Model|RISS 상세보기

국문 초록 (Abstract)

최근 대규모 언어 모델(Large Language Model, LLM)은 기계 번역, 문장 생성 등 다양한 자연어 처리 과제에서 탁월한 성능을 입증하고 있으며, 나아가 수학적 추론 능력 또한 언어 모델의 핵심적인 성능 지표로 부상하고 있다. 그러나 수학적 추론은 단순히 언어 패턴을 인식하는 것을 넘어, 다단계 논리 전개와 정교한 계산 과정을 요구한다는 점에서 일반적인 자연어 처리 과제와는 근본적인 차이가 있다.
이러한 배경에서 LLM의 수학적 추론 성능을 고도화하기 위해 프롬프트 엔지니어링(Prompt Engineering), 검색 증강 생성(Retrieval-Augmented Generation, RAG) 및 미세조정(Fine-tuning) 등 다양한 방법론이 활용되었다. 특히, 단계별 사고 과정을 포함하는 예제가 반영된 Chain-of-Thought(CoT) 프롬프팅 기법은 수학추론에서 유의미한 성능 개선을 이끌며 현재까지도 널리 활용되고 있다. 하지만 CoT는 주로 대규모의 모델에서 그 효과가 극대화되며, 이는 높은 운영 비용을 수반한다는 현실적 한계를 가진다.
이에 본 연구는 중소 규모 언어 모델 환경에서의 한국어 수학 문제 해결 성능을 제고하기 위해, RAG를 활용하여 문제와 연관된 외부 정보를 검색하고 이를 CoT 프롬프트에 통합하는 방안을 제안한다. 특히, RAG의 성능은 임베딩 품질에 크게 의존하므로 한국어 수학 문제에 적합한 임베딩 모델을 탐색하는 데 중점을 두었다.
본 연구는 약 8B 규모 모델과 AI-Hub의 한국어 수학 문제 데이터셋을 이용해 제안 기법의 효과를 검증하였다. 결과적으로, RAG를 통해 관련 예제를 동적 검색하는 제안 방법론이 선행 연구된 프롬프팅 방식 대비 높은 성능을 기록하였다. 또한, 범용적 특성을 갖는 임베딩 모델을 적용하였을 때 RAG 기반의 수학추론 성능이 가장 우수하였고, 이는 수학 특화 및 코드 특화 모델을 상회함을 실험을 통해 확인할 수 있었다.

번역하기

최근 대규모 언어 모델(Large Language Model, LLM)은 기계 번역, 문장 생성 등 다양한 자연어 처리 과제에서 탁월한 성능을 입증하고 있으며, 나아가 수학적 추론 능력 또한 언어 모델의 핵심적인 ...

다국어 초록 (Multilingual Abstract)

Large language models have lately demonstrated remarkable performance in natural language processing tasks such as text generation, and mathematical reasoning has also emerged as a critical benchmark of their capability. Unlike conventional NLP tasks, however, mathematical reasoning requires multi-step logical inference and precise calculation, highlighting the need for specialized approaches such as prompt engineering, retrieval-augmented generation, and fine-tuning. Among these, Chain-of-Thought prompting has shown significant improvements by incorporating step-by-step exemplars, but its effectiveness is most pronounced in large-scale models, which involve substantial computational and operational costs.
This study proposes an approach to improve the performance of mid-scale LLMs in solving Korean mathematical problems by leveraging RAG to dynamically retrieve external information and integrate it into CoT prompts. Since the effectiveness of RAG critically depends on embedding quality, we focus on identifying embedding models suitable for Korean mathematical problems. The proposed method was evaluated using an approximately 8B-parameter model and the Korean mathematical problem dataset provided by AI-Hub. Experimental results demonstrate that the RAG-CoT approach outperforms standard prompting methods, and that applying a general-purpose embedding model yielded the highest performance, surpassing both math-specific and code-specific embedding models.

번역하기

목차 (Table of Contents)

1. 서론 1
1.1. 연구 배경 1
1.2. 연구 목적 4
2. 관련 연구 5
2.1. LLM의 수학적 추론 연구 동향 5

1. 서론 1
1.1. 연구 배경 1
1.2. 연구 목적 4
2. 관련 연구 5
2.1. LLM의 수학적 추론 연구 동향 5
2.2. 프롬프트 엔지니어링(Prompt Engineering) 8
2.2.1. 프롬프트 엔지니어링 개요 8
2.2.2. 수학추론을 위한 퓨샷 기반 프롬프팅 기법 9
2.2.3. 수학추론을 위한 제로샷 기반 프롬프팅 기법 11
2.3. 검색 증강 생성(Retrieval-Augmented Generation) 12
2.3.1. RAG 개요 12
2.3.2. RAG 프레임워크 동작 과정 13
2.3.3. Embedding Model 14
2.3.4. 수학추론을 위한 RAG 활용 14
2.4. 국내외 관련 연구 현황 16
3. 제안 방안 17
3.1. 제안 프레임워크 구조 17
3.2. 인덱싱 18
3.3. 임베딩 모델 선택 19
3.4. 검색 20
3.5. 생성 21
3.6. 프롬프트 설계 21
3.6.1. 실험 프롬프트 유형 21
3.6.2. 시스템 프롬프트 구성 22
3.6.3. 사용자 프롬프트 구성 23
3.6.4. Self-Consistency 24
4. 실험 및 결과 25
4.1. 데이터셋 25
4.2. 실험 설계 26
4.3. 평가 지표 26
4.4. 활용 모델 27
4.5. 실험 결과 및 논의 27
4.5.1. 적용 기법별 성능 비교 27
4.5.2. 임베딩 모델별 성능 비교 29
4.5.3. 임베딩 모델별 오답 유형 분석 32
4.5.4. 공통 오답 사례 분석 36
5. 결론 38
참고문헌 39

상세검색

RISS 보유자료

상세검색

해외전자자료

RAG와 CoT 프롬프팅을 활용한 중소형 LLM의 한국어 수학추론 성능 향상 기법: 최적 임베딩 모델 탐색을 중심으로 = Enhancing Mathematical Reasoning in Mid-Scale Language Models for Korean Problems through RAG and CoT Prompting: Focusing on Embedding Model

부가정보

분석정보

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료