RISS 검색 - 국내학술지논문 상세보기

국문 초록 (Abstract)

본 연구는 초등학교 수학 문제 해결에서 대규모 언어모델(LLM)의 성능을 실증적으로 분석하고, 그 교육적 활용 가능성과 한계를 탐색하기 위해 OpenAI o1, Claude 3.5 Sonnet, Gemini 1.5 Pro를 대상으로 정답률, 정답 일관성, 오류 유형을 비교․분석하였다. 연구 대상은 2022 개정 교육과정의 초등학교 4학년 수학 ‘수와 연산’ 영역 문제이다. 분석 결과 OpenAI o1이 0.93으로 가장 높은 정답률을 보였고, Gemini 1.5 Pro(0.88), Claude 3.5 Sonnet(0.54) 순으로 나타났다. 표나 이미지가 결정적 정보를 포함한 문제에서는 세 모델 모두 낮은 정답률을 기록하며 시각적 정보 처리의 한계를 보였다. '곱셈과 나눗셈' 단원에서는 높은 정답률을, '큰 수' 단원에서는 낮은 정답률을 보이는 등 단원별 차이도 관찰되었다. 일관성 측면에서는 OpenAI o1이 가장 안정적이었으며, Claude 3.5 Sonnet은 동일 문제 반복 해결 시 정답이 일관되지 않는 경향을 보였다. 오류 유형 분석에서는 정보 인식 오류, 구조적 관계 해석 오류, 문맥 파악 실패 등이 나타났다. 이러한 결과는 LLM이 초등 수학 학습에서 활용 가능성을 보이지만, 문제 유형에 따른 성능 차이와 시각적 정보 해석 및 논리적 사고 과정의 한계로 인해 신중한 적용이필요함을 시사한다. 따라서 교사는 LLM의 문제 유형별 성능 차이를 인식하고 학습 지원 전략을 설계해야 한다. 또한 LLM의 다양한 풀이 표현 양식을 전략적으로 활용하고, 오류 특성을 교육적 기회로전환하는 교수․학습 전략 개발이 필요하다. 향후 연구에서는 다양한 학년과 수학 영역으로 대상을확장하고, 오류 분석을 바탕으로 LLM의 한계를 보완할 수 있는 프롬프트 설계 전략 및 수업 적용 방안을 실증적으로 검토할 필요가 있다.

번역하기

본 연구는 초등학교 수학 문제 해결에서 대규모 언어모델(LLM)의 성능을 실증적으로 분석하고, 그 교육적 활용 가능성과 한계를 탐색하기 위해 OpenAI o1, Claude 3.5 Sonnet, Gemini 1.5 Pro를 대상으로 ...

다국어 초록 (Multilingual Abstract)

This study investigates the performance of large language models (LLMs) in solving elementary mathematics problems by comparing accuracy, consistency, and error types across OpenAI o1, Claude 3.5 Sonnet, and Gemini 1.5 Pro. The dataset consists of problems from the “Number and Operations” domain within 4th-grade mathematics of the 2022 revised Korean national curriculum. Results show OpenAI o1 achieved the highest accuracy (0.93), followed by Gemini 1.5 Pro (0.88) and Claude 3.5 Sonnet (0.54). All models demonstrated lower performance on problems containing critical information in tables or images, revealing limitations in visual information processing. Performance was high in “Multiplication and Division” but low in “Large Numbers,” indicating topic-based differences. OpenAI o1 was most consistent, while Claude 3.5 Sonnet showed inconsistent answers when solving identical problems repeatedly. Error analysis revealed information recognition errors, structural relationship misinterpretation, and contextual understanding failures. These findings suggest that while LLMs show potential for elementary math learning, cautious application is needed due to performance differences across problem types and limitations in visual information interpretation and logical reasoning. Teachers should recognize LLMs' performance differences and design appropriate learning support strategies. Future research should expand to various grade levels and mathematical domains, and empirically examine prompt design strategies based on error analysis.

번역하기

상세검색

RISS 보유자료

상세검색

해외전자자료

부가정보

동일학술지(권/호) 다른 논문

분석정보

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료