대화형 수학 튜터링 시스템을 위한 데이터셋 설계 및 구축 = Design and Construction of a Dataset for Dialogue-Based Math Tutoring Systems|RISS 상세보기

다국어 초록 (Multilingual Abstract)

A Dialogue-based Tutoring System (DBTS) is a system that educates learners through conversations. The DBTS helps improve the ability of learners and can provide more practice opportunities in various fields such as language, science, and mathematics. Recently, the development in artificial intelligence and natural language processing has significantly improved the conversion quality of DBTS based on generative language models. To implement such DBTS, it is essential to secure various conversation-based guidance data in the respective fields. However, crowdsourcing such dialogue datasets is costly, and annotator training also requires a lot of time and money. Moreover, insufficient training of annotators can lead to a decline in data quality. Therefore, this paper proposes an automated method for constructing datasets for dialogue- based math tutoring systems and designs a benchmark to evaluate such systems based on the automatically constructed data, presenting its baseline performance. Compared to the traditional manual dataset construction methods, the automated approach not only saves time and cost but also demonstrates superiority in terms of data diversity and consistency. The data setting involves a 1:1 dialogue-based guided practice setting, where the dialogue scenarios are designed and constructed in a way that the teacher and student collaboratively solve problems after the teacher teaches new knowledge. For math problems, the dataset was constructed targeting the difficulty level of elementary mathematics, comprising Math Word Problems (GSM8K; Grade School Math 8K).

번역하기

국문 초록 (Abstract)

대화형 튜터링 시스템(Dialogue-Based Tutoring System; DBTS)은 학습자와의 대화를 통해 교육을 시키는 시스템이다. 대화형 튜터링 시스템은 학습자의 능력 향상을 도우며 언어, 과학, 수학 등 다양한 분야에 더 많은 연습기회를 제공할 수 있다. 최근 인공지능 및 자연어 처리 분야의 발전으로 생성형 언어 모델을 기반으로 한 대화형 튜터링 시스템의 대화수준에 많은 발전이 되었다. 이러한 대화형 튜터링 시스템을 구현하기 위해서는 해당분야의 다양한 대화 기반 튜터링 데이터의 확보가 필수적이다. 그러나, 이러한 대화 데이터셋을 크라우드소싱하는 것은 비용이 많이 들고, 주석자 교육에도 많은 시간이 소요된다. 또한, 충분하지 못한 주석자 교육은 데이터의 품질 저하를 야기할 수 있다. 따라서, 본 논문에서는 대화형 수학 튜터링 시스템을 위한 데이터셋 자동 구축방법을 제안하고, 자동으로 구축한 데이터를 바탕으로 대화형 튜터링 시스템을 평가할 수 있는 벤치마크를 설계하여 기본 성능을 제시한다. 기존의 수동 데이터셋 구축 방식에 비해 자동화된 접근법은 시간과 비용을 절감할 뿐만 아니라, 데이터의 다양성 및 일관성 측면에서 우수함을 보여준다. 데이터 세팅은 1:1 대화기반 지도 실습(guide practice) 세팅으로 교사가 새로운 지식을 가르친 이후 학생과 교사가 협력적으로 문제를 해결해 나가는 방식으로 대화 시나리오를 설계하여 구축을 진행했다. 수학 문제의 경우 수학 단어 문제(Math Word Problem) 중 GSM8K(Grade School Math 8K)로 구성해 초등 수학의 난이도를 대상으로 대화 데이터셋을 구축했다.

번역하기

대화형 튜터링 시스템(Dialogue-Based Tutoring System; DBTS)은 학습자와의 대화를 통해 교육을 시키는 시스템이다. 대화형 튜터링 시스템은 학습자의 능력 향상을 도우며 언어, 과학, 수학 등 다양한 ...

목차 (Table of Contents)