LSTM 기반 Skeleton 시퀀스 모델링을 활용한 유아 개별 행동 분석 시스템 = An Individual Child Behavior Analysis System Using LSTM-Based Skeleton Sequence Modeling|RISS 상세보기

다국어 초록 (Multilingual Abstract)

In modern society, the safety and development of young children had been recognized as important concerns across home, childcare, and educational environments. Because young children had not yet achieved sufficient physical and cognitive maturity, they frequently exhibited unpredictable behaviors, which increased their exposure to various types of accidents in daily life. Recent domestic child safety statistics from the past three years indicated that a substantial proportion of accidents occurred within the home, emphasizing the need for continuous behavior monitoring in everyday living environments. However, long-term direct observation by caregivers or teachers was difficult in practice, which highlighted the necessity for automated behavior analysis approaches.
With advances in deep learning and computer vision technologies, research on analyzing human behavior from video data had progressed rapidly. Nevertheless, most existing behavior and facial expression recognition studies had been developed primarily using adult datasets, which limited their applicability to children. In addition, many previous approaches focused on single-person scenarios, making it challenging to analyze individual behaviors in multi-person environments such as real childcare settings.
To address these limitations, this study developed a child behavior analysis system based on skeleton sequence modeling using an LSTM network. The system consisted of a Mediapipe-based skeleton extraction module, a SimpleLSTM-based behavior classification module, and face recognition and FER-based emotion analysis modules. Each module operated independently while being integrated into a unified pipeline designed to collect and process data stably in multi-person indoor environments. In particular, the face recognition module employed known face encoding and ID tracking techniques to prevent duplicate identification and to manage individual sequence buffers, thereby improving the accuracy of behavior recognition and emotion analysis.
Experimental results confirmed that the system operated stably in both single-person and multi-person video scenarios. High performance was observed especially in indoor environments where movement was relatively constrained. Using a SimpleLSTM model trained on the NTU-RGB+D 120 dataset, the system successfully tracked and classified individual behaviors in real time. Furthermore, FER-based emotion analysis enabled real-time tracking of facial expression changes and quantitative analysis of dominant emotional states. By recording analysis results in a CSV format, additional evaluations such as behavior frequency analysis, emotion change patterns, and behavior–emotion correlations were conducted.
Based on these results, the behavioral and emotional characteristics of individual children were quantitatively represented, demonstrating the system’s applicability to practical domains including child observation, education, and psychological assessment. The proposed system also proved suitable for analyzing indoor group activities and interactions involving multiple children. By integrating and visualizing behavior and emotion data, the system provided an intuitive means for caregivers and teachers to better understand children’s tendencies and to offer informed feedback. Overall, this study validated the feasibility of an integrated indoor child behavior and emotion analysis system and provided a foundational framework for future research on real-time observation support and educational or psychological evaluation systems. Based on the findings of this study, future research could focus on improving system reliability and generalizability through real-time processing optimization, multi-camera integration, and expansion to diverse environments and age groups.

번역하기

국문 초록 (Abstract)

현대 사회에서 유아의 안전과 발달은 가정 및 보육·교육 환경 전반에서 지속적으로 중요성이 강조되고 있다. 유아는 신체적·인지적 발달이 미성숙하여 예측하기 어려운 돌발적인 행동을 자주 보이기 때문에, 일상생활 속 다양한 사고 위험에 노출되기 쉽다. 최근 3년간 국내 어린이 안전사고 통계에서도 전체 사고의 절반 가까이가 가정에서 발생하는 것으로 보고되어, 생활 환경에서의 지속적인 행동 모니터링의 필요성이 제기되고 있다. 그러나 보호자나 교사가 유아를 장시간 직접 관찰하는 데에는 현실적인 한계가 존재한다. 최근 딥러닝과 컴퓨터 비전 기술의 발전으로 영상 데이터로부터 사람의 행동을 분석하는 연구가 활발히 진행되고 있다. 기존의 딥러닝 기반 행동·표정 인식 연구는 주로 성인 데이터를 기반으로 하여 유아에게 직접 적용하기 어렵다는 한계를 가지고 있다. 또한 단일 인물 환경을 중심으로 설계된 연구가 많아, 실제 보육 환경과 같은 다중 인물 상황에서 개별 유아의 행동을 분리하여 분석하는 데에는 제약이 있다. 이와 같은 문제를 해결하기 위해, 본 논문에서는 Skeleton 시퀀스 데이터를 기반으로 LSTM 모델을 활용하여 유아의 개별 행동을 인식하고 분석하는 시스템을 제안한다. 제안된 시스템은 Mediapipe 기반 Skeleton 추출 모듈, SimpleLSTM 기반 행동 분류 모듈, 얼굴 인식 및 FER 기반 감정 분석 모듈로 구성되어 있으며, 각각의 모듈이 독립적으로 작동함과 동시에 통합 파이프라인에서 멀티 인물 환경에서도 안정적으로 데이터를 수집하고 처리할 수 있도록 설계되었다. 특히, 얼굴 인식 모듈은 Known Face Encoding과 ID Tracking 알고리즘을 활용하여 동일 인물의 중복 인식을 방지하고, 각 인물별로 개별 시퀀스 버퍼를 관리하여 행동 인식 및 감정 분석의 정확도를 향상시켰다. 실험을 통해 단일 인물 영상과 다중 인물 영상 모두에서 시스템이 안정적으로 작동함을 확인하였다. 특히 실내 환경에서 움직임이 상대적으로 제한적인 조건에서는 높은 성능을 보였다. Skeleton 기반 행동 인식에서는 NTU-RGB+D 120 데이터셋을 기반으로 학습한 SimpleLSTM 모델을 활용하여, 각 인물별 행동을 실시간으로 추적하고 분류할 수 있었다. 또한 FER를 활용한 감정 분석을 통해, 표정 변화를 실시간으로 추적하고 Top-Emotion을 추출함으로써 감정 패턴을 정량적으로 분석할 수 있었다. 이와 함께, CSV 기반 데이터 기록을 통해 인물별 행동 빈도, 감정 변화 패턴, 행동-감정 상관관계 등 다양한 분석이 가능함을 실험적으로 확인하였다. 분석 결과를 바탕으로 각 인물들의 성향을 정량화할 수 있었으며, 이는 유아 관찰, 교육, 심리 평가 등 다양한 실무 환경에서 활용 가능함을 보여주었다. 특히, 멀티인물 환경에서도 개별 인물별 행동과 감정을 동시에 분석할 수 있어, 다수 유아가 참여하는 실내 활동 관찰 및 상호작용 분석에도 적합함을 확인하였다. 또한, 행동과 감정 데이터를 통합하여 시각화함으로써, 교사나 부모가 보다 직관적으로 아동의 성향을 이해하고 피드백을 제공할 수 있는 기반을 마련하였다. 본 연구는 유아 행동 및 감정 분석을 위한 통합 시스템의 설계, 구현, 실험 평가를 통해 실용적 가능성을 검증하였다. 통합 파이프라인을 통해 멀티인물 행동과 감정을 동시에 분석하고, 이를 기반으로 유아의 성향을 정량화할 수 있음을 확인하였다. 본 연구에서 제시한 시스템과 분석 방법은 교육·심리 평가 및 실시간 관찰 지원 시스템 개발을 위한 기초적 기반을 제공한다. 향후 연구에서는 실시간 처리 최적화, 다중 카메라 통합, 다양한 환경과 연령대 확장을 통해 시스템의 범용성과 신뢰성을 향상시킬 수 있을 것으로 기대된다.

번역하기

현대 사회에서 유아의 안전과 발달은 가정 및 보육·교육 환경 전반에서 지속적으로 중요성이 강조되고 있다. 유아는 신체적·인지적 발달이 미성숙하여 예측하기 어려운 돌발적인 행동을 �...

목차 (Table of Contents)