In modern society, the safety and development of young children had been recognized as important concerns across home, childcare, and educational environments. Because young children had not yet achieved sufficient physical and cognitive maturity, the...
In modern society, the safety and development of young children had been recognized as important concerns across home, childcare, and educational environments. Because young children had not yet achieved sufficient physical and cognitive maturity, they frequently exhibited unpredictable behaviors, which increased their exposure to various types of accidents in daily life. Recent domestic child safety statistics from the past three years indicated that a substantial proportion of accidents occurred within the home, emphasizing the need for continuous behavior monitoring in everyday living environments. However, long-term direct observation by caregivers or teachers was difficult in practice, which highlighted the necessity for automated behavior analysis approaches.
With advances in deep learning and computer vision technologies, research on analyzing human behavior from video data had progressed rapidly. Nevertheless, most existing behavior and facial expression recognition studies had been developed primarily using adult datasets, which limited their applicability to children. In addition, many previous approaches focused on single-person scenarios, making it challenging to analyze individual behaviors in multi-person environments such as real childcare settings.
To address these limitations, this study developed a child behavior analysis system based on skeleton sequence modeling using an LSTM network. The system consisted of a Mediapipe-based skeleton extraction module, a SimpleLSTM-based behavior classification module, and face recognition and FER-based emotion analysis modules. Each module operated independently while being integrated into a unified pipeline designed to collect and process data stably in multi-person indoor environments. In particular, the face recognition module employed known face encoding and ID tracking techniques to prevent duplicate identification and to manage individual sequence buffers, thereby improving the accuracy of behavior recognition and emotion analysis.
Experimental results confirmed that the system operated stably in both single-person and multi-person video scenarios. High performance was observed especially in indoor environments where movement was relatively constrained. Using a SimpleLSTM model trained on the NTU-RGB+D 120 dataset, the system successfully tracked and classified individual behaviors in real time. Furthermore, FER-based emotion analysis enabled real-time tracking of facial expression changes and quantitative analysis of dominant emotional states. By recording analysis results in a CSV format, additional evaluations such as behavior frequency analysis, emotion change patterns, and behavior–emotion correlations were conducted.
Based on these results, the behavioral and emotional characteristics of individual children were quantitatively represented, demonstrating the system’s applicability to practical domains including child observation, education, and psychological assessment. The proposed system also proved suitable for analyzing indoor group activities and interactions involving multiple children. By integrating and visualizing behavior and emotion data, the system provided an intuitive means for caregivers and teachers to better understand children’s tendencies and to offer informed feedback. Overall, this study validated the feasibility of an integrated indoor child behavior and emotion analysis system and provided a foundational framework for future research on real-time observation support and educational or psychological evaluation systems. Based on the findings of this study, future research could focus on improving system reliability and generalizability through real-time processing optimization, multi-camera integration, and expansion to diverse environments and age groups.