딥러닝 기반 다중 시점 추적 및 얼굴 인식을 활용한 고도 지능형 소 모니터링 프레임워크 = Deep Learning-Based Multi-View Tracking and Face Recognition for Advanced Intelligent Cattle Monitoring Framework|RISS 상세보기

다국어 초록 (Multilingual Abstract)

Precision Livestock Farming (PLF) has emerged as a transformative paradigm in modern animal husbandry, aiming to improve both productivity and animal welfare through the integration of sensing and data-driven technologies. For dairy cattle, continuous and fine-grained monitoring brings substantial benefits for estrus detection, health scoring, mastitis and lameness diagnosis, feeding behavior assessment, and body weight estimation. A prerequisite for these applications is reliable individual identification and long-term tracking. Traditional approaches often rely on contact-based devices such as Radio Frequency Identification (RFID) tags, pedometers, and accelerometers. While effective, these devices require manual installation and maintenance, which limits scalability and may cause stress to animals. In contrast, non-contact methods based on imaging or infrared sensing provide unobtrusive and scalable solutions. Among these, camera-based systems are becoming increasingly favored due to their ability to capture rich behavioral and biometric information without disturbing the natural environment of the animals.

Recent advances in computer vision and artificial intelligence have opened new opportunities for livestock monitoring. Techniques in object detection, semantic segmentation, pose estimation, and multi-object tracking have achieved state-of-the-art performance across various benchmarks, and their adaptation to PLF offers a cost-effective and highly efficient alternative to conventional sensor-based solutions. Vision-based systems enable automated and continuous observation of cattle herds, providing unprecedented insights into health and behavior. However, the deployment of such systems in barn environments is not trivial. Challenges such as frequent occlusion among animals, the difficulty of maintaining consistent identity across long time spans, and variations in illumination or background conditions significantly reduce robustness. Increasing the number of cameras, diversifying viewpoints, and utilizing large-scale datasets are effective strategies to mitigate these issues. Motivated by this, we propose a cross-camera multi-view monitoring framework that strengthens robustness within each individual view and simultaneously leverages complementary information across multiple cameras to address occlusion and identity fragmentation.

At the action-view level, we address the challenges of multi-cattle tracking under scale deformation, unexpected motion, and mild occlusion. For robust feature representation, we adopt an enhanced Spatial Pyramid Pooling (SPP) layer that improves multi-scale perception and spatial encoding. In parallel, we employ an ensemble Kalman filter that models the dynamic states of cattle using a five-dimensional feature set, including position, width, height, and orientation. These two components operate independently but complement each other in ensuring accurate detection and consistent tracking. Furthermore, we design a bench matching strategy to preserve identity continuity when standard association fails due to rapid movements or mild occlusion. Together, these methods substantially improve tracking stability and trajectory integrity in single camera.

At the face-view level, we focus on cattle face recognition, which is crucial for identity confirmation and linking behavioral records to individuals. However, natural head movements cause pose variation, illumination changes across date, and the monotonous background of barn environments also reduces discriminative power. To address these challenges, we introduce a pose filtering mechanism during the inference stage to ensure the quality of input facial images. In addition, we employ illumination-aware data augmentation to enhance generalization under diverse lighting conditions, and apply instance segmentation to further concentrate on the facial region while minimizing background interference. Experimental results demonstrate that these improvements significantly enhance the stability and reliability of cattle face recognition in natural farm environments.

At the multi-view level, we extend our framework to cross-camera multi-cattle tracking, where severe occlusion, appearance drift, and inaccuracies in identity back-updating pose major challenges. To achieve consistency, we introduce a center point estimation approach that provides robust position anchors across cameras. These anchors are projected into a unified Bird’s-Eye-View (BEV) grid, where trajectories from different viewpoints are aligned and matched. By integrating information from overlapping fields of view, the system reduces ambiguity caused by severe occlusion and appearance changes. Furthermore, we implement a cross-view identity propagation mechanism, which ensures that identities are updated reliably when cattle move between cameras or when visibility is temporarily lost. This design substantially decreases identity fragmentation, producing coherent long-term identity trajectories at the herd scale.

In conclusion, our cross-camera multi-view cattle monitoring framework offers a holistic solution for long-term individual identification and tracking in PLF. By combining enhanced single-view multi-cattle tracking, robust cattle face recognition, and effective cross-view identity integration, the proposed system directly addresses the key challenges in vision-based livestock monitoring. The results demonstrate improved robustness to occlusion, environmental variation, and identity drift, thereby supporting accurate and scalable behavioral analysis. This work represents a step forward in intelligent, non-contact cattle monitoring and provides a practical foundation for advancing animal welfare and farm management in precision livestock farming.

번역하기

목차 (Table of Contents)

1 Introduction 1
1.1 Motivation 1
1.2 Challenges 3
1.3 Objectives 7
1.4 Organization 12

1 Introduction 1
1.1 Motivation 1
1.2 Challenges 3
1.3 Objectives 7
1.4 Organization 12
2 Related Work 14
2.1 Vision-Based Methods in PLF 14
2.1.1 Applications 14
2.1.2 Model Usage Statistics 20
2.2 Principles of Reported Models 23
2.2.1 R-CNN series 23
2.2.2 YOLO series 24
2.2.3 SORT series 25
3 Farm Setup and Experimental Settings 27
3.1 Farm Setup 27
3.2 Model Selection 29
3.3 Evaluation Metrics 30
4 Enhanced Appearance and Nonlinear Motion with Bench Matching SORT 33
4.1 Action-View Challenges 33
4.2 Contributions 35
4.3 Proposed Method 36
4.4 Implementation Details 38
4.4.1 Appearance Model (AM) 38
4.4.2 Nonlinear Motion Model (NM) 40
4.4.3 Bench Matching Mechanism (BM) 41
4.5 Action-View Dataset 42
4.6 Experimental Setup 44
4.7 Qualitative Results 46
4.8 Quantitative Results 49
4.9 Ablation Study 50
4.10 Discussion of Single-View Tracking 51
5 Spatio-Temporal Filtering and Illumination-Robust Cattle Face Recognition 54
5.1 Face-View Challenges 54
5.2 Contributions 56
5.3 Proposed Method 57
5.4 Implementation Details 60
5.4.1 Spatio-Temporal Information 60
5.4.2 Illumination Variation 64
5.4.3 Background Interference 64
5.5 Face-View Dataset 65
5.6 Experimental Setup 67
5.6.1 Detection Modules 68
5.6.2 Identification Module 68
5.7 Qualitative Results 69
5.7.1 Detection Performance 69
5.7.2 Recognition Performance 69
5.8 Quantitative Results 71
5.8.1 Detection Metrics 71
5.8.2 Pose Alignment Feasibility 72
5.8.3 ID Classifier Performance 72
5.9 Ablation Study 74
5.9.1 Module Analysis 74
5.9.2 Generalization Analysis 75
5.10 Discussion of Single-View Face Recognition 77
6 Face-Assisted BEV Representation for Identity-Preserved Multi-View Cattle Tracking 79
6.1 Multi-View Challenges 79
6.2 Contributions 81
6.3 Proposed Method 83
6.4 Implementation Details 85
6.4.1 Action-View Trajectory Matching 85
6.4.2 Face-View ID Assignment 89
6.4.3 Cross-View ID Update 91
6.5 Multi-View Dataset 92
6.6 Experimental Setup 94
6.7 Qualitative Results 95
6.7.1 Detection Performance 95
6.7.2 Multi-View Monitoring 96
6.8 Quantitative Results 98
6.8.1 Detection Performance 98
6.8.2 Face Recognition 99
6.8.3 Multi-View Tracking 99
6.9 Trajectory Analysis 102
6.10 Discussion of Multi-View Monitoring 104
7 Conclusion and Future Work 106
7.1 Conclusion 106
7.2 Future Trend 108
Bibliography 109
요 약 문 141
Acknowledgments 144

상세검색

RISS 보유자료

상세검색

해외전자자료

딥러닝 기반 다중 시점 추적 및 얼굴 인식을 활용한 고도 지능형 소 모니터링 프레임워크 = Deep Learning-Based Multi-View Tracking and Face Recognition for Advanced Intelligent Cattle Monitoring Framework

부가정보

분석정보

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료