Precision Livestock Farming (PLF) has emerged as a transformative paradigm in modern animal husbandry, aiming to improve both productivity and animal welfare through the integration of sensing and data-driven technologies. For dairy cattle, continuous...
Precision Livestock Farming (PLF) has emerged as a transformative paradigm in modern animal husbandry, aiming to improve both productivity and animal welfare through the integration of sensing and data-driven technologies. For dairy cattle, continuous and fine-grained monitoring brings substantial benefits for estrus detection, health scoring, mastitis and lameness diagnosis, feeding behavior assessment, and body weight estimation. A prerequisite for these applications is reliable individual identification and long-term tracking. Traditional approaches often rely on contact-based devices such as Radio Frequency Identification (RFID) tags, pedometers, and accelerometers. While effective, these devices require manual installation and maintenance, which limits scalability and may cause stress to animals. In contrast, non-contact methods based on imaging or infrared sensing provide unobtrusive and scalable solutions. Among these, camera-based systems are becoming increasingly favored due to their ability to capture rich behavioral and biometric information without disturbing the natural environment of the animals.
Recent advances in computer vision and artificial intelligence have opened new opportunities for livestock monitoring. Techniques in object detection, semantic segmentation, pose estimation, and multi-object tracking have achieved state-of-the-art performance across various benchmarks, and their adaptation to PLF offers a cost-effective and highly efficient alternative to conventional sensor-based solutions. Vision-based systems enable automated and continuous observation of cattle herds, providing unprecedented insights into health and behavior. However, the deployment of such systems in barn environments is not trivial. Challenges such as frequent occlusion among animals, the difficulty of maintaining consistent identity across long time spans, and variations in illumination or background conditions significantly reduce robustness. Increasing the number of cameras, diversifying viewpoints, and utilizing large-scale datasets are effective strategies to mitigate these issues. Motivated by this, we propose a cross-camera multi-view monitoring framework that strengthens robustness within each individual view and simultaneously leverages complementary information across multiple cameras to address occlusion and identity fragmentation.
At the action-view level, we address the challenges of multi-cattle tracking under scale deformation, unexpected motion, and mild occlusion. For robust feature representation, we adopt an enhanced Spatial Pyramid Pooling (SPP) layer that improves multi-scale perception and spatial encoding. In parallel, we employ an ensemble Kalman filter that models the dynamic states of cattle using a five-dimensional feature set, including position, width, height, and orientation. These two components operate independently but complement each other in ensuring accurate detection and consistent tracking. Furthermore, we design a bench matching strategy to preserve identity continuity when standard association fails due to rapid movements or mild occlusion. Together, these methods substantially improve tracking stability and trajectory integrity in single camera.
At the face-view level, we focus on cattle face recognition, which is crucial for identity confirmation and linking behavioral records to individuals. However, natural head movements cause pose variation, illumination changes across date, and the monotonous background of barn environments also reduces discriminative power. To address these challenges, we introduce a pose filtering mechanism during the inference stage to ensure the quality of input facial images. In addition, we employ illumination-aware data augmentation to enhance generalization under diverse lighting conditions, and apply instance segmentation to further concentrate on the facial region while minimizing background interference. Experimental results demonstrate that these improvements significantly enhance the stability and reliability of cattle face recognition in natural farm environments.
At the multi-view level, we extend our framework to cross-camera multi-cattle tracking, where severe occlusion, appearance drift, and inaccuracies in identity back-updating pose major challenges. To achieve consistency, we introduce a center point estimation approach that provides robust position anchors across cameras. These anchors are projected into a unified Bird’s-Eye-View (BEV) grid, where trajectories from different viewpoints are aligned and matched. By integrating information from overlapping fields of view, the system reduces ambiguity caused by severe occlusion and appearance changes. Furthermore, we implement a cross-view identity propagation mechanism, which ensures that identities are updated reliably when cattle move between cameras or when visibility is temporarily lost. This design substantially decreases identity fragmentation, producing coherent long-term identity trajectories at the herd scale.
In conclusion, our cross-camera multi-view cattle monitoring framework offers a holistic solution for long-term individual identification and tracking in PLF. By combining enhanced single-view multi-cattle tracking, robust cattle face recognition, and effective cross-view identity integration, the proposed system directly addresses the key challenges in vision-based livestock monitoring. The results demonstrate improved robustness to occlusion, environmental variation, and identity drift, thereby supporting accurate and scalable behavioral analysis. This work represents a step forward in intelligent, non-contact cattle monitoring and provides a practical foundation for advancing animal welfare and farm management in precision livestock farming.