Video-based person re-identification (PReID) requires matching individuals across non-overlapping camera views using temporal information from video sequences. While large vision transformer (ViT) models achieve SOTA accuracy, their substantial comput...
Video-based person re-identification (PReID) requires matching individuals across non-overlapping camera views using temporal information from video sequences. While large vision transformer (ViT) models achieve SOTA accuracy, their substantial computational requirements limit deployment on resource-constrained devices such as edge platforms, and embedded surveillance cameras. Knowledge distillation offers a solution by transferring knowledge from large teacher models to compact student models. However, traditional distillation methods apply full teacher guidance uniformly from the start of training, creating gradient conflicts and hindering independent student learning when representations are poorly initialized. This paper proposes a novel three-phase progressive knowledge distillation strategy that dynamically adjusts teacher guidance intensity during training: (1) Foundation phase where students train independently without distillation, (2) Introduction phase with gradual partial distillation, and (3) Refinement phase with full knowledge transfer.
Our approach achieves 73.8% Rank-1 accuracy and 68.5% mAP. This work demonstrates that scheduled progressive knowledge distillation enables efficient video-based person ReID deployment on resource-constrained platforms while maintaining competitive accuracy.