In recent years, large-scale events and festivals face the potential for critical incidents. To prevent these incidents, CNN-based crowd-counting systems with high accuracy, such as CSRNet, are proposed. However, their extensive parameter size limits ...
In recent years, large-scale events and festivals face the potential for critical incidents. To prevent these incidents, CNN-based crowd-counting systems with high accuracy, such as CSRNet, are proposed. However, their extensive parameter size limits its application on mobile and edge devices. To solve this problem, RTL-based AI accelerators, which design processing engines optimized for AI models, are attracting a attention as an alternative platform to GPU due to their advantages of low power and lost cost. This paper proposes a CNN-based crowd counting system by designing CSRNet on the FPGA platform. In terms of algorithm optimization, we applied pruning and quantization to CSRNet to reduce the parameter size, and in terms of hardware design, we applied loop unrolling and dataflow optimization to parallelize operations and conducted a design based on data reuse patterns. As a result of Xilinx Ultrascale+ MPSoC ZCU102 implementation, the proposed IP uses only 24.92% of LUTs, 2.88% of FFs, and 3.17% of DSPs while offering advantages in terms of low power consumption and cost-effectiveness compared to GPUs.