Wildlife pose estimation based on the SCD-HRNet model and its application in biodiversity monitoring: A case study of the Saihanwula Region, Inner Mongolia

doi:10.17520/biods.2025287

Abstract

Abstract:

Aims: The conservation of wild animals in the Saihanwula region of Inner Mongolia is of great significance for maintaining regional biodiversity. Behavioral analysis helps enhance the scientific basis and intelligent management of biodiversity conservation, with pose estimation serving as the prerequisite and core support for behavioral analysis.
Methods: Aiming to solve the problem of decreased pose estimation accuracy caused by illumination changes, high-speed animal movement and complex environmental occlusion factors in wildlife monitoring, this paper proposed a novel wildlife pose estimation method combining attention mechanism and dynamic confidence suppression (selective coordinate-enhanced decoupling-HRNet, SCD-HRNet). Firstly, combined with the squeeze-and-excitation (SE) attention mechanism, the channel-level context features were extracted by global average pooling to enhance the discrimination ability of the network for species morphological features and effectively solve the problem of feature distortion caused by illumination changes. Secondly, in order to deal with the positioning deviation caused by high-speed animal movement, the coordinate attention (CA) mechanism was introduced to decompose the two-dimensional coordinates into horizontal and vertical components, and the bidirectional attention mechanism was used to establish the cross-direction long-range dependence relationship to improve the joint positioning accuracy under motion blur. Finally, the dynamic confidence suppression (DCS) module was proposed to establish an adaptive threshold function based on model inference accuracy to achieve robust detection of occluded key points.
Results: This paper carried out comparative experiments to verify the performance of the model. The experimental results showed that the mean average precision of SCD-HRNet method reaches 82.61% and 69.79% on the collected and labeled wild animal dataset in Saihanwula area and on the AP-10K public animal dataset, respectively, outperforming the existing methods.
Conclusion: The proposed SCD-HRNet method significantly improves the pose estimation accuracy of wildlife images in complex ecological scenes and provides reliable technical support for wildlife behavior analysis in ecological monitoring.

Key words: wild animals, pose estimation, HRNet, squeeze-and-excitation attention mechanism, coordinate attention mechanism, dynamic confidence suppression, infrared camera monitoring, biodiversity conservation

Ziyi Kong, Degang Wang, Jiantao Wang, Zhiyong Pei, Jing Sun, Changchun Zhang, Junguo Zhang. Wildlife pose estimation based on the SCD-HRNet model and its application in biodiversity monitoring: A case study of the Saihanwula Region, Inner Mongolia[J]. Biodiv Sci, 2026, 34(4): 25287.

Add to citation manager EndNote|Ris|BibTeX

URL: https://www.biodiversity-science.net/EN/10.17520/biods.2025287

https://www.biodiversity-science.net/EN/Y2026/V34/I4/25287

Figures/Tables 11

References 32

[1]	An L, Ren JL, Yu T, Hai T, Jia YC, Liu YB (2023) Three-dimensional surface motion capture of multiple freely moving pigs using MAMMAL. Nature Communications, 14, 7727. DOI PMID
[2]	Barney S, Dlay S, Crowe A, Kyriazakis I, Leach M (2023) Deep learning pose estimation for multi-cattle lameness detection. Scientific Reports, 13, 4499. DOI PMID
[3]	Cai JM, He PY, Yang ZP, Li LY, Zhao QJ, Pan F (2023) A deep feature fusion-based method for bird sound recognition and its interpretability analysis. Biodiversity Science, 31, 23087.(in Chinese with English abstract) DOI
	[蔡建民, 何培宇, 杨智鹏, 李露莹, 赵启军, 潘帆 (2023) 基于深度特征融合的鸟鸣识别方法及其可解释性分析. 生物多样性, 31, 23087.] DOI
[4]	Cai YH, Wang ZC, Luo ZX, Yin BY, Du AG, Wang HQ, Zhang XY, Zhou XY, Zhou EJ, Sun J (2020) Learning delicate local representations for multi-person pose estimation. ArXiv, doi: 10.48550/arXiv.2003.04030.
[5]	Cao JK, Tang HY, Fang HS, Shen XY, Tai YW, Lu CW (2019) Cross-domain adaptation for animal pose estimation. arXiv, doi: 10.48550/arXiv.1908.05806.
[6]	Cheng G, Yuan X, Yao XW, Yan KB, Zeng QH, Xie XX, Han JW (2023) Towards large-scale small object detection: Survey and benchmarks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 13467-13488.
[7]	Ding YH, Liang J, Jiang B, Zheng AH, He R (2024) MAPS: A noise-robust progressive learning approach for source-free domain adaptive keypoint detection. IEEE Transactions on Circuits and Systems for Video Technology, 34, 1376-1387. DOI URL
[8]	Han YN, Chen K, Wang YK, Liu WH, Wang ZW, Wang XJ, Han CL, Liao JH, Huang K, Cai SY, Huang YT, Wang N, Li JX, Song Y, Li J, Wang GD, Wang LP, Zhang YP, Wei PF (2024) Multi-animal 3D social pose estimation, identification and behaviour embedding with a few-shot learning framework. Nature Machine Intelligence, 6, 48-61. DOI
[9]	Hou QB, Zhou DQ, Feng JS (2021) Coordinate attention for efficient mobile network design. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Nashville, TN.
[10]	Hu B, Seybold B, Yang S, Sud A, Liu Y, Barron K, Cha P, Cosino M, Karlsson E, Kite J, Kolumam G, Preciado J, Zavala-Solorio J, Zhang CL, Zhang XM, Voorbach M, Tovcimak AE, Ruby JG, Ross DA (2023) 3D mouse pose from single-view video and a new dataset. Scientific Reports, 13, 13554. DOI PMID
[11]	Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018). IEEE, New York.
[12]	Huang K, Han YN, Chen K, Pan HL, Zhao GY, Yi WL, Li XX, Liu SY, Wei PF, Wang LP (2021) A hierarchical 3D-motion learning framework for animal spontaneous behavior mapping. Nature Communications, 12, 2784.
[13]	Jiang T, Lu P, Zhang L, Ma NS, Han R, Lyu CQ, Li YN, Chen K (2023) RTMPose: Real-time multi-person pose estimation based on MMPose. arXiv, doi: 10.48550/arXiv.2303.07399.
[14]	Kirillov A, Wu YX, He KM, Girshick R (2020) PointRend: Image segmentation as rendering. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA.
[15]	Lauer J, Zhou M, Ye SK, Menegas W, Schneider S, Nath T, Rahman MM, Di Santo V, Soberanes D, Feng GP, Murthy VN, Lauder G, Dulac C, Mathis MW, Mathis A (2022) Multi-animal pose estimation, identification and tracking with DeepLabCut. Nature Methods, 19, 496-504. DOI PMID
[16]	Li SY, Liu K, Wang H, Yang R, Li XZ, Sun YQ, Zhong RT, Wang W, Li Y, Sun YJ, Wang GH (2025) Pose estimation and tracking dataset for multi-animal behavior analysis on the China Space Station. Scientific Data, 12, 766. DOI
[17]	Liu H, Pan SG, Wu PB, Yu KG, Gao W, Yu BG (2024) Uncertainty-aware UWB/LiDAR/INS tightly coupled fusion pose estimation via filtering approach. IEEE Sensors Journal, 24, 11113-11126. DOI URL
[18]	Ma NN, Zhang XY, Zheng HT, Sun J (2018) ShuffleNet V2: Practical guidelines for efficient CNN architecture design. arXiv, doi: 10.48550/arXiv.1807.11164.
[19]	Mokany K, Ware C, Harwood TD, Schmidt RK, Ferrier S (2022) Habitat-based biodiversity assessment for ecosystem accounting in the Murray-Darling Basin. Conservation Biology, 36, e13915. DOI URL
[20]	Sagar ASMS, Islam MZ, Tanveer J, Kim HS (2025) Uncertainty-aware adaptive multiscale U-Net for low-contrast cardiac image segmentation. Applied Sciences, 15, 2222. DOI URL
[21]	Sandler M, Howard A, Zhu ML, Zhmoginov A, Chen LC (2018) MobileNetV2: Inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT.
[22]	Sun K, Xiao B, Liu D, Wang JD (2019) Deep high-resolution representation learning for human pose estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA.
[23]	Xiao B, Wu HP, Wei YC (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV) (eds Ferrari V, Hebert M, Sminchisescu C, Weiss Y). Springer, Cham.
[24]	Xu GD, Xu Y, Deng H, Mo H (2023) Research on multi-target animal pose estimation based on improved high resolution network. Computer Engineering and Applications, 59(22), 182-192.(in Chinese with English abstract) DOI
	[徐贵冬, 徐杨, 邓辉, 莫寒 (2023) 改进高分辨率网络的多目标动物姿态估计研究. 计算机工程与应用, 59(22), 182-192.] DOI
[25]	Xu Y, Zhang J, Zhang Q, Tao D (2022) ViTPose: Simple vision transformer baselines for human pose estimation. Advances in Neural Information Processing Systems, 35, 38571-38584.
[26]	Ye SK, Filippova A, Lauer J, Schneider S, Vidal M, Qiu T, Mathis A, Mathis MW (2024) SuperAnimal pretrained pose estimation models for behavioral analysis. Nature Communications, 15, 5165. DOI PMID
[27]	Yin ZX, Zhao YQ, Xu ZH, Yu QP (2024) Automatic detection of stereotypical behaviors of captive wild animals based on surveillance videos of zoos and animal reserves. Ecological Informatics, 79, 102450. DOI URL
[28]	Yu H, Xu YF, Zhang J, Zhao W, Guan ZY, Tao DC (2021) AP-10K: A benchmark for animal pose estimation in the wild. arXiv, doi: 10.48550/arXiv.2108.12617.
[29]	Yuan YH, Fu R, Huang L, Lin WH, Zhang C, Chen XL, Wang JD (2021) HRFormer: High-resolution transformer for dense prediction. arXiv, doi: 10.48550/arXiv.2110.09408.
[30]	Zhang JG, Cheng ZA, Hu CH, Chen C, Bao WD (2018) Adaptive image enhancement algorithm for wild animal monitoring based on Retinex theory. Transactions of the Chinese Society of Agricultural Engineering, 34(15), 183-189.(in Chinese with English abstract)
	[张军国, 程浙安, 胡春鹤, 陈宸, 鲍伟东 (2018) 野生动物监测光照自适应Retinex图像增强算法. 农业工程学报, 34(15), 183-189.]
[31]	Zhang WW, Xu Y, Bai R, Chen N (2023) Animal pose estimation based on improved stacked hourglass network. Computer Engineering, 49(2), 263-270.(in Chinese with English abstract) DOI
	[张雯雯, 徐杨, 白芮, 陈娜 (2023) 基于改进堆叠沙漏网络的动物姿态估计. 计算机工程, 49(2), 263-270.] DOI
[32]	Zhang XY, Zhou XY, Lin MX, Sun J (2018) ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT.

实验类型 Type of experiment	SE	CA	DCS	AP (%)	AP⁵⁰ (%)	AP⁷⁵ (%)	AP^M (%)	AP^L (%)	AR (%)
完整数据集消融实验 Ablation study on the full dataset	-	-	-	81.47	85.11	83.24	86.64	78.74	83.43
	√	-	-	81.93	85.33	83.86	87.20	79.23	84.02
	-	√	-	82.06	84.96	83.95	87.03	80.46	84.00
	-	-	√	81.95	84.90	83.47	87.06	79.55	83.72
	√	√	-	82.40	85.26	83.91	87.65	80.15	84.25
	√	-	√	82.22	85.22	83.86	86.34	79.74	84.03
	-	√	√	82.25	85.80	84.10	87.57	80.05	84.14
	√	√	√	82.61	85.19	84.39	88.15	80.48	84.68
低光照子集消融实验 Ablation study on the low-light subset	-	-	-	26.56	28.05	27.67	20.59	24.02	26.39
	√	-	-	26.78	28.03	28.03	20.36	24.43	26.59
	-	√	-	26.75	28.05	27.72	19.94	24.50	26.59
	-	-	√	26.68	28.05	28.05	19.77	24.45	26.52
	√	√	√	26.85	28.05	28.05	19.77	24.56	26.52
运动伪影与模糊子集消融实验 Ablation study on the motion blur and artifact subset	-	-	-	25.57	28.55	28.55	0.00	27.10	25.41
	√	-	-	25.77	31.68	27.72	0.00	27.23	25.72
	-	√	-	26.24	31.02	27.08	0.00	27.63	26.15
	-	-	√	26.03	32.39	27.72	0.00	27.48	26.04
	√	√	√	26.47	31.85	27.08	0.00	27.98	26.25
遮挡子集消融实验 Ablation study on the occlusion subset	-	-	-	21.31	27.21	20.58	26.54	20.68	22.56
	√	-	-	21.53	26.97	23.10	27.48	21.00	22.54
	-	√	-	21.67	27.19	23.62	26.61	21.23	22.88
	-	-	√	22.10	27.52	23.29	27.18	21.21	23.23
	√	√	√	22.13	27.36	23.86	25.21	21.16	23.12

实验类型 Type of experiment	SE	CA	DCS	AP (%)	AP⁵⁰ (%)	AP⁷⁵ (%)	AP^M (%)	AP^L (%)	AR (%)
完整数据集消融实验 Ablation study on the full dataset	-	-	-	81.47	85.11	83.24	86.64	78.74	83.43
	√	-	-	81.93	85.33	83.86	87.20	79.23	84.02
	-	√	-	82.06	84.96	83.95	87.03	80.46	84.00
	-	-	√	81.95	84.90	83.47	87.06	79.55	83.72
	√	√	-	82.40	85.26	83.91	87.65	80.15	84.25
	√	-	√	82.22	85.22	83.86	86.34	79.74	84.03
	-	√	√	82.25	85.80	84.10	87.57	80.05	84.14
	√	√	√	82.61	85.19	84.39	88.15	80.48	84.68
低光照子集消融实验 Ablation study on the low-light subset	-	-	-	26.56	28.05	27.67	20.59	24.02	26.39
	√	-	-	26.78	28.03	28.03	20.36	24.43	26.59
	-	√	-	26.75	28.05	27.72	19.94	24.50	26.59
	-	-	√	26.68	28.05	28.05	19.77	24.45	26.52
	√	√	√	26.85	28.05	28.05	19.77	24.56	26.52
运动伪影与模糊子集消融实验 Ablation study on the motion blur and artifact subset	-	-	-	25.57	28.55	28.55	0.00	27.10	25.41
	√	-	-	25.77	31.68	27.72	0.00	27.23	25.72
	-	√	-	26.24	31.02	27.08	0.00	27.63	26.15
	-	-	√	26.03	32.39	27.72	0.00	27.48	26.04
	√	√	√	26.47	31.85	27.08	0.00	27.98	26.25
遮挡子集消融实验 Ablation study on the occlusion subset	-	-	-	21.31	27.21	20.58	26.54	20.68	22.56
	√	-	-	21.53	26.97	23.10	27.48	21.00	22.54
	-	√	-	21.67	27.19	23.62	26.61	21.23	22.88
	-	-	√	22.10	27.52	23.29	27.18	21.21	23.23
	√	√	√	22.13	27.36	23.86	25.21	21.16	23.12

方法 Method	主干网络 Backbone network	AP (%)	AP⁵⁰ (%)	AP⁷⁵ (%)	AP^M (%)	AP^L (%)	AR (%)	参考文献 References
SCD-HRNet	HRNet-W48	82.61	85.19	84.39	88.15	80.48	84.68	本研究 This study
HRNet	HRNet-W48	81.47	85.11	83.24	86.64	78.74	83.43	Sun et al., 2019
Simple Baseline	ResNet-50	81.45	85.66	83.26	87.60	78.58	83.14	Xiao et al., 2018
Simple Baseline	ResNet-101	80.81	85.28	83.40	87.08	77.87	82.54	Xiao et al., 2018
HRFormer	HRFormer	80.72	86.95	85.17	85.52	77.97	82.82	Yuan et al., 2021
RSN	RSN-18	80.62	86.56	84.38	85.68	77.78	83.57	Cai et al., 2020
ViTPose	ViT-base	78.08	86.92	83.89	85.50	74.47	80.58	Xu et al., 2022
MobileNet	MobileNetV2	77.47	87.26	84.09	83.75	73.54	80.11	Sandler et al., 2018
Shufflenet	ShufflenetV1	76.98	87.00	83.83	84.25	72.68	79.84	Zhang et al., 2018
Shufflenet	ShufflenetV2	75.82	87.10	82.24	83.64	71.30	78.66	Ma et al., 2018
RTMPose	Cspnext	74.01	87.25	82.64	81.06	69.41	76.40	Jiang et al., 2023

方法 Method	主干网络 Backbone network	AP (%)	AP⁵⁰ (%)	AP⁷⁵ (%)	AP^M (%)	AP^L (%)	AR (%)	参考文献 References
SCD-HRNet	HRNet-W48	82.61	85.19	84.39	88.15	80.48	84.68	本研究 This study
HRNet	HRNet-W48	81.47	85.11	83.24	86.64	78.74	83.43	Sun et al., 2019
Simple Baseline	ResNet-50	81.45	85.66	83.26	87.60	78.58	83.14	Xiao et al., 2018
Simple Baseline	ResNet-101	80.81	85.28	83.40	87.08	77.87	82.54	Xiao et al., 2018
HRFormer	HRFormer	80.72	86.95	85.17	85.52	77.97	82.82	Yuan et al., 2021
RSN	RSN-18	80.62	86.56	84.38	85.68	77.78	83.57	Cai et al., 2020
ViTPose	ViT-base	78.08	86.92	83.89	85.50	74.47	80.58	Xu et al., 2022
MobileNet	MobileNetV2	77.47	87.26	84.09	83.75	73.54	80.11	Sandler et al., 2018
Shufflenet	ShufflenetV1	76.98	87.00	83.83	84.25	72.68	79.84	Zhang et al., 2018
Shufflenet	ShufflenetV2	75.82	87.10	82.24	83.64	71.30	78.66	Ma et al., 2018
RTMPose	Cspnext	74.01	87.25	82.64	81.06	69.41	76.40	Jiang et al., 2023

方法 Method	主干网络 Backbone network	AP (%)	AP⁵⁰ (%)	AP⁷⁵ (%)	AP^M (%)	AP^L (%)	AR (%)
SCD-HRNet	HRNet-W48	69.79	93.55	77.70	51.98	70.13	73.17
HRNet	HRNet-W48	67.88	92.10	72.97	51.26	68.33	72.04
Simple Baseline	ResNet-50	64.24	91.44	69.83	47.60	64.53	68.03
Simple Baseline	ResNet-101	62.42	90.03	65.61	44.38	62.74	66.53
HRFormer	HRFormer	58.95	89.39	61.51	40.93	59.35	63.62
RSN	RSN-18	58.74	88.28	61.44	47.14	59.01	62.88
RTMPose	Cspnext	54.75	88.04	55.61	44.20	54.99	58.90
MobileNet	MobileNetV2	50.49	83.70	49.94	39.11	50.73	54.98
Shufflenet	ShufflenetV2	46.68	82.49	43.64	30.37	47.02	51.67
Shufflenet	ShufflenetV1	46.59	81.90	45.80	34.82	46.83	51.61
ViTPose	ViT-base	46.47	79.92	46.18	31.81	46.64	50.35