基于SCD-HRNet模型的野生动物姿态估计及其在生物多样性监测中的应用: 以内蒙古赛罕乌拉地区为例

doi:10.17520/biods.2025287

生物多样性 ›› 2026, Vol. 34 ›› Issue (4): 25287. DOI: 10.17520/biods.2025287 cstr: 32101.14.biods.2025287

基于SCD-HRNet模型的野生动物姿态估计及其在生物多样性监测中的应用: 以内蒙古赛罕乌拉地区为例

孔孜亦¹^,²^,³, 王德港¹^,²^,³, 王建涛⁴, 裴志永⁵, 孙晶⁶, 张长春¹^,²^,³^,^*(), 张军国¹^,²^,³^,^*()

¹ 北京林业大学工学院, 北京 100083
² 林木资源高效生产全国重点实验室, 北京 100083
³ 北京林业大学生物多样性智慧监测研究中心, 北京 100083
⁴ 内蒙古乌兰坝国家级自然保护区管理局, 内蒙古赤峰 025450
⁵ 内蒙古农业大学能源与交通工程学院, 呼和浩特 010018
⁶ 兴安盟乌兰河地方级自然保护区管理局, 内蒙古乌兰浩特 137400

收稿日期:2025-07-20 接受日期:2025-10-22 出版日期:2026-04-20 发布日期:2026-05-27
通讯作者: *E-mail: zhangchangchun@bjfu.edu.cn;zhangjunguo@bjfu.edu.cn
基金资助:
国家自然科学基金(32371874);国家自然科学基金(32401569);北京市自然科学基金(6244053);陕西省科学院科技计划项目(2025K-32)

Wildlife pose estimation based on the SCD-HRNet model and its application in biodiversity monitoring: A case study of the Saihanwula Region, Inner Mongolia

Ziyi Kong¹^,²^,³, Degang Wang¹^,²^,³, Jiantao Wang⁴, Zhiyong Pei⁵, Jing Sun⁶, Changchun Zhang¹^,²^,³^,^*(), Junguo Zhang¹^,²^,³^,^*()

¹ College of Engineering, Beijing Forestry University, Beijing 100083, China
² National Key Laboratory of Efficient Production of Forest Resources, Beijing 100083, China
³ Research Center for Intelligent Biodiversity Monitoring, Beijing Forestry University, Beijing 100083, China
⁴ Administration of Ulanba National Nature Reserve, Chifeng, Inner Mongolia 025450, China
⁵ College of Energy and Transportation Engineering, Inner Mongolia Agricultural University, Hohhot 010018, China
⁶ Wulanhe Local Nature Reserve Administration, Hinggan League, Ulanhot, Inner Mongolia 137400, China

Received:2025-07-20 Accepted:2025-10-22 Online:2026-04-20 Published:2026-05-27
Contact: *E-mail: zhangchangchun@bjfu.edu.cn; zhangjunguo@bjfu.edu.cn
Supported by:
National Natural Science Foundation of China(32371874);National Natural Science Foundation of China(32401569);Beijing Natural Science Foundation(6244053);the Science and Technology Program of Shaanxi Academy of Sciences(2025K-32)

1. 附录.pdf(719KB)

摘要/Abstract

摘要：

内蒙古赛罕乌拉地区野生动物的保护对维护区域生物多样性具有重要意义。行为分析有助于提升生物多样性保护的科学性与智能化水平, 而姿态估计是行为分析的前提和核心支撑。针对野生动物监测中因光照变化、动物高速运动、复杂环境遮挡因素导致的姿态估计精度下降问题, 本文提出一种融合注意力机制和动态置信度抑制的野生动物姿态估计方法(selective coordinate-enhanced decoupling-HRNet, SCD-HRNet)。首先, 结合压缩-激励(squeeze-and-excitation, SE)注意力机制, 通过全局平均池化提取通道级上下文特征, 增强网络对物种形态特征的鉴别能力, 有效解决由光照变化导致的特征失真问题; 其次, 为应对动物高速运动带来的定位偏差, 引入坐标注意力(coordinate attention, CA)机制, 将二维坐标分解为水平与垂直分量, 通过双向注意力机制建立跨方向长程依赖关系, 提升运动模糊状态下的关键点定位精度; 最后, 提出动态置信度抑制(dynamic confidence suppression, DCS)模块, 基于模型推理精度建立自适应阈值函数, 实现遮挡部位关键点的稳健性检测。本文开展对比实验以验证模型的性能。实验结果表明, SCD-HRNet方法的平均精度均值在采集并标注的赛罕乌拉地区野生动物数据集和AP-10K公开动物数据集上分别达到了82.61%和69.79%, 均优于已有方法。本文提出的SCD-HRNet方法显著提升了复杂生态场景中野生动物图像的姿态估计精度, 为生物多样性监测中的野生动物行为分析提供了可靠的技术支持。

关键词: 野生动物, 姿态估计, HRNet, 压缩-激励注意力机制, 坐标注意力机制, 动态置信度抑制, 红外相机监测, 生物多样性保护

Abstract

Aims: The conservation of wild animals in the Saihanwula region of Inner Mongolia is of great significance for maintaining regional biodiversity. Behavioral analysis helps enhance the scientific basis and intelligent management of biodiversity conservation, with pose estimation serving as the prerequisite and core support for behavioral analysis.
Methods: Aiming to solve the problem of decreased pose estimation accuracy caused by illumination changes, high-speed animal movement and complex environmental occlusion factors in wildlife monitoring, this paper proposed a novel wildlife pose estimation method combining attention mechanism and dynamic confidence suppression (selective coordinate-enhanced decoupling-HRNet, SCD-HRNet). Firstly, combined with the squeeze-and-excitation (SE) attention mechanism, the channel-level context features were extracted by global average pooling to enhance the discrimination ability of the network for species morphological features and effectively solve the problem of feature distortion caused by illumination changes. Secondly, in order to deal with the positioning deviation caused by high-speed animal movement, the coordinate attention (CA) mechanism was introduced to decompose the two-dimensional coordinates into horizontal and vertical components, and the bidirectional attention mechanism was used to establish the cross-direction long-range dependence relationship to improve the joint positioning accuracy under motion blur. Finally, the dynamic confidence suppression (DCS) module was proposed to establish an adaptive threshold function based on model inference accuracy to achieve robust detection of occluded key points.
Results: This paper carried out comparative experiments to verify the performance of the model. The experimental results showed that the mean average precision of SCD-HRNet method reaches 82.61% and 69.79% on the collected and labeled wild animal dataset in Saihanwula area and on the AP-10K public animal dataset, respectively, outperforming the existing methods.
Conclusion: The proposed SCD-HRNet method significantly improves the pose estimation accuracy of wildlife images in complex ecological scenes and provides reliable technical support for wildlife behavior analysis in ecological monitoring.

Key words: wild animals, pose estimation, HRNet, squeeze-and-excitation attention mechanism, coordinate attention mechanism, dynamic confidence suppression, infrared camera monitoring, biodiversity conservation

孔孜亦, 王德港, 王建涛, 裴志永, 孙晶, 张长春, 张军国 (2026) 基于SCD-HRNet模型的野生动物姿态估计及其在生物多样性监测中的应用: 以内蒙古赛罕乌拉地区为例. 生物多样性, 34, 25287. DOI: 10.17520/biods.2025287.

Ziyi Kong, Degang Wang, Jiantao Wang, Zhiyong Pei, Jing Sun, Changchun Zhang, Junguo Zhang (2026) Wildlife pose estimation based on the SCD-HRNet model and its application in biodiversity monitoring: A case study of the Saihanwula Region, Inner Mongolia. Biodiversity Science, 34, 25287. DOI: 10.17520/biods.2025287.

导出引用管理器 EndNote|Ris|BibTeX

链接本文: https://www.biodiversity-science.net/CN/10.17520/biods.2025287

https://www.biodiversity-science.net/CN/Y2026/V34/I4/25287

图/表 11

图1 数据标注示意图。17个关键点覆盖头部、四肢和躯干, 用于识别采食、运动、休息等行为模式。为提升可视化效果, 对展示的标注关键点均进行了放大处理。

Fig. 1 Data annotation diagram. Seventeen key points covering the head, limbs, and torso are used to identify behavioral patterns such as feeding, moving, and resting. To improve visualization, all displayed key annotation points have been enlarged.

图2 SCD-HRNet模型结构图。Decoder: 解码器; Suppressed heatmap: 抑制后热力图; Raw heatmap: 原始热力图; Final layer: 最终输出层; SEBlock: 压缩-激励模块; Conv: 卷积; Deconv: 反卷积。该网络在HRNet基础上引入压缩-激励(SE)注意力机制、坐标注意力(CA)机制和动态置信度抑制(DCS)模块, 以提升复杂背景下的姿态估计关键点识别精度。

Fig. 2 Structure of SCD-HRNet model. Final layer, Final output layer; Conv, Convolution; SEBlock, Squeeze-and-excitation blocks; Deconv, Transposed convolution (deconvolution). The network, built on HRNet, introduces squeeze-and-excitation (SE) attention mechanism, coordinate attention (CA) mechanism, and dynamic confidence suppression (DCS) modules to improve keypoint detection accuracy for pose estimation in complex backgrounds.

图3 压缩-激励(SE)注意力机制结构图。FC layer: 全连接层; ReLU: ReLU激活函数; Sigmoid: Sigmoid函数。该模块通过全局平均池化压缩空间信息, 经全连接层学习通道间的重要性权重, 再对原始特征图进行通道级重标定, 从而提升对光照变化的稳健性。

Fig. 3 Structure of squeeze-and-excitation (SE) attention mechanism. FC layer, Fully connected layer; ReLU, Rectified linear unit activation; Sigmoid, Sigmoid function. The module applies global average pooling to squeeze spatial information, then uses a fully connected layer to learn channel-wise importance weights, and re-calibrates the original feature map at the channel level, thereby improving robustness to illumination changes.

图4 坐标注意力机制结构图。X Avg Pool: X轴平均池化; Y Avg Pool: Y轴平均池化; Concat: 拼接; Conv2d: 二维卷积层; BatchNorm: 批归一化层; Non-linear: 非线性激活函数; Sigmoid: Sigmoid函数。该模块将通道注意力分解为沿着两个空间方向的一维特征编码过程, 从而在通道注意力中精确地保留空间位置信息。

Fig. 4 Coordinate attention mechanism structure diagram. X Avg Pool, Average pooling along the x-axis; Y Avg Pool, Average pooling along the y-axis; Concat, Concatenation; Conv2d, 2D convolution; BatchNorm, Batch normalization; Non-linear, Non-linear activation function; Sigmoid, Sigmoid function. This module decomposes channel attention into two 1D feature-encoding paths along the horizontal and vertical spatial directions, thereby precisely preserving spatial positional information within the channel-attention mechanism.

图5 动态置信度抑制(DCS)模块流程图

Fig. 5 Flowchart of the dynamic confidence suppression (DCS) module

表1 不同子集上的消融实验结果。在同一训练/测试划分与训练策略下, 对基线HRNet-W48逐步加入压缩-激励(SE)注意力机制、坐标注意力(CA)机制、动态置信度抑制(DCS)模块, 并报告性能变化。表中“√/-”分别表示启用/未启用该模块; 其余设置与基线一致。AP: 平均精度; AP50: 目标关键点相似度(OKS)为0.50时计算得到的平均精度; AP75: OKS为0.75时计算得到的平均精度; APM: 中等大小物体的平均精度; APL: 大型大小物体的平均精度; AR: 召回率。

Table 1 Ablation study results on different subsets. Under the same training/test split and training strategy, squeeze-and-excitation (SE) attention mechanism, coordinate attention (CA), and dynamic confidence suppression (DCS) modules were progressively incorporated into the baseline HRNet-W48, and the corresponding performance changes were reported. In the table, “√/-” indicate enabled/disabled, respectively; all other settings follow the baseline. AP, Average precision; AP50, Average precision calculated at an object keypoint similarity (OKS) threshold of 0.50; AP75: Average precision calculated at an OKS threshold of 0.75; APM, Average precision for medium objects; APL, Average precision for large objects; AR, Average recall.

实验类型 Type of experiment	SE	CA	DCS	AP (%)	AP⁵⁰ (%)	AP⁷⁵ (%)	AP^M (%)	AP^L (%)	AR (%)
完整数据集消融实验 Ablation study on the full dataset	-	-	-	81.47	85.11	83.24	86.64	78.74	83.43
	√	-	-	81.93	85.33	83.86	87.20	79.23	84.02
	-	√	-	82.06	84.96	83.95	87.03	80.46	84.00
	-	-	√	81.95	84.90	83.47	87.06	79.55	83.72
	√	√	-	82.40	85.26	83.91	87.65	80.15	84.25
	√	-	√	82.22	85.22	83.86	86.34	79.74	84.03
	-	√	√	82.25	85.80	84.10	87.57	80.05	84.14
	√	√	√	82.61	85.19	84.39	88.15	80.48	84.68
低光照子集消融实验 Ablation study on the low-light subset	-	-	-	26.56	28.05	27.67	20.59	24.02	26.39
	√	-	-	26.78	28.03	28.03	20.36	24.43	26.59
	-	√	-	26.75	28.05	27.72	19.94	24.50	26.59
	-	-	√	26.68	28.05	28.05	19.77	24.45	26.52
	√	√	√	26.85	28.05	28.05	19.77	24.56	26.52
运动伪影与模糊子集消融实验 Ablation study on the motion blur and artifact subset	-	-	-	25.57	28.55	28.55	0.00	27.10	25.41
	√	-	-	25.77	31.68	27.72	0.00	27.23	25.72
	-	√	-	26.24	31.02	27.08	0.00	27.63	26.15
	-	-	√	26.03	32.39	27.72	0.00	27.48	26.04
	√	√	√	26.47	31.85	27.08	0.00	27.98	26.25
遮挡子集消融实验 Ablation study on the occlusion subset	-	-	-	21.31	27.21	20.58	26.54	20.68	22.56
	√	-	-	21.53	26.97	23.10	27.48	21.00	22.54
	-	√	-	21.67	27.19	23.62	26.61	21.23	22.88
	-	-	√	22.10	27.52	23.29	27.18	21.21	23.23
	√	√	√	22.13	27.36	23.86	25.21	21.16	23.12

实验类型 Type of experiment	SE	CA	DCS	AP (%)	AP⁵⁰ (%)	AP⁷⁵ (%)	AP^M (%)	AP^L (%)	AR (%)
完整数据集消融实验 Ablation study on the full dataset	-	-	-	81.47	85.11	83.24	86.64	78.74	83.43
	√	-	-	81.93	85.33	83.86	87.20	79.23	84.02
	-	√	-	82.06	84.96	83.95	87.03	80.46	84.00
	-	-	√	81.95	84.90	83.47	87.06	79.55	83.72
	√	√	-	82.40	85.26	83.91	87.65	80.15	84.25
	√	-	√	82.22	85.22	83.86	86.34	79.74	84.03
	-	√	√	82.25	85.80	84.10	87.57	80.05	84.14
	√	√	√	82.61	85.19	84.39	88.15	80.48	84.68
低光照子集消融实验 Ablation study on the low-light subset	-	-	-	26.56	28.05	27.67	20.59	24.02	26.39
	√	-	-	26.78	28.03	28.03	20.36	24.43	26.59
	-	√	-	26.75	28.05	27.72	19.94	24.50	26.59
	-	-	√	26.68	28.05	28.05	19.77	24.45	26.52
	√	√	√	26.85	28.05	28.05	19.77	24.56	26.52
运动伪影与模糊子集消融实验 Ablation study on the motion blur and artifact subset	-	-	-	25.57	28.55	28.55	0.00	27.10	25.41
	√	-	-	25.77	31.68	27.72	0.00	27.23	25.72
	-	√	-	26.24	31.02	27.08	0.00	27.63	26.15
	-	-	√	26.03	32.39	27.72	0.00	27.48	26.04
	√	√	√	26.47	31.85	27.08	0.00	27.98	26.25
遮挡子集消融实验 Ablation study on the occlusion subset	-	-	-	21.31	27.21	20.58	26.54	20.68	22.56
	√	-	-	21.53	26.97	23.10	27.48	21.00	22.54
	-	√	-	21.67	27.19	23.62	26.61	21.23	22.88
	-	-	√	22.10	27.52	23.29	27.18	21.21	23.23
	√	√	√	22.13	27.36	23.86	25.21	21.16	23.12

表2 经典模型在赛罕乌拉地区野生动物数据集上的检测结果对比。所有模型均为自上向下(top-down)姿态估计框架, 并在测试集上进行评估。关键点评价遵循COCO Keypoint标准。AP: 平均精度; AP50: 目标关键点相似度(OKS)为0.50时计算得到的平均精度; AP75: OKS为0.75时计算得到的平均精度; APM: 中等大小物体的平均精度; APL: 大型大小物体的平均精度; AR: 召回率。

Table 2 Comparison of detection results of classical models on the Saihanwula wildlife dataset. All models adopt a top-down pose estimation framework and are evaluated on the test set. Evaluation follows the COCO Keypoint protocol. AP, Average precision; AP50, Average precision calculated at an object keypoint similarity (OKS) threshold of 0.50; AP75: Average precision calculated at an OKS threshold of 0.75; APM, Average precision for medium objects; APL, Average precision for large objects; AR, Average recall.

方法 Method	主干网络 Backbone network	AP (%)	AP⁵⁰ (%)	AP⁷⁵ (%)	AP^M (%)	AP^L (%)	AR (%)	参考文献 References
SCD-HRNet	HRNet-W48	82.61	85.19	84.39	88.15	80.48	84.68	本研究 This study
HRNet	HRNet-W48	81.47	85.11	83.24	86.64	78.74	83.43	Sun et al., 2019
Simple Baseline	ResNet-50	81.45	85.66	83.26	87.60	78.58	83.14	Xiao et al., 2018
Simple Baseline	ResNet-101	80.81	85.28	83.40	87.08	77.87	82.54	Xiao et al., 2018
HRFormer	HRFormer	80.72	86.95	85.17	85.52	77.97	82.82	Yuan et al., 2021
RSN	RSN-18	80.62	86.56	84.38	85.68	77.78	83.57	Cai et al., 2020
ViTPose	ViT-base	78.08	86.92	83.89	85.50	74.47	80.58	Xu et al., 2022
MobileNet	MobileNetV2	77.47	87.26	84.09	83.75	73.54	80.11	Sandler et al., 2018
Shufflenet	ShufflenetV1	76.98	87.00	83.83	84.25	72.68	79.84	Zhang et al., 2018
Shufflenet	ShufflenetV2	75.82	87.10	82.24	83.64	71.30	78.66	Ma et al., 2018
RTMPose	Cspnext	74.01	87.25	82.64	81.06	69.41	76.40	Jiang et al., 2023

方法 Method	主干网络 Backbone network	AP (%)	AP⁵⁰ (%)	AP⁷⁵ (%)	AP^M (%)	AP^L (%)	AR (%)	参考文献 References
SCD-HRNet	HRNet-W48	82.61	85.19	84.39	88.15	80.48	84.68	本研究 This study
HRNet	HRNet-W48	81.47	85.11	83.24	86.64	78.74	83.43	Sun et al., 2019
Simple Baseline	ResNet-50	81.45	85.66	83.26	87.60	78.58	83.14	Xiao et al., 2018
Simple Baseline	ResNet-101	80.81	85.28	83.40	87.08	77.87	82.54	Xiao et al., 2018
HRFormer	HRFormer	80.72	86.95	85.17	85.52	77.97	82.82	Yuan et al., 2021
RSN	RSN-18	80.62	86.56	84.38	85.68	77.78	83.57	Cai et al., 2020
ViTPose	ViT-base	78.08	86.92	83.89	85.50	74.47	80.58	Xu et al., 2022
MobileNet	MobileNetV2	77.47	87.26	84.09	83.75	73.54	80.11	Sandler et al., 2018
Shufflenet	ShufflenetV1	76.98	87.00	83.83	84.25	72.68	79.84	Zhang et al., 2018
Shufflenet	ShufflenetV2	75.82	87.10	82.24	83.64	71.30	78.66	Ma et al., 2018
RTMPose	Cspnext	74.01	87.25	82.64	81.06	69.41	76.40	Jiang et al., 2023

表3 基于AP-10K数据集的泛化性测试结果。在AP-10K数据集上评估泛化能力, 各方法均在AP-10K官方train/val划分上从头训练。AP: 平均精度; AP50: 目标关键点相似度(OKS)为0.50时计算得到的平均精度; AP75: OKS为0.75时计算得到的平均精度; APM: 中等大小物体的平均精度; APL: 大型大小物体的平均精度; AR: 召回率。

Table 3 Generalization test results based on the AP-10K dataset. We evaluate generalization on the AP-10K dataset, all methods are trained from scratch on the official AP-10K train/val split. AP, Average precision; AP50, Average precision calculated at an object keypoint similarity (OKS) threshold of 0.50; AP75: Average precision calculated at an OKS threshold of 0.75; APM, Average precision for medium objects; APL, Average precision for large objects; AR, Average recall.

方法 Method	主干网络 Backbone network	AP (%)	AP⁵⁰ (%)	AP⁷⁵ (%)	AP^M (%)	AP^L (%)	AR (%)
SCD-HRNet	HRNet-W48	69.79	93.55	77.70	51.98	70.13	73.17
HRNet	HRNet-W48	67.88	92.10	72.97	51.26	68.33	72.04
Simple Baseline	ResNet-50	64.24	91.44	69.83	47.60	64.53	68.03
Simple Baseline	ResNet-101	62.42	90.03	65.61	44.38	62.74	66.53
HRFormer	HRFormer	58.95	89.39	61.51	40.93	59.35	63.62
RSN	RSN-18	58.74	88.28	61.44	47.14	59.01	62.88
RTMPose	Cspnext	54.75	88.04	55.61	44.20	54.99	58.90
MobileNet	MobileNetV2	50.49	83.70	49.94	39.11	50.73	54.98
Shufflenet	ShufflenetV2	46.68	82.49	43.64	30.37	47.02	51.67
Shufflenet	ShufflenetV1	46.59	81.90	45.80	34.82	46.83	51.61
ViTPose	ViT-base	46.47	79.92	46.18	31.81	46.64	50.35

表4 不同方法在自建数据集和AP-10K数据集的随机种子实验结果(均值±标准差)。本表展示了在两个野生动物姿态估计数据集上, 不同网络在5次随机种子实验下的平均性能及标准差, 用以反映模型性能的稳定性与波动情况。AP: 平均精度; AP50: 目标关键点相似度(OKS)为0.50时计算得到的平均精度; AP75: OKS为0.75时计算得到的平均精度; APM: 中等大小物体的平均精度; APL: 大型大小物体的平均精度; AR: 召回率。

Table 4 Experimental results of different methods on the self-built dataset and the AP-10K dataset (mean±SD).This table presents the average performance and standard deviation of various networks over five random seed experiments on two wildlife pose estimation datasets, reflecting the stability and variability of model performance. AP, Average precision; AP50, Average precision calculated at an object keypoint similarity (OKS) threshold of 0.50; AP75: Average precision calculated at an OKS threshold of 0.75; APM, Average precision for medium objects; APL, Average precision for large objects; AR, Average recall.

实验类型 Type of experiment	方法 Method	主干网络 Backbone network	AP (%)	AP⁵⁰ (%)	AP⁷⁵ (%)	AP^M (%)	AP^L (%)	AR (%)
自建数据集 Self-constructed dataset	SCD-HRNet	HRNet-W48	82.63±0.07	85.21±0.27	84.16±0.17	88.10±0.11	80.51±0.50	84.47±0.14
	HRNet	HRNet-W48	81.59±0.16	85.50±0.23	83.38±0.14	86.96±0.59	78.87±0.35	83.45±0.12
	Simple Baseline	ResNet-50	81.23±0.19	85.29±0.22	83.30±0.28	87.32±0.34	78.62±0.27	83.03±0.10
	Simple Baseline	ResNet-101	80.48±0.10	85.20±0.15	82.90±0.26	87.09±0.26	77.22±0.19	82.41±0.08
	HRFormer	HRFormer	80.54±0.18	87.22±0.22	85.21±0.19	86.13±0.28	77.35±0.28	82.72±0.11
AP-10K数据集 AP-10K Dataset	SCD-HRNet	HRNet-W48	69.87±0.41	94.09±0.42	75.71±1.01	54.72±2.60	70.16±0.45	73.31±0.38
	HRNet	HRNet-W48	67.53±0.31	93.03±0.72	72.64±0.66	53.14±4.00	67.81 ±0.28	71.33±0.27
	Simple Baseline	ResNet-50	63.92±0.19	90.78±0.34	69.53±0.91	46.86±3.87	64.24±0.15	67.83±0.14
	Simple Baseline	ResNet-101	62.62±0.30	90.19±0.53	66.87±1.22	47.75±3.55	62.91±0.25	66.67±0.30
	HRFormer	HRFormer	59.27±0.44	89.42±0.58	61.76±0.32	47.71±1.35	59.58±0.47	63.84±0.36

实验类型 Type of experiment	方法 Method	主干网络 Backbone network	AP (%)	AP⁵⁰ (%)	AP⁷⁵ (%)	AP^M (%)	AP^L (%)	AR (%)
自建数据集 Self-constructed dataset	SCD-HRNet	HRNet-W48	82.63±0.07	85.21±0.27	84.16±0.17	88.10±0.11	80.51±0.50	84.47±0.14
	HRNet	HRNet-W48	81.59±0.16	85.50±0.23	83.38±0.14	86.96±0.59	78.87±0.35	83.45±0.12
	Simple Baseline	ResNet-50	81.23±0.19	85.29±0.22	83.30±0.28	87.32±0.34	78.62±0.27	83.03±0.10
	Simple Baseline	ResNet-101	80.48±0.10	85.20±0.15	82.90±0.26	87.09±0.26	77.22±0.19	82.41±0.08
	HRFormer	HRFormer	80.54±0.18	87.22±0.22	85.21±0.19	86.13±0.28	77.35±0.28	82.72±0.11
AP-10K数据集 AP-10K Dataset	SCD-HRNet	HRNet-W48	69.87±0.41	94.09±0.42	75.71±1.01	54.72±2.60	70.16±0.45	73.31±0.38
	HRNet	HRNet-W48	67.53±0.31	93.03±0.72	72.64±0.66	53.14±4.00	67.81 ±0.28	71.33±0.27
	Simple Baseline	ResNet-50	63.92±0.19	90.78±0.34	69.53±0.91	46.86±3.87	64.24±0.15	67.83±0.14
	Simple Baseline	ResNet-101	62.62±0.30	90.19±0.53	66.87±1.22	47.75±3.55	62.91±0.25	66.67±0.30
	HRFormer	HRFormer	59.27±0.44	89.42±0.58	61.76±0.32	47.71±1.35	59.58±0.47	63.84±0.36

表5 不同方法相对于HRNet在自建数据集和AP-10K数据集上的统计显著性与效应量分析。ΔAP表示相较基线HRNet的平均精度(AP)差值, 95%置信区间(CI)为该差值的置信区间, 若区间不跨零则差异显著; P值用于判断差异的统计学显著性, 常用阈值为P<0.05 (显著)与P<0.001 (极显著); dn为配对效应量(Cohen’s dn), 用于衡量改进幅度的大小, 0.2/0.5/0.8分别对应小/中/大效应。正值表示性能提升, 负值表示性能下降。

Table 5 Statistical significance and effect size analysis of different methods compared with HRNet on the self-constructed dataset and the AP-10K dataset. ∆AP denotes the difference in average precision (AP) compared with the baseline HRNet. The 95% CI represents the confidence interval (CI) of this difference, where a non-zero-crossing interval indicates statistical significance. The P-value assesses the statistical significance of the difference, with commonly used thresholds of P<0.05 (significant) and P<0.001 (highly significant). dn denotes the paired effect size (Cohen’s dn), which measures the magnitude of improvement, where 0.2/0.5/0.8 correspond to small/medium/large effects, respectively. Positive values indicate performance improvement, while negative values indicate performance degradation.

实验类型 Type of experiment	方法 Method	ΔAP (%)	95% CI of ΔAP	P	d_n
自建数据集vs HRNet Self-constructed dataset vs HRNet	HRNet	-	-	-	-
	ResNet-50	-0.366	[-0.794, +0.062]	0.077	-1.06
	ResNet-101	-1.112	[-1.344, -0.880]	< 0.001	-5.95
	HRFormer	-1.056	[-1.257, -0.855]	< 0.001	-6.51
	SCD-HRNet	+1.040	[+0.855, +1.225]	< 0.001	+6.99
AP-10K数据集vs HRNet AP-10K dataset vs HRNet	HRNet	-	-	-	-
	ResNet-50	-3.618	[-4.029, -3.207]	< 0.001	-10.94
	ResNet-101	-4.912	[-5.482, -4.342]	< 0.001	-10.70
	HRFormer	-8.266	[-9.057, -7.475]	< 0.001	-12.98
	SCD-HRNet	+2.334	[+1.744, +2.924]	< 0.001	+4.91

实验类型 Type of experiment	方法 Method	ΔAP (%)	95% CI of ΔAP	P	d_n
自建数据集vs HRNet Self-constructed dataset vs HRNet	HRNet	-	-	-	-
	ResNet-50	-0.366	[-0.794, +0.062]	0.077	-1.06
	ResNet-101	-1.112	[-1.344, -0.880]	< 0.001	-5.95
	HRFormer	-1.056	[-1.257, -0.855]	< 0.001	-6.51
	SCD-HRNet	+1.040	[+0.855, +1.225]	< 0.001	+6.99
AP-10K数据集vs HRNet AP-10K dataset vs HRNet	HRNet	-	-	-	-
	ResNet-50	-3.618	[-4.029, -3.207]	< 0.001	-10.94
	ResNet-101	-4.912	[-5.482, -4.342]	< 0.001	-10.70
	HRFormer	-8.266	[-9.057, -7.475]	< 0.001	-12.98
	SCD-HRNet	+2.334	[+1.744, +2.924]	< 0.001	+4.91

表6 各方法的每秒十亿次浮点运算(GFLOPs)、参数规模(Params)与每秒帧数(FPS)对比。GFLOPs反映模型在一次前向推理中的理论计算量; Params表示模型可训练参数数量, 通常与存储需求和训练难度相关; FPS表示在测试环境下的平均实时处理帧率, 用于衡量模型的运行速度和部署可行性。较低的GFLOPs与参数规模通常意味着更轻量的模型, 而较高的FPS表示模型更适合实时应用场景。

Table 6 A comparison of Giga floating-point operations per second (GFLOPs), parameter size (Params), and frames per second (FPS) across different methods. GFLOPs reflects the theoretical computational load of a single forward pass. Params indicates the number of trainable parameters, which is generally related to storage requirements and training difficulty. FPS denotes the average real-time processing frame rate under the test environment, gauging the model’s operational speed and deployment feasibility. Lower GFLOPs and parameter size typically indicate a lighter model, while a higher FPS suggests the model is more suitable for real-time application scenarios.

方法 Method	主干网络 Backbone network	每秒十亿次浮点运算GFLOPs	参数规模 Params	每秒帧数 FPS
HRNet	HRNet-W48	20.99 G	63.60 M	25.97
Simple Baseline	ResNet-50	7.27 G	34.00 M	146.15
Simple Baseline	ResNet-101	12.13 G	52.99 M	83.91
MobileNet	MobileNetV2	2.11 G	9.57 M	163.86
RSN	RSN-18	3.02 G	9.15 M	61.75
Shufflenet	ShufflenetV1	1.80 G	6.94 M	138.52
Shufflenet	ShufflenetV2	1.82 G	7.55 M	141.11
RTMPose	Cspnext	2.57 G	13.62 M	106.11
HRFormer	HRFormer	19.50 G	43.22 M	16.10
ViTPose	ViT-base	25.03 G	89.99 M	37.38
HRNet+SE	HRNet-W48	21.00 G	63.60 M	27.73
HRNet+CA	HRNet-W48	21.00 G	63.60 M	28.60
HRNet+DCS	HRNet-W48	24.03 G	64.34 M	28.99
HRNet+SE+CA	HRNet-W48	21.00 G	63.60 M	27.24
HRNet+SE+DCS	HRNet-W48	24.03 G	64.35 M	29.01
HRNet+CA+DCS	HRNet-W48	24.03 G	64.34 M	29.38
SCD-HRNet	HRNet-W48	24.03 G	64.35 M	27.56

方法 Method	主干网络 Backbone network	每秒十亿次浮点运算GFLOPs	参数规模 Params	每秒帧数 FPS
HRNet	HRNet-W48	20.99 G	63.60 M	25.97
Simple Baseline	ResNet-50	7.27 G	34.00 M	146.15
Simple Baseline	ResNet-101	12.13 G	52.99 M	83.91
MobileNet	MobileNetV2	2.11 G	9.57 M	163.86
RSN	RSN-18	3.02 G	9.15 M	61.75
Shufflenet	ShufflenetV1	1.80 G	6.94 M	138.52
Shufflenet	ShufflenetV2	1.82 G	7.55 M	141.11
RTMPose	Cspnext	2.57 G	13.62 M	106.11
HRFormer	HRFormer	19.50 G	43.22 M	16.10
ViTPose	ViT-base	25.03 G	89.99 M	37.38
HRNet+SE	HRNet-W48	21.00 G	63.60 M	27.73
HRNet+CA	HRNet-W48	21.00 G	63.60 M	28.60
HRNet+DCS	HRNet-W48	24.03 G	64.34 M	28.99
HRNet+SE+CA	HRNet-W48	21.00 G	63.60 M	27.24
HRNet+SE+DCS	HRNet-W48	24.03 G	64.35 M	29.01
HRNet+CA+DCS	HRNet-W48	24.03 G	64.34 M	29.38
SCD-HRNet	HRNet-W48	24.03 G	64.35 M	27.56

参考文献 32

[1]	An L, Ren JL, Yu T, Hai T, Jia YC, Liu YB (2023) Three-dimensional surface motion capture of multiple freely moving pigs using MAMMAL. Nature Communications, 14, 7727. DOI PMID
[2]	Barney S, Dlay S, Crowe A, Kyriazakis I, Leach M (2023) Deep learning pose estimation for multi-cattle lameness detection. Scientific Reports, 13, 4499. DOI PMID
[3]	Cai JM, He PY, Yang ZP, Li LY, Zhao QJ, Pan F (2023) A deep feature fusion-based method for bird sound recognition and its interpretability analysis. Biodiversity Science, 31, 23087.(in Chinese with English abstract) DOI
	[蔡建民, 何培宇, 杨智鹏, 李露莹, 赵启军, 潘帆 (2023) 基于深度特征融合的鸟鸣识别方法及其可解释性分析. 生物多样性, 31, 23087.] DOI
[4]	Cai YH, Wang ZC, Luo ZX, Yin BY, Du AG, Wang HQ, Zhang XY, Zhou XY, Zhou EJ, Sun J (2020) Learning delicate local representations for multi-person pose estimation. ArXiv, doi: 10.48550/arXiv.2003.04030.
[5]	Cao JK, Tang HY, Fang HS, Shen XY, Tai YW, Lu CW (2019) Cross-domain adaptation for animal pose estimation. arXiv, doi: 10.48550/arXiv.1908.05806.
[6]	Cheng G, Yuan X, Yao XW, Yan KB, Zeng QH, Xie XX, Han JW (2023) Towards large-scale small object detection: Survey and benchmarks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 13467-13488.
[7]	Ding YH, Liang J, Jiang B, Zheng AH, He R (2024) MAPS: A noise-robust progressive learning approach for source-free domain adaptive keypoint detection. IEEE Transactions on Circuits and Systems for Video Technology, 34, 1376-1387. DOI URL
[8]	Han YN, Chen K, Wang YK, Liu WH, Wang ZW, Wang XJ, Han CL, Liao JH, Huang K, Cai SY, Huang YT, Wang N, Li JX, Song Y, Li J, Wang GD, Wang LP, Zhang YP, Wei PF (2024) Multi-animal 3D social pose estimation, identification and behaviour embedding with a few-shot learning framework. Nature Machine Intelligence, 6, 48-61. DOI
[9]	Hou QB, Zhou DQ, Feng JS (2021) Coordinate attention for efficient mobile network design. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Nashville, TN.
[10]	Hu B, Seybold B, Yang S, Sud A, Liu Y, Barron K, Cha P, Cosino M, Karlsson E, Kite J, Kolumam G, Preciado J, Zavala-Solorio J, Zhang CL, Zhang XM, Voorbach M, Tovcimak AE, Ruby JG, Ross DA (2023) 3D mouse pose from single-view video and a new dataset. Scientific Reports, 13, 13554. DOI PMID
[11]	Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018). IEEE, New York.
[12]	Huang K, Han YN, Chen K, Pan HL, Zhao GY, Yi WL, Li XX, Liu SY, Wei PF, Wang LP (2021) A hierarchical 3D-motion learning framework for animal spontaneous behavior mapping. Nature Communications, 12, 2784.
[13]	Jiang T, Lu P, Zhang L, Ma NS, Han R, Lyu CQ, Li YN, Chen K (2023) RTMPose: Real-time multi-person pose estimation based on MMPose. arXiv, doi: 10.48550/arXiv.2303.07399.
[14]	Kirillov A, Wu YX, He KM, Girshick R (2020) PointRend: Image segmentation as rendering. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA.
[15]	Lauer J, Zhou M, Ye SK, Menegas W, Schneider S, Nath T, Rahman MM, Di Santo V, Soberanes D, Feng GP, Murthy VN, Lauder G, Dulac C, Mathis MW, Mathis A (2022) Multi-animal pose estimation, identification and tracking with DeepLabCut. Nature Methods, 19, 496-504. DOI PMID
[16]	Li SY, Liu K, Wang H, Yang R, Li XZ, Sun YQ, Zhong RT, Wang W, Li Y, Sun YJ, Wang GH (2025) Pose estimation and tracking dataset for multi-animal behavior analysis on the China Space Station. Scientific Data, 12, 766. DOI
[17]	Liu H, Pan SG, Wu PB, Yu KG, Gao W, Yu BG (2024) Uncertainty-aware UWB/LiDAR/INS tightly coupled fusion pose estimation via filtering approach. IEEE Sensors Journal, 24, 11113-11126. DOI URL
[18]	Ma NN, Zhang XY, Zheng HT, Sun J (2018) ShuffleNet V2: Practical guidelines for efficient CNN architecture design. arXiv, doi: 10.48550/arXiv.1807.11164.
[19]	Mokany K, Ware C, Harwood TD, Schmidt RK, Ferrier S (2022) Habitat-based biodiversity assessment for ecosystem accounting in the Murray-Darling Basin. Conservation Biology, 36, e13915. DOI URL
[20]	Sagar ASMS, Islam MZ, Tanveer J, Kim HS (2025) Uncertainty-aware adaptive multiscale U-Net for low-contrast cardiac image segmentation. Applied Sciences, 15, 2222. DOI URL
[21]	Sandler M, Howard A, Zhu ML, Zhmoginov A, Chen LC (2018) MobileNetV2: Inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT.
[22]	Sun K, Xiao B, Liu D, Wang JD (2019) Deep high-resolution representation learning for human pose estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA.
[23]	Xiao B, Wu HP, Wei YC (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV) (eds Ferrari V, Hebert M, Sminchisescu C, Weiss Y). Springer, Cham.
[24]	Xu GD, Xu Y, Deng H, Mo H (2023) Research on multi-target animal pose estimation based on improved high resolution network. Computer Engineering and Applications, 59(22), 182-192.(in Chinese with English abstract) DOI
	[徐贵冬, 徐杨, 邓辉, 莫寒 (2023) 改进高分辨率网络的多目标动物姿态估计研究. 计算机工程与应用, 59(22), 182-192.] DOI
[25]	Xu Y, Zhang J, Zhang Q, Tao D (2022) ViTPose: Simple vision transformer baselines for human pose estimation. Advances in Neural Information Processing Systems, 35, 38571-38584.
[26]	Ye SK, Filippova A, Lauer J, Schneider S, Vidal M, Qiu T, Mathis A, Mathis MW (2024) SuperAnimal pretrained pose estimation models for behavioral analysis. Nature Communications, 15, 5165. DOI PMID
[27]	Yin ZX, Zhao YQ, Xu ZH, Yu QP (2024) Automatic detection of stereotypical behaviors of captive wild animals based on surveillance videos of zoos and animal reserves. Ecological Informatics, 79, 102450. DOI URL
[28]	Yu H, Xu YF, Zhang J, Zhao W, Guan ZY, Tao DC (2021) AP-10K: A benchmark for animal pose estimation in the wild. arXiv, doi: 10.48550/arXiv.2108.12617.
[29]	Yuan YH, Fu R, Huang L, Lin WH, Zhang C, Chen XL, Wang JD (2021) HRFormer: High-resolution transformer for dense prediction. arXiv, doi: 10.48550/arXiv.2110.09408.
[30]	Zhang JG, Cheng ZA, Hu CH, Chen C, Bao WD (2018) Adaptive image enhancement algorithm for wild animal monitoring based on Retinex theory. Transactions of the Chinese Society of Agricultural Engineering, 34(15), 183-189.(in Chinese with English abstract)
	[张军国, 程浙安, 胡春鹤, 陈宸, 鲍伟东 (2018) 野生动物监测光照自适应Retinex图像增强算法. 农业工程学报, 34(15), 183-189.]
[31]	Zhang WW, Xu Y, Bai R, Chen N (2023) Animal pose estimation based on improved stacked hourglass network. Computer Engineering, 49(2), 263-270.(in Chinese with English abstract) DOI
	[张雯雯, 徐杨, 白芮, 陈娜 (2023) 基于改进堆叠沙漏网络的动物姿态估计. 计算机工程, 49(2), 263-270.] DOI
[32]	Zhang XY, Zhou XY, Lin MX, Sun J (2018) ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT.

基于SCD-HRNet模型的野生动物姿态估计及其在生物多样性监测中的应用: 以内蒙古赛罕乌拉地区为例

Wildlife pose estimation based on the SCD-HRNet model and its application in biodiversity monitoring: A case study of the Saihanwula Region, Inner Mongolia

RichHTML

PDF (PC)

补充材料

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 32

相关文章 15

编辑推荐

Metrics

本文评价

[1]	王倩倩, 陈孝国, 朱锐丰, 张明春, 王新, 李世林, 仁增江措, 彭武, 杨彪. 藏东南黑麝及其同域有蹄类物种的生态适应性差异[J]. 生物多样性, 2026, 34(5): 25433-.
[2]	周丽洁, 郝珉辉, 何怀江, 程艳霞, 张春雨, 赵秀海. 小兴安岭森林β多样性格局、组分及其影响因素[J]. 生物多样性, 2026, 34(4): 25443-.
[3]	纪林, 邓宸迅, 王丽凤, 王德港, 王建涛, 于永永, 张军国. 基于Diff-SCC模型的偏态分布野生动物识别方法[J]. 生物多样性, 2026, 34(2): 25256-.
[4]	卢晓强, 芮丹, 张江峰, 尹冰鑫, 王雨露, 岑雨婷, 崔怡晨, 杨万霞. 氮输入驱动的关键生态过程对生物多样性的影响及其管理启示[J]. 生物多样性, 2026, 34(2): 25368-.
[5]	田璐瑶, 尹豪. 国外生物多样性抵消研究现状和对策[J]. 生物多样性, 2026, 34(1): 25187-.
[6]	杨方义, 靳彤, 申小莉, 张立, 杨彪. 生物多样性公益捐赠对《中国生物多样性保护战略与行动计划(2023‒2030年)》的贡献[J]. 生物多样性, 2026, 34(1): 25269-.
[7]	田璐瑶, 尹豪. 基于生物多样性保护的我国生态铁路现状和对策研究[J]. 生物多样性, 2025, 33(8): 24495-.
[8]	郑俊妮, 尚袁凌博, 罗堯, 魏营, 高志伟, 周宗泽, 廖凌娟, 杨道德. 地方重点保护野生动物名录调整方法探究: 以湖南省陆生脊椎动物为例[J]. 生物多样性, 2025, 33(8): 25055-.
[9]	毛静, 王婧, 黄杰, 熊姝红, 张自亮, 张佑祥, 吴涛. 湖南高望界国家级自然保护区2021-2023年鸟兽多样性监测数据集[J]. 生物多样性, 2025, 33(6): 24489-.
[10]	明玥, 郝培尧, 谭铃千, 郑曦. 基于城市绿色高质量发展理念的中国城市生物多样性保护与提升[J]. 生物多样性, 2025, 33(5): 24524-.
[11]	王欣, 鲍风宇. 基于鸟类多样性提升的南滇池国家湿地公园生态修复效果[J]. 生物多样性, 2025, 33(5): 24531-.
[12]	祝晓雨, 王晨灏, 王忠君, 张玉钧. 城市绿地生物多样性研究进展与展望[J]. 生物多样性, 2025, 33(5): 25027-.
[13]	卢晓强, 董姗姗, 马月, 徐徐, 邱凤, 臧明月, 万雅琼, 李孪鑫, 于赐刚, 刘燕. 前沿技术在生物多样性研究中的应用现状、挑战与展望[J]. 生物多样性, 2025, 33(4): 24440-.
[14]	刘立, 臧明月, 马月, 万雅琼, 胡飞龙, 卢晓强, 刘燕. 央地协同推动国家生物多样性战略和行动计划执行的措施、进展与展望[J]. 生物多样性, 2025, 33(3): 24532-.
[15]	周志华, 金效华, 罗颖, 李迪强, 岳建兵, 刘芳, 何拓, 李希, 董晖, 罗鹏. 中国林草部门落实《昆明-蒙特利尔全球生物多样性框架》的机制、成效分析及建议[J]. 生物多样性, 2025, 33(3): 24487-.