生物多样性 ›› 2025, Vol. 33 ›› Issue (9): 25237.  DOI: 10.17520/biods.2025237  cstr: 32101.14.biods.2025237

• 技术与方法 • 上一篇    下一篇

基于YOLO-DAS模型的湿地水鸟检测方法: 以内蒙古南海子湿地为例

孙济伦1,2,3, 谢将剑1,2,3(), 张长春1,2,3, 张军国1,2,3,*()   

  1. 1.北京林业大学工学院, 北京 100083
    2.林木资源高效生产全国重点实验室, 北京 100083
    3.北京林业大学生物多样性智慧监测研究中心, 北京 100083
  • 收稿日期:2025-06-20 接受日期:2025-08-08 出版日期:2025-09-20 发布日期:2025-10-31
  • 通讯作者: *E-mail: zhangjunguo@bjfu.edu.cn
  • 基金资助:
    国家自然科学基金(32371874);国家自然科学基金(32401569);北京市自然科学基金(5252014);北京林业大学科技创新计划项目(2024XY-G002)

Wetland waterbird detection method based on the YOLO-DAS model: A case study of the Nanhaizi Wetland in Inner Mongolia

Jilun Sun1,2,3, Jiangjian Xie1,2,3(), Changchun Zhang1,2,3, Junguo Zhang1,2,3,*()   

  1. 1 School of Technology, Beijing Forestry University, Beijing 100083, China
    2 State Key Laboratory of Efficient Production of Forest Resources, Beijing 100083, China
    3 Research Center for Biodiversity Intelligent Monitoring, Beijing Forestry University, Beijing 100083, China
  • Received:2025-06-20 Accepted:2025-08-08 Online:2025-09-20 Published:2025-10-31
  • Contact: *E-mail: zhangjunguo@bjfu.edu.cn
  • Supported by:
    National Natural Science Foundation of China(32371874);National Natural Science Foundation of China(32401569);Beijing Natural Science Foundation(5252014);Beijing Forestry University Science and Technology Innovation Program(2024XY-G002)

摘要:

湿地水鸟监测对于生物多样性及湿地保护具有重要意义。随着计算机视觉技术的广泛应用, 利用深度学习模型进行鸟类图像检测已成为鸟类保护的重要手段。实际湿地水鸟监测图像中存在背景信息复杂、类间特征相似、前景遮挡及目标尺度差异等问题, 使得模型检测性能不足。针对以上问题, 本研究建立了包含内蒙古南海子湿地111种水鸟27,030张图像的自建数据集Bird111, 并提出一种基于YOLO-DAS的湿地水鸟目标检测算法。首先, 融合可变形注意力机制(deformable attention, DAT), 自适应地关注图像中的重要区域, 提高网络的特征提取能力, 避免复杂背景以及相似特征的影响; 然后, 利用自适应空间特征融合(adaptively spatial feature fusion, ASFF), 对所提取的不同尺度特征中的冲突信息进行过滤以增强尺度不变性, 提高模型对多尺度鸟类目标的响应能力; 最后, 引入SlideLoss损失函数, 增加训练过程中对困难样本的关注, 提高对小目标和受遮挡目标的检测性能。实验结果表明, YOLO-DAS模型在自建Bird111数据集上相较于其他主流方法拥有最优的检测性能, 其精确率、召回率及平均检测精度均值较基线模型分别提升4%、2.4%和2.9%, 同时在CUB200-2011、Birdsnap和NABirds公开数据集上具有良好的泛化性能。本文所提出的YOLO-DAS模型能够有效提高复杂背景下的小目标或受遮挡鸟类的检测性能, 为湿地水鸟监测工作中不同鸟类目标尺度的图像检测提供了有效的技术方法。

关键词: 湿地水鸟检测, YOLOv8, 可变形注意力, 自适应空间特征融合, 损失函数

Abstract

Aims: Wetland waterbird monitoring is of great significance for biodiversity and wetland conservation. With the widespread application of computer-vision techniques, bird image detection using deep-learning models has become an important tool for bird conservation. However, real wetland monitoring images often contain complex backgrounds, inter-class feature similarity, foreground occlusion, and large target-scale variation, which impair detection performance.

Methods: To overcome these limitations, we built a dataset (Bird111) comprising 27,030 images of 111 waterbird species from the Nanhaizi Wetland, Inner Mongolia, and we propose a wetland waterbird detection algorithm based on YOLO-DAS. First, deformable attention mechanism (DAT) is integrated to adaptively focus on important image regions, improving the network’s feature-extraction ability and reducing the influence of complex backgrounds and similar features. Second, adaptively spatial feature fusion (ASFF) is applied to filter conflicting multi-scale feature information, enhancing scale invariance and the model’s responsiveness to multiscale bird targets. Finally, the SlideLoss function is introduced to increase emphasis on difficult samples during training and to improve detection of small and occluded targets.

Results: Experiments show that the YOLO-DAS model achieves the best detection performance on the Bird111 dataset compared with other mainstream methods. Its precision, recall, and mean average precision improved by 4.0%, 2.4%, and 2.9%, respectively, relative to the baseline model. The model also generalized well to public datasets (CUB- 200-2011, Birdsnap, and NABirds).

Conclusion: The YOLO-DAS model proposed here can effectively improve detection of small or occluded birds in complex backgrounds, and provides a practical technical approach for multi-scale bird detection in wetland monitoring.

Key words: wetland waterbird detection, YOLOv8, deformable attention, adaptively spatial feature fusion, loss function