生物多样性

• • 上一篇    下一篇

基于YOLO-DAS模型的湿地水鸟检测方法: 以内蒙古南海子地区为例

孙济伦1,2,3,谢将剑1,2,3,张长春1,2,3,张军国1,2,3*   

  1. 1. 北京林业大学工学院, 北京 100083; 2. 林木资源高效生产全国重点实验室, 北京 100083; 3. 北京林业大学生物多样性智慧监测研究中心, 北京 100083
  • 收稿日期:2025-06-20 修回日期:2025-07-29 接受日期:2025-10-08
  • 通讯作者: 张军国
  • 基金资助:
    开放环境野生动物监测图像增量学习识别机制及方法研究(32371874); 视听多模态特征融合的野生鸟类目标检测理论与方法(5252014); 陆生野生动物及其生境多模态智慧监测共性关键技术(2024XY-G002); 湿地水鸟监测图像开放集域适应识别机制及方法(32401569)

Wetland waterbird detection method based on the YOLO-DAS model: A case study of the Nanhaizi Region in Inner Mongolia

Jilun Sun1,2,3, Jiangjian Xie1,2,3, Changchun Zhang1,2,3, Junguo Zhang1,2,3*   

  1. 1. School of Technology, Beijing Forestry University, Beijing 100083, China; 

    2. State Key Laboratory of Efficient Production of Forest Resources, Beijing 100083, China; 

    3. Research Center for Biodiversity Intelligent Monitoring, Beijing Forestry University, Beijing 100083, China

  • Received:2025-06-20 Revised:2025-07-29 Accepted:2025-10-08
  • Contact: Junguo Zhang
  • Supported by:
    Research on Incremental Learning Recognition Mechanisms and Methods for Monitoring Images of Wild Animals in Open Environments(32371874); Theory and method for wild bird object detection by audio-visual multimodal feature fusion(5252014); Common key technologies for multimodal intelligent monitoring of terrestrial wildlife and their habitats(2024XY-G002); Adaptive recognition mechanism and method for open image domain monitoring of wetland waterbirds(32401569)

摘要: 湿地水鸟监测保护对于生物多样性及湿地保护具有重要意义。随着计算机视觉技术的广泛应用, 利用深度学习模型进行鸟类图像检测已成为鸟类保护的重要手段。实际湿地水鸟监测图像中存在背景信息复杂、类间特征相似、前景遮挡及目标尺度差异等问题, 使得模型检测性能不足。针对以上问题, 本研究建立了包含内蒙古南海子湿地111种水鸟27,030张图像的自建数据集Bird111, 并提出一种基于YOLO-DAS的湿地水鸟目标检测算法。首先, 融合可变形注意力机制(deformable attention, DAT), 自适应地关注图像中的重要区域, 提高网络的特征提取能力, 避免复杂背景以及相似特征的影响; 然后, 利用自适应空间特征融合(adaptively spatial feature fusion, ASFF), 对所提取的不同尺度特征中的冲突信息进行过滤以增强尺度不变性, 提高模型对多尺度鸟类目标的响应能力; 最后, 引入SlideLoss损失函数, 增加训练过程中对困难样本的关注, 提高对小目标和受遮挡目标的检测性能。实验结果表明, YOLO-DAS模型在自建Bird111数据集上相较于其他主流方法拥有最优的检测性能, 其精确率、召回率及平均检测精度均值较基线模型分别提升4%、2.4%和2.9%, 同时在CUB200-2011、Birdsnap和NABirds公开数据集上具有良好的泛化性能。本文所提出的YOLO-DAS模型能够有效提高复杂背景下的小目标或受遮挡鸟类的检测性能, 为湿地水鸟监测工作中不同鸟类目标尺度的图像检测提供了有效的技术方法。

关键词: 湿地水鸟检测, YOLOv8, 可变形注意力, 自适应空间特征融合, 损失函数

Abstract

Aims: Monitoring and conserving wetland waterbirds is crucial for biodiversity and wetland protection. With the widespread application of computer-vision techniques, bird image detection using deep-learning models has become an important tool for bird conservation. However, real wetland monitoring images often contain complex backgrounds, inter-class feature similarity, foreground occlusion, and large target-scale variation, which impair detection performance. 

Methods: To address these challenges, we built a dataset (Bird111) comprising 27,030 images of 111 waterbird species from the Nanhaizi Wetland, Inner Mongolia, and we propose a wetland waterbird detection algorithm based on YOLO-DAS. First, a DAT deformable-attention mechanism is integrated to adaptively focus on important image regions, improving the network’s feature-extraction ability and reducing the influence of complex backgrounds and similar features. Second, adaptive spatial feature fusion (ASFF) is applied to filter conflicting multi-scale feature information, enhancing scale invariance and the model’s responsiveness to multiscale bird targets. Finally, the SlideLoss loss function is introduced to increase emphasis on difficult samples during training and to improve detection of small and occluded targets. 

Results: Experiments show that the YOLO-DAS model achieves the best detection performance on the Bird111 dataset compared with other mainstream methods. Mean precision, recall, and mean average precision improved by 4.0%, 2.4%, and 2.9%, respectively, relative to the baseline model. The model also generalized well to public datasets (CUB-200-2011, Birdsnap, and NABirds). 

Conclusion: The YOLO-DAS model proposed here can effectively improve detection of small or occluded birds in complex backgrounds, and provides a practical technical approach for multi-scale bird detection in wetland monitoring.

Key words: wetland waterbird detection, YOLOv8, deformable attention, adaptive spatial feature fusion, loss function