Biodiv Sci

Previous Articles     Next Articles

Wetland waterbird detection method based on the YOLO-DAS model: A case study of the Nanhaizi Region in Inner Mongolia

Jilun Sun1,2,3, Jiangjian Xie1,2,3, Changchun Zhang1,2,3, Junguo Zhang1,2,3*   

  1. 1. School of Technology, Beijing Forestry University, Beijing 100083, China; 

    2. State Key Laboratory of Efficient Production of Forest Resources, Beijing 100083, China; 

    3. Research Center for Biodiversity Intelligent Monitoring, Beijing Forestry University, Beijing 100083, China

  • Received:2025-06-20 Revised:2025-07-29 Accepted:2025-10-08
  • Contact: Junguo Zhang
  • Supported by:
    Research on Incremental Learning Recognition Mechanisms and Methods for Monitoring Images of Wild Animals in Open Environments(32371874); Theory and method for wild bird object detection by audio-visual multimodal feature fusion(5252014); Common key technologies for multimodal intelligent monitoring of terrestrial wildlife and their habitats(2024XY-G002); Adaptive recognition mechanism and method for open image domain monitoring of wetland waterbirds(32401569)

Abstract:

Aims: Monitoring and conserving wetland waterbirds is crucial for biodiversity and wetland protection. With the widespread application of computer-vision techniques, bird image detection using deep-learning models has become an important tool for bird conservation. However, real wetland monitoring images often contain complex backgrounds, inter-class feature similarity, foreground occlusion, and large target-scale variation, which impair detection performance. 

Methods: To address these challenges, we built a dataset (Bird111) comprising 27,030 images of 111 waterbird species from the Nanhaizi Wetland, Inner Mongolia, and we propose a wetland waterbird detection algorithm based on YOLO-DAS. First, a DAT deformable-attention mechanism is integrated to adaptively focus on important image regions, improving the network’s feature-extraction ability and reducing the influence of complex backgrounds and similar features. Second, adaptive spatial feature fusion (ASFF) is applied to filter conflicting multi-scale feature information, enhancing scale invariance and the model’s responsiveness to multiscale bird targets. Finally, the SlideLoss loss function is introduced to increase emphasis on difficult samples during training and to improve detection of small and occluded targets. 

Results: Experiments show that the YOLO-DAS model achieves the best detection performance on the Bird111 dataset compared with other mainstream methods. Mean precision, recall, and mean average precision improved by 4.0%, 2.4%, and 2.9%, respectively, relative to the baseline model. The model also generalized well to public datasets (CUB-200-2011, Birdsnap, and NABirds). 

Conclusion: The YOLO-DAS model proposed here can effectively improve detection of small or occluded birds in complex backgrounds, and provides a practical technical approach for multi-scale bird detection in wetland monitoring.

Key words: wetland waterbird detection, YOLOv8, deformable attention, adaptive spatial feature fusion, loss function