Aim & Summary: The dynamics and distribution changes of bird populations are essential components of ecosystems and critical for maintaining ecological balance. Recently, the rapid development of acoustic monitoring technologies has enabled passive acoustic bird recognition to become an efficient and non-invasive method for bird monitoring. However, the collection and annotation of bird sound data face numerous challenges for practical application, particularly issues of data imbalance and sample scarcity, which severely limit the improvement of recognition accuracy. We focus on the application of ensemble learning methods in bird recognition to solve the issue of rare bird species identification under data imbalance conditions while enhancing the generalization ability and training efficiency of the model. Our study designs a cost-sensitive ensemble learning strategy to overcome the limitations posed by imbalanced and scarce bird sound data. Thus, we improve the recognition accuracy of rare bird species. We construct an efficient and accurate passive acoustic bird recognition system that provides strong support for the precise conservation of avian environments by integrating techniques such as transfer learning, self-attention mechanisms, and sensitive regularization terms.
Methods: To achieve the aforementioned objectives, we propose an improved cost-sensitive stacking ensemble learning strategy (cost-sensitive stacking ensemble for bird sound recognition, CSE-BSR). The specific methods include: (1) preprocessing collected bird sound data, including noise reduction, feature extraction, and spectrogram analysis, to improve model performance and reduce training time; (2) selecting deep learning models pre-trained on large bird sound datasets as base learners and fine-tuning them through transfer learning to better adapt to new recognition tasks; (3) designing a feature fusion method based on self-attention mechanisms to effectively integrate homogeneous yet heterogeneous features output by base learners, enhancing feature representation and model generalization; (4) recognition classification by incorporating sensitive regularization terms into the loss function of the ensemble model and dynamically adjusting weights according to the rarity coefficients of bird species to ensure the model obtains a global optimal solution during inference.
Results: We construct a proprietary dataset using samples from ten bird species in Laoshan Forest Park, Nanjing to verify the effectiveness of our proposed method. Additionally, experiments were conducted on the publicly available BirdCLEF 2023 dataset. Experimental results show that the proposed method achieved overall classification accuracies of 95.29% and 90.17% on the imbalanced proprietary dataset and the BirdCLEF 2023 dataset, respectively, significantly outperforming mainstream ensemble learning methods. Specifically, the proposed method exhibited higher sensitivity and generalization capability in recognizing rare bird species.
Conclusion: We address the issues of data imbalance and sample scarcity in bird sound recognition by proposing a cost-sensitive ensemble learning strategy. The recognition accuracy and generalization ability of rare bird species is enhanced through techniques such as transfer learning, self-attention mechanisms, and sensitive regularization terms. The proposed approach demonstrates superior performance and scalability in practical applications compared to mainstream ensemble learning methods. However, the training and inference processes remain time-consuming and resource-intensive despite achieving significant recognition effects. Future research plans include how to optimize model structures, reduce computational costs, and enhance model interpretability to better serve the precise conservation of avian environments.