基于机器学习鸟声识别算法研究进展

doi:10.17520/biods.2023272

生物多样性 ›› 2023, Vol. 31 ›› Issue (11): 23272. DOI: 10.17520/biods.2023272 cstr: 32101.14.biods.2023272

基于机器学习鸟声识别算法研究进展

申小虎¹^,²^,^*(), 朱翔宇¹, 史洪飞², 王传之³

1.江苏警官学院刑事科学技术系, 南京 210031
2.国家林业和草原局野生动植物物证技术国家林业和草原局重点实验室, 南京 210023
3.科大讯飞科技有限公司, 合肥 230088

收稿日期:2023-07-31 接受日期:2023-10-12 出版日期:2023-11-20 发布日期:2023-12-08
通讯作者: * E-mail: shenxiaohu@jspi.cn
基金资助:
野生动植物物证技术国家林业和草原局重点实验室开放课题(KLNPC2102);江苏省交通安全设施智能网联工程研究中心平台资助

Research progress of birdsong recognition algorithms based on machine learning

Xiaohu Shen¹^,²^,^*(), Xiangyu Zhu¹, Hongfei Shi², Chuanzhi Wang³

1 Department of Forensic Science and Technology, Jiangsu Police Institute, Nanjing 210031
2 National Forestry and Grassland Administration, Key Laboratory of State Forest and Grassland Administration on Wildlife Evidence Technology, Nanjing 210023
3 iFLYTEK CO. LTD., Hefei 230088

Received:2023-07-31 Accepted:2023-10-12 Online:2023-11-20 Published:2023-12-08
Contact: * E-mail: shenxiaohu@jspi.cn

摘要/Abstract

摘要：

监测生态系统中鸟类多样性的状态和趋势是一项重大挑战, 需要广泛适用的基于机器学习的鸟鸣识别算法。为准确把握基于机器学习的鸟声识别方法的研究现状与发展趋势, 本文介绍了鸟鸣识别任务的基本概念, 并从模型结构设计角度对基于机器学习的鸟鸣识别算法进行概述。鉴于基于机器学习的鸟鸣识别技术的跨学科性质, 根据研究方向将算法分为: 概率模型(probabilistic model)、模板匹配(template matching)、时序分析(time series analysis)、迁移学习(transfer learning)、数据融合(data fusion)、集成学习(ensemble learning)、度量学习(metric learning)和无监督聚类(unsupervised clustering)的鸟鸣识别算法。本文回顾了这些方法在完成鸟声识别任务时的技术脉络, 以及这些算法的特点和局限性, 并比较了它们在鸟鸣识别方面的有效性。本文还讨论了常用的标准化鸟声开源数据集和评估指标。最后, 本文指出当前方法所面临的挑战和该领域潜在的未来研究方向。本综述旨在为从事鸟声识别研究的学者和开发人员提供一个全面的参考框架, 以便更好地理解现有技术和潜在发展趋势。

关键词: 鸟声识别, 机器学习, 深度学习, 鸟类多样性, 鸟声数据集, 评估指标

Abstract

Background & Aim: Birds, located at the upstream of the ecological food chain, serve as crucial reference indicators for environmental quality and pollution. However, monitoring the status and trends of bird diversity in ecosystems poses a significant challenge. Establishing an all-weather bird diversity detection in system requires an extensively applicable machine learning-based birdsong recognition algorithm. To facilitate a precise comprehension of the research status pertaining to machine learning-based birdsong recognition algorithms and their developmental trends, we introduce the fundamental concepts of birdsong recognition and provides an overview of machine learning-based bird sound recognition algorithms from the perspective of model structure design.

Summary: Given the interdisciplinary nature of machine learning-based birdsong recognition technology, the algorithms can be classified into the following categories based on research directions: probabilistic model, template matching, time series analysis, transfer learning, data fusion, ensemble learning, metric learning-based, and unsupervised clustering birdsong recognition algorithms. We review the technical context of these categories in the context of performing birdsong recognition tasks. Furthermore, we present an analysis of the characteristics and limitations of these algorithms, along with a comparison of their birdsong recognition effectiveness in birdsong recognition. It also discusses commonly used standardized birdsong open-source datasets for birdsong and evaluation metrics applied. Finally, we outline the challenges confronted by existing methods and identifies potential future research directions in this field.

Perspectives: We endeavor to furnish scholars and developers involved in birdsong recognition research with a comprehensive reference framework, enabling them to better comprehend the existing technologies and potential developmental trends. Currently, there is a need to enhance the accuracy and robustness of machine learning-based birdsong recognition methods, especially for large-scale data samples. Additionally, the promotion and application of these methods still encounter several challenges that require resolution. The future investigations should focus on the following aspects: (1) optimization and improvement models; (2) integrating of multimodal data; (3) application of transfer learning; (4) expansion of application scenarios; and (5) establishing and standardization of databases.

Key words: birdsong recognition, machine learning, deep learning, bird diversity, birdsong datasets, evaluation metrics

申小虎, 朱翔宇, 史洪飞, 王传之 (2023) 基于机器学习鸟声识别算法研究进展. 生物多样性, 31, 23272. DOI: 10.17520/biods.2023272.

Xiaohu Shen, Xiangyu Zhu, Hongfei Shi, Chuanzhi Wang (2023) Research progress of birdsong recognition algorithms based on machine learning. Biodiversity Science, 31, 23272. DOI: 10.17520/biods.2023272.

导出引用管理器 EndNote|Ris|BibTeX

链接本文: https://www.biodiversity-science.net/CN/10.17520/biods.2023272

https://www.biodiversity-science.net/CN/Y2023/V31/I11/23272

图/表 13

参考文献 106

[1]	Acconcjaioco M, Ntalampiras S (2021) One-shot learning for acoustic identification of bird species in non-stationary environments. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 755-762. Milan, Italy.
[2]	Acevedo MA, Corrada-Bravo CJ, Corrada-Bravo H, Villanueva-Rivera LJ, Aide TM (2009) Automated classification of bird and amphibian calls using machine learning: A comparison of methods. Ecological Informatics, 4, 206-214. DOI URL
[3]	Adavanne S, Drossos K, Çakir E, Virtanen T (2017) Stacked convolutional and recurrent neural networks for bird audio detection. In: 201725th European Signal Processing Conference EUSIPCO,pp.1729-1733. Kos, Greece.
[4]	Anderson SE, Dave AS, Margoliash D (1996) Template-based automatic recognition of birdsong syllables from continuous recordings. Journal of the Acoustical Society of America, 100, 1209-1219. PMID
[5]	Bai J, Chen C, Chen J (2020) Xception based system for bird sound detection. In: CLEF Working Notes 2020, CLEF: Conference and Labs of the Evaluation Forum. Thessaloniki, Greece.
[6]	Bai J, Wang B, Chen C, Fu Z, Chen J (2019) Inception-v3 based method of LifeCLEF 2019 bird recognition. In: CLEFWorking Notes 2019, pp. 9-12. Lugano, Switzerland.
[7]	Bai J, Wu R, Wang M (2018) CIAIC-BAD system for DCASE 2018 challenge task3. In: Detection and Classification of Acoustic Scenes and Events 2018. Woking, Surrey, UK.
[8]	Bendale A, Boult TE (2016) Towards open set deep networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1563-1572. Las Vegas, NV, USA.
[9]	Bold N, Zhang C, Akashi T (2019) Cross-domain deep feature combination for bird species classification with audio-visual data. IEICE Transactions on Information and Systems, E102.D, 2033-2042. DOI URL
[10]	Boulmaiz A, Messadeg D, Doghmane N, Taleb-Ahmed A (2017) Design and implementation of a robust acoustic recognition system for waterbird species using TMS320C6713 DSK. International Journal of Ambient Computing & Intelligence, 8, 98-118.
[11]	Briggs F, Raich R, Fern XZ (2009) Audio classification of bird species: A statistical manifold approach. In: 2009 Ninth IEEE International Conference on Data Mining, Miami Beach, pp. 51-60. Florida, USA.
[12]	Brock A, De S, Smith SL (2021) High-performance large-scale image recognition without normalization. In: 38th International Conference on Machine Learning (PMLR), 139, 1059-1071.
[13]	Carvalho S, Gomes EF (2023) Automatic classification of bird sounds: Using MFCC and Mel spectrogram features with deep learning. Vietnam Journal of Computer Science, 10, 39-54. DOI URL
[14]	Chandu B, Munikoti A, Murthy KS, Murthy VG, Nagaraj C (2020) Automated bird species identification using audio signal processing and neural networks. In: 2020International Conference on Artificial Intelligence and Signal Processing AISP,pp.1-5. Amaravati, India.
[15]	Chen SS, Li Y (2014) Applying random forest classifier combined with time-frequency texture features to bird sounds recognition. Computer Applications and Software, 31, 154-157, 161. (in Chinese with English abstract)
	[陈莎莎, 李应 (2014) 结合时-频纹理特征的随机森林分类器应用于鸟声识别. 计算机应用与软件, 31, 154-157, 161.]
[16]	Chou CH, Ko HY (2011) Automatic birdsong recognition with MFCC based syllable feature extraction. In: InternationalConference on Ubiquitous Intelligence and Computing,pp.185-196. Banff, Canda.
[17]	Clementino T, Colonna JG (2020) Using triplet loss for bird species recognition on BirdCLEF 2020. In: Conferenceand Labs of the Evaluation Forum,pp.22-25. Thessaloniki, Greece.
[18]	Conde MV, Shubham K, Agnihotri P, Movva ND, Bessenyei S (2021) Weakly-supervised classification and detection of bird sounds in the wild. A BirdCLEF 2021 solution. arXiv: 2107.04878. https://arxiv.org/abs/2107.04878.
[19]	Dai YS, Yang J, Dong YW, Zou HP, Hu MZ, Wang B (2021) Blind source separation-based IVA-Xception model for bird sound recognition in complex acoustic environments. Electronics Letters, 57, 454-456. DOI URL
[20]	Das N, Mondal A, Chaki J, Padhy N, Dey N (2020) Machine learning models for bird species recognition based on vocalization:A succinct review. In: InformationTechnology and Intelligent Transportation Systems ITITS,pp.1-9. Xi’an, China.
[21]	De Oliveira AG, Ventura TM, Ganchev TD, Silva LNS, Marques MI, Schuchmann KL (2020) Speeding up training of automated bird recognizers by data reduction of audio features. PeerJ, 8, e8407.
[22]	Fritzler A, Koitka S, Friedrich CM (2017) Recognizing bird species in audio files using transfer learning. In: Conferenceand Labs of the Evaluation Forum, 1866-pp. 1882. Dublin, Ireland.
[23]	Ghani B, Hallerberg S (2021) A randomized bag-of-birds approach to study robustness of automated audio based bird species classification. Applied Sciences, 11, 9226-9242. DOI URL
[24]	Gupta G, Kshirsagar M, Zhong M, Gholami S, Ferres JL (2021) Comparing recurrent convolutional neural networks for large scale bird species classification. Scientific Reports, 11, 17085. DOI PMID
[25]	Han PF, Chen X (2022) Bird sound recognition based on MFCC-IMFCC and GA-SVM. Computer Systems and Applications, 31(11), 393-399. (in Chinese with English abstract)
	[韩鹏飞, 陈晓 (2022) 基于MFCC-IMFCC和GA-SVM的鸟声识别. 计算机系统应用, 31(11), 393-399.]
[26]	Han X, Mu Y, Sheng GM (2023) Research on the CCPSO optimized SVM based bird sound recognition technology. Technical Acoustics, 42, 118-126. (in Chinese with English abstract)
	[韩雪, 慕昱, 盛桂敏 (2023) CCPSO优化支持向量机的鸟声识别技术研究. 声学技术, 42, 118-126.]
[27]	Hong TY, Zabidi MM (2021) Bird sound detection with convolutional neural networks using raw waveforms and spectrograms. In: InternationalSymposium on Applied Science and Engineering,pp.242-248. Erzurum, Turkey.
[28]	Incze Á, Jancsó HB, Szilágyi Z, Farkas A, Sulyok C (2018) Bird sound recognition using a convolutional neural network. In: 2018 IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY), pp. 295-300. Subotica, Serbia.
[29]	Jancovic P, Kokuer (2011) Automatic Detection and Recognition of Tonal Bird Sounds in Noisy Environments. Eurasip Journal on Advances in Signal Processing, 2011, 982936.
[30]	Joly A, Champ J, Buisson O (2014) Instance-based bird species identication with undiscriminant features pruning. In: Cross-LanguageEvaluation Forum,pp.625-633. Sheffield, UK.
[31]	Joly A, Leveau V, Champ J, Buisson O (2015) Shared nearest neighbors match kernel for bird songs identification. In: Cross-LanguageEvaluation Forum 2015, hal-01182784. Toulouse, France.
[32]	Jung T, Jeon H, Jeon C, Cook A, Weiss A, Lee M, Smith AH (2019) Deep learning-based bird sound recognition system with data pre-processing. In: Academic Conference of Korea Electronics Engineering Association, pp. 756-759. Jeju Island, Korea.
[33]	Kaewtip K, Alwan A, Reilly C, Taylor CE (2016) A robust automatic birdsong phrase classification: A template-based approach. Journal of the Acoustical Society of America, 140, 3691-3701. PMID
[34]	Kaewtip K, Tan LN, Taylor CE, Alwan A (2015) Bird-phrase segmentation and verification: A noise-robust template-based approach. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 758-762. South Brisbane, QLD, Australia.
[35]	Kahl S, Stter FR, Goau H, Glotin H, Planqué B, Vellinga WP, Joly A (2019) Overview of BirdCLEF 2019:Large-scale bird recognition in soundscapes. In: Cross-LanguageEvaluation Forum 2019. Lugano, Switzerland.
[36]	Kahl S, Wilhelm-Stein T, Hussein H (2017) Large-scale bird sound classification using convolutional neural networks. In: Cross-LanguageEvaluation Forum. Amsterdam, Netherlands.
[37]	Kahl S, Wood CM, Eibl M, Klinck H (2021) BirdNET: A deep learning solution for avian diversity monitoring. Ecological Informatics, 61, 101236. DOI URL
[38]	Koh CY, Chang JY, Tai CL, Huang DY, Hsieh HH (2019) Bird sound classification using convolutional neural networks. In: Cross-LanguageEvaluation Forum. Lugano, Switzerland.
[39]	Kong Q, Iqbal T, Yong X (2018) DCASE 2018 challenge surrey cross-task convolutional neural network baseline. In: Detectionand Classification of Acoustic Scenes and Events,pp.217-221. Woking, Surrey, UK.
[40]	Lakshminarayanan B, Raich R, Fern X (2010) A syllable-level probabilistic framework for bird species identification. In: 2009International Conference on Machine Learning and Applications,pp.53-59. Miami, Florida, USA.
[41]	Lasseck M (2015) Improved automatic bird identification through decision tree based feature selection and bagging. In: Cross-LanguageEvaluation Forum 2015. Toulouse, France.
[42]	Lasseck M (2018) Acoustic bird detection with deep convolutional neural networks. In:Detection and Classification of Acoustic Scenes and Events 2018. Woking, Surrey, UK.
[43]	Lasseck M (2019) Bird species identification in soundscapes. In:Conference and Labs of the Evaluation Forum 2019. Lugano, Switzerland.
[44]	LeBien J, Zhong M, Campos-Cerqueira M, Velev JP, Dodhia R, Ferres JL, Aide TM (2020) A pipeline for identification of bird and frog species in tropical soundscape recordings using a convolutional neural network. Ecological Informatics, 59, 101113. DOI URL
[45]	Li DP, Zhou XY, Ye R, Xia Y, Xu HN (2022) Bird sound recognition algorithm based on feature selection and GWO-KELM. Technical Acoustics, 41, 782-788. (in Chinese with English abstract)
	[李大鹏, 周晓彦, 叶如, 夏煜, 徐华南 (2022) 基于特征选择和GWO-KELM的鸟声识别算法. 声学技术, 41, 782-788.]
[46]	Li Y, Zhao M, Xu MY, Liu YF, Qian YC (2019) A survey of research on multi-source information fusion technology. Intelligent Computer and Applications, 9(5), 186-189. (in Chinese with English abstract)
	[李洋, 赵鸣, 徐梦瑶, 刘云飞, 钱雨辰 (2019) 多源信息融合技术研究综述. 智能计算机与应用, 9(5), 186-189.]
[47]	Liu HT, Jiang HY, Shu X, Xu Y, Wu YL, Guo XQ (2017) Recognition of multiple bird species in audio recordings based on feature transfer. Journal of Data Acquisition and Processing, 32, 1239-1247. (in Chinese with English abstract)
	[刘昊天, 姜海燕, 舒欣, 徐彦, 伍艳莲, 郭小清 (2017) 基于特征迁移的多物种鸟声识别方法. 数据采集与处理, 32, 1239-1247.]
[48]	Liu Z, Mao H, Wu CY (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition, arXiv: 2201.03545. https://arxiv.org/abs/2201.03545.
[49]	Liu Z, Zhang YC, Hu HL (2017) Bird sound classification simulation in noisy environment based on random forests and large scale acoustic features. System Simulation Technology, 13, 359-362. (in Chinese with English abstract)
	[刘钊, 张宇琛, 胡海龙 (2017) 随机森林和大规模声学特征的噪声环境鸟声识别仿真. 系统仿真技术, 13, 359-362.]
[50]	Liu ZH, Chen WJ, Chen AB (2022) Homologous spectrogram feature fusion with self-attention mechanism for bird sound classification. Journal of Computer Applications, 42, 1260-1268. (in Chinese with English abstract) DOI
	[刘志华, 陈文洁, 陈爱斌 (2022) 基于自注意力机制时频谱同源特征融合的鸟鸣声分类. 计算机应用, 42, 1260-1268.] DOI
[51]	Lostanlen V, Salamon J, Farnsworth A, Kelling S, Bello JP (2018) Birdvox-full-night: A dataset and benchmark for avian flight call detection. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 266-270. Calgary, Canada.
[52]	Marin-Cudraz T, Muffat-Joly B, Novoa C, Aubry P, Desmet JF, Mahamoud-Issa M, Nicolè F, Van Niekerk MH, Mathevon N, Sèbe F (2019) Acoustic monitoring of rock ptarmigan: A multi-year comparison with point-count protocol. Ecological Indicators, 101, 710-719. DOI
[53]	Mehyadin AE, Abdulazeez AM, Hasan DA, Saeed JN (2021) Birds sound classification based on machine learning algorithms. Asian Journal of Research in Computer Science, 9(4), 1-11.
[54]	Mohanty R, Mallik BK, Solanki SS (2020) Automatic bird species recognition system using neural network based on spike. Applied Acoustics, 161, 107177. DOI URL
[55]	Morgan MM, Braasch J (2022) Open set classification strategies for long-term environmental field recordings for bird species recognition. The Journal of the Acoustical Society of America, 151, 4028-4038. DOI URL
[56]	Muhling M, Franz J, Korfhage N, Freisleben B (2020) Bird species recognition via neural architecture search. In:Conference and Labs of the Evaluation Forum. Thessaloniki, Greece.
[57]	Murugaiya R, Abas PE, De Silva LC (2022) Probability enhanced entropy (PEE) novel feature for improved bird sound classification. Machine Intelligence Research, 19, 52-62. DOI
[58]	Nanni L, Maguolo G, Brahnam S, Paci M (2021) An ensemble of convolutional neural networks for audio classification. Applied Sciences, 11, 5796. DOI URL
[59]	Narasimhan R, Fern XZ, Raich R (2017) Simultaneous segmentation and classification of bird song using CNN. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 146-150. New Orleans, Los Angeles, USA.
[60]	Ntalampiras S (2018) Bird species identification via transfer learning from music genres. Ecological Informatics, 44, 76-81. DOI URL
[61]	Ntalampiras S, Potamitis I (2021) Acoustic detection of unknown bird species and individuals. CAAI Transactions on Intelligence Technology, 6, 291-300. DOI URL
[62]	Nugroho H, Widodo W, Rachman A (2019) Pattern recognition bird sounds based on their type using discreate cosine transform (DCT) and Gaussian methods. Kinetik Game Technology Information System Computer Network Computing Electronics and Control, 4, 233-240.
[63]	Ou Y, Zhou XY, Li DP (2022) Experimental design of birdsong recognition based on CNN. Research and Exploration in Laboratory, 41(4), 99-102, 112. (in Chinese with English abstract)
	[欧昀, 周晓彦, 李大鹏 (2022) 基于卷积神经网络的鸟声识别实验设计. 实验室研究与探索, 41(4), 99-102, 112.]
[64]	Pankajakshan A, Thakur A, Thapar D (2018) All-conv net for bird activity detection:Significance of learned pooling. In: Interspeech 2018, pp. 2122-2126. Hyderabad, India.
[65]	Permana SDH, Saputra G, Arifitama B, Caesarendra W, Rahim R (2022) Classification of bird sounds as an early warning method of forest fires using convolutional neural network (CNN) algorithm. Journal of King Saud University - Computer and Information Sciences, 34, 4345-4357. DOI URL
[66]	Petrusková T, Pišvejcová I, Kinštová A, Brinke T, Petrusek A (2016) Repertoire-based individual acoustic monitoring of a migratory passerine bird with complex song as an efficient tool for tracking territorial dynamics and annual return rates. Methods in Ecology and Evolution, 7, 274-284. DOI URL
[67]	Phaye S, Benetos E, Wang Y (2019) SubSpectralNet—Using sub-spectrogram based convolutional neural networks for acoustic scene classification. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 825-829. Brighton, UK.
[68]	Priyadarshani N, Marsland S, Castro I (2018) Automated birdsong recognition in complex acoustic environments: A review. Journal of Avian Biology, 49, e01447.
[69]	Ptacek L, Machlica L, Linhart P, Jaska P, Muller L (2016) Automatic recognition of bird individuals on an open set using as-is recordings. Bioacoustics, 25, 55-73. DOI URL
[70]	Qiao Y, Qian K, Zhao ZP (2020) A survey on Chinese literature for bird sound recognition based on machine listening. Journal of Fudan University (Natural Science), 59, 375-380. (in Chinese with English abstract)
	[乔玉, 钱昆, 赵子平 (2020) 基于机器听觉的鸟声识别的中文研究综述. 复旦学报(自然科学版), 59, 375-380.]
[71]	Qiao Y, Qian K, Zhao ZP (2020) Learning higher representations from bioacoustics:A sequence-to-sequence deep learning approach for bird sound classification. In: InternationalConference on Neural Information Processing,pp.130-138. Bangkok, Thailand.
[72]	Qiu ZB, Lu ZW, Wang HX, Kuang YJ (2022) Recognition of bird sounds related to power grid faults based on Mel spectrogram and convolutional neural network. Journal of South China University of Technology (Natural Science Edition), 50, 129-136. (in Chinese with English abstract)
	[邱志斌, 卢祖文, 王海祥, 况燕军 (2022) 基于Mel频谱图和CNN的电网涉鸟故障鸟声识别. 华南理工大学学报(自然科学版), 50, 129-136.] DOI
[73]	Rauch L, Schwinger R, Wirth M, Sick B, Tomforde S, Scholz C (2023) Active Bird2Vec: Towards end-to-end bird sound monitoring with transformers. arXiv:2308.07121. https://arxiv. org/abs/2308.07121.
[74]	Salamon J, Bello JP, Farnsworth A, Kelling S (2017) Fusing shallow and deep learning for bioacoustic bird species classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 141-145. New Orleans, LA, USA.
[75]	Salamon J, Bello JP, Farnsworth A, Robbins M, Keen S, Klinck H, Kelling S (2016) Towards the automatic classification of avian flight calls for bioacoustic monitoring. PLoS ONE, 11, e0166866.
[76]	Sprengel E, Jaggi M, Kilcher Y, Hofmann T (2016) Audio based bird species identification using deep learning techniques. In: Conferenceand Labs of the Evaluation Forum CLEF,pp.547-559. Évora, Portugal.
[77]	Stastny J, Munk M, Juranek L (2018) Automatic bird species recognition based on birds vocalization. EURASIP Journal on Audio, Speech, and Music Processing, 19, 1-7.
[78]	Stowell D, Plumbley MD (2014) Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning. PeerJ, 2, e488.
[79]	Sun B, Wan PW, Tao D, Zhao YX (2015) Identification of birds based on adaptive optimal kernel time-frequency distribution. Journal of Data Acquisition and Processing, 30, 1187-1195. (in Chinese with English abstract)
	[孙斌, 万鹏威, 陶达, 赵玉晓 (2015) 基于自适应最优核时频分布的鸟类识别. 数据采集与处理, 30, 1187-1195.]
[80]	Tang Q, Xu LM, Zheng BC, He CL (2023) Transound: Hyper-head attention transformer for birds sound recognition. Ecological Informatics, 75, 102001. DOI URL
[81]	Thakur A, Thapar D, Rajan P, Nigam A (2019) Deep metric learning for bioacoustic classification: Overcoming training data scarcity using dynamic triplet loss. The Journal of the Acoustical Society of America, 146, 534-547. DOI URL
[82]	Vidaña-Vila E, Navarro J, Alsina-Pagès RM, Ramírez Á (2020) A two-stage approach to automatically detect and classify woodpecker (Fam. Picidae) sounds. Applied Acoustics, 166, 107312. DOI URL
[83]	Vilches E, Escobar IA, Vallejo EE, Taylor CE (2006) Data mining applied to acoustic bird species recognition. In: 18th International Conference on Pattern Recognition (ICPR'06), pp. 400-403. Hong Kong, China.
[84]	Voelker AR, Kajic I, Eliasmith C (2019) Legendre memory units: Continuous-time representation in recurrent neural networks. In:Proceedings of the 33rd International Conference on Neural Information Processing Systems (2019 NIPS), pp. 15570-15579. Vancouver, Canada.
[85]	Wang EZ, He DJ (2014) Bird recognition based on MFCC and dual-GMM. Computer Engineering and Design, 35, 1868-1871, 1881. (in Chinese with English abstract)
	[王恩泽, 何东健 (2014) 基于MFCC和双重GMM的鸟类识别方法. 计算机工程与设计, 35, 1868-1871, 1881.]
[86]	Wang JH, Zhou XY, Han ZC, Wang LL (2023) Small sample optimized bird sound recognition network based on bridging transformer. Journal of Applied Acoustics (accessed on 2023-08-16) (in Chinese with English abstract)
	[王基豪, 周晓彦, 韩智超, 王丽丽 (2023) 基于桥接Transformer的小样本优化鸟声识别网络. 应用声学, (网络首发时间: 2023-08-16)] http://kns.cnki.net/kcms/detail/11.2121.O4.20230815.1700.004.html.
[87]	Wei JM, Li Y (2015) Rapid bird sound recognition using anti-noise texture features. Acta Electronica Sinica, 43, 185-190. (in Chinese with English abstract)
	[魏静明, 李应 (2015) 利用抗噪纹理特征的快速鸟鸣声识别. 电子学报, 43, 185-190.] DOI
[88]	Wu KY, Ruan WD, Zhou DF, Chen QC, Zhang CY, Pan XY, Yu S, Liu Y, Xiao RB (2023) Syllable clustering analysis-based passive acoustic monitoring technology and its application in bird monitoring. Biodiversity Science, 31, 22370. (in Chinese with English abstract) DOI
	[吴科毅, 阮文达, 周棣锋, 陈庆春, 张承云, 潘新园, 余上, 刘阳, 肖荣波 (2023) 基于音节聚类分析的被动声学监测技术及其在鸟类监测中的应用. 生物多样性, 31, 22370.] DOI
[89]	Xie J, Hu K, Zhu MY, Yu JH, Zhu QB (2019) Investigation of different CNN-based models for improved bird sound classification. IEEE Access, 7, 175353-175361. DOI URL
[90]	Xie JJ, Li WB, Zhang JG, Ding CQ (2018) Bird species recognition method based on Chirplet spectrogram feature and deep learning. Journal of Beijing Forestry University, 40(3), 122-127. (in Chinese with English abstract)
	[谢将剑, 李文彬, 张军国, 丁长青 (2018) 基于Chirplet语图特征和深度学习的鸟类物种识别方法. 北京林业大学学报, 40(3), 122-127.]
[91]	Xie JJ, Yang J, Xing ZL, Zhang Z, Chen X (2020) Bird species recognition method based on multi-feature fusion. Journal of Applied Acoustics, 39, 199-206. (in Chinese with English abstract)
	[谢将剑, 杨俊, 邢照亮, 张卓, 陈新 (2020) 多特征融合的鸟类物种识别方法. 应用声学, 39, 199-206.]
[92]	Xie JJ, Zhong YJ, Zhang JG, Liu S, Ding CQ, Triantafyllopoulos A (2023) A review of automatic recognition technology for bird vocalizations in the deep learning era. Ecological Informatics, 73, 101927. DOI URL
[93]	Xie SN, Girshick R, Dollár P, Tu ZW, He KM (2017) Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5987-5995. Honolulu, HI, USA.
[94]	Xie SS, Xu HF, Liu J, Zhang Y, Lv DJ (2021) Research on bird songs recognition based on MFCC-HMM. In: 2021 International Conference on Computer, Control and Robotics (ICCCR), pp. 262-266. Shanghai, China.
[95]	Xie ZF, Li DZ, Sun HX, Zhang AM (2023) Deep learning techniques for bird chirp recognition task. Biodiversity Science, 31, 22308. (in Chinese with English abstract) DOI
	[谢卓钒, 李鼎昭, 孙海信, 张安民 (2023) 面向鸟鸣声识别任务的深度学习技术. 生物多样性, 31, 22308.] DOI
[96]	Xing ZL, Wu WY, Zhang ZX, Chen QL, Ni DM (2021) Bird song recognition method based on C-LSTM. Technology Innovation and Application, 11(15), 15-18. (in Chinese with English abstract)
	[邢照亮, 吴伟银, 张正晓, 陈麒麟, 倪东明 (2021) 基于C-LSTM的鸟鸣声识别方法. 科技创新与应用, 11(15), 15-18.]
[97]	Xu JW, Yang Y (2018) A survey of ensemble learning approaches. Journal of Yunnan University (Natural Sciences Edition), 40, 1082-1092. (in Chinese with English abstract)
	[徐继伟, 杨云 (2018) 集成学习方法: 研究综述. 云南大学学报(自然科学版), 40, 1082-1092.]
[98]	Xu SZ, Sun YN, Huangfu LY, Fang WQ (2018) Design of synthesized bird sounds classifier based on multi feature extraction classifiers and time-frequency chat. Research and Exploration in Laboratory, 37(9), 81-86, 91. (in Chinese with English abstract)
	[徐淑正, 孙忆南, 皇甫丽英, 方玮骐 (2018) 基于MFCC和时频图等多种特征的综合鸟声识别分类器设计. 实验室研究与探索, 37(9), 81-86, 91.]
[99]	Yan X, Li Y (2013) Anti-noise power normalized cepstral coefficients in bird sounds recognition. Acta Electronica Sinica, 41, 295-300. (in Chinese with English abstract) DOI
	[颜鑫, 李应 (2013) 利用抗噪幂归一化倒谱系数的鸟类声音识别. 电子学报, 41, 295-300.] DOI
[100]	Yang CY, Qi HD, Peng YQ, Yin B, Hou J, Shu ZY, Chen SP (2020) Research on the application of energy spectrum with voiceprint information in bird recognition. Journal of Applied Acoustics, 39, 453-463. (in Chinese with English abstract)
	[杨春勇, 祁宏达, 彭焱秋, 尹滨, 侯金, 舒振宇, 陈少平 (2020) 融合声纹信息的能量谱图在鸟类识别中的研究. 应用声学, 39, 453-463.]
[101]	Yin CC, Xu F, Zhang C (2022) Bird song recognition based on ERB loudness feature and deep learning. Network New Media Technology, 11(2), 25-32. (in Chinese with English abstract)
	[尹晨畅, 许枫, 张纯 (2022) 基于ERB响度特征的深度学习鸟鸣声识别. 网络新媒体技术, 11(2), 25-32.]
[102]	Zabidi MM, Wong KL, Sheikh UU, Abdul Manan SS, Hamzah MAN (2022) Bird sound detection with binarized neural networks. Journal of Electrical Technology, 21, 48-53.
[103]	Zhang C, Jiang WS, Zhao Q (2021) Semantic segmentation of aerial imagery via split-attention networks with disentangled nonlocal and edge supervision. Remote Sensing, 13, 1176. DOI URL
[104]	Zhang CY, Chen YH, Hao ZZ, Gao XH (2022) An efficient time-domain end-to-end single-channel bird sound separation network. Animals, 12, 3117. DOI URL
[105]	Zhang FY, Zhang LY, Chen HX, Xie JJ (2021) Bird species identification using spectrogram based on multi-channel fusion of DCNNs. Entropy, 23, 1507-1518. DOI URL
[106]	Zhang SH, Zhao Z, Xu ZY, Zhang Y (2017) Automatic bird vocalization identification based on Mel-subband parameterized feature. Journal of Computer Applications, 37, 1111-1115. (in Chinese with English abstract) DOI
	[张赛花, 赵兆, 许志勇, 张怡 (2017) 基于Mel子带参数化特征的自动鸟鸣识别. 计算机应用, 37, 1111-1115.] DOI

文献方法 Literature	所属类别 Category			输入 Input			基础网络 Basic network	优点 Advantage		缺点 Disadvantage				特定问题 Specific issue
颜鑫和李应, 2013	概率模型 Probabilistic model			抗噪幂归一化倒谱系数 Anti-noise power normalized cepstral coefficients (APNCC)			SVM	两阶段去噪得到更好的抗噪信息表征 Two-stage denoising for better anti-noise information representation		滤除部分前景信息, 在纯净条件下识别率下降 Causing a decrease in recognition rate under pure conditions				环境中的非平稳噪声 Non-stationary noise
Joly et al, 2014	概率模型 Probabilistic model			梅尔倒谱系数 Mel frequency cepstral coefficient (MFCC)			KNN	采用语义过滤方案过滤了非相关信息 Using semantic filtering to filter out irrelevant information		弱监督学习容易导致标签噪声 Weak supervised learning can easily lead to label noise				?
Lasseck, 2015	概率模型 Probabilistic model			低级描述符 Low-level descriptors (LLDs)			DT	借助特征关注区域降低了模板匹配时间, 提升了泛用性 By utilizing feature focus areas, template matching time is reduced and universality is improved		特征选择的过程较为复杂 The process of feature selection is relatively complex				?
杨春勇等, 2020	概率模型 Probabilistic model			局部二值模式、方向梯度直方图 Local binary pattern (LBP), histogram of oriented gradient (HOG)			KNN	鸣声能量谱图边缘特征能较好拟合鸟声信息 The edge features of sound energy spectrum are well fitted with bird sound information		HOG特征维度大, 不利于大规模计算 The large dimensionality of HOG features is not conducive to large-scale computing				?
韩鹏飞和陈晓, 2022	概率模型 Probabilistic model			梅尔倒谱系数、翻转梅尔倒谱系数 Mel frequency cepstral coefficient (MFCC), Inverted Mel frequency cepstral coefficient (IMFCC)			GA-SVM	利用IMFCC表征稀疏的高频部分信息 Using IMFCC to represent sparse high-frequency information		特征权重具有随机性 The feature weights have randomness				?
Kaewtip et al, 2015	模板匹配 Template matching			时频图 Spectrogram			SVM	利用DTW和高能时频区域获得噪声鲁棒模板 Using DTW and high-energy time-frequency regions to obtain noise-robust templates		噪声能量导致高能区域被覆盖, 其算法性能会产生退化 Noise energy causes high-energy areas to be covered, resulting in degradation of algorithm performance				有限训练数据 Limited training data
孙斌等, 2015	模板匹配 Template matching			最优核时频分布 Adaptive optimal kernel (AOK)			?	时频模板特征数据量小可降低匹配计算量 The small amount of feature data in time-frequency templates can reduce the amount of matching computation		灰度共生法提取特征计算量较大, 算法复杂性高 The gray level co-occurrence method requires a large amount of computation for feature extraction and has high algorithm complexity				?
Gupta et al, 2021	时间序列 Timing analysis			时频图 Spectrogram			CNN-LSTM CNN-GRU CNN-LMU	LMU单元使用差异化正交化记忆机制, 提升长时依赖能力并减少了模型参数 The LMU unit uses a differentiated orthogonalization memory mechanism to enhance long-term dependency and reduce model parameters		?				大规模鸟类预测 Large scale bird species prediction
Qiao et al, 2020	时间序列 Timing analysis			时频图 Spectrogram			BiRNN-BiRNN	无监督序列到序列模型来学习更高层表示, 从而有效获得上下文信息 Unsupervised sequence to sequence model for learning higher-level representations and effectively obtaining effective contextual information		缺乏合理的缺失值处理机制 Lack of a mechanism for handling missing values				?
Carvalho & Gomes, 2023	时间序列 Timing analysis			梅尔倒谱系数、梅尔时频图 MFCC, Mel spectrogram			LSTM、GRU、CRNN等	CNN-GRU降低了计算复杂度, 更易收敛 CNN-GRU reduces computational complexity and makes convergence easier		仍然不能完全解决梯度消失问题 Still unable to completely solve the problem of gradient disappearance				?
文献方法 Literature		所属类别 Category		输入 Input		基础网络 Basic network		优点 Advantage			缺点 Disadvantage				特定问题 Specific issue
Lasseck, 2019		迁移学习 Transfer learning		时频图 Spectrogram		Inception ResNet		采用不同数据增强技术提升了精度与模型泛用性 Utilizing different data augmentation techniques to improve accuracy and model universality			训练时间长 Learning for more time				多标签分类问题 Multi- label classification issues
Ntalampiras, 2018		迁移学习 Transfer learning		梅尔倒谱系数 MFCC		HMM		通过音乐分类概率密度分布来获取鸟声分类知识 Obtaining knowledge of bird sound classification through probability density distribution of music classification			非平稳噪声会导致知识迁移效果差 Non stationary noise leads to poor knowledge transfer performance				非深度框架的迁移学习 Transfer learning in traditional machine learning frameworks
LeBien et al, 2020		迁移学习 Transfer learning		时频图 Spectrogram		ResNet50		通过假阳性检测训练来整合每个类的相关缺失信息 Integrate relevant missing information for each class by false positive detection training			?				跨物种知识迁移 Cross species knowledge transfer
谢将剑等, 2020		数据融合 Data-fusion		短时傅里叶变换、Mel倒谱变换、线调频小波变换 Short-time Fourier transform (STFT), Mel frequency cepstral transform (MFCT), Chirplet transform (CT)		VGG		特征加权确保在特征融合下不增加特征维度 Feature weighting ensures that feature dimensions are not added during feature fusion			未考虑不同语图条件下的模型结构 Model structure without considering different spectrograms				?
Xie et al, 2019		数据融合 Data-fusion		梅尔时频图、谐波谱图、瞬态响应谱图 Mel-spectrogram, Harmonic-component, Percussive- component		VGG		三种谱图表征了鸟声中的不同成分, 同时分别训练避免不同特征分量间的干扰 Three spectrograms represent different components of birdsong, while training separately to avoid interference between different feature components			训练效率较低 Low training efficiency				?
Salamon et al, 2017		数据融合 Data-fusion		时频图 Spectrogram		SKM、CNN		充分挖掘了模型对不同特征预测的互补特性 Fully mining the complementary characteristics of the model for predicting different features			易产生决策结果偏差 Easy to generate deviation in decision results				?
Bold et al, 2019		数据融合 Data-fusion		鸟类图像、时频图 Bird images, spectrograms		CaffeNet		双流多模态CNN在后期的融合策略使鸟类原始图像成为鸟声识别的有效补充 The dual-stream multimodal CNN fusion strategy in the later stage makes the bird images an effective supplement to bird sound recognition			融合后数据特征维度过高易导致实时性差、系统性能降低 High dimensionality of fused data features can lead to poor real-time performance and reduced system performance				?
Xie et al, 2023		数据融合 Data-fusion		MFCC融合特征图 MFCC fusion feature map		DenseNet 121		模型空间复杂度较低 Low model space complexity			训练过度消耗内存, 不适合大规模训练 Excessive memory consumption during training, not suitable for large-scale training
Conde et al, 2021		集成学习 Ensemble learning		时频图 Spectrogram		ResNeSt-50 EfficientNet DenseNet 121		使用多标签来提升鸟类种类的预测概率 Using multi- lables to improve the prediction probability of bird species			模型堆叠泛用性不高 Low universality of model stacking				弱监督鸟声分类问题 Weak supervised birdsong classification problem
文献方法 Literature	所属类别 Category		输入 Input		基础网络 Basic network				优点 Advantage			缺点 Disadvantage	特定问题 Specific issue
Morgan & Braasch, 2022	度量学习 Metric learning		时频图 Spectrogram		VGG16				分层网络在无标记数据条件下实现显著的性能提升 Layered networks achieve significant performance improvement under unlabeled data conditions			过多依赖数据假设 Excessive reliance on data assumptions	开放数据集 Open dataset
Acconcjai-oco & Ntalampir-as, 2021	度量学习 Metric learning		时频图 Spectrogram		SNN				同时对未知鸟类与已知鸟类之间的相似性和差异性进行度量 Simultaneously measuring the similarity and difference between unknown and known birds			训练中对未标记数据的验证增加了算法复杂性 The validation of unlabeled data during training increases algorithm complexity	开放数据集Open dataset
吴科毅等, 2023	无监督聚类 Unsupervised clustering		时频图 Spectrogram		VAE				过零率与能量的辅助判定, 可避免特征提取过程中产生的漏检 Assisted determination of zero crossing rate and energy to avoid missed detections during feature extraction process			需要推断聚类数量 Need to infer the number of clusters	多物种鸟鸣混叠音节Mixed syllables of bird songs from multiple species
Kahl et al, 2021	传统深度学习 radition -al deep learning		时频图 Spectrogram		ResNet-157				多标签分类与混合训练提高了识别任务的整体性能 Multi label classification and mixed training improve the overall performance of recognition tasks			对训练和推理计算能力要求较高 High requirements for training and reasoning and computing abilities	?

文献方法 Literature	所属类别 Category			输入 Input			基础网络 Basic network	优点 Advantage		缺点 Disadvantage				特定问题 Specific issue
颜鑫和李应, 2013	概率模型 Probabilistic model			抗噪幂归一化倒谱系数 Anti-noise power normalized cepstral coefficients (APNCC)			SVM	两阶段去噪得到更好的抗噪信息表征 Two-stage denoising for better anti-noise information representation		滤除部分前景信息, 在纯净条件下识别率下降 Causing a decrease in recognition rate under pure conditions				环境中的非平稳噪声 Non-stationary noise
Joly et al, 2014	概率模型 Probabilistic model			梅尔倒谱系数 Mel frequency cepstral coefficient (MFCC)			KNN	采用语义过滤方案过滤了非相关信息 Using semantic filtering to filter out irrelevant information		弱监督学习容易导致标签噪声 Weak supervised learning can easily lead to label noise				?
Lasseck, 2015	概率模型 Probabilistic model			低级描述符 Low-level descriptors (LLDs)			DT	借助特征关注区域降低了模板匹配时间, 提升了泛用性 By utilizing feature focus areas, template matching time is reduced and universality is improved		特征选择的过程较为复杂 The process of feature selection is relatively complex				?
杨春勇等, 2020	概率模型 Probabilistic model			局部二值模式、方向梯度直方图 Local binary pattern (LBP), histogram of oriented gradient (HOG)			KNN	鸣声能量谱图边缘特征能较好拟合鸟声信息 The edge features of sound energy spectrum are well fitted with bird sound information		HOG特征维度大, 不利于大规模计算 The large dimensionality of HOG features is not conducive to large-scale computing				?
韩鹏飞和陈晓, 2022	概率模型 Probabilistic model			梅尔倒谱系数、翻转梅尔倒谱系数 Mel frequency cepstral coefficient (MFCC), Inverted Mel frequency cepstral coefficient (IMFCC)			GA-SVM	利用IMFCC表征稀疏的高频部分信息 Using IMFCC to represent sparse high-frequency information		特征权重具有随机性 The feature weights have randomness				?
Kaewtip et al, 2015	模板匹配 Template matching			时频图 Spectrogram			SVM	利用DTW和高能时频区域获得噪声鲁棒模板 Using DTW and high-energy time-frequency regions to obtain noise-robust templates		噪声能量导致高能区域被覆盖, 其算法性能会产生退化 Noise energy causes high-energy areas to be covered, resulting in degradation of algorithm performance				有限训练数据 Limited training data
孙斌等, 2015	模板匹配 Template matching			最优核时频分布 Adaptive optimal kernel (AOK)			?	时频模板特征数据量小可降低匹配计算量 The small amount of feature data in time-frequency templates can reduce the amount of matching computation		灰度共生法提取特征计算量较大, 算法复杂性高 The gray level co-occurrence method requires a large amount of computation for feature extraction and has high algorithm complexity				?
Gupta et al, 2021	时间序列 Timing analysis			时频图 Spectrogram			CNN-LSTM CNN-GRU CNN-LMU	LMU单元使用差异化正交化记忆机制, 提升长时依赖能力并减少了模型参数 The LMU unit uses a differentiated orthogonalization memory mechanism to enhance long-term dependency and reduce model parameters		?				大规模鸟类预测 Large scale bird species prediction
Qiao et al, 2020	时间序列 Timing analysis			时频图 Spectrogram			BiRNN-BiRNN	无监督序列到序列模型来学习更高层表示, 从而有效获得上下文信息 Unsupervised sequence to sequence model for learning higher-level representations and effectively obtaining effective contextual information		缺乏合理的缺失值处理机制 Lack of a mechanism for handling missing values				?
Carvalho & Gomes, 2023	时间序列 Timing analysis			梅尔倒谱系数、梅尔时频图 MFCC, Mel spectrogram			LSTM、GRU、CRNN等	CNN-GRU降低了计算复杂度, 更易收敛 CNN-GRU reduces computational complexity and makes convergence easier		仍然不能完全解决梯度消失问题 Still unable to completely solve the problem of gradient disappearance				?
文献方法 Literature		所属类别 Category		输入 Input		基础网络 Basic network		优点 Advantage			缺点 Disadvantage				特定问题 Specific issue
Lasseck, 2019		迁移学习 Transfer learning		时频图 Spectrogram		Inception ResNet		采用不同数据增强技术提升了精度与模型泛用性 Utilizing different data augmentation techniques to improve accuracy and model universality			训练时间长 Learning for more time				多标签分类问题 Multi- label classification issues
Ntalampiras, 2018		迁移学习 Transfer learning		梅尔倒谱系数 MFCC		HMM		通过音乐分类概率密度分布来获取鸟声分类知识 Obtaining knowledge of bird sound classification through probability density distribution of music classification			非平稳噪声会导致知识迁移效果差 Non stationary noise leads to poor knowledge transfer performance				非深度框架的迁移学习 Transfer learning in traditional machine learning frameworks
LeBien et al, 2020		迁移学习 Transfer learning		时频图 Spectrogram		ResNet50		通过假阳性检测训练来整合每个类的相关缺失信息 Integrate relevant missing information for each class by false positive detection training			?				跨物种知识迁移 Cross species knowledge transfer
谢将剑等, 2020		数据融合 Data-fusion		短时傅里叶变换、Mel倒谱变换、线调频小波变换 Short-time Fourier transform (STFT), Mel frequency cepstral transform (MFCT), Chirplet transform (CT)		VGG		特征加权确保在特征融合下不增加特征维度 Feature weighting ensures that feature dimensions are not added during feature fusion			未考虑不同语图条件下的模型结构 Model structure without considering different spectrograms				?
Xie et al, 2019		数据融合 Data-fusion		梅尔时频图、谐波谱图、瞬态响应谱图 Mel-spectrogram, Harmonic-component, Percussive- component		VGG		三种谱图表征了鸟声中的不同成分, 同时分别训练避免不同特征分量间的干扰 Three spectrograms represent different components of birdsong, while training separately to avoid interference between different feature components			训练效率较低 Low training efficiency				?
Salamon et al, 2017		数据融合 Data-fusion		时频图 Spectrogram		SKM、CNN		充分挖掘了模型对不同特征预测的互补特性 Fully mining the complementary characteristics of the model for predicting different features			易产生决策结果偏差 Easy to generate deviation in decision results				?
Bold et al, 2019		数据融合 Data-fusion		鸟类图像、时频图 Bird images, spectrograms		CaffeNet		双流多模态CNN在后期的融合策略使鸟类原始图像成为鸟声识别的有效补充 The dual-stream multimodal CNN fusion strategy in the later stage makes the bird images an effective supplement to bird sound recognition			融合后数据特征维度过高易导致实时性差、系统性能降低 High dimensionality of fused data features can lead to poor real-time performance and reduced system performance				?
Xie et al, 2023		数据融合 Data-fusion		MFCC融合特征图 MFCC fusion feature map		DenseNet 121		模型空间复杂度较低 Low model space complexity			训练过度消耗内存, 不适合大规模训练 Excessive memory consumption during training, not suitable for large-scale training
Conde et al, 2021		集成学习 Ensemble learning		时频图 Spectrogram		ResNeSt-50 EfficientNet DenseNet 121		使用多标签来提升鸟类种类的预测概率 Using multi- lables to improve the prediction probability of bird species			模型堆叠泛用性不高 Low universality of model stacking				弱监督鸟声分类问题 Weak supervised birdsong classification problem
文献方法 Literature	所属类别 Category		输入 Input		基础网络 Basic network				优点 Advantage			缺点 Disadvantage	特定问题 Specific issue
Morgan & Braasch, 2022	度量学习 Metric learning		时频图 Spectrogram		VGG16				分层网络在无标记数据条件下实现显著的性能提升 Layered networks achieve significant performance improvement under unlabeled data conditions			过多依赖数据假设 Excessive reliance on data assumptions	开放数据集 Open dataset
Acconcjai-oco & Ntalampir-as, 2021	度量学习 Metric learning		时频图 Spectrogram		SNN				同时对未知鸟类与已知鸟类之间的相似性和差异性进行度量 Simultaneously measuring the similarity and difference between unknown and known birds			训练中对未标记数据的验证增加了算法复杂性 The validation of unlabeled data during training increases algorithm complexity	开放数据集Open dataset
吴科毅等, 2023	无监督聚类 Unsupervised clustering		时频图 Spectrogram		VAE				过零率与能量的辅助判定, 可避免特征提取过程中产生的漏检 Assisted determination of zero crossing rate and energy to avoid missed detections during feature extraction process			需要推断聚类数量 Need to infer the number of clusters	多物种鸟鸣混叠音节Mixed syllables of bird songs from multiple species
Kahl et al, 2021	传统深度学习 radition -al deep learning		时频图 Spectrogram		ResNet-157				多标签分类与混合训练提高了识别任务的整体性能 Multi label classification and mixed training improve the overall performance of recognition tasks			对训练和推理计算能力要求较高 High requirements for training and reasoning and computing abilities	?

方法文献 Literature	数据增强 Augmentation	评价标准 Evaluation criteria	实验结果 Test result (%)	鸟类种数 Number of bird species	测试数据集 Test dataset
韩雪等, 2023	否 No	识别平均精度 c-mAP	95.31	11	Macaulay library
颜鑫和李应, 2013	否 No	识别平均精度 c-mAP	94.12	34	Freesound
Joly et al, 2014	否 No	识别平均精度 c-mAP	36.5	501	Xeno-canto
Zabidi et al, 2022	否 No	识别平均精度 c-mAP	94.08	10	Xeno-canto
吴科毅等, 2023	否 No	识别平均精度 c-mAP	89.6	10	白云山数据集 Baiyunshan dataset
Kahl et al, 2021	是 Yes	识别平均精度 c-mAP	79.1	84	Xeno-canto
Ntalampiras, 2018	是 Yes	识别平均精度 c-mAP	92.5	10	Xeno-canto
Lasseck, 2019	是 Yes	识别平均精度 c-mAP	35.6	659	Xeno-canto
Carvalho & Gomes, 2023	否 No	识别平均精度 c-mAP	44.3	91	自建库 Self-building database
LeBien et al, 2020	否 No	识别平均精度 c-mAP	89.3	24	Elyunk National Forest
Xie et al, 2023	否 No	识别平均精度 c-mAP	96.9	10	Xeno-canto
孙斌等, 2015	否 No	识别平均精度 c-mAP	96.0	40	自建库 Self-building database
Salamon et al, 2017	是 Yes	识别平均精度 c-mAP	96.0	43	CLO-43DS
Xie et al, 2019	否 No	识别平均精度 c-mAP	86.3	43	CLO-43DS
谢将剑等, 2020	否 No	识别平均精度 c-mAP	89.4	35	ICML4B
Morgan & Braasch, 2022	否 No	准确率 Accuracy	92.4	12	自建库 Self-building database
Acconcjaioco & Ntalampiras, 2021	否 No	准确率 Accuracy	97.4	6	Xeno-canto

方法文献 Literature	数据增强 Augmentation	评价标准 Evaluation criteria	实验结果 Test result (%)	鸟类种数 Number of bird species	测试数据集 Test dataset
韩雪等, 2023	否 No	识别平均精度 c-mAP	95.31	11	Macaulay library
颜鑫和李应, 2013	否 No	识别平均精度 c-mAP	94.12	34	Freesound
Joly et al, 2014	否 No	识别平均精度 c-mAP	36.5	501	Xeno-canto
Zabidi et al, 2022	否 No	识别平均精度 c-mAP	94.08	10	Xeno-canto
吴科毅等, 2023	否 No	识别平均精度 c-mAP	89.6	10	白云山数据集 Baiyunshan dataset
Kahl et al, 2021	是 Yes	识别平均精度 c-mAP	79.1	84	Xeno-canto
Ntalampiras, 2018	是 Yes	识别平均精度 c-mAP	92.5	10	Xeno-canto
Lasseck, 2019	是 Yes	识别平均精度 c-mAP	35.6	659	Xeno-canto
Carvalho & Gomes, 2023	否 No	识别平均精度 c-mAP	44.3	91	自建库 Self-building database
LeBien et al, 2020	否 No	识别平均精度 c-mAP	89.3	24	Elyunk National Forest
Xie et al, 2023	否 No	识别平均精度 c-mAP	96.9	10	Xeno-canto
孙斌等, 2015	否 No	识别平均精度 c-mAP	96.0	40	自建库 Self-building database
Salamon et al, 2017	是 Yes	识别平均精度 c-mAP	96.0	43	CLO-43DS
Xie et al, 2019	否 No	识别平均精度 c-mAP	86.3	43	CLO-43DS
谢将剑等, 2020	否 No	识别平均精度 c-mAP	89.4	35	ICML4B
Morgan & Braasch, 2022	否 No	准确率 Accuracy	92.4	12	自建库 Self-building database
Acconcjaioco & Ntalampiras, 2021	否 No	准确率 Accuracy	97.4	6	Xeno-canto

挑战赛 Challenge	排名 Rank	采用模型 Network adopted	评估得分 Scores	物种数 No. of species	相关文献 Related literature	多标签 Multi-label	备注 Comments
BirdCLEF2023	1	NFNet/ConvNeXt/ ConvNeXtV2	c-mAP: 0.7639	264	-	是 Yes	集成学习 Ensemble learning
	2	EfficientNetV2/ ResNet-34/ EfficientNet-B0/ EfficientNet-B3	c-mAP: 0.7637		-	是 Yes	集成学习 Ensemble learning
	3	EfficientNet-B0/ EfficientNetV2	c-mAP: 0.7631		-		集成学习 Ensemble learning
BirdCLEF2022	1	EfficientNet-B3/ NFNet	macro F1: 0.8527	113	-	是 Yes	集成学习 Ensemble learning
BirdCLEF2022	2	ReNeXt50/ EfficientNet-B0/ EfficientNetV2/ NFNet	macro F1: 0.8438	113	-	是 Yes	集成学习 Ensemble learning
BirdCLEF2021	1	ResNeSt	micro averaged F1: 0.6932	397	-	是 Yes	-
	2	ResNet-34/ EfficientNetV2	micro averaged F1: 0.6893		-		集成学习 Ensemble learning
	3	ResNet-50/ EfficientNet-B2~B7	micro averaged F1: 0.6891		-		集成学习 Ensemble learning
	10	ResNeSt-50 EfficientNet DenseNet121	micro averaged F1: 0.6738		Conde et al, 2021		集成学习 Ensemble learning
BirdCLEF2020	1	由NAS定义 Built by NAS	c-mAP: 0.128	960	Voelker et al, 2019	是 Yes	-
	2	Xception	c-mAP: 0.042		Bai et al, 2020		-
	3	Alexnet	c-mAP: 0.063		Muhling et al, 2020		-
BirdCLEF2019	1	ResNet/ Inception	c-mAP: 0.356	659	Lasseck, 2019	是 Yes	-
	2	ResNet/ Inception	c-mAP: 0.160		Koh et al, 2019		-
	3	Inception-v3	c-mAP: 0.054		Bai et al, 2019		-

基于机器学习鸟声识别算法研究进展

Research progress of birdsong recognition algorithms based on machine learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 106

相关文章 15

编辑推荐

Metrics

本文评价

[1]	徐欢, 辛凤飞, 施宏亮, 袁琳, 薄顺奇, 赵欣怡, 邓帅涛, 潘婷婷, 余婧, 孙赛赛, 薛程. 生态修复技术集成应用对长江口北支生境与鸟类多样性提升效果评估[J]. 生物多样性, 2025, 33(5): 24478-.
[2]	白皓天, 余上, 潘新园, 凌嘉乐, 吴娟, 谢恺琪, 刘阳, 陈学业. AI辅助识别的鸟类被动声学监测在城市湿地公园中的应用[J]. 生物多样性, 2024, 32(8): 24188-.
[3]	王秦韵, 张玉泉, 刘浩, 李明, 刘菲, 赵宁, 陈鹏, 齐敦武, 阙品甲. 成都大熊猫繁育研究基地鸟类多样性[J]. 生物多样性, 2024, 32(8): 24066-.
[4]	段菲, 刘鸣章, 卜红亮, 俞乐, 李晟. 城市化对鸟类群落组成及功能特征的影响——以京津冀地区为例[J]. 生物多样性, 2024, 32(8): 23473-.
[5]	王永财, 万华伟, 高吉喜, 胡卓玮, 孙晨曦, 吕娜, 张志如. 基于深度学习的我国北方常见天然草地植物识别[J]. 生物多样性, 2024, 32(4): 23435-.
[6]	黄万涛, 郝泽周, 张梓欣, 肖治术, 张承云. 被动声学监测设备性能比较及对鸟声识别的影响[J]. 生物多样性, 2024, 32(10): 24273-.
[7]	申小虎, 李冠宇, 史洪飞, 王传之. 数据不平衡下鸟声识别的集成学习策略[J]. 生物多样性, 2024, 32(10): 24215-.
[8]	李乐, 张承云, 裴男才, 高丙涛, 王娜, 李嘉睿, 武瑞琛, 郝泽周. 基于被动声学监测技术的城市绿地景观格局与鸟类多样性关联分析[J]. 生物多样性, 2024, 32(10): 24296-.
[9]	郝泽周, 张承云, 李乐, 高丙涛, 曾伟, 王淳, 王梓炫, 黄万涛, 张悦, 裴男才, 肖治术. 城市鸟类多样性被动声学监测与评价技术应用[J]. 生物多样性, 2024, 32(10): 24123-.
[10]	郭倩茸, 段淑斐, 谢捷, 董雪燕, 肖治术. 鸟声标注技术及其在被动声学监测中的应用[J]. 生物多样性, 2024, 32(10): 24313-.
[11]	谢将剑, 沈忱, 张飞宇, 肖治术. 融合音频及生态位信息的跨地域鸟类物种识别方法[J]. 生物多样性, 2024, 32(10): 24259-.
[12]	蔡建民, 何培宇, 杨智鹏, 李露莹, 赵启军, 潘帆. 基于深度特征融合的鸟鸣识别方法及其可解释性分析[J]. 生物多样性, 2023, 31(7): 23087-.
[13]	殷鲁秦, 王成, 韩文静. 基于取食行为探究北京居民区鸟类的食源特征及多样性[J]. 生物多样性, 2023, 31(5): 22473-.
[14]	黄雨菲, 路春燕, 贾明明, 王自立, 苏越, 苏艳琳. 基于无人机影像与面向对象-深度学习的滨海湿地植物物种分类[J]. 生物多样性, 2023, 31(3): 22411-.
[15]	边琦, 王成, 程贺, 韩丹, 赵伊琳, 殷鲁秦. 声学指数在城市森林鸟类多样性评估中的应用[J]. 生物多样性, 2023, 31(1): 22080-.