生物多样性 ›› 2023, Vol. 31 ›› Issue (11): 23272. DOI: 10.17520/biods.2023272
收稿日期:
2023-07-31
接受日期:
2023-10-12
出版日期:
2023-11-20
发布日期:
2023-12-08
通讯作者:
* E-mail: 基金资助:
Xiaohu Shen1,2,*(), Xiangyu Zhu1, Hongfei Shi2, Chuanzhi Wang3
Received:
2023-07-31
Accepted:
2023-10-12
Online:
2023-11-20
Published:
2023-12-08
Contact:
* E-mail: 摘要:
监测生态系统中鸟类多样性的状态和趋势是一项重大挑战, 需要广泛适用的基于机器学习的鸟鸣识别算法。为准确把握基于机器学习的鸟声识别方法的研究现状与发展趋势, 本文介绍了鸟鸣识别任务的基本概念, 并从模型结构设计角度对基于机器学习的鸟鸣识别算法进行概述。鉴于基于机器学习的鸟鸣识别技术的跨学科性质, 根据研究方向将算法分为: 概率模型(probabilistic model)、模板匹配(template matching)、时序分析(time series analysis)、迁移学习(transfer learning)、数据融合(data fusion)、集成学习(ensemble learning)、度量学习(metric learning)和无监督聚类(unsupervised clustering)的鸟鸣识别算法。本文回顾了这些方法在完成鸟声识别任务时的技术脉络, 以及这些算法的特点和局限性, 并比较了它们在鸟鸣识别方面的有效性。本文还讨论了常用的标准化鸟声开源数据集和评估指标。最后, 本文指出当前方法所面临的挑战和该领域潜在的未来研究方向。本综述旨在为从事鸟声识别研究的学者和开发人员提供一个全面的参考框架, 以便更好地理解现有技术和潜在发展趋势。
申小虎, 朱翔宇, 史洪飞, 王传之 (2023) 基于机器学习鸟声识别算法研究进展. 生物多样性, 31, 23272. DOI: 10.17520/biods.2023272.
Xiaohu Shen, Xiangyu Zhu, Hongfei Shi, Chuanzhi Wang (2023) Research progress of birdsong recognition algorithms based on machine learning. Biodiversity Science, 31, 23272. DOI: 10.17520/biods.2023272.
图1 鸟声识别算法发展历程。DTW: 动态时间规整; SVM: 支持向量机; CNN: 卷积神经网络; DNN: 深度神经网络。
Fig. 1 Development history of birdsong recognition algorithms. DTW, Dynamic time warping; SVM, Support vector machine; CNN, Convolutional neural networks; DNN, Deep neural network.
图4 长短时记忆网络(LSTM)、门控循环单元(GRU)与时序网络单元(LMU)的结构对比
Fig. 4 Comparison between long short-term memory (LSTM), gate recurrent unit (GRU), and Legendre memory units (LMU) layer structures
文献方法 Literature | 所属类别 Category | 输入 Input | 基础网络 Basic network | 优点 Advantage | 缺点 Disadvantage | 特定问题 Specific issue | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
颜鑫和李应, | 概率模型 Probabilistic model | 抗噪幂归一化倒谱系数 Anti-noise power normalized cepstral coefficients (APNCC) | SVM | 两阶段去噪得到更好的抗噪信息表征 Two-stage denoising for better anti-noise information representation | 滤除部分前景信息, 在纯净条件下识别率下降 Causing a decrease in recognition rate under pure conditions | 环境中的非平稳噪声 Non-stationary noise | ||||||||||||
Joly et al, | 概率模型 Probabilistic model | 梅尔倒谱系数 Mel frequency cepstral coefficient (MFCC) | KNN | 采用语义过滤方案过滤了非相关信息 Using semantic filtering to filter out irrelevant information | 弱监督学习容易导致标签噪声 Weak supervised learning can easily lead to label noise | ? | ||||||||||||
Lasseck, | 概率模型 Probabilistic model | 低级描述符 Low-level descriptors (LLDs) | DT | 借助特征关注区域降低了模板匹配时间, 提升了泛用性 By utilizing feature focus areas, template matching time is reduced and universality is improved | 特征选择的过程较为复杂 The process of feature selection is relatively complex | ? | ||||||||||||
杨春勇等, | 概率模型 Probabilistic model | 局部二值模式、方向梯度直方图 Local binary pattern (LBP), histogram of oriented gradient (HOG) | KNN | 鸣声能量谱图边缘特征能较好拟合鸟声信息 The edge features of sound energy spectrum are well fitted with bird sound information | HOG特征维度大, 不利于大规模计算 The large dimensionality of HOG features is not conducive to large-scale computing | ? | ||||||||||||
韩鹏飞和陈晓, | 概率模型 Probabilistic model | 梅尔倒谱系数、翻转梅尔倒谱系数 Mel frequency cepstral coefficient (MFCC), Inverted Mel frequency cepstral coefficient (IMFCC) | GA-SVM | 利用IMFCC表征稀疏的高频部分信息 Using IMFCC to represent sparse high-frequency information | 特征权重具有随机性 The feature weights have randomness | ? | ||||||||||||
Kaewtip et al, | 模板匹配 Template matching | 时频图 Spectrogram | SVM | 利用DTW和高能时频区域获得噪声鲁棒模板 Using DTW and high-energy time-frequency regions to obtain noise-robust templates | 噪声能量导致高能区域被覆盖, 其算法性能会产生退化 Noise energy causes high-energy areas to be covered, resulting in degradation of algorithm performance | 有限训练数据 Limited training data | ||||||||||||
孙斌等, | 模板匹配 Template matching | 最优核时频分布 Adaptive optimal kernel (AOK) | ? | 时频模板特征数据量小可降低匹配计算量 The small amount of feature data in time-frequency templates can reduce the amount of matching computation | 灰度共生法提取特征计算量较大, 算法复杂性高 The gray level co-occurrence method requires a large amount of computation for feature extraction and has high algorithm complexity | ? | ||||||||||||
Gupta et al, | 时间序列 Timing analysis | 时频图 Spectrogram | CNN-LSTM CNN-GRU CNN-LMU | LMU单元使用差异化正交化记忆机制, 提升长时依赖能力并减少了模型参数 The LMU unit uses a differentiated orthogonalization memory mechanism to enhance long-term dependency and reduce model parameters | ? | 大规模鸟类预测 Large scale bird species prediction | ||||||||||||
Qiao et al, | 时间序列 Timing analysis | 时频图 Spectrogram | BiRNN-BiRNN | 无监督序列到序列模型来学习更高层表示, 从而有效获得上下文信息 Unsupervised sequence to sequence model for learning higher-level representations and effectively obtaining effective contextual information | 缺乏合理的缺失值处理机制 Lack of a mechanism for handling missing values | ? | ||||||||||||
Carvalho & Gomes, | 时间序列 Timing analysis | 梅尔倒谱系数、梅尔时 频图 MFCC, Mel spectrogram | LSTM、GRU、CRNN等 | CNN-GRU降低了计算复杂度, 更易收敛 CNN-GRU reduces computational complexity and makes convergence easier | 仍然不能完全解决梯度消失问题 Still unable to completely solve the problem of gradient disappearance | ? | ||||||||||||
文献方法 Literature | 所属类别 Category | 输入 Input | 基础网络 Basic network | 优点 Advantage | 缺点 Disadvantage | 特定问题 Specific issue | ||||||||||||
Lasseck, 2019 | 迁移学习 Transfer learning | 时频图 Spectrogram | Inception ResNet | 采用不同数据增强技术提升了精度与模型泛用性 Utilizing different data augmentation techniques to improve accuracy and model universality | 训练时间长 Learning for more time | 多标签分类问题 Multi- label classification issues | ||||||||||||
Ntalampiras, 2018 | 迁移学习 Transfer learning | 梅尔倒谱系数 MFCC | HMM | 通过音乐分类概率密度分布来获取鸟声分类知识 Obtaining knowledge of bird sound classification through probability density distribution of music classification | 非平稳噪声会导致知识迁移效果差 Non stationary noise leads to poor knowledge transfer performance | 非深度框架的迁移学习 Transfer learning in traditional machine learning frameworks | ||||||||||||
LeBien et al, | 迁移学习 Transfer learning | 时频图 Spectrogram | ResNet50 | 通过假阳性检测训练来整合每个类的相关缺失信息 Integrate relevant missing information for each class by false positive detection training | ? | 跨物种知识迁移 Cross species knowledge transfer | ||||||||||||
谢将剑等, | 数据融合 Data-fusion | 短时傅里叶变换、Mel倒谱变换、线调频小波变换 Short-time Fourier transform (STFT), Mel frequency cepstral transform (MFCT), Chirplet transform (CT) | VGG | 特征加权确保在特征融合下不增加特征维度 Feature weighting ensures that feature dimensions are not added during feature fusion | 未考虑不同语图条件下的模型结构 Model structure without considering different spectrograms | ? | ||||||||||||
Xie et al, 2019 | 数据融合 Data-fusion | 梅尔时频图、谐波谱图、瞬态响应谱图 Mel-spectrogram, Harmonic-component, Percussive- component | VGG | 三种谱图表征了鸟声中的不同成分, 同时分别训练避免不同特征分量间的干扰 Three spectrograms represent different components of birdsong, while training separately to avoid interference between different feature components | 训练效率较低 Low training efficiency | ? | ||||||||||||
Salamon et al, | 数据融合 Data-fusion | 时频图 Spectrogram | SKM、CNN | 充分挖掘了模型对不同特征预测的互补特性 Fully mining the complementary characteristics of the model for predicting different features | 易产生决策结果偏差 Easy to generate deviation in decision results | ? | ||||||||||||
Bold et al, 2019 | 数据融合 Data-fusion | 鸟类图像、时频图 Bird images, spectrograms | CaffeNet | 双流多模态CNN在后期的融合策略使鸟类原始图像成为鸟声识别的有效补充 The dual-stream multimodal CNN fusion strategy in the later stage makes the bird images an effective supplement to bird sound recognition | 融合后数据特征维度过高易导致实时性差、系统性能降低 High dimensionality of fused data features can lead to poor real-time performance and reduced system performance | ? | ||||||||||||
Xie et al, | 数据融合 Data-fusion | MFCC融合特征图 MFCC fusion feature map | DenseNet 121 | 模型空间复杂度较低 Low model space complexity | 训练过度消耗内存, 不适合大规模训练 Excessive memory consumption during training, not suitable for large-scale training | |||||||||||||
Conde et al, | 集成学习 Ensemble learning | 时频图 Spectrogram | ResNeSt-50 EfficientNet DenseNet 121 | 使用多标签来提升鸟类种类的预测概率 Using multi- lables to improve the prediction probability of bird species | 模型堆叠泛用性不高 Low universality of model stacking | 弱监督鸟声分类问题 Weak supervised birdsong classification problem | ||||||||||||
文献方法 Literature | 所属类别 Category | 输入 Input | 基础网络 Basic network | 优点 Advantage | 缺点 Disadvantage | 特定问题 Specific issue | ||||||||||||
Morgan & Braasch, | 度量学习 Metric learning | 时频图 Spectrogram | VGG16 | 分层网络在无标记数据条件下实现显著的性能提升 Layered networks achieve significant performance improvement under unlabeled data conditions | 过多依赖数据假设 Excessive reliance on data assumptions | 开放数据集 Open dataset | ||||||||||||
Acconcjai-oco & Ntalampir-as, | 度量学习 Metric learning | 时频图 Spectrogram | SNN | 同时对未知鸟类与已知鸟类之间的相似性和差异性进行度量 Simultaneously measuring the similarity and difference between unknown and known birds | 训练中对未标记数据的验证增加了算法复杂性 The validation of unlabeled data during training increases algorithm complexity | 开放数据集Open dataset | ||||||||||||
吴科毅等, | 无监督聚类 Unsupervised clustering | 时频图 Spectrogram | VAE | 过零率与能量的辅助判定, 可避免特征提取过程中产生的漏检 Assisted determination of zero crossing rate and energy to avoid missed detections during feature extraction process | 需要推断聚类数量 Need to infer the number of clusters | 多物种鸟鸣混叠音节Mixed syllables of bird songs from multiple species | ||||||||||||
Kahl et al, | 传统深度学习 radition -al deep learning | 时频图 Spectrogram | ResNet-157 | 多标签分类与混合训练提高了识别任务的整体性能 Multi label classification and mixed training improve the overall performance of recognition tasks | 对训练和推理计算能力要求较高 High requirements for training and reasoning and computing abilities | ? |
表1 基于机器学习的典型鸟声识别方法比较
Table 1 Comparison of typical machine learning based birdsong recognition methods
文献方法 Literature | 所属类别 Category | 输入 Input | 基础网络 Basic network | 优点 Advantage | 缺点 Disadvantage | 特定问题 Specific issue | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
颜鑫和李应, | 概率模型 Probabilistic model | 抗噪幂归一化倒谱系数 Anti-noise power normalized cepstral coefficients (APNCC) | SVM | 两阶段去噪得到更好的抗噪信息表征 Two-stage denoising for better anti-noise information representation | 滤除部分前景信息, 在纯净条件下识别率下降 Causing a decrease in recognition rate under pure conditions | 环境中的非平稳噪声 Non-stationary noise | ||||||||||||
Joly et al, | 概率模型 Probabilistic model | 梅尔倒谱系数 Mel frequency cepstral coefficient (MFCC) | KNN | 采用语义过滤方案过滤了非相关信息 Using semantic filtering to filter out irrelevant information | 弱监督学习容易导致标签噪声 Weak supervised learning can easily lead to label noise | ? | ||||||||||||
Lasseck, | 概率模型 Probabilistic model | 低级描述符 Low-level descriptors (LLDs) | DT | 借助特征关注区域降低了模板匹配时间, 提升了泛用性 By utilizing feature focus areas, template matching time is reduced and universality is improved | 特征选择的过程较为复杂 The process of feature selection is relatively complex | ? | ||||||||||||
杨春勇等, | 概率模型 Probabilistic model | 局部二值模式、方向梯度直方图 Local binary pattern (LBP), histogram of oriented gradient (HOG) | KNN | 鸣声能量谱图边缘特征能较好拟合鸟声信息 The edge features of sound energy spectrum are well fitted with bird sound information | HOG特征维度大, 不利于大规模计算 The large dimensionality of HOG features is not conducive to large-scale computing | ? | ||||||||||||
韩鹏飞和陈晓, | 概率模型 Probabilistic model | 梅尔倒谱系数、翻转梅尔倒谱系数 Mel frequency cepstral coefficient (MFCC), Inverted Mel frequency cepstral coefficient (IMFCC) | GA-SVM | 利用IMFCC表征稀疏的高频部分信息 Using IMFCC to represent sparse high-frequency information | 特征权重具有随机性 The feature weights have randomness | ? | ||||||||||||
Kaewtip et al, | 模板匹配 Template matching | 时频图 Spectrogram | SVM | 利用DTW和高能时频区域获得噪声鲁棒模板 Using DTW and high-energy time-frequency regions to obtain noise-robust templates | 噪声能量导致高能区域被覆盖, 其算法性能会产生退化 Noise energy causes high-energy areas to be covered, resulting in degradation of algorithm performance | 有限训练数据 Limited training data | ||||||||||||
孙斌等, | 模板匹配 Template matching | 最优核时频分布 Adaptive optimal kernel (AOK) | ? | 时频模板特征数据量小可降低匹配计算量 The small amount of feature data in time-frequency templates can reduce the amount of matching computation | 灰度共生法提取特征计算量较大, 算法复杂性高 The gray level co-occurrence method requires a large amount of computation for feature extraction and has high algorithm complexity | ? | ||||||||||||
Gupta et al, | 时间序列 Timing analysis | 时频图 Spectrogram | CNN-LSTM CNN-GRU CNN-LMU | LMU单元使用差异化正交化记忆机制, 提升长时依赖能力并减少了模型参数 The LMU unit uses a differentiated orthogonalization memory mechanism to enhance long-term dependency and reduce model parameters | ? | 大规模鸟类预测 Large scale bird species prediction | ||||||||||||
Qiao et al, | 时间序列 Timing analysis | 时频图 Spectrogram | BiRNN-BiRNN | 无监督序列到序列模型来学习更高层表示, 从而有效获得上下文信息 Unsupervised sequence to sequence model for learning higher-level representations and effectively obtaining effective contextual information | 缺乏合理的缺失值处理机制 Lack of a mechanism for handling missing values | ? | ||||||||||||
Carvalho & Gomes, | 时间序列 Timing analysis | 梅尔倒谱系数、梅尔时 频图 MFCC, Mel spectrogram | LSTM、GRU、CRNN等 | CNN-GRU降低了计算复杂度, 更易收敛 CNN-GRU reduces computational complexity and makes convergence easier | 仍然不能完全解决梯度消失问题 Still unable to completely solve the problem of gradient disappearance | ? | ||||||||||||
文献方法 Literature | 所属类别 Category | 输入 Input | 基础网络 Basic network | 优点 Advantage | 缺点 Disadvantage | 特定问题 Specific issue | ||||||||||||
Lasseck, 2019 | 迁移学习 Transfer learning | 时频图 Spectrogram | Inception ResNet | 采用不同数据增强技术提升了精度与模型泛用性 Utilizing different data augmentation techniques to improve accuracy and model universality | 训练时间长 Learning for more time | 多标签分类问题 Multi- label classification issues | ||||||||||||
Ntalampiras, 2018 | 迁移学习 Transfer learning | 梅尔倒谱系数 MFCC | HMM | 通过音乐分类概率密度分布来获取鸟声分类知识 Obtaining knowledge of bird sound classification through probability density distribution of music classification | 非平稳噪声会导致知识迁移效果差 Non stationary noise leads to poor knowledge transfer performance | 非深度框架的迁移学习 Transfer learning in traditional machine learning frameworks | ||||||||||||
LeBien et al, | 迁移学习 Transfer learning | 时频图 Spectrogram | ResNet50 | 通过假阳性检测训练来整合每个类的相关缺失信息 Integrate relevant missing information for each class by false positive detection training | ? | 跨物种知识迁移 Cross species knowledge transfer | ||||||||||||
谢将剑等, | 数据融合 Data-fusion | 短时傅里叶变换、Mel倒谱变换、线调频小波变换 Short-time Fourier transform (STFT), Mel frequency cepstral transform (MFCT), Chirplet transform (CT) | VGG | 特征加权确保在特征融合下不增加特征维度 Feature weighting ensures that feature dimensions are not added during feature fusion | 未考虑不同语图条件下的模型结构 Model structure without considering different spectrograms | ? | ||||||||||||
Xie et al, 2019 | 数据融合 Data-fusion | 梅尔时频图、谐波谱图、瞬态响应谱图 Mel-spectrogram, Harmonic-component, Percussive- component | VGG | 三种谱图表征了鸟声中的不同成分, 同时分别训练避免不同特征分量间的干扰 Three spectrograms represent different components of birdsong, while training separately to avoid interference between different feature components | 训练效率较低 Low training efficiency | ? | ||||||||||||
Salamon et al, | 数据融合 Data-fusion | 时频图 Spectrogram | SKM、CNN | 充分挖掘了模型对不同特征预测的互补特性 Fully mining the complementary characteristics of the model for predicting different features | 易产生决策结果偏差 Easy to generate deviation in decision results | ? | ||||||||||||
Bold et al, 2019 | 数据融合 Data-fusion | 鸟类图像、时频图 Bird images, spectrograms | CaffeNet | 双流多模态CNN在后期的融合策略使鸟类原始图像成为鸟声识别的有效补充 The dual-stream multimodal CNN fusion strategy in the later stage makes the bird images an effective supplement to bird sound recognition | 融合后数据特征维度过高易导致实时性差、系统性能降低 High dimensionality of fused data features can lead to poor real-time performance and reduced system performance | ? | ||||||||||||
Xie et al, | 数据融合 Data-fusion | MFCC融合特征图 MFCC fusion feature map | DenseNet 121 | 模型空间复杂度较低 Low model space complexity | 训练过度消耗内存, 不适合大规模训练 Excessive memory consumption during training, not suitable for large-scale training | |||||||||||||
Conde et al, | 集成学习 Ensemble learning | 时频图 Spectrogram | ResNeSt-50 EfficientNet DenseNet 121 | 使用多标签来提升鸟类种类的预测概率 Using multi- lables to improve the prediction probability of bird species | 模型堆叠泛用性不高 Low universality of model stacking | 弱监督鸟声分类问题 Weak supervised birdsong classification problem | ||||||||||||
文献方法 Literature | 所属类别 Category | 输入 Input | 基础网络 Basic network | 优点 Advantage | 缺点 Disadvantage | 特定问题 Specific issue | ||||||||||||
Morgan & Braasch, | 度量学习 Metric learning | 时频图 Spectrogram | VGG16 | 分层网络在无标记数据条件下实现显著的性能提升 Layered networks achieve significant performance improvement under unlabeled data conditions | 过多依赖数据假设 Excessive reliance on data assumptions | 开放数据集 Open dataset | ||||||||||||
Acconcjai-oco & Ntalampir-as, | 度量学习 Metric learning | 时频图 Spectrogram | SNN | 同时对未知鸟类与已知鸟类之间的相似性和差异性进行度量 Simultaneously measuring the similarity and difference between unknown and known birds | 训练中对未标记数据的验证增加了算法复杂性 The validation of unlabeled data during training increases algorithm complexity | 开放数据集Open dataset | ||||||||||||
吴科毅等, | 无监督聚类 Unsupervised clustering | 时频图 Spectrogram | VAE | 过零率与能量的辅助判定, 可避免特征提取过程中产生的漏检 Assisted determination of zero crossing rate and energy to avoid missed detections during feature extraction process | 需要推断聚类数量 Need to infer the number of clusters | 多物种鸟鸣混叠音节Mixed syllables of bird songs from multiple species | ||||||||||||
Kahl et al, | 传统深度学习 radition -al deep learning | 时频图 Spectrogram | ResNet-157 | 多标签分类与混合训练提高了识别任务的整体性能 Multi label classification and mixed training improve the overall performance of recognition tasks | 对训练和推理计算能力要求较高 High requirements for training and reasoning and computing abilities | ? |
方法文献 Literature | 数据增强 Augmentation | 评价标准 Evaluation criteria | 实验结果 Test result (%) | 鸟类种数 Number of bird species | 测试数据集 Test dataset |
---|---|---|---|---|---|
韩雪等, | 否 No | 识别平均精度 c-mAP | 95.31 | 11 | Macaulay library |
颜鑫和李应, | 否 No | 识别平均精度 c-mAP | 94.12 | 34 | Freesound |
Joly et al, | 否 No | 识别平均精度 c-mAP | 36.5 | 501 | Xeno-canto |
Zabidi et al, | 否 No | 识别平均精度 c-mAP | 94.08 | 10 | Xeno-canto |
吴科毅等, | 否 No | 识别平均精度 c-mAP | 89.6 | 10 | 白云山数据集 Baiyunshan dataset |
Kahl et al, | 是 Yes | 识别平均精度 c-mAP | 79.1 | 84 | Xeno-canto |
Ntalampiras, 2018 | 是 Yes | 识别平均精度 c-mAP | 92.5 | 10 | Xeno-canto |
Lasseck, 2019 | 是 Yes | 识别平均精度 c-mAP | 35.6 | 659 | Xeno-canto |
Carvalho & Gomes, | 否 No | 识别平均精度 c-mAP | 44.3 | 91 | 自建库 Self-building database |
LeBien et al, | 否 No | 识别平均精度 c-mAP | 89.3 | 24 | Elyunk National Forest |
Xie et al, | 否 No | 识别平均精度 c-mAP | 96.9 | 10 | Xeno-canto |
孙斌等, | 否 No | 识别平均精度 c-mAP | 96.0 | 40 | 自建库 Self-building database |
Salamon et al, | 是 Yes | 识别平均精度 c-mAP | 96.0 | 43 | CLO-43DS |
Xie et al, 2019 | 否 No | 识别平均精度 c-mAP | 86.3 | 43 | CLO-43DS |
谢将剑等, | 否 No | 识别平均精度 c-mAP | 89.4 | 35 | ICML4B |
Morgan & Braasch, | 否 No | 准确率 Accuracy | 92.4 | 12 | 自建库 Self-building database |
Acconcjaioco & Ntalampiras, | 否 No | 准确率 Accuracy | 97.4 | 6 | Xeno-canto |
表2 不同鸟声识别算法的实验结果比较
Table 2 Comparison of experimental results of birdsong recognition algorithms
方法文献 Literature | 数据增强 Augmentation | 评价标准 Evaluation criteria | 实验结果 Test result (%) | 鸟类种数 Number of bird species | 测试数据集 Test dataset |
---|---|---|---|---|---|
韩雪等, | 否 No | 识别平均精度 c-mAP | 95.31 | 11 | Macaulay library |
颜鑫和李应, | 否 No | 识别平均精度 c-mAP | 94.12 | 34 | Freesound |
Joly et al, | 否 No | 识别平均精度 c-mAP | 36.5 | 501 | Xeno-canto |
Zabidi et al, | 否 No | 识别平均精度 c-mAP | 94.08 | 10 | Xeno-canto |
吴科毅等, | 否 No | 识别平均精度 c-mAP | 89.6 | 10 | 白云山数据集 Baiyunshan dataset |
Kahl et al, | 是 Yes | 识别平均精度 c-mAP | 79.1 | 84 | Xeno-canto |
Ntalampiras, 2018 | 是 Yes | 识别平均精度 c-mAP | 92.5 | 10 | Xeno-canto |
Lasseck, 2019 | 是 Yes | 识别平均精度 c-mAP | 35.6 | 659 | Xeno-canto |
Carvalho & Gomes, | 否 No | 识别平均精度 c-mAP | 44.3 | 91 | 自建库 Self-building database |
LeBien et al, | 否 No | 识别平均精度 c-mAP | 89.3 | 24 | Elyunk National Forest |
Xie et al, | 否 No | 识别平均精度 c-mAP | 96.9 | 10 | Xeno-canto |
孙斌等, | 否 No | 识别平均精度 c-mAP | 96.0 | 40 | 自建库 Self-building database |
Salamon et al, | 是 Yes | 识别平均精度 c-mAP | 96.0 | 43 | CLO-43DS |
Xie et al, 2019 | 否 No | 识别平均精度 c-mAP | 86.3 | 43 | CLO-43DS |
谢将剑等, | 否 No | 识别平均精度 c-mAP | 89.4 | 35 | ICML4B |
Morgan & Braasch, | 否 No | 准确率 Accuracy | 92.4 | 12 | 自建库 Self-building database |
Acconcjaioco & Ntalampiras, | 否 No | 准确率 Accuracy | 97.4 | 6 | Xeno-canto |
挑战赛 Challenge | 排名 Rank | 采用模型 Network adopted | 评估得分 Scores | 物种数 No. of species | 相关文献 Related literature | 多标签 Multi-label | 备注 Comments |
---|---|---|---|---|---|---|---|
BirdCLEF2023 | 1 | NFNet/ConvNeXt/ ConvNeXtV2 | c-mAP: 0.7639 | 264 | - | 是 Yes | 集成学习 Ensemble learning |
2 | EfficientNetV2/ ResNet-34/ EfficientNet-B0/ EfficientNet-B3 | c-mAP: 0.7637 | - | 集成学习 Ensemble learning | |||
3 | EfficientNet-B0/ EfficientNetV2 | c-mAP: 0.7631 | - | 集成学习 Ensemble learning | |||
BirdCLEF2022 | 1 | EfficientNet-B3/ NFNet | macro F1: 0.8527 | 113 | - | 是 Yes | 集成学习 Ensemble learning |
2 | ReNeXt50/ EfficientNet-B0/ EfficientNetV2/ NFNet | macro F1: 0.8438 | - | 集成学习 Ensemble learning | |||
BirdCLEF2021 | 1 | ResNeSt | micro averaged F1: 0.6932 | 397 | - | 是 Yes | - |
2 | ResNet-34/ EfficientNetV2 | micro averaged F1: 0.6893 | - | 集成学习 Ensemble learning | |||
3 | ResNet-50/ EfficientNet-B2~B7 | micro averaged F1: 0.6891 | - | 集成学习 Ensemble learning | |||
10 | ResNeSt-50 EfficientNet DenseNet121 | micro averaged F1: 0.6738 | Conde et al, | 集成学习 Ensemble learning | |||
BirdCLEF2020 | 1 | 由NAS定义 Built by NAS | c-mAP: 0.128 | 960 | Voelker et al, 2019 | 是 Yes | - |
2 | Xception | c-mAP: 0.042 | Bai et al, | - | |||
3 | Alexnet | c-mAP: 0.063 | Muhling et al, | - | |||
BirdCLEF2019 | 1 | ResNet/ Inception | c-mAP: 0.356 | 659 | Lasseck, 2019 | 是 Yes | - |
2 | ResNet/ Inception | c-mAP: 0.160 | Koh et al, 2019 | - | |||
3 | Inception-v3 | c-mAP: 0.054 | Bai et al, 2019 | - |
表3 近5年鸟声识别挑战赛的成绩比较
Table 3 Comparison of the results of BirdCLEF in the past five years
挑战赛 Challenge | 排名 Rank | 采用模型 Network adopted | 评估得分 Scores | 物种数 No. of species | 相关文献 Related literature | 多标签 Multi-label | 备注 Comments |
---|---|---|---|---|---|---|---|
BirdCLEF2023 | 1 | NFNet/ConvNeXt/ ConvNeXtV2 | c-mAP: 0.7639 | 264 | - | 是 Yes | 集成学习 Ensemble learning |
2 | EfficientNetV2/ ResNet-34/ EfficientNet-B0/ EfficientNet-B3 | c-mAP: 0.7637 | - | 集成学习 Ensemble learning | |||
3 | EfficientNet-B0/ EfficientNetV2 | c-mAP: 0.7631 | - | 集成学习 Ensemble learning | |||
BirdCLEF2022 | 1 | EfficientNet-B3/ NFNet | macro F1: 0.8527 | 113 | - | 是 Yes | 集成学习 Ensemble learning |
2 | ReNeXt50/ EfficientNet-B0/ EfficientNetV2/ NFNet | macro F1: 0.8438 | - | 集成学习 Ensemble learning | |||
BirdCLEF2021 | 1 | ResNeSt | micro averaged F1: 0.6932 | 397 | - | 是 Yes | - |
2 | ResNet-34/ EfficientNetV2 | micro averaged F1: 0.6893 | - | 集成学习 Ensemble learning | |||
3 | ResNet-50/ EfficientNet-B2~B7 | micro averaged F1: 0.6891 | - | 集成学习 Ensemble learning | |||
10 | ResNeSt-50 EfficientNet DenseNet121 | micro averaged F1: 0.6738 | Conde et al, | 集成学习 Ensemble learning | |||
BirdCLEF2020 | 1 | 由NAS定义 Built by NAS | c-mAP: 0.128 | 960 | Voelker et al, 2019 | 是 Yes | - |
2 | Xception | c-mAP: 0.042 | Bai et al, | - | |||
3 | Alexnet | c-mAP: 0.063 | Muhling et al, | - | |||
BirdCLEF2019 | 1 | ResNet/ Inception | c-mAP: 0.356 | 659 | Lasseck, 2019 | 是 Yes | - |
2 | ResNet/ Inception | c-mAP: 0.160 | Koh et al, 2019 | - | |||
3 | Inception-v3 | c-mAP: 0.054 | Bai et al, 2019 | - |
[1] | Acconcjaioco M, Ntalampiras S (2021) One-shot learning for acoustic identification of bird species in non-stationary environments. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 755-762. Milan, Italy. |
[2] |
Acevedo MA, Corrada-Bravo CJ, Corrada-Bravo H, Villanueva-Rivera LJ, Aide TM (2009) Automated classification of bird and amphibian calls using machine learning: A comparison of methods. Ecological Informatics, 4, 206-214.
DOI URL |
[3] | Adavanne S, Drossos K, Çakir E, Virtanen T (2017) Stacked convolutional and recurrent neural networks for bird audio detection. In: 201725th European Signal Processing Conference EUSIPCO,pp.1729-1733. Kos, Greece. |
[4] |
Anderson SE, Dave AS, Margoliash D (1996) Template-based automatic recognition of birdsong syllables from continuous recordings. Journal of the Acoustical Society of America, 100, 1209-1219.
PMID |
[5] | Bai J, Chen C, Chen J (2020) Xception based system for bird sound detection. In: CLEF Working Notes 2020, CLEF: Conference and Labs of the Evaluation Forum. Thessaloniki, Greece. |
[6] | Bai J, Wang B, Chen C, Fu Z, Chen J (2019) Inception-v3 based method of LifeCLEF 2019 bird recognition. In: CLEFWorking Notes 2019, pp. 9-12. Lugano, Switzerland. |
[7] | Bai J, Wu R, Wang M (2018) CIAIC-BAD system for DCASE 2018 challenge task3. In: Detection and Classification of Acoustic Scenes and Events 2018. Woking, Surrey, UK. |
[8] | Bendale A, Boult TE (2016) Towards open set deep networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1563-1572. Las Vegas, NV, USA. |
[9] |
Bold N, Zhang C, Akashi T (2019) Cross-domain deep feature combination for bird species classification with audio-visual data. IEICE Transactions on Information and Systems, E102.D, 2033-2042.
DOI URL |
[10] | Boulmaiz A, Messadeg D, Doghmane N, Taleb-Ahmed A (2017) Design and implementation of a robust acoustic recognition system for waterbird species using TMS320C6713 DSK. International Journal of Ambient Computing & Intelligence, 8, 98-118. |
[11] | Briggs F, Raich R, Fern XZ (2009) Audio classification of bird species: A statistical manifold approach. In: 2009 Ninth IEEE International Conference on Data Mining, Miami Beach, pp. 51-60. Florida, USA. |
[12] | Brock A, De S, Smith SL (2021) High-performance large-scale image recognition without normalization. In: 38th International Conference on Machine Learning (PMLR), 139, 1059-1071. |
[13] |
Carvalho S, Gomes EF (2023) Automatic classification of bird sounds: Using MFCC and Mel spectrogram features with deep learning. Vietnam Journal of Computer Science, 10, 39-54.
DOI URL |
[14] | Chandu B, Munikoti A, Murthy KS, Murthy VG, Nagaraj C (2020) Automated bird species identification using audio signal processing and neural networks. In: 2020International Conference on Artificial Intelligence and Signal Processing AISP,pp.1-5. Amaravati, India. |
[15] | Chen SS, Li Y (2014) Applying random forest classifier combined with time-frequency texture features to bird sounds recognition. Computer Applications and Software, 31, 154-157, 161. (in Chinese with English abstract) |
[陈莎莎, 李应 (2014) 结合时-频纹理特征的随机森林分类器应用于鸟声识别. 计算机应用与软件, 31, 154-157, 161.] | |
[16] | Chou CH, Ko HY (2011) Automatic birdsong recognition with MFCC based syllable feature extraction. In: InternationalConference on Ubiquitous Intelligence and Computing,pp.185-196. Banff, Canda. |
[17] | Clementino T, Colonna JG (2020) Using triplet loss for bird species recognition on BirdCLEF 2020. In: Conferenceand Labs of the Evaluation Forum,pp.22-25. Thessaloniki, Greece. |
[18] | Conde MV, Shubham K, Agnihotri P, Movva ND, Bessenyei S (2021) Weakly-supervised classification and detection of bird sounds in the wild. A BirdCLEF 2021 solution. arXiv: 2107.04878. https://arxiv.org/abs/2107.04878. |
[19] |
Dai YS, Yang J, Dong YW, Zou HP, Hu MZ, Wang B (2021) Blind source separation-based IVA-Xception model for bird sound recognition in complex acoustic environments. Electronics Letters, 57, 454-456.
DOI URL |
[20] | Das N, Mondal A, Chaki J, Padhy N, Dey N (2020) Machine learning models for bird species recognition based on vocalization:A succinct review. In: InformationTechnology and Intelligent Transportation Systems ITITS,pp.1-9. Xi’an, China. |
[21] | De Oliveira AG, Ventura TM, Ganchev TD, Silva LNS, Marques MI, Schuchmann KL (2020) Speeding up training of automated bird recognizers by data reduction of audio features. PeerJ, 8, e8407. |
[22] | Fritzler A, Koitka S, Friedrich CM (2017) Recognizing bird species in audio files using transfer learning. In: Conferenceand Labs of the Evaluation Forum, 1866-pp. 1882. Dublin, Ireland. |
[23] |
Ghani B, Hallerberg S (2021) A randomized bag-of-birds approach to study robustness of automated audio based bird species classification. Applied Sciences, 11, 9226-9242.
DOI URL |
[24] |
Gupta G, Kshirsagar M, Zhong M, Gholami S, Ferres JL (2021) Comparing recurrent convolutional neural networks for large scale bird species classification. Scientific Reports, 11, 17085.
DOI PMID |
[25] | Han PF, Chen X (2022) Bird sound recognition based on MFCC-IMFCC and GA-SVM. Computer Systems and Applications, 31(11), 393-399. (in Chinese with English abstract) |
[韩鹏飞, 陈晓 (2022) 基于MFCC-IMFCC和GA-SVM的鸟声识别. 计算机系统应用, 31(11), 393-399.] | |
[26] | Han X, Mu Y, Sheng GM (2023) Research on the CCPSO optimized SVM based bird sound recognition technology. Technical Acoustics, 42, 118-126. (in Chinese with English abstract) |
[韩雪, 慕昱, 盛桂敏 (2023) CCPSO优化支持向量机的鸟声识别技术研究. 声学技术, 42, 118-126.] | |
[27] | Hong TY, Zabidi MM (2021) Bird sound detection with convolutional neural networks using raw waveforms and spectrograms. In: InternationalSymposium on Applied Science and Engineering,pp.242-248. Erzurum, Turkey. |
[28] | Incze Á, Jancsó HB, Szilágyi Z, Farkas A, Sulyok C (2018) Bird sound recognition using a convolutional neural network. In: 2018 IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY), pp. 295-300. Subotica, Serbia. |
[29] | Jancovic P, Kokuer (2011) Automatic Detection and Recognition of Tonal Bird Sounds in Noisy Environments. Eurasip Journal on Advances in Signal Processing, 2011, 982936. |
[30] | Joly A, Champ J, Buisson O (2014) Instance-based bird species identication with undiscriminant features pruning. In: Cross-LanguageEvaluation Forum,pp.625-633. Sheffield, UK. |
[31] | Joly A, Leveau V, Champ J, Buisson O (2015) Shared nearest neighbors match kernel for bird songs identification. In: Cross-LanguageEvaluation Forum 2015, hal-01182784. Toulouse, France. |
[32] | Jung T, Jeon H, Jeon C, Cook A, Weiss A, Lee M, Smith AH (2019) Deep learning-based bird sound recognition system with data pre-processing. In: Academic Conference of Korea Electronics Engineering Association, pp. 756-759. Jeju Island, Korea. |
[33] |
Kaewtip K, Alwan A, Reilly C, Taylor CE (2016) A robust automatic birdsong phrase classification: A template-based approach. Journal of the Acoustical Society of America, 140, 3691-3701.
PMID |
[34] | Kaewtip K, Tan LN, Taylor CE, Alwan A (2015) Bird-phrase segmentation and verification: A noise-robust template-based approach. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 758-762. South Brisbane, QLD, Australia. |
[35] | Kahl S, Stter FR, Goau H, Glotin H, Planqué B, Vellinga WP, Joly A (2019) Overview of BirdCLEF 2019:Large-scale bird recognition in soundscapes. In: Cross-LanguageEvaluation Forum 2019. Lugano, Switzerland. |
[36] | Kahl S, Wilhelm-Stein T, Hussein H (2017) Large-scale bird sound classification using convolutional neural networks. In: Cross-LanguageEvaluation Forum. Amsterdam, Netherlands. |
[37] |
Kahl S, Wood CM, Eibl M, Klinck H (2021) BirdNET: A deep learning solution for avian diversity monitoring. Ecological Informatics, 61, 101236.
DOI URL |
[38] | Koh CY, Chang JY, Tai CL, Huang DY, Hsieh HH (2019) Bird sound classification using convolutional neural networks. In: Cross-LanguageEvaluation Forum. Lugano, Switzerland. |
[39] | Kong Q, Iqbal T, Yong X (2018) DCASE 2018 challenge surrey cross-task convolutional neural network baseline. In: Detectionand Classification of Acoustic Scenes and Events,pp.217-221. Woking, Surrey, UK. |
[40] | Lakshminarayanan B, Raich R, Fern X (2010) A syllable-level probabilistic framework for bird species identification. In: 2009International Conference on Machine Learning and Applications,pp.53-59. Miami, Florida, USA. |
[41] | Lasseck M (2015) Improved automatic bird identification through decision tree based feature selection and bagging. In: Cross-LanguageEvaluation Forum 2015. Toulouse, France. |
[42] | Lasseck M (2018) Acoustic bird detection with deep convolutional neural networks. In:Detection and Classification of Acoustic Scenes and Events 2018. Woking, Surrey, UK. |
[43] | Lasseck M (2019) Bird species identification in soundscapes. In:Conference and Labs of the Evaluation Forum 2019. Lugano, Switzerland. |
[44] |
LeBien J, Zhong M, Campos-Cerqueira M, Velev JP, Dodhia R, Ferres JL, Aide TM (2020) A pipeline for identification of bird and frog species in tropical soundscape recordings using a convolutional neural network. Ecological Informatics, 59, 101113.
DOI URL |
[45] | Li DP, Zhou XY, Ye R, Xia Y, Xu HN (2022) Bird sound recognition algorithm based on feature selection and GWO-KELM. Technical Acoustics, 41, 782-788. (in Chinese with English abstract) |
[李大鹏, 周晓彦, 叶如, 夏煜, 徐华南 (2022) 基于特征选择和GWO-KELM的鸟声识别算法. 声学技术, 41, 782-788.] | |
[46] | Li Y, Zhao M, Xu MY, Liu YF, Qian YC (2019) A survey of research on multi-source information fusion technology. Intelligent Computer and Applications, 9(5), 186-189. (in Chinese with English abstract) |
[李洋, 赵鸣, 徐梦瑶, 刘云飞, 钱雨辰 (2019) 多源信息融合技术研究综述. 智能计算机与应用, 9(5), 186-189.] | |
[47] | Liu HT, Jiang HY, Shu X, Xu Y, Wu YL, Guo XQ (2017) Recognition of multiple bird species in audio recordings based on feature transfer. Journal of Data Acquisition and Processing, 32, 1239-1247. (in Chinese with English abstract) |
[刘昊天, 姜海燕, 舒欣, 徐彦, 伍艳莲, 郭小清 (2017) 基于特征迁移的多物种鸟声识别方法. 数据采集与处理, 32, 1239-1247.] | |
[48] | Liu Z, Mao H, Wu CY (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition, arXiv: 2201.03545. https://arxiv.org/abs/2201.03545. |
[49] | Liu Z, Zhang YC, Hu HL (2017) Bird sound classification simulation in noisy environment based on random forests and large scale acoustic features. System Simulation Technology, 13, 359-362. (in Chinese with English abstract) |
[刘钊, 张宇琛, 胡海龙 (2017) 随机森林和大规模声学特征的噪声环境鸟声识别仿真. 系统仿真技术, 13, 359-362.] | |
[50] |
Liu ZH, Chen WJ, Chen AB (2022) Homologous spectrogram feature fusion with self-attention mechanism for bird sound classification. Journal of Computer Applications, 42, 1260-1268. (in Chinese with English abstract)
DOI |
[刘志华, 陈文洁, 陈爱斌 (2022) 基于自注意力机制时频谱同源特征融合的鸟鸣声分类. 计算机应用, 42, 1260-1268.]
DOI |
|
[51] | Lostanlen V, Salamon J, Farnsworth A, Kelling S, Bello JP (2018) Birdvox-full-night: A dataset and benchmark for avian flight call detection. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 266-270. Calgary, Canada. |
[52] |
Marin-Cudraz T, Muffat-Joly B, Novoa C, Aubry P, Desmet JF, Mahamoud-Issa M, Nicolè F, Van Niekerk MH, Mathevon N, Sèbe F (2019) Acoustic monitoring of rock ptarmigan: A multi-year comparison with point-count protocol. Ecological Indicators, 101, 710-719.
DOI |
[53] | Mehyadin AE, Abdulazeez AM, Hasan DA, Saeed JN (2021) Birds sound classification based on machine learning algorithms. Asian Journal of Research in Computer Science, 9(4), 1-11. |
[54] |
Mohanty R, Mallik BK, Solanki SS (2020) Automatic bird species recognition system using neural network based on spike. Applied Acoustics, 161, 107177.
DOI URL |
[55] |
Morgan MM, Braasch J (2022) Open set classification strategies for long-term environmental field recordings for bird species recognition. The Journal of the Acoustical Society of America, 151, 4028-4038.
DOI URL |
[56] | Muhling M, Franz J, Korfhage N, Freisleben B (2020) Bird species recognition via neural architecture search. In:Conference and Labs of the Evaluation Forum. Thessaloniki, Greece. |
[57] |
Murugaiya R, Abas PE, De Silva LC (2022) Probability enhanced entropy (PEE) novel feature for improved bird sound classification. Machine Intelligence Research, 19, 52-62.
DOI |
[58] |
Nanni L, Maguolo G, Brahnam S, Paci M (2021) An ensemble of convolutional neural networks for audio classification. Applied Sciences, 11, 5796.
DOI URL |
[59] | Narasimhan R, Fern XZ, Raich R (2017) Simultaneous segmentation and classification of bird song using CNN. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 146-150. New Orleans, Los Angeles, USA. |
[60] |
Ntalampiras S (2018) Bird species identification via transfer learning from music genres. Ecological Informatics, 44, 76-81.
DOI URL |
[61] |
Ntalampiras S, Potamitis I (2021) Acoustic detection of unknown bird species and individuals. CAAI Transactions on Intelligence Technology, 6, 291-300.
DOI URL |
[62] | Nugroho H, Widodo W, Rachman A (2019) Pattern recognition bird sounds based on their type using discreate cosine transform (DCT) and Gaussian methods. Kinetik Game Technology Information System Computer Network Computing Electronics and Control, 4, 233-240. |
[63] | Ou Y, Zhou XY, Li DP (2022) Experimental design of birdsong recognition based on CNN. Research and Exploration in Laboratory, 41(4), 99-102, 112. (in Chinese with English abstract) |
[欧昀, 周晓彦, 李大鹏 (2022) 基于卷积神经网络的鸟声识别实验设计. 实验室研究与探索, 41(4), 99-102, 112.] | |
[64] | Pankajakshan A, Thakur A, Thapar D (2018) All-conv net for bird activity detection:Significance of learned pooling. In: Interspeech 2018, pp. 2122-2126. Hyderabad, India. |
[65] |
Permana SDH, Saputra G, Arifitama B, Caesarendra W, Rahim R (2022) Classification of bird sounds as an early warning method of forest fires using convolutional neural network (CNN) algorithm. Journal of King Saud University - Computer and Information Sciences, 34, 4345-4357.
DOI URL |
[66] |
Petrusková T, Pišvejcová I, Kinštová A, Brinke T, Petrusek A (2016) Repertoire-based individual acoustic monitoring of a migratory passerine bird with complex song as an efficient tool for tracking territorial dynamics and annual return rates. Methods in Ecology and Evolution, 7, 274-284.
DOI URL |
[67] | Phaye S, Benetos E, Wang Y (2019) SubSpectralNet—Using sub-spectrogram based convolutional neural networks for acoustic scene classification. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 825-829. Brighton, UK. |
[68] | Priyadarshani N, Marsland S, Castro I (2018) Automated birdsong recognition in complex acoustic environments: A review. Journal of Avian Biology, 49, e01447. |
[69] |
Ptacek L, Machlica L, Linhart P, Jaska P, Muller L (2016) Automatic recognition of bird individuals on an open set using as-is recordings. Bioacoustics, 25, 55-73.
DOI URL |
[70] | Qiao Y, Qian K, Zhao ZP (2020) A survey on Chinese literature for bird sound recognition based on machine listening. Journal of Fudan University (Natural Science), 59, 375-380. (in Chinese with English abstract) |
[乔玉, 钱昆, 赵子平 (2020) 基于机器听觉的鸟声识别的中文研究综述. 复旦学报(自然科学版), 59, 375-380.] | |
[71] | Qiao Y, Qian K, Zhao ZP (2020) Learning higher representations from bioacoustics:A sequence-to-sequence deep learning approach for bird sound classification. In: InternationalConference on Neural Information Processing,pp.130-138. Bangkok, Thailand. |
[72] | Qiu ZB, Lu ZW, Wang HX, Kuang YJ (2022) Recognition of bird sounds related to power grid faults based on Mel spectrogram and convolutional neural network. Journal of South China University of Technology (Natural Science Edition), 50, 129-136. (in Chinese with English abstract) |
[邱志斌, 卢祖文, 王海祥, 况燕军 (2022) 基于Mel频谱图和CNN的电网涉鸟故障鸟声识别. 华南理工大学学报(自然科学版), 50, 129-136.]
DOI |
|
[73] | Rauch L, Schwinger R, Wirth M, Sick B, Tomforde S, Scholz C (2023) Active Bird2Vec: Towards end-to-end bird sound monitoring with transformers. arXiv:2308.07121. https://arxiv. org/abs/2308.07121. |
[74] | Salamon J, Bello JP, Farnsworth A, Kelling S (2017) Fusing shallow and deep learning for bioacoustic bird species classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 141-145. New Orleans, LA, USA. |
[75] | Salamon J, Bello JP, Farnsworth A, Robbins M, Keen S, Klinck H, Kelling S (2016) Towards the automatic classification of avian flight calls for bioacoustic monitoring. PLoS ONE, 11, e0166866. |
[76] | Sprengel E, Jaggi M, Kilcher Y, Hofmann T (2016) Audio based bird species identification using deep learning techniques. In: Conferenceand Labs of the Evaluation Forum CLEF,pp.547-559. Évora, Portugal. |
[77] | Stastny J, Munk M, Juranek L (2018) Automatic bird species recognition based on birds vocalization. EURASIP Journal on Audio, Speech, and Music Processing, 19, 1-7. |
[78] | Stowell D, Plumbley MD (2014) Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning. PeerJ, 2, e488. |
[79] | Sun B, Wan PW, Tao D, Zhao YX (2015) Identification of birds based on adaptive optimal kernel time-frequency distribution. Journal of Data Acquisition and Processing, 30, 1187-1195. (in Chinese with English abstract) |
[孙斌, 万鹏威, 陶达, 赵玉晓 (2015) 基于自适应最优核时频分布的鸟类识别. 数据采集与处理, 30, 1187-1195.] | |
[80] |
Tang Q, Xu LM, Zheng BC, He CL (2023) Transound: Hyper-head attention transformer for birds sound recognition. Ecological Informatics, 75, 102001.
DOI URL |
[81] |
Thakur A, Thapar D, Rajan P, Nigam A (2019) Deep metric learning for bioacoustic classification: Overcoming training data scarcity using dynamic triplet loss. The Journal of the Acoustical Society of America, 146, 534-547.
DOI URL |
[82] |
Vidaña-Vila E, Navarro J, Alsina-Pagès RM, Ramírez Á (2020) A two-stage approach to automatically detect and classify woodpecker (Fam. Picidae) sounds. Applied Acoustics, 166, 107312.
DOI URL |
[83] | Vilches E, Escobar IA, Vallejo EE, Taylor CE (2006) Data mining applied to acoustic bird species recognition. In: 18th International Conference on Pattern Recognition (ICPR'06), pp. 400-403. Hong Kong, China. |
[84] | Voelker AR, Kajic I, Eliasmith C (2019) Legendre memory units: Continuous-time representation in recurrent neural networks. In:Proceedings of the 33rd International Conference on Neural Information Processing Systems (2019 NIPS), pp. 15570-15579. Vancouver, Canada. |
[85] | Wang EZ, He DJ (2014) Bird recognition based on MFCC and dual-GMM. Computer Engineering and Design, 35, 1868-1871, 1881. (in Chinese with English abstract) |
[王恩泽, 何东健 (2014) 基于MFCC和双重GMM的鸟类识别方法. 计算机工程与设计, 35, 1868-1871, 1881.] | |
[86] | Wang JH, Zhou XY, Han ZC, Wang LL (2023) Small sample optimized bird sound recognition network based on bridging transformer. Journal of Applied Acoustics (accessed on 2023-08-16) (in Chinese with English abstract) |
[王基豪, 周晓彦, 韩智超, 王丽丽 (2023) 基于桥接Transformer的小样本优化鸟声识别网络. 应用声学, (网络首发时间: 2023-08-16)] http://kns.cnki.net/kcms/detail/11.2121.O4.20230815.1700.004.html. | |
[87] | Wei JM, Li Y (2015) Rapid bird sound recognition using anti-noise texture features. Acta Electronica Sinica, 43, 185-190. (in Chinese with English abstract) |
[魏静明, 李应 (2015) 利用抗噪纹理特征的快速鸟鸣声识别. 电子学报, 43, 185-190.]
DOI |
|
[88] |
Wu KY, Ruan WD, Zhou DF, Chen QC, Zhang CY, Pan XY, Yu S, Liu Y, Xiao RB (2023) Syllable clustering analysis-based passive acoustic monitoring technology and its application in bird monitoring. Biodiversity Science, 31, 22370. (in Chinese with English abstract)
DOI |
[吴科毅, 阮文达, 周棣锋, 陈庆春, 张承云, 潘新园, 余上, 刘阳, 肖荣波 (2023) 基于音节聚类分析的被动声学监测技术及其在鸟类监测中的应用. 生物多样性, 31, 22370.]
DOI |
|
[89] |
Xie J, Hu K, Zhu MY, Yu JH, Zhu QB (2019) Investigation of different CNN-based models for improved bird sound classification. IEEE Access, 7, 175353-175361.
DOI URL |
[90] | Xie JJ, Li WB, Zhang JG, Ding CQ (2018) Bird species recognition method based on Chirplet spectrogram feature and deep learning. Journal of Beijing Forestry University, 40(3), 122-127. (in Chinese with English abstract) |
[谢将剑, 李文彬, 张军国, 丁长青 (2018) 基于Chirplet语图特征和深度学习的鸟类物种识别方法. 北京林业大学学报, 40(3), 122-127.] | |
[91] | Xie JJ, Yang J, Xing ZL, Zhang Z, Chen X (2020) Bird species recognition method based on multi-feature fusion. Journal of Applied Acoustics, 39, 199-206. (in Chinese with English abstract) |
[谢将剑, 杨俊, 邢照亮, 张卓, 陈新 (2020) 多特征融合的鸟类物种识别方法. 应用声学, 39, 199-206.] | |
[92] |
Xie JJ, Zhong YJ, Zhang JG, Liu S, Ding CQ, Triantafyllopoulos A (2023) A review of automatic recognition technology for bird vocalizations in the deep learning era. Ecological Informatics, 73, 101927.
DOI URL |
[93] | Xie SN, Girshick R, Dollár P, Tu ZW, He KM (2017) Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5987-5995. Honolulu, HI, USA. |
[94] | Xie SS, Xu HF, Liu J, Zhang Y, Lv DJ (2021) Research on bird songs recognition based on MFCC-HMM. In: 2021 International Conference on Computer, Control and Robotics (ICCCR), pp. 262-266. Shanghai, China. |
[95] |
Xie ZF, Li DZ, Sun HX, Zhang AM (2023) Deep learning techniques for bird chirp recognition task. Biodiversity Science, 31, 22308. (in Chinese with English abstract)
DOI |
[谢卓钒, 李鼎昭, 孙海信, 张安民 (2023) 面向鸟鸣声识别任务的深度学习技术. 生物多样性, 31, 22308.]
DOI |
|
[96] | Xing ZL, Wu WY, Zhang ZX, Chen QL, Ni DM (2021) Bird song recognition method based on C-LSTM. Technology Innovation and Application, 11(15), 15-18. (in Chinese with English abstract) |
[邢照亮, 吴伟银, 张正晓, 陈麒麟, 倪东明 (2021) 基于C-LSTM的鸟鸣声识别方法. 科技创新与应用, 11(15), 15-18.] | |
[97] | Xu JW, Yang Y (2018) A survey of ensemble learning approaches. Journal of Yunnan University (Natural Sciences Edition), 40, 1082-1092. (in Chinese with English abstract) |
[徐继伟, 杨云 (2018) 集成学习方法: 研究综述. 云南大学学报(自然科学版), 40, 1082-1092.] | |
[98] | Xu SZ, Sun YN, Huangfu LY, Fang WQ (2018) Design of synthesized bird sounds classifier based on multi feature extraction classifiers and time-frequency chat. Research and Exploration in Laboratory, 37(9), 81-86, 91. (in Chinese with English abstract) |
[徐淑正, 孙忆南, 皇甫丽英, 方玮骐 (2018) 基于MFCC和时频图等多种特征的综合鸟声识别分类器设计. 实验室研究与探索, 37(9), 81-86, 91.] | |
[99] |
Yan X, Li Y (2013) Anti-noise power normalized cepstral coefficients in bird sounds recognition. Acta Electronica Sinica, 41, 295-300. (in Chinese with English abstract)
DOI |
[颜鑫, 李应 (2013) 利用抗噪幂归一化倒谱系数的鸟类声音识别. 电子学报, 41, 295-300.]
DOI |
|
[100] | Yang CY, Qi HD, Peng YQ, Yin B, Hou J, Shu ZY, Chen SP (2020) Research on the application of energy spectrum with voiceprint information in bird recognition. Journal of Applied Acoustics, 39, 453-463. (in Chinese with English abstract) |
[杨春勇, 祁宏达, 彭焱秋, 尹滨, 侯金, 舒振宇, 陈少平 (2020) 融合声纹信息的能量谱图在鸟类识别中的研究. 应用声学, 39, 453-463.] | |
[101] | Yin CC, Xu F, Zhang C (2022) Bird song recognition based on ERB loudness feature and deep learning. Network New Media Technology, 11(2), 25-32. (in Chinese with English abstract) |
[尹晨畅, 许枫, 张纯 (2022) 基于ERB响度特征的深度学习鸟鸣声识别. 网络新媒体技术, 11(2), 25-32.] | |
[102] | Zabidi MM, Wong KL, Sheikh UU, Abdul Manan SS, Hamzah MAN (2022) Bird sound detection with binarized neural networks. Journal of Electrical Technology, 21, 48-53. |
[103] |
Zhang C, Jiang WS, Zhao Q (2021) Semantic segmentation of aerial imagery via split-attention networks with disentangled nonlocal and edge supervision. Remote Sensing, 13, 1176.
DOI URL |
[104] |
Zhang CY, Chen YH, Hao ZZ, Gao XH (2022) An efficient time-domain end-to-end single-channel bird sound separation network. Animals, 12, 3117.
DOI URL |
[105] |
Zhang FY, Zhang LY, Chen HX, Xie JJ (2021) Bird species identification using spectrogram based on multi-channel fusion of DCNNs. Entropy, 23, 1507-1518.
DOI URL |
[106] |
Zhang SH, Zhao Z, Xu ZY, Zhang Y (2017) Automatic bird vocalization identification based on Mel-subband parameterized feature. Journal of Computer Applications, 37, 1111-1115. (in Chinese with English abstract)
DOI |
[张赛花, 赵兆, 许志勇, 张怡 (2017) 基于Mel子带参数化特征的自动鸟鸣识别. 计算机应用, 37, 1111-1115.]
DOI |
[1] | 王永财, 万华伟, 高吉喜, 胡卓玮, 孙晨曦, 吕娜, 张志如. 基于深度学习的我国北方常见天然草地植物识别[J]. 生物多样性, 2024, 32(4): 23435-. |
[2] | 蔡建民, 何培宇, 杨智鹏, 李露莹, 赵启军, 潘帆. 基于深度特征融合的鸟鸣识别方法及其可解释性分析[J]. 生物多样性, 2023, 31(7): 23087-. |
[3] | 殷鲁秦, 王成, 韩文静. 基于取食行为探究北京居民区鸟类的食源特征及多样性[J]. 生物多样性, 2023, 31(5): 22473-. |
[4] | 黄雨菲, 路春燕, 贾明明, 王自立, 苏越, 苏艳琳. 基于无人机影像与面向对象-深度学习的滨海湿地植物物种分类[J]. 生物多样性, 2023, 31(3): 22411-. |
[5] | 边琦, 王成, 程贺, 韩丹, 赵伊琳, 殷鲁秦. 声学指数在城市森林鸟类多样性评估中的应用[J]. 生物多样性, 2023, 31(1): 22080-. |
[6] | 李家兴, 周丽萍, 孙家杰, 谭筱彩, 蒋爱伍. 广西山地农业化背景下鸟类多样性比较[J]. 生物多样性, 2022, 30(5): 21515-. |
[7] | 马星, 王浩, 余蔚, 杜勇, 梁健超, 胡慧建, 邱胜荣, 刘璐. 基于MaxEnt模型分析广东省鸟类多样性热点分布及保护空缺[J]. 生物多样性, 2021, 29(8): 1097-1107. |
[8] | 徐岩, 张聪伶, 降瑞娇, 王子斐, 朱梦晨, 沈国春. 无人机高光谱影像与冠层树种多样性监测[J]. 生物多样性, 2021, 29(5): 647-660. |
[9] | 段菲, 李晟. 黄河流域鸟类多样性现状、分布格局及保护空缺[J]. 生物多样性, 2020, 28(12): 1459-1468. |
[10] | 张倩雯, 龚粤宁, 宋相金, 王新财, 杨昌腾, 束祖飞, 邹发生. 红外相机技术与其他几种森林鸟类多样性调查方法的比较[J]. 生物多样性, 2018, 26(3): 229-237. |
[11] | 梁健超, 丁志锋, 张春兰, 胡慧建, 朵海瑞, 唐虹. 青海三江源国家级自然保护区麦秀分区鸟类多样性空间格局及热点区域研究[J]. 生物多样性, 2017, 25(3): 294-303. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
备案号:京ICP备16067583号-7
Copyright © 2022 版权所有 《生物多样性》编辑部
地址: 北京香山南辛村20号, 邮编:100093
电话: 010-62836137, 62836665 E-mail: biodiversity@ibcas.ac.cn