面向鸟鸣声识别任务的深度学习技术

doi:10.17520/biods.2022308

生物多样性 ›› 2023, Vol. 31 ›› Issue (1): 22308. DOI: 10.17520/biods.2022308 cstr: 32101.14.biods.2022308

• 中国野生脊椎动物鸣声监测与生物声学研究专题 • 上一篇下一篇

面向鸟鸣声识别任务的深度学习技术

谢卓钒¹^,²^,³, 李鼎昭²^,³, 孙海信²^,³^,^*(), 张安民⁴

1.厦门大学电子科学与技术学院(国家示范性微电子学院), 福建厦门 361005
2.厦门大学信息学院, 福建厦门 361000
3.自然资源部东南沿海海洋信息智能感知与应用重点实验室, 福建厦门 361005
4.天津大学海洋科学与技术学院, 天津 300072

收稿日期:2022-06-08 接受日期:2022-07-28 出版日期:2023-01-20 发布日期:2022-09-22
通讯作者: *孙海信, E-mail: hisensessun@163.com
基金资助:
国家自然科学基金(61971362);福建省自然资源科技创新项目(KY-080000-04-2021-030)

Deep learning techniques for bird chirp recognition task

Zhuofan Xie¹^,²^,³, Dingzhao Li²^,³, Haixin Sun²^,³^,^*(), Anmin Zhang⁴

1. School of Electronic Science and Engineering (National Model Microelectronics College), Xiamen University, Xiamen, Fujian 361005
2. School of Informatics, Xiamen University, Xiamen, Fujian 361000
3. Key Laboratory of Southeast Coast Marine Information Intelligent Perception and Application, Ministry of Natural Resources, Xiamen, Fujian 361005
4. School of Marine Science and Technology, Tianjin University, Tianjin 300072

Received:2022-06-08 Accepted:2022-07-28 Online:2023-01-20 Published:2022-09-22
Contact: *Haixin Sun, E-mail: hisensessun@163.com

摘要/Abstract

摘要：

在生态系统中, 鸟类是重要的组成部分, 对调节生态环境和监测生物多样性至关重要, 甚至可以通过监测鸟群动向与监听鸟群异常鸣声对地震、海啸等自然灾害进行辅助预测和防范, 为此, 鸟鸣声识别和异常鸣声监测成为热门的研究方向。然而, 由于传统鸟鸣声识别方法存在特征提取不充分等问题, 导致识别率不高。本文采用融合特征的方法结合深度学习技术提取鸟鸣声特征, 融合特征选择改良后的对数梅尔谱差分参数同原始信号参数拼接所得的特征; 深度学习方法是基于DenseNet121网络结构, 并融入自注意力模块与中心损失函数进行鸟鸣声识别。自注意力模块部分提高了关键通道的特征表达能力; 中心损失函数可解决类内特征不紧凑问题。我们通过消融实验对比验证, 对在Xeno-Canto世界野生鸟类声音公开数据集上选取的10种鸟类声音进行识别, 准确率达到96.9%。代码已开源至Github: https://github.com/CarrieX6/-Xeno-Canto-.git。

关键词: 鸟鸣声识别, 特征融合, 自注意力模块, 中心损失函数

Abstract

Background: In the ecosystem, birds are an important component, which is crucial for regulating the ecological environment and monitoring biodiversity, and can even assist in predicting natural disasters such as earthquakes and tsunamis by monitoring the movement of birds and listening to their abnormal calls, so bird sound recognition and abnormal call detection have become popular research directions. However, low recognition rate is caused to the problems of insufficient feature extraction in traditional bird sound recognition methods.

Method: In this paper, we used a fusion feature method combined with deep learning to extract bird sound features. The fusion features were obtained by splicing the original signal parameters with the modified log-Meier spectral difference parameters; the deep learning method was based on the DenseNet121 network structure and incorporated the self-attention module and the central loss function for bird sound recognition. The self-attentive module partially improved the feature representation of key channels; the central loss function was used to solve the problem of incompact intra-class features. We used the data of 10 bird sounds from the Xeno-Canto World Wild Bird Sounds public dataset to test the accuracy of bird chirp recognition.

Conclusion In this paper, a neural network structure containing self-attention mechanism and center loss function is proposed for bird song recognition. Its verification accuracy reaches to 96.9%. The code is open source to Github: https://github.com/ CarrieX6/-Xeno-Canto-.git.

Key words: bird chirp recognition, feature fusion, self-attentive module, central loss function

谢卓钒, 李鼎昭, 孙海信, 张安民 (2023) 面向鸟鸣声识别任务的深度学习技术. 生物多样性, 31, 22308. DOI: 10.17520/biods.2022308.

Zhuofan Xie, Dingzhao Li, Haixin Sun, Anmin Zhang (2023) Deep learning techniques for bird chirp recognition task. Biodiversity Science, 31, 22308. DOI: 10.17520/biods.2022308.

导出引用管理器 EndNote|Ris|BibTeX

链接本文: https://www.biodiversity-science.net/CN/10.17520/biods.2022308

https://www.biodiversity-science.net/CN/Y2023/V31/I1/22308

图/表 10

参考文献 14

[1]	Buades A, Coll B, Morel JM (2011) Non-local means denoising. Image Processing on Line, 1, 208-212. DOI URL
[2]	Dagan U, Izhaki I (2019) Understory vegetation in planted pine forests governs bird community composition and diversity in the eastern Mediterranean region. Forest Ecosystems, 6, 29. DOI URL
[3]	Dai YS, Yang J, Dong YW, Zou HP, Hu MZ, Wang B (2021) Blind source separation-based IVA-Xception model for bird sound recognition in complex acoustic environments. Electronics Letters, 57, 454-456. DOI URL
[4]	He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778. Las Vegas, NV, USA.
[5]	Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261-2269. Honolulu, HI, USA.
[6]	Incze Á, Jancsó HB, Szilágyi Z, Farkas A, Sulyok C (2018) Bird sound recognition using a convolutional neural network. In: 2018 IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY), pp. 295-300. Subotica, Serbia.
[7]	Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60, 84-90. DOI URL
[8]	Lü KP, Sun B, Zhao YX (2021) Research on bird recognition method based on bird singing and deep learning. Bulletin of Science and Technology, 37(10), 24-30, 37. (in Chinese)
	[吕坤朋, 孙斌, 赵玉晓 (2021) 基于鸟鸣声及深度学习的鸟类识别方法研究. 科技通报, 37(10), 24-30, 37.]
[9]	Mahendra M, Nasution MA, Rahmayanti F, Islama D (2021) Application of appropriate technology for automatic bird pest removal and automatic fish feed in the Minapadi system in Beutong Nagan Raya District. International Journal of Community Service, 1(3), 231-237. DOI URL
[10]	Petmezas G, Cheimariotis GA, Stefanopoulos L, Rocha B, Paiva RP, Katsaggelos AK, Maglaveras N (2022) Automated lung sound classification using a hybrid CNN-LSTM network and focal loss function. Sensors, 22(3), 1232.
[11]	Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, doi: arXiv:1409.1556. DOI
[12]	Song FC, Ding XM, Yao F, Rui SJ, Chen R (2021) Research on railway intelligent bird repellent based on sensor technology and Internet of Things technology. Railway Engineering Technology and Economy, 36(1), 33-37. (in Chinese)
	[宋福春, 丁小明, 姚发, 芮胜骏, 陈容 (2021) 基于传感器技术和物联网技术的铁路智能驱鸟器的研究. 铁路工程技术与经济, 36(1), 33-37.]
[13]	Yang JF, Liu QQ, Zhang K, Lin QQ, Hou JH (2022) Diversity of bird community in spring in Bodhi Islands, Hebei Province. Journal of Hebei University (Natural Science Edition), 42, 182-189. (in Chinese with English abstract)
	[杨俊锋, 刘琪琪, 张侃, 林庆乾, 侯建华 (2022) 河北菩提岛诸岛春季鸟类群落多样性. 河北大学学报(自然科学版), 42, 182-189.]
[14]	Zhang Y, Zeng JF, Li YM, Chen D (2021) Convolutional neural network-gated recurrent unit neural network with feature fusion for environmental sound classification. Automatic Control and Computer Sciences, 55, 311-318. DOI URL

鸟类中文名称 Chinese name	鸟类英文学名 Latin name	库内标签 Registered label	样本时长 Sample length (s)	数据来源 Data source
美洲麻鳽	Botaurus lentiginosus	amebit	23,960.7	https://xeno-canto.org/species/Botaurus-lentiginosus
白头海雕	Haliaeetus leucocephalus	baleag	22,744.8	https://xeno-canto.org/species/Haliaeetus-leucocephalus
布氏雀鹀	Spizella breweri	brespa	23,880.6	https://xeno-canto.org/species/Spizella-breweri
普通拟八哥	Quiscalus quiscula	comgra	23,517.5	https://xeno-canto.org/species/Quiscalus-quiscula
角鸬鹚	Phalacrocorax auritus	doccor	22,183.7	https://xeno-canto.org/species/Phalacrocorax-auritus
灰斑鸠	Streptopelia decaocto	eucdov	22,837.2	https://xeno-canto.org/species/Streptopelia-decaocto
长嘴啄木鸟	Leuconotopicus villosus	haiwoo	23,009.2	https://xeno-canto.org/species/Leuconotopicus-villosus
暗背金翅雀	Spinus psaltria	lesgol	22,521.4	https://xeno-canto.org/species/Spinus-psaltria
环颈潜鸭	Aythya collaris	rinduc	21,653.2	https://xeno-canto.org/species/Aythya-collaris
白喉雨燕	Aeronautes saxatalis	whtswi	22,491.7	https://xeno-canto.org/species/Aeronautes-saxatalis

鸟类中文名称 Chinese name	鸟类英文学名 Latin name	库内标签 Registered label	样本时长 Sample length (s)	数据来源 Data source
美洲麻鳽	Botaurus lentiginosus	amebit	23,960.7	https://xeno-canto.org/species/Botaurus-lentiginosus
白头海雕	Haliaeetus leucocephalus	baleag	22,744.8	https://xeno-canto.org/species/Haliaeetus-leucocephalus
布氏雀鹀	Spizella breweri	brespa	23,880.6	https://xeno-canto.org/species/Spizella-breweri
普通拟八哥	Quiscalus quiscula	comgra	23,517.5	https://xeno-canto.org/species/Quiscalus-quiscula
角鸬鹚	Phalacrocorax auritus	doccor	22,183.7	https://xeno-canto.org/species/Phalacrocorax-auritus
灰斑鸠	Streptopelia decaocto	eucdov	22,837.2	https://xeno-canto.org/species/Streptopelia-decaocto
长嘴啄木鸟	Leuconotopicus villosus	haiwoo	23,009.2	https://xeno-canto.org/species/Leuconotopicus-villosus
暗背金翅雀	Spinus psaltria	lesgol	22,521.4	https://xeno-canto.org/species/Spinus-psaltria
环颈潜鸭	Aythya collaris	rinduc	21,653.2	https://xeno-canto.org/species/Aythya-collaris
白喉雨燕	Aeronautes saxatalis	whtswi	22,491.7	https://xeno-canto.org/species/Aeronautes-saxatalis

参数名称 Parameter name	参数值 Parameter value
批大小 Batch_size	256
时期数 Epochs	50
学习率 Learning rate	0.001
优化器 Optimizer	自适应矩估计优化器 Adam optimizer
损失函数 Loss function	分类交叉熵 Categorical_cross-entropy

参数名称 Parameter name	参数值 Parameter value
批大小 Batch_size	256
时期数 Epochs	50
学习率 Learning rate	0.001
优化器 Optimizer	自适应矩估计优化器 Adam optimizer
损失函数 Loss function	分类交叉熵 Categorical_cross-entropy

特征提取方法 Feature extraction method	准确率 Accuracy	总参数量 No. of parameters
VGG11 + 原始特征 VGG11 + Original feature	0.906	1.38e8
VGG11 + 对数梅尔谱差分特征 VGG11 + Log-Meier spectral differential characteristics	0.926	1.38e8
VGG11 + 融合特征 VGG11 + Fusion feature	0.935	1.38e8
ResNet18 + 原始特征 ResNet18 + Original feature	0.896	1.11e7
ResNet18 + 对数梅尔谱差分特征 ResNet18 + Log-Meier spectral differential characteristics	0.912	1.11e7
ResNet18 + 融合特征 ResNet18 + Fusion feature	0.933	1.11e7
DensNet121 + 原始特征 DensNet121 + Original feature	0.901	6.94e6
DensNet121 + 对数梅尔谱差分特征 DensNet121 + Log-Meier spectral differential characteristics	0.932	6.96e6
DensNet121 + 融合特征 DensNet121 + Fusion feature	0.939	6.96e6

面向鸟鸣声识别任务的深度学习技术

Deep learning techniques for bird chirp recognition task

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 14

相关文章 2

编辑推荐

Metrics

本文评价

模型 Model	准确率 Accuracy
DensNet121 + 融合特征 DensNet121 + Fusion feature	0.939
DensNet121 + 融合特征 + 注意力机制 DenseNet121 + Fusion feature + Attention	0.953
DensNet121 + 融合特征 + 注意力机制 + 中心损失函数 DenseNet121 + Fusion feature + Attention + Center loss function	0.969

[1]	许群, 谢永华. 基于注意力机制融合多特征的东北虎个体自动跟踪方法[J]. 生物多样性, 2024, 32(3): 23409-.
[2]	蔡建民, 何培宇, 杨智鹏, 李露莹, 赵启军, 潘帆. 基于深度特征融合的鸟鸣识别方法及其可解释性分析[J]. 生物多样性, 2023, 31(7): 23087-.