生物多样性 ›› 2023, Vol. 31 ›› Issue (1): 22308.  DOI: 10.17520/biods.2022308

• 中国野生脊椎动物鸣声监测与生物声学研究专题 • 上一篇    下一篇

面向鸟鸣声识别任务的深度学习技术

谢卓钒1,2,3, 李鼎昭2,3, 孙海信2,3,*(), 张安民4   

  1. 1.厦门大学电子科学与技术学院(国家示范性微电子学院), 福建厦门 361005
    2.厦门大学信息学院, 福建厦门 361000
    3.自然资源部东南沿海海洋信息智能感知与应用重点实验室, 福建厦门 361005
    4.天津大学海洋科学与技术学院, 天津 300072
  • 收稿日期:2022-06-08 接受日期:2022-07-28 出版日期:2023-01-20 发布日期:2022-09-22
  • 通讯作者: *孙海信, E-mail: hisensessun@163.com
  • 基金资助:
    国家自然科学基金(61971362);福建省自然资源科技创新项目(KY-080000-04-2021-030)

Deep learning techniques for bird chirp recognition task

Zhuofan Xie1,2,3, Dingzhao Li2,3, Haixin Sun2,3,*(), Anmin Zhang4   

  1. 1. School of Electronic Science and Engineering (National Model Microelectronics College), Xiamen University, Xiamen, Fujian 361005
    2. School of Informatics, Xiamen University, Xiamen, Fujian 361000
    3. Key Laboratory of Southeast Coast Marine Information Intelligent Perception and Application, Ministry of Natural Resources, Xiamen, Fujian 361005
    4. School of Marine Science and Technology, Tianjin University, Tianjin 300072
  • Received:2022-06-08 Accepted:2022-07-28 Online:2023-01-20 Published:2022-09-22
  • Contact: *Haixin Sun, E-mail: hisensessun@163.com

摘要:

在生态系统中, 鸟类是重要的组成部分, 对调节生态环境和监测生物多样性至关重要, 甚至可以通过监测鸟群动向与监听鸟群异常鸣声对地震、海啸等自然灾害进行辅助预测和防范, 为此, 鸟鸣声识别和异常鸣声监测成为热门的研究方向。然而, 由于传统鸟鸣声识别方法存在特征提取不充分等问题, 导致识别率不高。本文采用融合特征的方法结合深度学习技术提取鸟鸣声特征, 融合特征选择改良后的对数梅尔谱差分参数同原始信号参数拼接所得的特征; 深度学习方法是基于DenseNet121网络结构, 并融入自注意力模块与中心损失函数进行鸟鸣声识别。自注意力模块部分提高了关键通道的特征表达能力; 中心损失函数可解决类内特征不紧凑问题。我们通过消融实验对比验证, 对在Xeno-Canto世界野生鸟类声音公开数据集上选取的10种鸟类声音进行识别, 准确率达到96.9%。代码已开源至Github: https://github.com/CarrieX6/-Xeno-Canto-.git。

关键词: 鸟鸣声识别, 特征融合, 自注意力模块, 中心损失函数

Abstract

Background: In the ecosystem, birds are an important component, which is crucial for regulating the ecological environment and monitoring biodiversity, and can even assist in predicting natural disasters such as earthquakes and tsunamis by monitoring the movement of birds and listening to their abnormal calls, so bird sound recognition and abnormal call detection have become popular research directions. However, low recognition rate is caused to the problems of insufficient feature extraction in traditional bird sound recognition methods.

Method: In this paper, we used a fusion feature method combined with deep learning to extract bird sound features. The fusion features were obtained by splicing the original signal parameters with the modified log-Meier spectral difference parameters; the deep learning method was based on the DenseNet121 network structure and incorporated the self-attention module and the central loss function for bird sound recognition. The self-attentive module partially improved the feature representation of key channels; the central loss function was used to solve the problem of incompact intra-class features. We used the data of 10 bird sounds from the Xeno-Canto World Wild Bird Sounds public dataset to test the accuracy of bird chirp recognition.

Conclusion In this paper, a neural network structure containing self-attention mechanism and center loss function is proposed for bird song recognition. Its verification accuracy reaches to 96.9%. The code is open source to Github: https://github.com/ CarrieX6/-Xeno-Canto-.git.

Key words: bird chirp recognition, feature fusion, self-attentive module, central loss function