生物多样性 ›› 2023, Vol. 31 ›› Issue (1): 22370. DOI: 10.17520/biods.2022370
• 中国野生脊椎动物鸣声监测与生物声学研究专题 • 上一篇 下一篇
吴科毅1, 阮文达1, 周棣锋1, 陈庆春1,*(), 张承云1, 潘新园2, 余上3, 刘阳4, 肖荣波5
收稿日期:
2022-06-30
接受日期:
2022-11-24
出版日期:
2023-01-20
发布日期:
2022-12-02
通讯作者:
*陈庆春, E-mail: qcchen@gzhu.edu.cn
基金资助:
Keyi Wu1, Wenda Ruan1, Difeng Zhou1, Qingchen Chen1,*(), Chengyun Zhang1, Xinyuan Pan2, Shang Yu3, Yang Liu4, Rongbo Xiao5
Received:
2022-06-30
Accepted:
2022-11-24
Online:
2023-01-20
Published:
2022-12-02
Contact:
*Qingchen Chen, E-mail: qcchen@gzhu.edu.cn
摘要:
被动声学监测通过分析鸟鸣声信息来实现物种识别, 为鸟类多样性监测提供了一种切实可行的技术方案。由于鸟种的鸣声复杂多变, 如何通过声纹快速准确辨别物种, 分析鸟类丰度, 降低对人工操作的需求等技术难题, 成为基于声纹的鸟类多样性监测所面临的挑战。本文提出了基于音节聚类的鸟类鸣声监测框架: 首先通过音高、频率平坦度等音频特征在声纹数据中提取音节, 然后通过无监督表征学习与狄利克雷过程(Dirichlet process)混合模型对音节进行深度无监督聚类训练, 完成音节聚类和自动音节种类推断。分析结果表明, 本文提出的基于音节聚类的鸟类鸣声监测框架在处理开源数据集白腰文鸟(Lonchura striata)的曲目时可获得接近90%的聚类准确率。在此基础上, 本研究对2022年4‒5月在广州市白云山公园固定监测点所录制的10种鸟类鸣声进行了无监督的音节聚类分析, 验证了本文所提出的基于音节聚类的鸟类鸣声监测框架的有效性: 本技术不仅可以支持快速鸟类物种识别, 还可以统计和分析不同物种鸟鸣在时间、频度、数量上的变化。这些结果表明, 基于音节聚类的鸟类鸣声监测框架可以显著降低对人工标注训练数据的要求, 克服传统鸟鸣物种识别框架在处理重叠鸟鸣时难以处理多物种识别的缺点, 为基于被动声学监测的鸟类多样性监测提供了一个快速物种识别、音节序列分析和精细化种群丰度分析的综合解决方案。
吴科毅, 阮文达, 周棣锋, 陈庆春, 张承云, 潘新园, 余上, 刘阳, 肖荣波 (2023) 基于音节聚类分析的被动声学监测技术及其在鸟类监测中的应用. 生物多样性, 31, 22370. DOI: 10.17520/biods.2022370.
Keyi Wu, Wenda Ruan, Difeng Zhou, Qingchen Chen, Chengyun Zhang, Xinyuan Pan, Shang Yu, Yang Liu, Rongbo Xiao (2023) Syllable clustering analysis-based passive acoustic monitoring technology and its application in bird monitoring. Biodiversity Science, 31, 22370. DOI: 10.17520/biods.2022370.
图1 深度无监督音节聚类整体结构。拾音器终端采集声音, 经音节提取算法定位和提取每个鸟鸣音节, 经过表征学习与狄利克雷过程混合模型交替学习得到音节分类器。
Fig. 1 The whole structure of deep unsupervised syllable clustering. The sound is collected by the pickup terminal, each bird song is located and extracted by the syllable extraction algorithm, and the syllable classifier is obtained by alternate learning of representation learning and Dirichlet process mixed model (DPMM).
1: 输入input: 包含N个样本的原始音频X1:N, 每帧信号样本数T, 相邻语音帧偏移参数H, 音高检测阈值θpitch |
---|
2: |
3: |
4: for |
5: |
6: |
7: |
8: |
9: |
10: |
11: |
12:end for |
13: |
14: |
15: Mask |
16: return |
表1 本文提出的基于多语音特征的音节检测与提取算法——多特征音节提取
Table 1 The multi-feature based syllable detection and extraction algorithm proposed in this paper
1: 输入input: 包含N个样本的原始音频X1:N, 每帧信号样本数T, 相邻语音帧偏移参数H, 音高检测阈值θpitch |
---|
2: |
3: |
4: for |
5: |
6: |
7: |
8: |
9: |
10: |
11: |
12:end for |
13: |
14: |
15: Mask |
16: return |
图2 检测并自动标注的白腰文鸟(Lonchura striata) (上)与白云山多种鸟类(下)音节。 谱图中白色框包围部分即为音节。
Fig. 2 The syllables of Lonchura striata (top) and Baiyun Mountain birds (bottom) detected and automatically labeled. The part highlighted with white box in the spectrogram are syllables.
图3 基于变分编码器的鸟鸣音节聚类示意图。 编码器将音节向低维隐空间投影, 解码器利用均值和方差向量重构音节。
Fig. 3 Schematic diagram of bird song syllable clustering based on variational encoder. The encoder projects the syllables into a low-dimensional latent space, and the decoder reconstructs the syllables using the mean and variance vectors.
图4 白腰文鸟曲目库(鸟032312)的无监督聚类后的音节检测与标注结果。 白色框标定为音节范围, 音节上方的数字标签为无监督聚类分配的伪标签, 下方为专家标注的音节标签。
Fig. 4 Syllable detection and annotation results after unsupervised clustering of Lonchura striata repertoire (Bird 032312). The white boxes are marked as syllable ranges, the number tag above the syllable is a pseudo tag assigned by unsupervised clustering, and the number below the syllable corresponds to the syllable tag marked by the expert.
图5 白腰文鸟曲目库中鸟032312的10个聚类音节可视化结果; 利用t-SNE降维方法, 将64维特征降为3维和2维。
Fig. 5 The syllable clustering results of Lonchura striata repertoire Bird 032312 are visualized; the 64-dimensional features are reduced to 3 and 2 dimensions using the t-SNE dimensionality reduction method.
图6 传统聚类算法与基于深度学习方法在白腰文鸟曲目库的聚类性能对比, 横坐标是每只鸟的编号, (左)纵坐标是聚类准确率, (右)纵坐标代表音节种类数。 红色线为音节种类数量, 根据每只鸟的音节种类数递增排序了每只鸟的聚类编号。
Fig. 6 Comparison of the clustering performance of traditional clustering algorithms and the proposed deep learning-based methods in the Lonchura striata song library. The abscissa is the number of each bird, the ordinate (left) is the clustering accuracy, and the ordinate (right) represents the number of syllables. The red line is the number of syllable types, and the cluster number of each bird is sorted in ascending order according to the number of syllable species of each bird.
图7 白云山公园某监测点鸟类音节数量(左)和种类(右)统计。 每隔6天统计从2022年04月01日至05月09日监测到的鸟鸣音节数量和种类。早晨统计时间段为5:00?9:00, 傍晚统计时间段为16:00?21:00。
Fig. 7 Statistics on the number of syllables of birds at a monitoring site in Baiyunshan Park: (left) the number of syllables and (right) the number of syllables. The number and types of bird syllables are count every 6 days from April 1, 2022 to May 9, 2022. The statistical time period is from 5:00 to 9:00 in the morning, and from 16:00 to 21:00 in the evening.
[1] |
Alghamdi A, Mehtab T, Iqbal R, Leeza M, Islam N, Hamdi M, Shaikh A (2021) Automatic classification of monosyllabic and multisyllabic birds using PDHF. Electronics, 10, 624.
DOI URL |
[2] |
Baker MC, Cunningham MA (1985) The biology of bird-song dialects. Behavioral and Brain Sciences, 8, 85-100.
DOI URL |
[3] | Bilger HT, Vertosick E, Vickers A, Kaczmarek K, Prum RO (2021) Higher-order musical temporal structure in bird song. Frontiers in Psychology, 12, 629456. |
[4] |
Botero CA, Rossman RJ, Caro LM, Stenzler LM, Lovette IJ, de Kort SR, Vehrencamp SL (2009) Syllable type consistency is related to age, social status and reproductive success in the tropical mockingbird. Animal Behaviour, 77, 701-706.
DOI URL |
[5] |
Bradfer-Lawrence T, Gardner N, Bunnefeld L, Bunnefeld N, Willis SG, Dent DH (2019) Guidelines for the use of acoustic indices in environmental research. Methods in Ecology and Evolution, 10, 1796-1807.
DOI |
[6] |
Cohen Y, Nicholson D, Gardner TJ (2020) TweetyNet: A neural network that enables high-throughput, automated annotation of birdsong. bioRxiv, doi:10.1101/2020.08.28. 272088.
DOI |
[7] | Cui P, Deng WH (2007) Review on the status and development of bird community research. Chinese Journal of Zoology, 42, 149-158. (in Chinese with English abstract) |
[崔鹏, 邓文洪 (2007) 鸟类群落研究进展. 动物学杂志, 42, 149-158.] | |
[8] | Fagerlund S (2004) Automatic Recognition of Bird Species by Their Sounds. PhD thesis, Helsinki University of Technology, Helsinki. |
[9] | Geberzahn N, Gahr M (2011) Undirected (solitary) birdsong in female and male blue-capped cordon-bleus (Uraeginthus cyanocephalus) and its endocrine correlates. PLoS ONE, 6, e26485. |
[10] |
Große Ruse M, Hasselquist D, Hansson B, Tarka M, Sandsten M (2016) Automated analysis of song structure in complex birdsongs. Animal Behaviour, 112, 39-51.
DOI URL |
[11] |
Gutiérrez RJ, Cody M, Courtney S, Franklin AB (2007) The invasion of barred owls and its potential effect on the spotted owl: A conservation conundrum. Biological Invasions, 9, 181-196.
DOI URL |
[12] | He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. IEEE, 770-778. |
[13] |
Heck M, Sakti S, Nakamura S (2018) Dirichlet process mixture of mixtures model for unsupervised subword modeling. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26, 2027-2042.
DOI URL |
[14] |
Illes AE (2015) Context of female bias in song repertoire size, singing effort, and singing independence in a cooperatively breeding songbird. Behavioral Ecology and Sociobiology, 69, 139-150.
DOI URL |
[15] | Katahira K, Suzuki K, Okanoya K, Okada M (2011) Complex sequencing rules of birdsong can be explained by simple hidden Markov processes. PLoS ONE, 6, e24516. |
[16] | Kingma DP, Welling M (2013) Auto-encoding variational Bayes. arXiv: 1312.6114. https://arxiv.org/abs/1312.6114. |
[17] |
Kong QQ, Cao Y, Iqbal T, Wang YX, Wang WW, Plumbley MD (2020) PANNs: Large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 2880-2894.
DOI URL |
[18] | Koumura T, Okanoya K (2016) Automatic recognition of element classes and boundaries in the birdsong with variable sequences. PLoS ONE, 11, e0159188. |
[19] |
Lindsay B, McLachlan GL, Basford KE, Dekker M (1989) Mixture models: Inference and applications to clustering. Journal of the American Statistical Association, 84, 337.
DOI URL |
[20] | Lobato M, Vellema M, Gahr C, Leitão A, de Lima SMA, Geberzahn N, Gahr M (2015) Mismatch in sexual dimorphism of developing song and song control system in blue-capped cordon-bleus, a songbird species with singing females and males. Frontiers in Ecology and Evolution, 3, 117. |
[21] | Marck A, Vortman Y, Kolodny O, Lavner Y (2022) Identification, analysis and characterization of base units of bird vocal communication: The white spectacled bulbul (Pycnonotus xanthopygos) as a case study. Frontiers in Behavioral Neuroscience, 15, 812939. |
[22] | Micheli-Tzanakou E (2017) Supervised and Unsupervised Pattern Recognition:Feature Extraction and Computational intelligence. CRC Press, Boca Raton. |
[23] | Morita T, Koda H, Okanoya K, Tachibana RO (2021) Measuring context dependency in birdsong using artificial neural networks. PLoS Computational Biology, 17, e1009707. |
[24] |
O’Reilly C, Marples NM, Kelly DJ, Harte N (2016) YIN-Bird: Improved Pitch Tracking for Bird Vocalisations Interspeech, ISCA. doi: 10.21437/Interspeech.2016-90.
DOI |
[25] |
Pillay R, Fletcher RJ Jr, Sieving KE, Udell BJ, Bernard H (2019) Bioacoustic monitoring reveals shifts in breeding songbird populations and singing behaviour with selective logging in tropical forests. Journal of Applied Ecology, 56, 2482-2492.
DOI URL |
[26] |
Planqué R, Britton NF, Slabbekoorn H (2014) On the maintenance of bird song dialects. Journal of Mathematical Biology, 68, 505-531.
DOI PMID |
[27] |
Potamitis I, Ntalampiras S, Jahn O, Riede K (2014) Automatic bird sound detection in long real-field recordings: Applications and tools. Applied Acoustics, 80, 1-9.
DOI URL |
[28] |
Sainburg T, Thielk M, Gentner TQ (2019) Latent space visualization, characterization, and generation of diverse vocal communication signals. bioRxiv, doi: 10.1101/870311.
DOI |
[29] |
Snell-Rood EC (2012) The effect of climate on acoustic signals: Does atmospheric sound absorption matter for bird song and bat echolocation. The Journal of the Acoustical Society of America, 131, 1650-1658.
DOI URL |
[30] | Steinfath E, Palacios-Muñoz A, Rottschäfer JR, Yuezak D, Clemens J (2021) Fast and accurate annotation of acoustic signals with deep neural networks. eLife, 10, 68837. |
[31] |
Stowell D, Wood MD, Pamuła H, Stylianou Y, Glotin H (2019) Automatic acoustic detection of birds through deep learning: The first Bird Audio Detection challenge. Methods in Ecology and Evolution, 10, 368-380.
DOI |
[32] |
Suzuki TN (2021) Animal linguistics: Exploring referentiality and compositionality in bird calls. Ecological Research, 36, 221-231.
DOI URL |
[33] | Torres-García AA, Mendoza-Montoya O, Reyes-García CA, Villaseñor-Pineda L (2021) Biosignal Processing and Classification Using Computational Learning and Intelligence, pp. 3-6. Elsevier, Amsterdam. |
[34] | Von Luxburg U (2007) A tutorial on spectral clustering. Statistics and Computing, 17, 395-416. |
[35] | Wang Q, Lü XG (2007) Application of water bird to monitor and evaluate wetland ecosystem. Wetland Science, 5, 274-281. (in Chinese with English abstract) |
[王强, 吕宪国 (2007) 鸟类在湿地生态系统监测与评价中的应用. 湿地科学, 5, 274-281.] | |
[36] | Xie J, Towsey M, Truskinger A, Eichinski P, Zhang JL, Roe P (2015) Acoustic classification of Australian anurans using syllable features. 2015 IEEE Tenth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP). April 7-9, 2015, Singapore. IEEE, 1-6. |
[37] | Yang CC, Cai Y, Liang W (2009) Bird diversity in parks of Guangzhou. Journal of Hainan Normal University (Natural Science), 22, 196-199. (in Chinese with English abstract) |
[杨灿朝, 蔡燕, 梁伟 (2009) 广州市区各公园的鸟类多样性比较. 海南师范大学学报(自然科学版), 22, 196-199.] | |
[38] | Yang LX, Cheung NM, Li JY, Fang J (2019) Deep clustering by Gaussian mixture variational autoencoders with graph embedding. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). October 27-November 2, 2019, Seoul, Korea (South). IEEE, 6439-6448. |
[39] | Zhang T, Ramakrishnan R, Livny M (1996) Birch: An efficient data clustering method for very large databases. SIGMOD Record (ACM Special Interest Group on Management of Data), 25, 103-114. |
[40] | Zhou S, Xu HJ, Zheng ZN, Chen JW li Z, Bu JJ, Wu J, Wang X, Zhu WW, Ester M (2022) A comprehensive survey on deep clustering: Taxonomy, challenges, and future directions. arXiv: 2206.07579. https://arxiv.org/abs/2206.07579. |
[1] | 万凤鸣, 万华伟, 张志如, 高吉喜, 孙晨曦, 王永财. 草地植物多样性无人机调查的应用潜力[J]. 生物多样性, 2024, 32(3): 23381-. |
[2] | 肖治术, 崔建国, 王代平, 王志陶, 罗金红, 谢捷. 现代生物声学的学科发展趋势及中国机遇[J]. 生物多样性, 2023, 31(1): 22423-. |
[3] | 马海港, 范鹏来. 被动声学监测技术在陆生哺乳动物研究中的应用、进展和展望[J]. 生物多样性, 2023, 31(1): 22374-. |
[4] | 曾晨, 刘阳. 鸟类社会行为中的嗅觉通讯研究进展[J]. 生物多样性, 2022, 30(11): 22219-. |
[5] | 钟恩主, 管振华, 周兴策, 赵友杰, 李函, 谭绍斌, 胡坤融. 被动声学监测技术在西黑冠长臂猿监测中的应用[J]. 生物多样性, 2021, 29(1): 109-117. |
[6] | 孔嘉鑫, 张昭臣, 张健. 基于多源遥感数据的植物物种分类与识别: 研究进展与展望[J]. 生物多样性, 2019, 27(7): 796-812. |
[7] | 魏亚男, 王晓梅, 姚鹏程, 陈小勇, 李宏庆. 比较不同DNA条形码对中国海岸带耐盐植物的识别率[J]. 生物多样性, 2017, 25(10): 1095-1104. |
[8] | 宁淑萍, 颜海飞, 郝刚, 葛学军. 植物DNA条形码研究进展[J]. 生物多样性, 2008, 16(5): 417-425. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
备案号:京ICP备16067583号-7
Copyright © 2022 版权所有 《生物多样性》编辑部
地址: 北京香山南辛村20号, 邮编:100093
电话: 010-62836137, 62836665 E-mail: biodiversity@ibcas.ac.cn