生物多样性 ›› 2026, Vol. 34 ›› Issue (4): 25228.  DOI: 10.17520/biods2025228  cstr: 32101.14.biods.2025228

• • 上一篇    下一篇

基于改进型谱减法和stacking集成学习的低频海洋哺乳类动物声音分类研究

曹莉凌, 金朝阳, 张铮, 曹守启*   

  1. 上海海洋大学工程学院, 201306
  • 收稿日期:2025-06-17 修回日期:2025-12-25 接受日期:2026-04-20 出版日期:2026-04-20
  • 通讯作者: 曹守启
  • 基金资助:
    国家重点研发计划(2023YFD2401302)

Low-frequency marine mammal sound classification using improved spectral subtraction and stacking ensemble learning

Liling Cao, Zhaoyang Jin, Zheng Zhang, Shouqi Cao*   

  1. College of Engineering Science and Techonology, Shanghai Ocean University
  • Received:2025-06-17 Revised:2025-12-25 Accepted:2026-04-20 Online:2026-04-20
  • Contact: Shouqi Cao

摘要: 我国远洋渔业活动的快速发展,对海洋生态环境和海洋哺乳动物的生存产生了严重的负面影响。海洋哺乳类动物声音识别能够辅助监测其种群和栖息地的动态变化,在监测、生态保护和生态学研究中具有重要作用。针对海洋环境中哺乳类动物声音信号易受背景噪声干扰、特征提取与分类准确率较低的问题,本文提出了一种基于改进型谱减法与Stacking集成学习的分类方法。首先,利用变分模态分解(variational mode decomposition,VMD)将含噪音频按频率带分解,并通过皮尔逊相关系数筛选噪声模态,再使用谱减法实现针对性降噪。其次,在特征提取方面,本文提出特征融合方案,结合音频的时域、频域统计特征和通过卷积神经网络在Mel语谱图上提取的深度特征,并利用线性判别分析(linear discriminant analysis,LDA)进行降维处理,然后通过特征融合构建出具有综合判别信息的多维特征向量。最后,采用Stacking集成学习模型进行声音分类识别,将SVM、KNN、XGBOOST、MLP和GNB 5个基学习器的预测结果通过LightGBM元学习器进行融合。实验结果表明,该方法在低频率海洋哺乳动物声音分类任务上较传统机器学习准确率平均提高了8.04%。

关键词: 海洋哺乳动物声音识别, stacking集成学习, 谱减法, 变分模态分解, 声学识别

Abstract

Aims: The rapid development of China's distant-water fisheries has exerted significant negative impacts on the marine ecological environment and the survival of marine mammals. Acoustic recognition of marine mammals can facilitate monitoring of their population dynamics and habitat changes, playing a crucial role in ecological monitoring, conservation, and research. To address the challenges of background noise interference and low accuracy in feature extraction and classification of marine mammal vocalizations, this paper proposes a classification method based on an improved spectral subtraction technique combined with Stacking ensemble learning. 

Methods: (1) Variational Mode Decomposition (VMD) is utilized to decompose noisy audio signals into multiple frequency bands. Noise-dominant modes are identified using the Pearson correlation coefficient and are subsequently suppressed through targeted spectral subtraction. (2) For feature extraction, a fusion strategy is employed that combines time-domain and frequency-domain statistical features with deep representations extracted from Mel spectrograms via a convolutional neural network (CNN). To enhance class separability and reduce dimensionality, Linear Discriminant Analysis (LDA) is applied, producing a compact and discriminative feature set. (3) In the classification phase, a Stacking ensemble model is built by integrating five base learners—SVM, KNN, XGBoost, MLP, and GNB—whose predictions are aggregated using LightGBM as the meta-learner. 

Results: Experimental results demonstrate that the proposed method significantly enhances classification performance in low-frequency marine mammal sound recognition. The improved spectral subtraction effectively reduces background noise while preserving critical acoustic features. The fusion of Mel-spectrogram deep features with statistical features, followed by LDA dimensionality reduction, produces highly discriminative feature vectors. The Stacking ensemble model, integrating five diverse base learners with LightGBM as the meta-learner, achieves a classification accuracy of 94.78%, surpassing the best-performing individual model by 5.12% and the worst-performing by 9.89%. Additionally, the model exhibits robust performance across imbalanced classes, maintaining high precision and recall even for underrepresented species. 

Conclusion: This study presents an effective framework for low-frequency marine mammal acoustic classification under complex oceanic noise conditions. By integrating VMD-based spectral subtraction for noise suppression, multi-domain feature extraction, and a Stacking ensemble model, the proposed method achieves superior classification accuracy and generalization ability. The results validate that combining domain knowledge in signal processing with ensemble learning strategies can significantly improve the robustness and precision of marine bioacoustic monitoring systems. This approach holds promise for real-time ecological surveillance and conservation applications in noisy marine environments.

Key words: marine mammal sound recognition, stacking ensemble learning, spectral subtraction, variational mode decomposition (VMD), acoustic recognition