Biodiv Sci ›› 2026, Vol. 34 ›› Issue (4): 25228. DOI: 10.17520/biods2025228 cstr: 32101.14.biods.2025228
Previous Articles Next Articles
Liling Cao, Zhaoyang Jin, Zheng Zhang, Shouqi Cao*
Received:
Revised:
Accepted:
Online:
Contact:
Abstract:
Aims: The rapid development of China's distant-water fisheries has exerted significant negative impacts on the marine ecological environment and the survival of marine mammals. Acoustic recognition of marine mammals can facilitate monitoring of their population dynamics and habitat changes, playing a crucial role in ecological monitoring, conservation, and research. To address the challenges of background noise interference and low accuracy in feature extraction and classification of marine mammal vocalizations, this paper proposes a classification method based on an improved spectral subtraction technique combined with Stacking ensemble learning.
Methods: (1) Variational Mode Decomposition (VMD) is utilized to decompose noisy audio signals into multiple frequency bands. Noise-dominant modes are identified using the Pearson correlation coefficient and are subsequently suppressed through targeted spectral subtraction. (2) For feature extraction, a fusion strategy is employed that combines time-domain and frequency-domain statistical features with deep representations extracted from Mel spectrograms via a convolutional neural network (CNN). To enhance class separability and reduce dimensionality, Linear Discriminant Analysis (LDA) is applied, producing a compact and discriminative feature set. (3) In the classification phase, a Stacking ensemble model is built by integrating five base learners—SVM, KNN, XGBoost, MLP, and GNB—whose predictions are aggregated using LightGBM as the meta-learner.
Results: Experimental results demonstrate that the proposed method significantly enhances classification performance in low-frequency marine mammal sound recognition. The improved spectral subtraction effectively reduces background noise while preserving critical acoustic features. The fusion of Mel-spectrogram deep features with statistical features, followed by LDA dimensionality reduction, produces highly discriminative feature vectors. The Stacking ensemble model, integrating five diverse base learners with LightGBM as the meta-learner, achieves a classification accuracy of 94.78%, surpassing the best-performing individual model by 5.12% and the worst-performing by 9.89%. Additionally, the model exhibits robust performance across imbalanced classes, maintaining high precision and recall even for underrepresented species.
Conclusion: This study presents an effective framework for low-frequency marine mammal acoustic classification under complex oceanic noise conditions. By integrating VMD-based spectral subtraction for noise suppression, multi-domain feature extraction, and a Stacking ensemble model, the proposed method achieves superior classification accuracy and generalization ability. The results validate that combining domain knowledge in signal processing with ensemble learning strategies can significantly improve the robustness and precision of marine bioacoustic monitoring systems. This approach holds promise for real-time ecological surveillance and conservation applications in noisy marine environments.
Key words: marine mammal sound recognition, stacking ensemble learning, spectral subtraction, variational mode decomposition (VMD), acoustic recognition
Liling Cao, Zhaoyang Jin, Zheng Zhang, Shouqi Cao. Low-frequency marine mammal sound classification using improved spectral subtraction and stacking ensemble learning[J]. Biodiv Sci, 2026, 34(4): 25228.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.biodiversity-science.net/EN/10.17520/biods2025228
https://www.biodiversity-science.net/EN/Y2026/V34/I4/25228