生物多样性 ›› 2026, Vol. 34 ›› Issue (1): 25210.  DOI: 10.17520/biods.2025210  cstr: 32101.14.biods.2025210

• 生态学数据分析方法专题 • 上一篇    下一篇

可解释机器学习及其生态学应用

石亚飞1*, 牛富荣2, 黄晓敏3, 洪星1, 龚相文4, 王艳莉2, 林栋1, 柳小妮1   

  1. 1. 甘肃农业大学草业学院, 兰州 730070; 2. 甘肃农业大学林学院, 兰州 730070; 3. 扬州大学农学院, 江苏扬州 225009; 4. 西南大学地理科学学院, 重庆 400715
  • 收稿日期:2025-06-05 修回日期:2025-09-02 出版日期:2026-01-20 发布日期:2026-01-21
  • 通讯作者: 石亚飞
  • 基金资助:
    固沙年限与降水变化对油蒿土壤种子库自然更新的影响研究(3230131058)

Interpretable machine learning and its applications in ecology

Yafei Shi1*, Furong Niu2, Xiaomin Huang3, Xing Hong1, Xiangwen Gong4, Yanli Wang2, Dong Lin1, Xiaoni Liu1   

  1. 1 Pratacultural College, Gansu Agricultural University, Lanzhou 730070, China 

    2 College of Forestry, Gansu Agricultural University, Lanzhou 730070, China 

    3 Agricultural College, Yangzhou University, Yangzhou, Jiangsu 225009, China 

    4 School of Geographical Sciences, Southwest University, Chongqing 400715, China

  • Received:2025-06-05 Revised:2025-09-02 Online:2026-01-20 Published:2026-01-21
  • Contact: Yafei Shi

摘要: 近年来, 机器学习在生态学领域的应用日益广泛, 尤其在复杂的非线性数据建模方面展现出强大优势。然而, 机器学习的“黑箱属性”使其难以提供清晰的结果解释, 这限制了其应用范围。为解决机器学习的不透明问题, 可解释机器学习(interpretable machine learning, IML)应运而生, 它致力于提高模型透明度并增强结果的可解释性。本文系统梳理了可解释机器学习中的白盒模型与黑盒模型、全局解释与局部解释、内在可解释与事后解释模型等基本概念, 并基于案例数据分别应用于线性回归、决策树与随机森林等模型, 展示了包括回归系数、置换特征重要性、部分依赖图、累积局部效应图、Shapley加性解释(SHAP)以及局部模型无关解释(LIME)等多种主流可解释机器学习的实现方法与生态学解释能力。研究表明, 尽管白盒模型的解释也属于可解释机器学习的范畴, 但当前其主要是一系列针对黑盒模型的事后解释方法的集成。其次, 不同方法在解释层级、适用模型及可视化表达方面各具优势。可解释机器学习能在一定程度上填补了复杂模型预测性能与生态学解释需求之间的鸿沟, 但需要基于数据情况和研究问题进行选择性应用。本文可为生态学研究人员提供可操作的分析框架, 并强调可解释机器学习应当作为当前主流统计建模的重要补充, 将在未来生态学研究中具有广阔的应用前景。

关键词: 机器学习, 生态学解释, 随机森林, 黑盒模型, 植物多样性

Abstract

Aims: The increasing adoption of machine learning in ecological research has enabled the modeling of complex, nonlinear ecological patterns. However, the “black-box” nature of many machine learning models limits their interpretability, hindering the extraction of ecological insights. This review aims to introduce the core concepts, methods, and practical tools of interpretable machine learning (IML), and to demonstrate how these techniques can enhance ecological understanding from predictive models. 

Methods: We first clarify key distinctions among white-box model and black-box model, global interpretability and local interpretability, and intrinsic interpretability versus post-hoc interpretability models. Using a simulated dataset representing plant diversity and environmental variables (e.g., elevation, temperature, soil moisture), we apply both white-box models (e.g., linear regression, decision trees) and black-box models (e.g., random forest) to illustrate major interpretability techniques, including regression coefficients, permutation importance, partial dependence plots (PDP), accumulated local effects (ALE), Shapley additive explanations (SHAP), and local interpretable model-agnostic explanations (LIME). 

Results: White-box models offer direct and transparent interpretability through their model structure, while black-box models require additional tools to derive explanations. Our case study shows that both model types can yield consistent insights about variable importance and ecological relationships. Furthermore, methods such as ALE and SHAP effectively address common limitations in conventional approaches like PDP by accounting for feature interactions and dependencies. 

Conclusion: IML provides a valuable toolkit for improving model transparency and interpretability in ecological research. It serves as a crucial complement to traditional statistical modeling, enabling researchers to extract meaningful ecological interpretations from complex models. As ecological data and modeling complexity continue to grow, the integration of IML techniques will become increasingly important for hypothesis generation and ecological decision-making.

Key words: machine learning, ecological interpretation, random forest, black-box model, plant diversity