生物多样性

• • 上一篇    下一篇

可解释机器学习及其生态学应用

石亚飞1*, 牛富荣2, 黄晓敏3, 洪星1, 龚相文4, 王艳莉2, 林栋1, 柳小妮1   

  1. 1.甘肃农业大学 草业学院, 兰州 730070 2.甘肃农业大学 林学院, 兰州 730070 3.扬州大学 农学院, 江苏扬州 225009 4.西南大学 地理科学学院, 重庆 400715
  • 收稿日期:2025-06-05 修回日期:2025-09-02
  • 通讯作者: 石亚飞
  • 基金资助:
    固沙年限与降水变化对油蒿土壤种子库自然更新的影响研究(3230131058)

Interpretable Machine Learning and Its Applications in Ecology

Yafei Shi1*, Furong Niu2, Xiaomin Huang3, Xing Hong1, Xiangwen Gong4, Yanli Wang2, Dong Lin1, Xiaoni Liu1   

  1. 1.Pratacultural College, Gansu Agricultural University, Lanzhou 730070, China 

    2.College of Forestry, Gansu Agricultural University, Lanzhou 730070, China 

    3.Agricultural College, Yangzhou University, Yangzhou, Jiangsu 225009, China 

    4.College of Geographical Sciences, Southwest University, Chongqing 400715, China

  • Received:2025-06-05 Revised:2025-09-02
  • Contact: Yafei Shi

摘要: 近年来, 机器学习在生态学领域的应用日益广泛, 尤其在复杂的非线性数据建模方面展现出强大优势。然而, 机器学习的“黑箱属性”使其难以提供清晰的结果解释, 限制了其应用范围。为解决机器学习的不透明问题, 可解释机器学习(interpretable machine learning, IML)应运而生, 它致力于提高模型透明度并增强结果的可解释性。本文系统梳理了可解释机器学习中白盒与黑盒模型、全局与局部解释、内在解释与事后解释等基本概念, 并基于案例数据分别应用于线性回归、决策树与随机森林等模型, 展示了包括回归系数、特征重要性排序、部分依赖图、局部累计效应图、夏普利加性解释以及局部模型无关解释等多种主流可解释机器学习的实现方法与生态学解释能力。研究表明, 尽管白盒模型的解释也属于可解释机器学习的范畴, 但当前其主要是一系列针对黑盒模型的事后解释方法的集成。其次, 不同方法在解释层级、适用模型及可视化表达方面各具优势。可解释机器学习可在一定程度上解决复杂模型预测性能与生态学解释需求之间的鸿沟, 但需要基于数据情况和研究问题进行选择性应用。本文可为生态学研究人员提供可操作的可解释机器学习分析框架, 并强调其应当作为当前主流统计建模的重要补充, 将在未来生态学研究中具有广阔的应用前景。

关键词: 机器学习, 生态学解释, 随机森林, 黑箱模型, 植物多样性

Abstract

Aims: The increasing adoption of machine learning (ML) in ecological research has enabled the modeling of complex, nonlinear ecological patterns. However, the "black-box" nature of many ML models limits their interpretability, hindering the extraction of ecological insights. This review aims to introduce the core concepts, methods, and practical tools of interpretable machine learning (IML), and to demonstrate how these techniques can enhance ecological understanding from predictive models. 

Methods: We first clarify key distinctions among white-box and black-box models, global and local interpretability, and intrinsic versus post-hoc explanation frameworks. Using a simulated dataset representing plant diversity and environmental variables (e.g., elevation, temperature, soil moisture), we apply both white-box models (e.g., linear regression, decision trees) and black-box models (e.g., random forest) to illustrate major interpretability techniques, including regression coefficients, permutation importance, partial dependence plots (PDP), accumulated local effects (ALE), SHapley Additive exPlanations (SHAP) values, and Local Interpretable Model-agnostic Explanations (LIME). 

Results: White-box models offer direct and transparent interpretability through their model structure, while black-box models require additional tools to derive explanations. Our case study shows that both model types can yield consistent insights about variable importance and ecological relationships. Furthermore, methods such as ALE and SHAP effectively address common limitations in conventional approaches like PDP by accounting for feature interactions and dependencies. 

Conclusion: IML provides a valuable toolkit for improving model transparency and interpretability in ecological research. It serves as a crucial complement to traditional statistical modeling, enabling researchers to extract meaningful ecological interpretations from complex models. As ecological data and modeling complexity continue to grow, the integration of IML techniques will become increasingly important for hypothesis generation and ecological decision-making.

Key words: machine learning, ecological interpretation, random forest, black-box model, plant diversity