可解释机器学习及其生态学应用

doi:10.17520/biods.2025210

生物多样性

可解释机器学习及其生态学应用

石亚飞^1*, 牛富荣², 黄晓敏³, 洪星¹, 龚相文⁴, 王艳莉², 林栋¹, 柳小妮¹

1.甘肃农业大学草业学院, 兰州 730070 2.甘肃农业大学林学院, 兰州 730070 3.扬州大学农学院, 江苏扬州 225009 4.西南大学地理科学学院, 重庆 400715

收稿日期:2025-06-05 修回日期:2025-09-02
通讯作者: 石亚飞
基金资助:
固沙年限与降水变化对油蒿土壤种子库自然更新的影响研究(3230131058)

Interpretable Machine Learning and Its Applications in Ecology

Yafei Shi^1*, Furong Niu², Xiaomin Huang³, Xing Hong¹, Xiangwen Gong⁴, Yanli Wang², Dong Lin¹, Xiaoni Liu¹

1.Pratacultural College, Gansu Agricultural University, Lanzhou 730070, China

2.College of Forestry, Gansu Agricultural University, Lanzhou 730070, China

3.Agricultural College, Yangzhou University, Yangzhou, Jiangsu 225009, China

4.College of Geographical Sciences, Southwest University, Chongqing 400715, China

Received:2025-06-05 Revised:2025-09-02
Contact: Yafei Shi

摘要/Abstract

摘要： 近年来, 机器学习在生态学领域的应用日益广泛, 尤其在复杂的非线性数据建模方面展现出强大优势。然而, 机器学习的“黑箱属性”使其难以提供清晰的结果解释, 限制了其应用范围。为解决机器学习的不透明问题, 可解释机器学习(interpretable machine learning, IML)应运而生, 它致力于提高模型透明度并增强结果的可解释性。本文系统梳理了可解释机器学习中白盒与黑盒模型、全局与局部解释、内在解释与事后解释等基本概念, 并基于案例数据分别应用于线性回归、决策树与随机森林等模型, 展示了包括回归系数、特征重要性排序、部分依赖图、局部累计效应图、夏普利加性解释以及局部模型无关解释等多种主流可解释机器学习的实现方法与生态学解释能力。研究表明, 尽管白盒模型的解释也属于可解释机器学习的范畴, 但当前其主要是一系列针对黑盒模型的事后解释方法的集成。其次, 不同方法在解释层级、适用模型及可视化表达方面各具优势。可解释机器学习可在一定程度上解决复杂模型预测性能与生态学解释需求之间的鸿沟, 但需要基于数据情况和研究问题进行选择性应用。本文可为生态学研究人员提供可操作的可解释机器学习分析框架, 并强调其应当作为当前主流统计建模的重要补充, 将在未来生态学研究中具有广阔的应用前景。

关键词: 机器学习, 生态学解释, 随机森林, 黑箱模型, 植物多样性

Abstract

Aims: The increasing adoption of machine learning (ML) in ecological research has enabled the modeling of complex, nonlinear ecological patterns. However, the "black-box" nature of many ML models limits their interpretability, hindering the extraction of ecological insights. This review aims to introduce the core concepts, methods, and practical tools of interpretable machine learning (IML), and to demonstrate how these techniques can enhance ecological understanding from predictive models.

Methods: We first clarify key distinctions among white-box and black-box models, global and local interpretability, and intrinsic versus post-hoc explanation frameworks. Using a simulated dataset representing plant diversity and environmental variables (e.g., elevation, temperature, soil moisture), we apply both white-box models (e.g., linear regression, decision trees) and black-box models (e.g., random forest) to illustrate major interpretability techniques, including regression coefficients, permutation importance, partial dependence plots (PDP), accumulated local effects (ALE), SHapley Additive exPlanations (SHAP) values, and Local Interpretable Model-agnostic Explanations (LIME).

Results: White-box models offer direct and transparent interpretability through their model structure, while black-box models require additional tools to derive explanations. Our case study shows that both model types can yield consistent insights about variable importance and ecological relationships. Furthermore, methods such as ALE and SHAP effectively address common limitations in conventional approaches like PDP by accounting for feature interactions and dependencies.

Conclusion: IML provides a valuable toolkit for improving model transparency and interpretability in ecological research. It serves as a crucial complement to traditional statistical modeling, enabling researchers to extract meaningful ecological interpretations from complex models. As ecological data and modeling complexity continue to grow, the integration of IML techniques will become increasingly important for hypothesis generation and ecological decision-making.

Key words: machine learning, ecological interpretation, random forest, black-box model, plant diversity

石亚飞, 牛富荣, 黄晓敏, 洪星, 龚相文, 王艳莉, 林栋, 柳小妮. 可解释机器学习及其生态学应用. 生物多样性, DOI: 10.17520/biods.2025210.

Yafei Shi, Furong Niu, Xiaomin Huang, Xing Hong, Xiangwen Gong, Yanli Wang, Dong Lin, Xiaoni Liu. Interpretable Machine Learning and Its Applications in Ecology. Biodiversity Science, DOI: 10.17520/biods.2025210.

导出引用管理器 EndNote|Ris|BibTeX

链接本文: https://www.biodiversity-science.net/CN/10.17520/biods.2025210

[1]	林秦文, 张娜, 王强. 中国盐生植物编目和分布数据集[J]. 生物多样性, 2025, 33(7): 25030-.
[2]	李帅, 刘卫华, 许驭丹, 田晓波, 宋厚娟, 岳晓婷, 武玲玲, 张青, 上官铁梁. 山西省野生维管植物编目和分布数据集[J]. 生物多样性, 2025, 33(7): 24317-.
[3]	杨泉峰, 唐艳杰, 肖海军, 王颖, 张蓉, 欧阳芳, 魏淑花. 宁夏不同草原类型植物多样性-蝗虫-步甲的级联效应及对初级生产力的影响[J]. 生物多样性, 2025, 33(6): 25021-.
[4]	张浩斌, 肖路, 刘艳杰. 夜间灯光对外来入侵植物和本地植物群落多样性和生长的影响[J]. 生物多样性, 2025, 33(4): 24553-.
[5]	宋威, 程才, 王嘉伟, 吴纪华. 土壤微生物对植物多样性-生态系统功能关系的调控作用[J]. 生物多样性, 2025, 33(4): 24579-.
[6]	连佳丽, 陈婧, 杨雪琴, 赵莹, 罗叙, 韩翠, 赵雅欣, 李建平. 荒漠草原植物多样性和微生物多样性对降水变化的响应[J]. 生物多样性, 2024, 32(6): 24044-.
[7]	万凤鸣, 万华伟, 张志如, 高吉喜, 孙晨曦, 王永财. 草地植物多样性无人机调查的应用潜力[J]. 生物多样性, 2024, 32(3): 23381-.
[8]	张乃鹏, 梁洪儒, 张焱, 孙超, 陈勇, 王路路, 夏江宝, 高芳磊. 土壤类型和地下水埋深对黄河三角洲典型盐沼植物群落空间分异的影响[J]. 生物多样性, 2024, 32(2): 23370-.
[9]	蒋陈焜, 郁文彬, 饶广远, 黎怀成, Julien B. Bachelier, Hartmut H. Hilger, Theodor C. H. Cole. 植物系统发生海报——以演化视角介绍植物多样性的科教资料项目[J]. 生物多样性, 2024, 32(11): 24210-.
[10]	韩赟, 迟晓峰, 余静雅, 丁旭洁, 陈世龙, 张发起. 青海野生维管植物名录[J]. 生物多样性, 2023, 31(9): 23280-.
[11]	陈又生, 宋柱秋, 卫然, 罗艳, 陈文俐, 杨福生, 高连明, 徐源, 张卓欣, 付鹏程, 向春雷, 王焕冲, 郝加琛, 孟世勇, 吴磊, 李波, 于胜祥, 张树仁, 何理, 郭信强, 王文广, 童毅华, 高乞, 费文群, 曾佑派, 白琳, 金梓超, 钟星杰, 张步云, 杜思怡. 西藏维管植物多样性编目和分布数据集[J]. 生物多样性, 2023, 31(9): 23188-.
[12]	宋柱秋, 叶文, 董仕勇, 金梓超, 钟星杰, 王震, 张步云, 徐晔春, 陈文俐, 李世晋, 姚纲, 徐洲锋, 廖帅, 童毅华, 曾佑派, 曾云保, 陈又生. 广东省高等植物多样性编目和分布数据集[J]. 生物多样性, 2023, 31(9): 23177-.
[13]	梁彩群, 陈玉凯, 杨小波, 张凯, 李东海, 江悦馨, 李婧涵, 王重阳, 张顺卫, 朱子丞. 海南省野生维管植物编目和分布数据集[J]. 生物多样性, 2023, 31(6): 23067-.
[14]	李仕裕, 张奕奇, 邹璞, 宁祖林, 廖景平. 广东省植物园植物多样性迁地保护现状及发展建议[J]. 生物多样性, 2023, 31(6): 22647-.
[15]	吴浩, 余玉蓉, 王佳钰, 赵媛博, 高娅菲, 李小玲, 卜贵军, 薛丹, 吴林. 低水位增加灌木多样性和生物量但降低土壤有机碳含量: 以鄂西南贫营养泥炭地为例[J]. 生物多样性, 2023, 31(3): 22600-.