生物多样性 ›› 2014, Vol. 22 ›› Issue (3): 277-284.doi: 10.3724/SP.J.1003.2014.13267

所属专题: 生物多样性信息学专题(II)

• • 上一篇    下一篇

流程化的生态建模方法与科学工作流系统

乔慧捷, 林聪田, 王江宁, 纪力强*()   

  1. 中国科学院动物研究所, 北京 100101
  • 收稿日期:2013-12-26 接受日期:2014-05-16 出版日期:2014-05-20
  • 通讯作者: 纪力强 E-mail:ji@ioz.ac.cn
  • 基金项目:
    国家自然科学基金(31100390)

Process-oriented ecological modeling approach and scientific workflow system

Huijie Qiao, Congtian Lin, Jiangning Wang, Liqiang Ji*()   

  1. Institute of Zoology, Chinese Academy of Sciences, Beijing 100101
  • Received:2013-12-26 Accepted:2014-05-16 Online:2014-05-20
  • Contact: Ji Liqiang E-mail:ji@ioz.ac.cn

科学工作流系统是由一系列经过特殊设计的数据分析与管理步骤组成的、按照一定的逻辑组织在一起, 并在给定的运行环境下, 完成特定科学研究的工作流管理系统。科学工作流系统致力于使全世界的科学家可以在一个简单易用的平台上交换思想, 共同设计全球尺度的实验, 共享数据、实验步骤与结果等。每一个科学家可以独立创建自己的工作流, 执行工作流并实时查看结果; 不同科学家之间也可以方便地共享和复用这些工作流。本文以开普勒系统(Kepler system)和生物多样性虚拟实验室(BioVeL)两个项目为例, 介绍了科学工作流的发展历史、背景、现有项目和应用等。以生态位模型工作流为例, 介绍了科学工作流的流程以及特点等。并通过对现有科学工作流的分析, 对其发展方向和存在的问题提出了自己的看法及预期。

关键词: 科学工作流, 生态建模, 生态位模型, 开普勒系统, 生物多样性虚拟实验室

A scientific workflow system is designed specifically to organize, manage and execute a series of research steps, or a workflow, in a given runtime environment. The vision for scientific workflow systems is that the scientists around the world can collaborate on designing global-scaled experiments, sharing the data sets, experimental processes, and results on an easy-to-use platform. Each scientist can create and execute their own workflows and view results in real-time, and then subsequently share and reuse workflows among other scientists. Two case studies, using the Kepler system and BioVeL, are introduced in this paper. Ecological niche modeling process, which is a specialized form of scientific workflow system included in both Kepler system and BioVeL, was used to describe and discuss the features, developmental trends, and problems of scientific workflows.

Key words: scientific workflow, ecological modeling, ecological niche model, Kepler system, BioVeL

图1

BioVeL的组织结构及工作原理"

表1

现有的两个科学工作流系统的特征比较"

开普勒系统 Kepler system BioVeL
运行方式 Operating mode 单机运行 Stand-alone 在线运行 Online
是否可自由组合 Can be combined freely? 是 Yes 否 No
是否可重复使用 Can be reused? 是 Yes 是 Yes
复杂程度 Complexity 复杂, 多变, 可自由组合
Complex, varied, can be combined
复杂与否与提供的服务相关Associated with the provided services
共享方式 Way to share 通过文件 Via files 在线服务 Online services
使用方便程度 Usability 复杂 Complex 简单 Simple
已有的数量 Number of instances 丰富 Abundance 有限 Limited
用户数量 Number of users 丰富 Abundance 测试阶段, 用户数量未知 Unknown

图2

开普勒系统中的生态位模型工作流"

[1] Barseghian D, Altintas I, Jones MB, Crawl D, Potter N, Gallagher J, Cornillon P, Schildhauer M, Borer ET, Seabloom EW, Hosseini PR (2010) Workflows and extensions to the Kepler scientific workflow system to support environmental sensor data access and analysis.Ecological Informatics, 5, 42-50.
[2] Birch LC (1953) Experimental background to the study of the distribution and abundance of insects. I. The influence of temperature, moisture and food on the innate capacity for increase of three grain beetles.Ecology, 34, 698-711.
[3] Bowers S, Timothy M, Sean R, Manish A, Bertram L (2008) Kepler/pPOD: scientific workflow and provenance support for assembling the tree of life. In: International Provenance and Annotation Workshop (eds Freire J, Koop D, Moreau L), pp. 70-77. Springer, Berlin, Heidelberg.
[4] Busby JR (1991) BIOCLIM—a bioclimate analysis and prediction system.Plant Protection Quarterly, 6, 8-9.
[5] Costa J, Peterson AT (2012) Ecological niche modeling as a tool for understanding distributions and interactions of vectors, hosts, and etiologic agents of Chagas disease.Advances in Experimental Medicine and Biology, 710, 59-70.
[6] Dietterich TG (2009) Machine learning and ecosystem informatics: challenges and opportunities. In: Advances in Machine Learning (eds Zhou ZH, Washio T), pp. 1-5.Springer, Berlin, Heidelberg.
[7] Ebeling SK, Welk E, Auge H, Bruelheide H (2008) Predicting the spread of an invasive plant: combining experiments and ecological niche model.Ecography, 31, 709-719.
[8] Elton CS (1927) Animal Ecology. University of Chicago Press, Chicago.
[9] Franklin J (2010) Moving beyond static species distribution models in support of conservation biogeography.Diversity and Distributions, 16, 321-330.
[10] Franklin J, Davis FW, Ikegami M, Syphard AD, Flint LE, Flint AL, Hannah L (2013) Modeling plant species distributions under future climates: How fine scale do climate projections need to be?Global Change Biology, 19, 473-483.
[11] Friedman JH (1991) Multivariate adaptive regression splines.The Annals of Statistics, 19, 1-67.
[12] Graham CH, Ferrier S, Huettman F, Moritz C, Peterson AT (2004) New developments in museum-based informatics and applications in biodiversity analysis.Trends in Ecology and Evolution, 19, 497-503.
[13] Grinnell J (1917) Field tests of theories concerning distribu- tional control.The American Naturalist, 51, 115-128.
[14] Guisan A, Edwards TC, Hastie T (2002) Generalized linear and generalized additive models in studies of species distributions: setting the scene.Ecological Modelling, 157, 89-100.
[15] Hadly EA, Spaeth PA, Li C (2009) Niche conservatism above the species level.Proceedings of the National Academy of Sciences, USA, 106, 19707-19714.
[16] Hijmans R, Guarino L, Mathur P, Jarvis A (2011) DIVA-GIS: Geographic Information System for Biodiversity Research.(2014-03-20)
[17] Hirzel AH, Hausser J, Chessel D, Perrin N (2002) Ecological niche factor analysis: How to compute habitat-suitability maps without absence data.Ecology, 83, 2027-2036.
[18] Joppa LN, McInerny G, Harper R, Salido L, Takeda K, O’Hara K, Gavaghan D, Emmott S (2013) Troubling trends in scientific software use.Science, 340, 814-815.
[19] Lin C (2013) Taxonomic Tree Tool.
[20] Liu J, Möller M, Provan J, Gao LM, Poudel RC, Li DZ (2013) Geological and ecological factors drive cryptic speciation of yews in a biodiversity hotspot.The New Phytologist, 199, 1093-1108.
[21] Ludäscher B, Altintas I, Berkley C, Higgins D, Jaeger E, Jones M, Lee EA, Tao J, Zhao Y (2006) Scientific workflow management and the Kepler system.Concurrency and Computation: Practice and Experience, 18, 1039-1065.
[22] Ludascher B, Altintas I, Bowers S, Cummings J, Critchlow T, Deelman E, DeRoure D, Freire J, Goble C, Jones M (2009) Scientific process automation and workflow management. In: Scientific Data Management: Challenges, Technology, and Deployment (eds Shoshani A, Rotem D), pp. 467-508. Chapman and Hall , London.
[23] Mbogga MS, Wang XL, Hamann A (2010) Bioclimate envelope model predictions for natural resource management: dealing with uncertainty.Journal of Applied Ecology, 47, 731-740.
[24] McPhillips TM, Bowers S (2005) An approach for pipelining nested collections in scientific workflows.ACM SIGMOD Record, 34, 12-17.
[25] McPhillips T, Bowers S, Zinn D, Ludaescher B (2009) Scientific workflow design for mere mortals.Future Generation Computer Systems, 25, 541-551.
[26] Muñoz MES, Giovanni R, Siqueira MF, Sutton T, Brewer P, Pereira RS, Canhos DAL, Canhos VP (2011) Open- Modeller: a generic approach to species’ potential distribu- tion modelling.GeoInformatica, 15, 111-135.
[27] Olden JD, Lawler JJ, Poff NL (2008) Machine learning methods without tears: a primer for ecologists.The Quarterly Review of Biology, 83, 171-193.
[28] Patricia DH, Pamela JW (2006) Modern machine learning for automatic optimization algorithm selection. In: Proceedings of the INFORMS Artificial Intelligence and Data Mining Workshop. Citeseer.
[29] Pennington DD, Michener WK (2005) The EcoGrid and the Kepler workflow system: a new platform for conducting ecological analyses.Bulletin of the Ecological Society of America, 86, 169-176.
[30] Peterson AT, Sánchez-Cordero V, Beard C, Ramsey J (2002) Ecologic niche modeling and potential reservoirs for Chagas disease, Mexico.Emerging Infectious Diseases, 8, 662-667.
[31] Peterson AT, Soberón J, Sánchez-Cordero V (1999) Conservatism of ecological niches in evolutionary time.Science, 285, 1265-1267.
[32] Peterson AT (2006) Ecological niche modeling and spatial patterns of disease transmission.Emerging Infectious Diseases, 12, 1822-1826.
[33] Peterson AT, Ammann CM (2013) Global patterns of connectivity and isolation of populations of forest bird species in the late Pleistocene.Global Ecology and Biogeography, 22, 596-606.
[34] Phillips SJ, Dudík M, Schapire RE (2005) Maxent software for species distribution modeling.. (2014-03-21)
[35] Phillips SJ, Anderson RP, Schapire RE (2006) Maximum entropy modeling of species geographic distributions.Ecological Modelling, 190, 231-259.
[36] Phillips SJ, Dudík M (2008) Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation.Ecography, 31, 161-175.
[37] Phillips SJ, Dudík M, Schapire RE (2004) A maximum entropy approach to species distribution modeling. In: Proceedings of the Twenty-first International Conference on Machine Learning, p. 83. ACM. Banff, Alberta, Canada.
[38] Qiao HJ (乔慧捷), Hu JH (胡军华), Huang JH (黄继红) (2013) Theoretical basis, future directions, and challenges for ecological niche models.Science China: Life Sciences(中国科学: 生命科学), 43, 915-927. (in Chinese with English abstract)
[39] Rangel TF, Loyola RD (2012) Labeling ecological niche models.Natureza & Conservacao, 10, 119-126.
[40] Santana FS, Fonseca RR, Saraiva AM, Corrêa PLP, Bravo C, Giovanni R (2006) OpenModeller—an open framework for ecological niche modeling: analysis and future improvements. In: Proceedings of the World Conference on Computers in Agriculture and Natural Resources. Orlando, Florida, USA.
[41] Sobero?n J, Peterson AT (2011) Ecological niche shifts and environmental space anisotropy: a cautionary note.Revista Mexicana de Biodiversidad, 82, 1348-1355.
[42] Stockwell D (1999) The GARP modelling system: problems and solutions to automated spatial prediction.International Journal of Geographical Information Science, 13, 143-158.
[43] Swemmer LK, Taljaard S (2011) SANParks, people and adaptive management: understanding a diverse field of practice during changing times.Koedoe, 53, 199-205.
[44] Taylor I, Deelman E, Gannon D (2006) Workflows for e-Science: Scientific Workflows for Grids. Springer, Berlin.
[45] Thuiller W, Lafourcade B, Engler R, Araújo MB (2009) BIOMOD—a platform for ensemble forecasting of species distributions.Ecography, 32, 369-373.
[46] Thuiller W, Richardson DM, Pysek P, Midgley GF, Hughes GO, Rouget M (2005) Niche-based modelling as a tool for predicting the risk of alien plant invasions at a global scale.Global Change Biology, 11, 2234-2250.
[47] Václavík T, Meentemeyer RK (2009) Invasive species distribution modeling (iSDM): Are absence data and dispersal constraints needed to predict actual distributions?Ecological Modelling, 220, 3248-3258.
[48] van Derwal J, Falconi L, Januchowski S, Shoo L, Storlie C (2011) SDMTools—Species distribution modelling tools: tools for processing data associated with species distribution modelling exercises.. 2014-04-20.
[49] Walker PA, Cocks KD (1991) HABITAT: a procedure for modelling a disjoint environmental envelope for a plant or animal species.Global Ecology and Biogeography Letters, 1, 108-118.
[50] Wang JN, Ji LQ, Liang AP, Yuan DC (2012a) The identification of butterfly families using content-based image retrieval.Biosystems Engineering, 111, 24-32.
[51] Wang JN, Lin CT, Ji LQ, Liang AP (2012b) A new automatic identification system of insect images at the order level.Knowledge-based Systems, 33, 102-110.
[52] Wang LS (王利松), Chen B (陈彬), Ji LQ (纪力强), Ma KP (马克平) (2010) Progress in biodiversity informatics.Biodiversity Science(生物多样性), 18, 429-443. (in Chinese with English abstract)
[53] Warren DL, Glor RE, Turelli M (2008) Environmental niche equivalency versus conservatism: quantitative approaches to niche evolution.Evolution, 62, 2868-2883.
[54] Warren DL, Glor RE, Turelli M (2010) ENMTools: a toolbox for comparative studies of environmental niche models.Ecography, 33, 607-611.
[55] Wiens JJ, Graham CH (2005) Niche conservatism: integrating evolution, ecology, and conservation biology.Annual Review of Ecology, Evolution, and Systematics, 36, 519-539.
[56] Xu ZP (许哲平), Qin HN (覃海宁), Ma KP (马克平), Bao BJ (包伯坚), Li Y (李奕), Zhao LN (赵莉娜) (2012) Research on management, sharing and application of natural science and technology resources: taking Chinese Virtual Herbarium (CVH) for an example.China Science & Technology Resources Review(中国科技资源导刊), 44, 27-33. (in Chinese with English abstract)
[57] Zhu GP (朱耿平), Liu GQ (刘国卿), Bu WJ (卜文俊), Gao YB (高玉葆) (2013) Ecological niche modeling and its applications in biodiversity conservation.Biodiversity Science(生物多样性), 21, 90-98. (in Chinese with English abstract)
[1] 范靖宇, 李汉芃, 杨琢, 朱耿平. (2019) 基于本土最优模型模拟入侵物种水盾草在中国的潜在分布. 生物多样性, 27(2): 140-148.
[2] 王波, 黄勇, 李家堂, 戴强, 王跃招, 杨道德. (2018) 西南喀斯特地貌区两栖动物丰富度分布格局与环境因子的关系. 生物多样性, 26(9): 941-950.
[3] 丁晨晨, 胡一鸣, 李春旺, 蒋志刚. (2018) 印度野牛在中国的分布及其栖息地适宜性分析. 生物多样性, 26(9): 951-961.
[4] 周中一, 刘冉, 时书纳, 苏艳军, 李文楷, 郭庆华. (2018) 基于激光雷达数据的物种分布模拟: 以美国加州内华达山脉南部区域食鱼貂分布模拟为例. 生物多样性, 26(8): 878-891.
[5] 张琴, 张东方, 吴明丽, 郭杰, 孙成忠, 谢彩香. (2017) 基于生态位模型预测天麻全球潜在适生区. 植物生态学报, 41(7): 770-778.
[6] 叶俊伟, 袁永革, 蔡荔, 王晓娟. (2017) 中国东北温带针阔混交林植物物种的谱系地理研究进展. 生物多样性, 25(12): 1339-1349.
[7] 方晓峰, 杨庆松, 刘何铭, 马遵平, 董舒, 曹烨, 袁铭皎, 费希旸, 孙小颖, 王希华. (2016) 天童常绿阔叶林中常绿与落叶物种的物种多度分布格局. 生物多样性, 24(6): 629-638.
[8] 崔相艳, 王文娟, 杨小强, 李述, 秦声远, 戎俊. (2016) 基于生态位模型预测野生油茶的潜在分布. 生物多样性, 24(10): 1117-1128.
[9] 朱耿平, 乔慧捷. (2016) Maxent模型复杂度对物种潜在分布区预测的影响. 生物多样性, 24(10): 1189-1196.
[10] 朱耿平, 刘强, 高玉葆. (2014) 提高生态位模型转移能力来模拟入侵物种 的潜在分布. 生物多样性, 22(2): 223-230.
[11] 朱耿平, 刘国卿, 卜文俊, 高玉葆. (2013) 生态位模型的基本原理及其在生物多样性保护中的应用. 生物多样性, 21(1): 90-98.
[12] 闫琰, 张春雨, 赵秀海. (2012) 长白山不同演替阶段针阔混交林群落物种多度分布格局. 植物生态学报, 36(9): 923-934.
[13] 高利霞, 毕润成, 闫明. (2011) 山西霍山油松林的物种多度分布格局. 植物生态学报, 35(12): 1256-1270.
[14] 林聪田, 纪力强. (2010) PSDS 2.0: 一个基于GIS和多个模型的生物潜在分布地预测系统. 生物多样性, 18(5): 461-472.
[15] 陈立立, 余岩, 何兴金. (2008) 喜旱莲子草在中国的入侵和扩散动态及其潜在分布区预测. 生物多样性, 16(6): 578-585.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed