生物多样性 ›› 2022, Vol. 30 ›› Issue (11): 22356.  DOI: 10.17520/biods.2022356

• 技术与方法 • 上一篇    下一篇

中国生物多样性在线数据处理平台的构建

邱金水1, 王亚楠2, 庄会富1,*()   

  1. 1.中国科学院昆明植物研究所, 昆明 650201
    2.中国科学院昆明动物研究所, 昆明 650201
  • 收稿日期:2022-06-29 接受日期:2022-09-17 出版日期:2022-11-20 发布日期:2022-10-22
  • 通讯作者: 庄会富
  • 作者简介: E-mail: zhuanghuifu@mail.kib.ac.cn
  • 基金资助:
    国家科技资源共享服务平台(国家重要野生植物种质资源库-NWPGRC-21);云南省生物资源数字化开发应用(202002AA100007);中国科学院网络安全和信息化专项(CAS-WX2022SDC-SJ01)

Construction of the Chinese biodiversity online data processing platform

Jinshui Qiu1, Yanan Wang2, Huifu Zhuang1,*()   

  1. 1. Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201
    2. Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650201
  • Received:2022-06-29 Accepted:2022-09-17 Online:2022-11-20 Published:2022-10-22
  • Contact: Huifu Zhuang

摘要:

高质量的生物多样性数据能够为生物多样性的研究与保护提供数据支撑。目前研究人员开发了大量的生物多样性数据处理软件或工具, 包括工作流系统、R语言包、Python语言包和Excel工具等, 但是使用这些软件或工具需要用户安装相应的软件客户端, 并掌握一定的编程语言、软件开发和复杂的Excel公式等知识和技能。为降低用户的学习成本和使用门槛, 本文采用了Browser/Server模式设计技术、Web技术、可视化技术、响应式开发技术、网络爬虫技术、数据处理技术和Solr智能检索技术等, 针对不同维度的生物多样性数据设计和开发了相应的数据处理模块, 构建了中国生物多样性在线数据处理平台(http://dp.iflora.cn/)。该平台能够有效地帮助科研人员对物种名称、地理位置、时间日期和经纬度等数据进行处理, 并提供数据格式转换、数据质量评测和资源统计分析等辅助功能, 帮助科研人员实现零代码和低门槛地处理生物多样性数据, 提供便捷、高效和简单的数据清洗、校正、转换和整合等数据处理渠道, 为生物多样性研究和保护提供信息化技术支持与服务。

关键词: 物种名称处理, 地理位置处理, 经纬度处理, 时间日期处理, 数据格式处理, 数据质量评测

Abstract

Aims: Biodiversity contributes to the most basic living environment and material conditions for human beings, and it serves as the basis for human survival and social development. But natural environmental change and over-interference of human behavior have caused a gradual loss of biodiversity. High-quality biodiversity data can facilitate biodiversity research and conservation in order to mitigate these losses. Currently, researchers have developed many biodiversity data processing tools, including workflow systems, R language packages, Python language packages, and Excel tools. However, using these software or tools require users to install the corresponding software clients and acquire certain knowledge and skills in utilizing programming languages, software development and complex Excel formulas. This all requires a high learning cost and usage threshold, rendering these tools inaccessible for some user. For this reason, this paper aims to describe a Chinese biodiversity online data processing platform (CBODPP) to aid researchers in achieving a zero code and a low usage threshold for biodiversity data processing work.

Methods: The CBODPP is designed in Browser/Server mode and implemented using a web-based client. The platform pages are developed based on reactive development technology, which is compatible with both computer and mobile browsers. The platform realizes service functions such as scientific name correction, geographic location analysis and inverse geocoding based on web crawler technology, data processing technology and Solr intelligent search technology. In addition, the platform has also developed corresponding data processing modules for biodiversity data of different dimensions. Users can process data in a specified column field individually, thus ensuring a high flexibility of data processing when utilizing this platform.

Results: In order to process biodiversity data, users do not need to install a workflow management system and create workflows, nor do they need to master complexcoding language such as Python or R. By accessing the CBODPP (http://dp.iflora.cn/), biodiversity data such as species name, geographical location, time, date, longitude and latitude can be processed online in a visual manner. In addition, the data processing platform also provides auxiliary functions such as data format conversion, data quality evaluation and resource statistical analysis.

Conclusion: The CBODPP can aid scientific researchers in processing biodiversity data with zero code and a low threshold, providing researchers with a convenient, efficient and simple data processing platform for data cleaning, correction, conversion and integration. Because of this, it provides support for a wide range of scientists in the field of biodiversity informatics, allowing researchers to focus more on scientific research in specialized areas of biodiversity rather than on software work learning to utilize software.

Key words: species name processing, geographical location processing, longitude and latitude processing, time and date processing, data format processing, data quality evaluation