生物多样性 ›› 2023, Vol. 31 ›› Issue (1): 22094.  DOI: 10.17520/biods.2022094

• 技术与方法 • 上一篇    下一篇

基于地理环境相似度的长江经济带入侵物种虚拟负样本生成方法

肖巍峰1,2,3, 左绿荇1, 杨文涛1,3,*(), 李朝奎3   

  1. 1.湖南科技大学地球科学与空间信息工程学院, 湖南湘潭 411201
    2.湖南省地质灾害监测预警与应急救援工程技术研究中心, 长沙 410004
    3.湖南科技大学测绘遥感重点实验室, 湖南湘潭 411201
  • 收稿日期:2022-03-01 接受日期:2022-06-01 出版日期:2023-01-20 发布日期:2022-06-23
  • 通讯作者: 杨文涛
  • 作者简介:*E-mail: yangwentao8868@126.com
  • 基金资助:
    湖南省自然科学基金创新研究群体项目(2020JJ1003);湖南省教育厅科研项目(20C0805)

Generating pseudo-absence samples of invasive species based on the similarity of geographical environment in the Yangtze River Economic Belt

Weifeng Xiao1,2,3, Lüxing Zuo1, Wentao Yang1,3,*(), Chaokui Li3   

  1. 1. School of Earth Sciences and Spatial Information Engineering, Hunan University of Science and Technology, Xiangtan, Hunan 411201
    2. Hunan Geological Disaster Monitoring, Early Warning and Emergency Rescue Engineering Technology Research Center, Changsha 410004
    3. Hunan Provincial Key Laboratory of Geo-Information Engineering in Surveying, Mapping and Remote Sensing, Hunan University of Science and Technology, Xiangtan, Hunan 411201
  • Received:2022-03-01 Accepted:2022-06-01 Online:2023-01-20 Published:2022-06-23
  • Contact: Wentao Yang

摘要:

入侵物种空间分布建模的核心数据源来源于物种多样性采样(物种出现点和未出现点), 然而, 大多数入侵物种标本库只记录物种出现点样本信息, 缺乏对未出现点(负样本)位置的记录。因此, 生成有效的入侵物种虚拟负样本是建立物种空间分布模型的关键。本文提出了一种基于地理环境相似度的虚拟负样本生成方法。首先利用主成分分析(PCA)方法对地理环境原始变量进行线性相关性建模, 基于提取的主成分, 采用K-means算法对入侵物种样本进行聚类分析并计算各样本的地理环境相似度。在此基础上, 通过建立基于主成分的入侵物种相似性度量与可信度计算框架来识别虚拟负样本。以长江经济带入侵物种一年蓬(Erigeron annuus)数据集为例, 分析了整个区域虚拟负样本的可信度。结果表明, 与空间随机采样和单类支持向量机采样相比, 用本研究提出的方法生成的样本数据建立的logistic回归和支持向量机预测结果更优, 验证了该方法的可行性与有效性。基于地理环境相似度的虚拟负样本抽样策略有助于解决由于随机采样而引起的误采样到潜在入侵点的难题, 同时负样本的可信度能有助于识别不同等级的入侵物种适应区。

关键词: 入侵物种, 地理环境, 可信度, 一年蓬, 长江经济带

Abstract

Aims: Obtaining diversity samples of invasive alien species is crucial for species spatial distribution models. This includes both invasive species presence and absence samples. However, most invasive species data sets lack explicit spatial information for absent species samples. Consequently, generating effective pseudo-absence samples of invasive species is a significant issue for constructing species spatial distribution models. This paper proposed an invasive species pseudo-absence sampling method based on the similarities of geographical environments.

Methods: First, the principal component analysis (PCA) was used to model the linear correlation of the original variables. Then the K-means algorithm was used to cluster the invasive species samples and calculate the geographic environment similarity of each of them based on the PCA components. Second, the pseudo-absence samples of invasive species were generated by establishing a framework for similarity measurement of PCA components and confidence level calculation of pseudo-absence samples. Finally, based on random sampling, one-class support vector machine (OCSVM) and the proposed approach, the logistic regression and support vector machine (SVM) were adopted to implement the accuracy evaluation by using the dataset regarding the invasive species Erigeron annuus in the Yangtze River Economic Belt.

Results Compared with random sampling and OCSVM, the proposed sampling approach had better prediction results from logistic regression and SVM, and the feasibility and effectiveness of the proposed approach were validated.

ConclusionsThe strategy for generating pseudo-absence samples based on the similarity of geographical environments solves the problem of erroneously sampling potential invasive species presence owing to random sampling, meanwhile, the confidence level of species absence samples can be used to obtain different levels of adaptive areas of invasive species.

Key words: invasive species, geographic environment, confidence level, Erigeron annuus, the Yangtze River Economic Belt