Biodiv Sci ›› 2023, Vol. 31 ›› Issue (1): 22094.

• Technology and Methodology •

### Generating pseudo-absence samples of invasive species based on the similarity of geographical environment in the Yangtze River Economic Belt

Weifeng Xiao1,2,3, Lüxing Zuo1, Wentao Yang1,3,*(), Chaokui Li3

1. 1. School of Earth Sciences and Spatial Information Engineering, Hunan University of Science and Technology, Xiangtan, Hunan 411201
2. Hunan Geological Disaster Monitoring, Early Warning and Emergency Rescue Engineering Technology Research Center, Changsha 410004
3. Hunan Provincial Key Laboratory of Geo-Information Engineering in Surveying, Mapping and Remote Sensing, Hunan University of Science and Technology, Xiangtan, Hunan 411201
• Received:2022-03-01 Accepted:2022-06-01 Online:2023-01-20 Published:2022-06-23
• Contact: Wentao Yang

Abstract:

Aims: Obtaining diversity samples of invasive alien species is crucial for species spatial distribution models. This includes both invasive species presence and absence samples. However, most invasive species data sets lack explicit spatial information for absent species samples. Consequently, generating effective pseudo-absence samples of invasive species is a significant issue for constructing species spatial distribution models. This paper proposed an invasive species pseudo-absence sampling method based on the similarities of geographical environments.

Methods: First, the principal component analysis (PCA) was used to model the linear correlation of the original variables. Then the K-means algorithm was used to cluster the invasive species samples and calculate the geographic environment similarity of each of them based on the PCA components. Second, the pseudo-absence samples of invasive species were generated by establishing a framework for similarity measurement of PCA components and confidence level calculation of pseudo-absence samples. Finally, based on random sampling, one-class support vector machine (OCSVM) and the proposed approach, the logistic regression and support vector machine (SVM) were adopted to implement the accuracy evaluation by using the dataset regarding the invasive species Erigeron annuus in the Yangtze River Economic Belt.

Results Compared with random sampling and OCSVM, the proposed sampling approach had better prediction results from logistic regression and SVM, and the feasibility and effectiveness of the proposed approach were validated.

ConclusionsThe strategy for generating pseudo-absence samples based on the similarity of geographical environments solves the problem of erroneously sampling potential invasive species presence owing to random sampling, meanwhile, the confidence level of species absence samples can be used to obtain different levels of adaptive areas of invasive species.