%A Weifeng Xiao, Lüxing Zuo, Wentao Yang, Chaokui Li %T Generating pseudo-absence samples of invasive species based on the similarity of geographical environment in the Yangtze River Economic Belt %0 Journal Article %D 2023 %J Biodiv Sci %R 10.17520/biods.2022094 %P 22094- %V 31 %N 1 %U {https://www.biodiversity-science.net/CN/abstract/article_82521.shtml} %8 2023-01-20 %X

Aims: Obtaining diversity samples of invasive alien species is crucial for species spatial distribution models. This includes both invasive species presence and absence samples. However, most invasive species data sets lack explicit spatial information for absent species samples. Consequently, generating effective pseudo-absence samples of invasive species is a significant issue for constructing species spatial distribution models. This paper proposed an invasive species pseudo-absence sampling method based on the similarities of geographical environments.

Methods: First, the principal component analysis (PCA) was used to model the linear correlation of the original variables. Then the K-means algorithm was used to cluster the invasive species samples and calculate the geographic environment similarity of each of them based on the PCA components. Second, the pseudo-absence samples of invasive species were generated by establishing a framework for similarity measurement of PCA components and confidence level calculation of pseudo-absence samples. Finally, based on random sampling, one-class support vector machine (OCSVM) and the proposed approach, the logistic regression and support vector machine (SVM) were adopted to implement the accuracy evaluation by using the dataset regarding the invasive species Erigeron annuus in the Yangtze River Economic Belt.

Results Compared with random sampling and OCSVM, the proposed sampling approach had better prediction results from logistic regression and SVM, and the feasibility and effectiveness of the proposed approach were validated.

ConclusionsThe strategy for generating pseudo-absence samples based on the similarity of geographical environments solves the problem of erroneously sampling potential invasive species presence owing to random sampling, meanwhile, the confidence level of species absence samples can be used to obtain different levels of adaptive areas of invasive species.