Biodiv Sci ›› 2025, Vol. 33 ›› Issue (8): 25184.  DOI: 10.17520/biods.2025184  cstr: 32101.14.biods.2025184

• Special Feature: Genetic Diversity and Conservation • Previous Articles     Next Articles

Spatiotemporal pattern analysis of eukaryotic genetic data based on the GenBank database

Xin Peng, Chuan Liu, Xiaolei Hunag*   

  1. State Key Laboratory of Agricultural and Forestry Biosecurity, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, China
  • Received:2025-05-19 Revised:2025-08-24 Accepted:2025-09-12 Online:2025-08-20 Published:2025-09-17
  • Contact: Xiaolei Huang

Abstract:

Aims: Genetic data are playing an increasingly vital role in biodiversity research and conservation practices. However, researchers often face constraints such as data quality deficiencies and uneven geographic or taxonomic distribution when utilizing this data. While the genetic data patterns of terrestrial vertebrates have been extensively studied, the spatial distribution patterns of genetic data for global plants, fungi, and other animal groups still lack systematic empirical research. This study aims to assess the current coverage of genetic data across three major eukaryotic kingdoms (Animalia, Plantae, and Fungi), focusing on representative molecular markers to analyze metadata completeness and spatiotemporal distribution patterns, thereby identifying key bottlenecks in biodiversity research applications. 

Methods: This study employed a multi-scale analytical approach to evaluate genetic data across the three eukaryotic kingdoms. First, comprehensive statistical analyses were conducted on sequence and genome datasets. We then specifically assessed metadata completeness for three standard DNA barcodes: cytochrome c oxidase subunit I (COI; Animalia), ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit (rbcL; Plantae), and internal transcribed spacer (ITS; Fungi). Finally, we systematically analyzed the spatial distribution patterns and interannual variation trends of these genetic data using geographic grids of different resolutions (4° × 4° for global scale and 2° × 2° for China). 

Results: The results demonstrate that the kingdom Animalia possesses approximately 270 million sequences and 16,000 genomic datasets, surpassing both Plantae (approximately 140 million sequences, 7,000 genomes) and Fungi (approximately 20 million sequences, 17,000 genomes). Geographic metadata deficiencies were prevalent across all three standard barcode markers (COI, rbcL, and ITS). ITS sequences exhibiting the highest rate of missing geographic coordinate data (92.07%), followed by rbcL (83.19%) and COI (26.40%). The spatiotemporal distribution pattern demonstrates a distinct “Northern Hemisphere Centralization” at the global scale, with North America, Western Europe, and East Asia being the dominant regions, while the Southern Hemisphere generally lacks data; At the same time, a declining trend was observed in COI and rbcL data, while ITS data exhibited rapid growth. In China, a unique distribution emerged, characterized by “Southern Animalia, Eastern Plantae, and Northern Fungi”, with significant data shortages in the northwest region. Over time, data for Plantae and Fungi in China continue to grow, while data for Animalia remain stable. 

Conclusion: These findings highlight that deficiencies in genetic data quality and imbalances in spatial distribution have become important bottlenecks restricting biodiversity research. To address these issues, we recommend the establishment of stringent metadata archiving standards, increased scientific research investment in underrepresented areas such as the Southern Hemisphere and Northwest China, and the promotion of equitable global data resource allocation through the construction of an international scientific research cooperation network. These measures aim to enhance the application value of genetic data in biodiversity research and conservation practices.

Key words: genetic data, DNA barcode, genome, spatiotemporal distribution pattern