Biodiv Sci ›› 2026, Vol. 34 ›› Issue (1): 25263.  DOI: 10.17520/biods.2025263

• Special Feature: Ecological Data Analysis Methods • Previous Articles     Next Articles

An approach for estimating haplotype richness from sequences with unequal lengths

Yuan Jiang1, Beixi Huang1, Xueyuan Jia1, Si Liang1, Yutong Xie1, Ping Fan1*, Gang Song2   

  1. 1 Shaanxi University of Chinese Medicine, Xianyang, Shaanxi 712046, China 

    2 Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China

  • Received:2025-07-06 Revised:2025-09-28 Accepted:2026-01-21 Online:2026-01-20 Published:2026-01-22
  • Contact: Ping Fan

Abstract:

Aims: Traditional methods for calculating genetic diversity necessitate uniform sequence lengths within species. However, the sequences available in public databases often exhibit variability in length, thereby complicating the processes of haplotype identification and genetic diversity assessment. Although methods exist for estimating haplotype diversity and nucleotide diversity from sequences with unequal lengths, there is currently no effective methodology for calculating haplotype richness. 

Methods: In response to this issue, this research introduces a method to estimate haplotype richness for DNA sequences of varying lengths, utilizing the nucleotide differences between paired sequences (Kij). Three analyses were conducted to validate the method’s performance: (1) For sequences that were of equal length, the results obtained from our method were compared with those from DnaSP. (2) The algorithm’s performance with sequences of different lengths was tested by generating simulated sequences of random lengths from equal-length datasets of birds, mammals, and amphibians, and its generalization capability was evaluated using a medicinal plant dataset. (3) The method was employed to assess the latitudinal gradient patterns of haplotype richness in birds, mammals, and amphibians. 

Results: For sequences with equal length, the new method’s results were not significantly different from those of DnaSP (birds: W = 22,018, P = 0.845; mammals: W = 23,096, P = 0.990; amphibians: W = 3,518.5, P = 0.977) but it surpassed it in haplotype identification when base deletions occurred, identifying an average of 1.333 ± 0.188 more haplotypes. (2) Random-length simulation trials confirmed the effectiveness (The mean relative error indicated an overall accuracy of 0.130 ± 0.106, while the variance of relative error showed a stability of 0.007 ± 0.007) and applicability of this method in estimating haplotype richness for sequences of varying lengths. (3) An examination of latitudinal haplotype richness patterns found that birds and mammals exhibited a significant decreasing trend in the Southern Hemisphere but a relatively stable decreasing trend in the Northern Hemisphere, whereas amphibians showed a continuous decline from south to north. 

Conclusions: This study advances the development of more precise quantitative methodologies and introduces novel analytical tools for the conservation of genetic diversity.

Key words: genetic diversity, unequal length sequences, haplotype richness, method