Biodiv Sci

Previous Articles     Next Articles

Cautions on using chloroplast genome-based phylogeny for species identification and biogeography: A case study of Pterocarya

Shanshan Wang1,2, Qi Liu2,3, Yu Li2, Tianrui Wang2, Xiling Dai3, Gregor Kozlowski2,4,5, Jin Xu1*, Yigang Song2*   

  1. 1 Faculty of Urban Construction and Eco-Technology, Shanghai Institute of Technology, Shanghai 201418, China 

    2 Key Laboratory of East China Plant Conservation and Utilization, National Forestry and Grassland Administration, Shanghai Chenshan Botanical Garden, Shanghai 201602, China 

    3 Shanghai Normal University, School of Life Sciences, Shanghai 200234, China 

    4 Department of Biology and Botanic Garden, University of Fribourg, Fribourg 1700, Switzerland 

    5 Natural History Museum Fribourg, Fribourg 1700, Switzerland

  • Received:2025-11-25 Revised:2026-02-20 Accepted:2026-04-17
  • Contact: Jin Xu, Yigang Song

Abstract:

Aim: The chloroplast genome has been widely used for plant species identification, delimitation, and phylogenetic reconstruction due to its structural conservation, maternal uniparental inheritance, and high copy number. However, the chloroplast genome cannot effectively reflect pollen-mediated gene flow, is sensitive to genetic drift, and is susceptible to incomplete lineage sorting and hybridization events. Therefore, in plant identification and population evolutionary history studies, relying solely on chloroplast genome evidence may lead to biased or incomplete conclusions. 

Methods: This study performed whole-chloroplast genome sequencing, assembly, and annotation for seven Pterocarya individuals, and conducted comparative genomic and phylogenetic analyses incorporating published data. 

Results: The chloroplast genome ranged in length from 160,176 to 160,318 bp and exhibited a typical quadripartite structure, encoding 131 genes (86 protein-coding genes, 8 rRNAs, and 37 tRNAs). The number of SSR loci ranged from 87 to 96, dominated by mononucleotide repeats. Comparative genomic analysis indicated that chloroplast genomes among Pterocarya species were highly conserved overall. The sequence conservation of the inverted repeat regions and coding regions was higher than that of the single-copy regions and non-coding regions. Notably, within the coding regions, the ndhF gene exhibited significantly higher sequence variation than other protein-coding genes. The results of phylogenetic analysis showed that the phylogenetic trees constructed based on different data partitions (whole genome, protein-coding sequence, hypervariable regions, and sliding windows) exhibited various topological structures with two to five major clades. Individuals of species such as P. stenoptera, P. macroptera var. macroptera, and P. insignis consistently showed lineage admixture and cross-clade distribution across different analyses. 

Conclusion: The chloroplast genome exhibits significant limitations in resolving species identification and biogeographic history within the genus Pterocarya, and relying solely on its data may lead to systematic bias. Therefore, future studies on plant phylogenetics should integrate nuclear genomic data and employ multispecies coalescent methods to achieve more accurate species identification and evolutionary relationship reconstruction.

Key words: Pterocarya, chloroplast genome, phylogenetic analysis, biogeography