生物多样性

• • 上一篇    下一篇

基于环境DNA宏条形码的无脊椎动物多样性研究:生物信息学流程比较与评估

闫姿伶1,2, 陈晓宇2, 姚蒙1,2*   

  1. 1. 北京大学城市与环境学院生态研究中心, 北京 100871, 中国; 2. 北京大学生命科学学院基因功能研究与操控全国重点实验室, 北京 100871, 中国
  • 收稿日期:2025-09-15 修回日期:2025-12-09
  • 通讯作者: 姚蒙

A comparative evaluation of bioinformatic pipelines for invertebrate biodiversity profiling via environmental DNA metabarcoding

Ziling Yan1,2, Xiaoyu Chen2, Meng Yao1,2*   

  1. 1 Institute of Ecology, College of Urban and Environmental Sciences, Peking University, Beijing 100871, China 

    2 State Key Laboratory of Gene Function and Modulation Research, School of Life Sciences, Peking University, Beijing 100871, China

  • Received:2025-09-15 Revised:2025-12-09
  • Contact: Meng Yao

摘要: 近年来,环境DNA(eDNA)宏条形码技术被广泛应用于生物多样性研究,但该技术在蓬勃发展的同时仍存在一些方法学问题有待解决。其中一个重要问题是生物信息学处理流程的选择,尤其是对物种多样性极高的无脊椎动物,测序结果的处理流程直接影响检测结果,但目前缺乏对该过程的系统比较评估。本研究使用来源于淡水的eDNA样品进行无脊椎动物宏条形码测序,比较评估多种生物信息学流程对于无脊椎动物序列处理的影响。研究中选取4种常用的聚类/降噪方法(UPARSE、Swarm、UNOISE和DADA2)以及3种分类分配方法(BOLDigger、BLASTN和朴素贝叶斯分类器),共组合形成12种生物信息学处理流程。结果显示,DADA2降噪方法与BOLDigger分类分配相结合的处理流程产生了最多的无脊椎动物分子可操作分类单元(MOTU)与最高的分类覆盖度和分类分辨率。4种聚类/降噪方法中,UNOISE和DADA2降噪方法比UPARSE和Swarm聚类方法获得了更多的无脊椎动物MOTU;3种分类分配方法中,BOLDigger和BLASTN相比朴素贝叶斯分类器获得了更高的分类覆盖度和分类分辨率。这些结果对基于eDNA的淡水无脊椎动物多样性研究具有重要的参考价值,此外还提示针对不同研究类群以及不同条形码区段,需要相应调整使用的生物信息学方法,以得到更为准确可靠的生物多样性数据。

关键词: 环境DNA, 无脊椎动物多样性, 生物信息学流程, 聚类, 降噪, 分类分配

Abstract

Aims: Environmental DNA (eDNA) technology has been increasingly applied in biodiversity research. However, its rapid development has also sparked methodological debates. A key issue involves the selection of bioinformatic pipelines, particularly for extremely biodiverse taxa such as invertebrates. Bioinformatic pipelines significantly affect eDNA-based biodiversity profiles, yet a systematic comparative evaluation of relevant pipelines is currently lacking. Therefore, this study aims to compare and evaluate bioinformatic pipelines commonly used for analyzing eDNA-derived invertebrate sequencing data. 

Method: Invertebrate metabarcoding sequencing was carried out on freshwater eDNA samples, and the performance of various bioinformatic pipelines in processing invertebrate sequences was comparatively assessed. Four commonly used clustering or denoising methods (UPARSE, Swarm, UNOISE, and DADA2) and three taxonomic assignment methods (BOLDigger, BLASTN, and Naïve Bayesian Classifier) were selected, together constituting 12 bioinformatic pipelines. 

Results: Of the 12 evaluated pipelines, the combination of DADA2 denoising and BOLDigger taxonomic assignment yielded the largest number of invertebrate molecular operational taxonomic units (MOTUs), along with the highest levels of taxonomic coverage and resolution. Among the four clustering or denoising methods, UNOISE and DADA2 denoising yielded more invertebrate MOTUs than UPARSE and Swarm clustering. Among the three taxonomic assignment methods, BOLDigger and BLASTN yielded higher taxonomic coverage and resolution than Naïve Bayesian Classifier. 

Conclusion: These findings have significant implications for eDNA-based research of freshwater invertebrate biodiversity. Furthermore, our results suggest that bioinformatic pipelines should be adjusted according to different study taxa and barcode regions to obtain accurate and reliable biodiversity data.

Key words: environmental DNA, invertebrate biodiversity, bioinformatic pipeline, clustering, denoising, taxonomic assignment