生物多样性 ›› 2019, Vol. 27 ›› Issue (5): 567-575.  DOI: 10.17520/biods.2018211

• 方法 • 上一篇    下一篇

基于Nextflow构建的宏条形码自动化分析流程EPPS

李诣远*(),DavidC.Molik,MichaelE.Pfrender   

  1. (Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46554, USA)
  • 收稿日期:2018-08-01 接受日期:2019-03-05 出版日期:2019-05-20 发布日期:2019-05-20

EPPS, a metabarcoding bioinformatics pipeline using Nextflow

Li Yiyuan()*,C. Molik David,E. Pfrender Michael   

  1. Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46554, USA)Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46554, USA
  • Received:2018-08-01 Accepted:2019-03-05 Online:2019-05-20 Published:2019-05-20

摘要:

基于宏条形码技术的物种快速检测有助于生物多样性的评估、预测和保护。本文介绍了常用宏条形码分析的步骤和参数设定方法。我们利用Nextflow搭建了一款宏条形码分析流程EPPS, 可以自动化地运行从原始数据的质量控制到环境多样性的比较。Nextflow软件还拥有流程监控的功能, 可视化输出每个进程所消耗的时间与内存。本文还使用测试数据和已发表数据证明该平台能够有效地分析宏条形码数据并可靠地分析环境生物多样性的相似性。

关键词: 环境DNA, USEARCH, Trimmomatic, 主成分分析

Abstract

Metabarcoding helps to quickly assess biodiversity. In this study, we discuss popular metabarcoding analytical tools and parameter settings. We also develop a metabarcoding bioinformatics pipeline, EPPS, to process data from quality control of raw reads to biodiversity comparisons between samples using a pipeline building program, Nextflow. The EPPS pipeline can summarize the time and memory cost of each process in the pipeline. We also apply the pipeline on a test dataset and a public dataset from a previous study. The result suggests that this pipeline can reliably analyze metabarcoding data and facilitate pipeline sharing of metabarcoding studies.

Key words: environmental DNA, USEARCH, Trimmomatic, principal component analysis