Biodiv Sci ›› 2019, Vol. 27 ›› Issue (5): 567-575. DOI: 10.17520/biods.2018211
• Methodology • Previous Articles Next Articles
Li Yiyuan()*,C. Molik David,E. Pfrender Michael
Received:
2018-08-01
Accepted:
2019-03-05
Online:
2019-05-20
Published:
2019-05-20
Li Yiyuan, C. Molik David, E. Pfrender Michael. EPPS, a metabarcoding bioinformatics pipeline using Nextflow[J]. Biodiv Sci, 2019, 27(5): 567-575.
Fig. 2 The timeline chart of EPPS. The x-axis is the amount of time for each process in seconds. Each row indicates the name of different stages of the analysis. filter, Data filtering; demultiplex, Primer removal and demultiplex if there are multiple primers; merge, Merging of forward and reverse reads; otu_clustering and map, OTU clustering and mapping of reads; plot, Plotting PCA plot. As there are four samples in the testing data set, there are four processes (1 to 4) for filter, demultiplex, merge, and map steps. Each bar indicates the time for each process. The dark area in each bar represents the real execution time. Each bar displays two numbers: the task duration time and the virtual memory size peak.
Fig. 3 PCA plot based on testing data. Each dot in the figure represents a test sample. The distance between dots indicates the dissimilarity between samples. For example, the similarity between test3 and test4 is higher than test1 and test2.
Fig. 4 EPPS result based on public data set. Samples are named from 1 to 8 from upstream to downstream. The suffix “a”, “b” and “c” indicate three replicates of the same sampling location. Based on the PCA, the most upstream sample (Location 1) has unique fish composition. Location 3-6 have similar fish composition. The downstream samples (Location 7-8) share similar fish composition.
[1] |
Bazinet AL, Cummings MP ( 2012) A comparative evaluation of sequence classification programs. BMC Bioinformatics, 13, 92.
DOI |
[2] |
Berger SA, Krompass D, Stamatakis A ( 2011) Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Systematic Biology, 60, 291-302.
DOI URL |
[3] | Bik HM, Interactive Pitch Inc . ( 2014) Phinch: An interactive, exploratory data visualization framework for-Omic datasets. bioRxiv, 009944. |
[4] |
Bista I, Carvalho GR, Tang M, Walsh K, Zhou X, Hajibabaei M, Shokralla S, Seymour M, Bradley D, Liu S, Christmas M ( 2018) Performance of amplicon and shotgun sequencing for accurate biomass estimation in invertebrate community samples. Molecular Ecology Resources, 18, 1020-1034.
DOI URL |
[5] | Bohmann K, Evans A, Gilbert MT, Carvalho GR, Creer S, Knapp M, Douglas WY, De Bruyn M ( 2014) Environmental DNA for wildlife biology and biodiversity monitoring. Trends in Ecology & Evolution, 29, 358-367. |
[6] |
Bolger AM, Lohse M, Usadel B ( 2014) Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics, 30, 2114-2120.
DOI URL |
[7] |
Boyer F, Mercier C, Bonin A, Le Bras Y, Taberlet P, Coissac E ( 2016) obitools: A unix-inspired software package for DNA metabarcoding. Molecular Ecology Resources, 16, 176-182.
DOI URL |
[8] |
Brady A, Salzberg SL ( 2009) Phymm and PhymmBL: Metagenomic phylogenetic classification with interpolated Markov models. Nature Methods, 6, 673.
DOI |
[9] |
Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJ, Holmes SP ( 2016) DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods, 13, 581.
DOI |
[10] |
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL ( 2009) BLAST+: Architecture and applications. BMC Bioinformatics, 10, 421.
DOI URL |
[11] |
Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Lozupone CA, Turnbaugh PJ, Fierer N, Knight R ( 2011) Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proceedings of the National Academy of Sciences, USA, 108, 4516-4522.
DOI URL |
[12] |
Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI, Huttley GA ( 2010) QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7, 335.
DOI |
[13] |
Cardoso P, Borges PA, Veech JA ( 2009) Testing the performance of beta diversity measures based on incidence data: The robustness to undersampling. Diversity and Distributions, 15, 1081-1090.
DOI URL |
[14] |
Collen B, Whitton F, Dyer EE, Baillie JE, Cumberlidge N, Darwall WR, Pollock C, Richman NI, Soulsby AM, Böhm M ( 2014) Global patterns of freshwater species diversity, threat and endemism. Global Ecology and Biogeography, 23, 40-51.
DOI URL |
[15] |
Crampton-Platt A, Timmermans MJ, Gimmel ML, Kutty SN, Cockerill TD, Vun Khen C, Vogler AP ( 2015) Soup to tree: The phylogeny of beetles inferred by mitochondrial metagenomics of a Bornean rainforest sample. Molecular Biology and Evolution, 32, 2302-2316.
DOI URL |
[16] |
Crampton-Platt A, Douglas WY, Zhou X, Vogler AP ( 2016) Mitochondrial metagenomics: Letting the genes out of the bottle. GigaScience, 5, 15.
DOI URL |
[17] |
Deiner K, Bik HM, Mächler E, Seymour M, Lacoursière- Roussel A, Altermatt F, Creer S, Bista I, Lodge DM, de Vere N, Pfrender ME ( 2017 a) Environmental DNA metabarcoding: Transforming how we survey animal and plant communities. Molecular Ecology, 26, 5872-5895.
DOI URL |
[18] |
Deiner K, Renshaw MA, Li Y, Olds BP, Lodge DM, Pfrender ME ( 2017 b) Long-range PCR allows sequencing of mitochondrial genomes from environmental DNA. Methods in Ecology and Evolution, 8, 1888-1898.
DOI URL |
[19] |
Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C ( 2017) Nextflow enables reproducible computational workflows. Nature Biotechnology, 35, 316.
DOI |
[20] |
Dowle EJ, Pochon X, Banks JC, Shearer K, Wood SA ( 2016) Targeted gene enrichment and high-throughput sequencing for environmental biomonitoring: A case study using freshwater macroinvertebrates. Molecular Ecology Resources, 16, 1240-1254.
DOI URL |
[21] | Edgar RC ( 2016) SINTAX: A simple non-Bayesian taxonomy classifier for 16S and ITS sequences. bioRxiv, 074161. |
[22] |
Edgar RC ( 2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26, 2460-2461.
DOI URL |
[23] |
Edgar RC ( 2013) UPARSE: Highly accurate OTU sequences from microbial amplicon reads. Nature Methods, 10, 996.
DOI |
[24] |
Evans NT, Li Y, Renshaw MA, Olds BP, Deiner K, Turner CR, Jerde CL, Lodge DM, Lamberti GA, Pfrender ME ( 2017) Fish community assessment with eDNA metabarcoding: Effects of sampling design and bioinformatic filtering. Canadian Journal of Fisheries and Aquatic Sciences, 74, 1362-1374.
DOI URL |
[25] |
Evans NT, Olds BP, Renshaw MA, Turner CR, Li Y, Jerde CL, Mahon AR, Pfrender ME, Lamberti GA, Lodge DM ( 2016) Quantification of mesocosm fish and amphibian species diversity via environmental DNA metabarcoding. Molecular Ecology Resources, 16, 29-41.
DOI URL |
[26] |
Gerlach W, Stoye J ( 2011) Taxonomic classification of metagenomic shotgun sequences with CARMA3. Nucleic Acids Research, 39, e91.
DOI URL |
[27] |
Huson DH, Auch AF, Qi J, Schuster SC ( 2007) MEGAN analysis of metagenomic data. Genome Research, 17, 377-386.
DOI URL |
[28] |
Ji Y, Ashton L, Pedley SM, Edwards DP, Tang Y, Nakamura A, Kitching R, Dolman PM, Woodcock P, Edwards FA, Larsen TH ( 2013) Reliable, verifiable and efficient monitoring of biodiversity via metabarcoding. Ecology Letters, 16, 1245-1257.
DOI URL |
[29] |
Li Y, Evans NT, Renshaw MA, Jerde CL, Olds BP, Shogren AJ, Deiner K, Lodge DM, Lamberti GA, Pfrender ME ( 2018) Estimating fish alpha- and beta-diversity along a small stream with environmental DNA metabarcoding. Metabarcoding and Metagenomics, 2, e24262.
DOI URL |
[30] | Liu B, Gibbons T, Ghodsi M, Pop M ( 2010) MetaPhyler: Taxonomic profiling for metagenomic sequences. In: Bioinformatics and Biomedicine (BIBM), 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 95-100. |
[31] |
Liu S, Wang X, Xie L, Tan M, Li Z, Su X, Zhang H, Misof B, Kjer KM, Tang M, Niehuis O ( 2016) Mitochondrial capture enriches mito-DNA 100 fold, enabling PCR-free mitogenomics biodiversity analysis. Molecular Ecology Resources, 16, 470-479.
DOI URL |
[32] |
Liu S, Li Y, Lu J, Su X, Tang M, Zhang R, Zhou L, Zhou C, Yang Q, Ji Y, Yu DW ( 2013) SOAPBarcode: Revealing arthropod biodiversity through assembly of Illumina shotgun sequences of PCR amplicons. Methods in Ecology and Evolution, 4, 1142-1150.
DOI URL |
[33] |
Lodge DM, Turner CR, Jerde CL, Barnes MA, Chadderton L, Egan SP, Feder JL, Mahon AR, Pfrender ME ( 2012) Conservation in a cup of water: Estimating biodiversity and population abundance from environmental DNA. Molecular Ecology, 21, 2555-2558.
DOI URL |
[34] | Martin M ( 2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. journal, 17, 10-12. |
[35] |
Masella AP, Bartram AK, Truszkowski JM, Brown DG, Neufeld JD ( 2012) PANDAseq: Paired-end assembler for Illumina sequences. BMC Bioinformatics, 13, 31.
DOI URL |
[36] |
Matsen FA, Kodner RB, Armbrust EV ( 2010) pplacer: Linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics, 11, 538.
DOI |
[37] | Millennium Ecosystem Assessment ( 2005) Ecosystem and Human Well-being: Biodiversity Synthesis. World Resources Institute, Washington, DC. |
[38] |
Munch K, Boomsma W, Huelsenbeck JP, Willerslev E, Nielsen R ( 2008) Statistical assignment of DNA sequences using Bayesian phylogenetics. Systematic Biology, 57, 750-757.
DOI URL |
[39] |
Newbold T, Hudson LN, Hill SL, Contu S, Lysenko I, Senior RA, Börger L, Bennett DJ, Choimes A, Collen B, Day J ( 2015) Global effects of land use on local terrestrial biodiversity. Nature, 520, 45.
DOI |
[40] | Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’hara RB, Simpson GL, Solymos P, Stevens MH, Wagner H ( 2013) Package ‘vegan’. Community Ecology Package, version. 2. ( accessed on 2018-08-01) |
[41] |
Olds BP, Jerde CL, Renshaw MA, Li Y, Evans NT, Turner CR, Deiner K, Mahon AR, Brueseke MA, Shirey PD, Pfrender ME ( 2016) Estimating species richness using environmental DNA. Ecology and Evolution, 6, 4214-4226.
DOI URL |
[42] |
Patil KR, Roune L, McHardy AC ( 2012) The PhyloPythiaS web server for taxonomic assignment of metagenome sequences. PLoS ONE, 7, e38581.
DOI URL |
[43] |
Pfrender M, Hawkins C, Bagley M, Courtney G, Creutzburg B, Epler J, Fend S, Ferrington L Jr, Hartzell P, Jackson S, Larsen D ( 2010) Assessing macroinvertebrate biodiversity in freshwater ecosystems: Advances and challenges in DNA-based approaches. The Quarterly Review of Biology, 85, 319-340.
DOI URL |
[44] |
Pimm SL, Jenkins CN, Abell R, Brooks TM, Gittleman JL, Joppa LN, Raven PH, Roberts CM, Sexton JO ( 2014) The biodiversity of species and their rates of extinction, distribution, and protection. Science, 344, 1246752.
DOI URL |
[45] |
Piro VC, Matschkowski M, Renard BY ( 2017) MetaMeta: Integrating metagenome analysis tools to improve taxonomic profiling. Microbiome, 5, 101.
DOI URL |
[46] |
Price MN, Dehal PS, Arkin AP ( 2009) FastTree: Computing large minimum evolution trees with profiles instead of a distance matrix. Molecular Biology and Evolution, 26, 1641-1650.
DOI URL |
[47] | R Core Team ( 2016) R: A Language and Environment for Statistical Computing. https://www.R-project.org/. ( accessed on 2018-08-01) |
[48] |
Rognes T, Flouri T, Nichols B, Quince C, Mahé F ( 2016) VSEARCH: A versatile open source tool for metagenomics. PeerJ, 4, e2584.
DOI URL |
[49] | Rosen GL, Reichenberger ER, Rosenfeld AM ( 2010) NBC: The Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics, 27, 127-129. |
[50] |
Sato Y, Miya M, Fukunaga T, Sado T, Iwasaki W ( 2018) MitoFish and MiFish pipeline: A mitochondrial genome database of fish with an analysis pipeline for environmental DNA metabarcoding. Molecular Biology and Evolution, 35, 1553-1555.
DOI URL |
[51] |
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW ( 2009) Introducing Mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology, 75, 7537-7541.
DOI URL |
[52] | Simon TP, Evans NT ( 2017) Environmental quality assessment using stream fishes. In: Methods in Stream Ecology, 3rd edn. (eds Hauer FR, Lamberti G), pp. 319-334. Elsevier, London. |
[53] | Slowikowski K ( 2018) ggrepel: Automatically Position Non- Overlapping Text Labels with ‘ggplot2’. https://CRAN.R- project.org/package=ggrepel. ( accessed on 2018-08-01) |
[54] |
Taberlet P, Coissac E, Hajibabaei M, Rieseberg LH ( 2012) Environmental DNA. Molecular Ecology, 21, 1789-1793.
DOI URL |
[55] |
Tang M, Hardman CJ, Ji Y, Meng G, Liu S, Tan M, Yang S, Moss ED, Wang J, Yang C, Bruce C ( 2015) High-throughput monitoring of wild bee diversity and abundance via mitogenomics. Methods in Ecology and Evolution, 6, 1034-1043.
DOI URL |
[56] |
Thomsen PF, Kielgast JO, Iversen LL, Wiuf C, Rasmussen M, Gilbert MT, Orlando L, Willerslev E ( 2012) Monitoring endangered freshwater biodiversity using environmental DNA. Molecular Ecology, 21, 2565-2573.
DOI URL |
[57] |
Thomsen PF, Willerslev E ( 2015) Environmental DNA—An emerging tool in conservation for monitoring past and present biodiversity. Biological Conservation, 183, 4-18.
DOI URL |
[58] | Uritskiy GV, DiRuggiero J, Taylor J ( 2018) MetaWRAP—A flexible pipeline for genome-resolved metagenomic data analysis. bioRxiv, 277442. |
[59] | Visconti A, Martin TC, Falchi M ( 2018) YAMP: A containerised workflow enabling reproducibility in metagenomics research. GigaScience, 7, giy072. |
[60] |
Wang Q, Garrity GM, Tiedje JM, Cole JR ( 2007) Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology, 73, 5261-5267.
DOI URL |
[61] | Wickham H ( 2016) ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag, New York. 2016) ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag, New York. |
[62] |
Wilcox TM, Zarn KE, Piggott MP, Young MK, McKelvey KS, Schwartz MK ( 2018) Capture enrichment of aquatic environmental DNA: A first proof of concept. Molecular Ecology Resources, 18, 1392-1401.
DOI URL |
[63] |
Worm B, Barbier EB, Beaumont N, Duffy JE, Folke C, Halpern BS, Jackson JB, Lotze HK, Micheli F, Palumbi SR, Sala E ( 2006) Impacts of biodiversity loss on ocean ecosystem services. Science, 314, 787-790.
DOI URL |
[64] |
Zhou HW, Li DF, Tam NF, Jiang XT, Zhang H, Sheng HF, Qin J, Liu X, Zou F ( 2011) BIPES, a cost-effective high-throughput method for assessing microbial diversity. The ISME Journal, 5, 741.
DOI |
[65] |
Zhou X, Li Y, Liu S, Yang Q, Su X, Zhou L, Tang M, Fu R, Li J, Huang Q ( 2013) Ultra-deep sequencing enables high-fidelity recovery of biodiversity for bulk arthropod samples without PCR amplification. GigaScience, 2, 4.
DOI URL |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
Copyright © 2022 Biodiversity Science
Editorial Office of Biodiversity Science, 20 Nanxincun, Xiangshan, Beijing 100093, China
Tel: 010-62836137, 62836665 E-mail: biodiversity@ibcas.ac.cn