Biodiversity Science ›› 2019, Vol. 27 ›› Issue (5): 526-533.doi: 10.17520/biods.2018209

• Reviews • Previous Article     Next Article

DNA barcoding and emerging reference construction and data analysis technologies

Liu Shanlin()   

  1. Beijing Advanced Innovation Center for Food Nutrition and Human Health, College of Plant Protection, China Agricultural University, Beijing 100193
  • Received:2018-07-30 Accepted:2018-12-25 Online:2019-05-20
  • Liu Shanlin

DNA barcoding has been growing exponentially in terms of the number of barcode generated as well as its applications, e.g. as conservation tools in: species identification for damaged specimens, diet analysis from gut content and feces, biodiversity assessment from environmental DNA (eDNA), bulk arthropod samples or invertebrate-derived DNA (iDNA). These applications often require coupling with high throughput sequencing (HTS) technologies, and when done so are referred to as metabarcoding. Here, we discuss the methods used to generate reference barcodes using cost-efficient HTS platforms, and introduce several rules-of-thumb and some widely-used tools to conduct data quality control, denoising, and Operational Taxonomic Units (OTUs) clustering. We hope this review will help readers better understand how these emerging technologies can be implemented alongside existing technologies to accelerate biodiversity assessments in an accurate and efficient way.

Key words: DNA barcoding, OTUs, clustering, metabarcoding, high throughput sequencing

Table 1

Marker genes widely used for barcoding"

标记基因 Marker gene 目标物种 Targeted group 数据库 Database
16S 细菌和古细菌 Bacteria and archea (Sogin et al, 2006) 核糖体数据库项目 Ribosomal Database Project (RDP, Cole et al, 2008); Greengenes (DeSantis et al, 2006); SILVA (Pruesse et al, 2007)
ITS 真菌(Schoch et al, 2012)、植物(Group et al, 2011)、原生生物(Pawlowski et al, 2012)
Fungi (Schoch et al, 2012); plant (Group et al, 2011); protist (Pawlowski et al, 2012)
UNITE (K?ljalg et al, 2005); GenBank (Benson et al, 2012)
18S 原生生物 Protist (Pawlowski et al, 2012) SILVA (Pruesse et al, 2007)
matK + rbcL 植物 Plant (Hollingsworth et al, 2009) 生命条形码数据库 Barcode of Life Data Systems
(BOLD, Ratnasingham & Hebert, 2007); GenBank (Benson et al, 2012)
COI 动物群(Hebert et al, 2003)、原生生物(Pawlowski et al, 2012)
Fauna (Hebert et al, 2003) and protist (Pawlowski et al, 2012)
核糖体数据库项目 Ribosomal Database Project (RDP, Cole et al, 2008)

Table 2

High throughput methods to achieve barcode sequences"

Targeted region length (bp)
~300 - 无法处理较长的目标序列; Roche 454平台
Can not work on long fragments;
Roche 454 platform
Shokralla et al, 2014
~180 简单, 易操作, 成本低
Straightforward, easy to operate,
目标序列偏短, 只能用于物种初筛
Short targeted region; can only be used
for species pre-clustering
Meier et al, 2016
~650 标准DNA条形码全长
Standard full-length COI
普适性差; 需要多轮PCR过程
Poor universality; multiple rounds of PCR
Shokralla et al, 2015;
Cruaud et al, 2017
~650 易操作, 标准DNA条形码全长
Easy to operate, standard full-length COI
Relatively high requirement for
computational resources
Liu et al, 2017
~650 易操作, 标准DNA条形码全长
Easy to operate, standard full-length COI
High cost of SMRT platform
Hebert et al, 2018
~650 易操作, 标准DNA条形码全长
Easy to operate, standard full-length COI
Not a mass production
Yang et al, 2018

Fig. 1

Diagram of DNA barcode data analysis"

[1] Armstrong K, Ball S ( 2005) DNA barcodes for biosecurity: Invasive species identification. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 360, 1813-1823.
doi: 10.1098/rstb.2005.1713
[2] Baird DJ, Hajibabaei M ( 2012) Biomonitoring 2.0: A new paradigm in ecosystem assessment made possible by next-generation DNA sequencing. Molecular Ecology, 21, 2039-2044.
doi: 10.1111/j.1365-294X.2012.05519.x
[3] Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW ( 2012) GenBank. Nucleic Acids Research, 41, D36-D42.
doi: 10.1093/nar/gks1195
[4] Bohmann K, Evans A, Gilbert MTP, Carvalho GR, Creer S, Knapp M, Douglas WY, De Bruyn M ( 2014) Environmental DNA for wildlife biology and biodiversity monitoring. Trends in Ecology & Evolution, 29, 358-367.
[5] Bohmann K, Schnell IB, Gilbert MTP ( 2013) When bugs reveal biodiversity. Molecular Ecology, 22, 909-911.
doi: 10.1111/mec.2013.22.issue-4
[6] Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP ( 2016) DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods, 13, 581-583.
doi: 10.1038/nmeth.3869
[7] Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI ( 2010) QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7, 335-336.
doi: 10.1038/nmeth.f.303
[8] Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Lozupone CA, Turnbaugh PJ, Fierer N, Knight R ( 2011) Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proceedings of the National Academy of Sciences, USA, 108, 4516-4522.
doi: 10.1073/pnas.1000080107
[9] Chen Y, Chen Y, Shi C, Huang Z, Zhang Y, Li S, Li Y, Ye J, Yu C, Li Z ( 2017) SOAPnuke: A MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. GigaScience, 7, gix120.
[10] Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, Kulam-Syed-Mohideen A, McGarrell DM, Marsh T, Garrity GM ( 2008) The Ribosomal Database Project: Improved alignments and new tools for rRNA analysis. Nucleic Acids Research, 37, D141-D145.
[11] Cruaud P, Rasplus J-Y, Rodriguez LJ, Cruaud A ( 2017) High-throughput sequencing of multiple amplicons for barcoding and integrative taxonomy. Scientific Reports, 7, 41948.
doi: 10.1038/srep41948
[12] Deiner K, Bik HM, Mächler E, Seymour M, Lacoursière-Roussel A, Altermatt F, Creer S, Bista I, Lodge DM, Vere N ( 2017) Environmental DNA metabarcoding: Transforming how we survey animal and plant communities. Molecular Ecology, 26, 5872-5895.
doi: 10.1111/mec.2017.26.issue-21
[13] DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL ( 2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Applied and Environmental Microbiology 72, 5069-5072.
[14] Edgar RC ( 2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26, 2460-2461.
doi: 10.1093/bioinformatics/btq461
[15] Edgar RC ( 2013) UPARSE: Highly accurate OTU sequences from microbial amplicon reads. Nature Methods, 10, 996-998.
doi: 10.1038/nmeth.2604
[16] Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R ( 2011) UCHIME improves sensitivity and speed of chimera detection. Bioinformatics, 27, 2194-2200.
doi: 10.1093/bioinformatics/btr381
[17] Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B ( 2008) Real-time DNA sequencing from single polymerase molecules. Science, 323, 133-138.
[18] Eren AM, Morrison HG, Lescault PJ, Reveillaud J, Vineis JH, Sogin ML ( 2015) Minimum entropy decomposition: Unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences. The ISME Journal, 9, 968-979.
doi: 10.1038/ismej.2014.195
[19] Frøslev TG, Kjøller R, Bruun HH, Ejrnæs R, Brunbjerg AK, Pietroni C, Hansen AJ ( 2017) Algorithm for post-clustering curation of DNA amplicon data yields reliable biodiversity estimates. Nature Communications, 8, 1188.
doi: 10.1038/s41467-017-01312-x
[20] Group CPB, Li DZ, Gao LM, Li HT, Wang H, Ge XJ, Liu JQ, Chen ZD, Zhou SL, Chen SL ( 2011) Comparative analysis of a large dataset indicates that internal transcribed spacer (ITS) should be incorporated into the core barcode for seed plants. Proceedings of the National Academy of Sciences, USA, 108, 19641-19646.
doi: 10.1073/pnas.1104551108
[21] Hajibabaei M, Baird DJ, Fahner NA, Beiko R, Golding GB ( 2016) A new way to contemplate Darwin’s tangled bank: How DNA barcodes are reconnecting biodiversity science and biomonitoring. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 371, 20150330.
doi: 10.1098/rstb.2015.0330
[22] Hao X, Jiang R, Chen T ( 2011) Clustering 16S rRNA for OTU prediction: A method of unsupervised Bayesian clustering. Bioinformatics, 27, 611-618.
doi: 10.1093/bioinformatics/btq725
[23] Hebert PD, Braukmann TW, Prosser SW, Ratnasingham S, Ivanova NV, Janzen DH, Hallwachs W, Naik S, Sones JE, Zakharov EV ( 2018) A Sequel to Sanger: Amplicon sequencing that scales. BMC Genomics, 19, 219.
doi: 10.1186/s12864-018-4611-3
[24] Hebert PD, Cywinska A, Ball SL ( 2003) Biological identifications through DNA barcodes. Proceedings of the Royal Society of London B: Biological Sciences, 270, 313-321.
doi: 10.1098/rspb.2002.2218
[25] Hebert PD, Hollingsworth PM, Hajibabaei M ( 2016) From writing to reading the encyclopedia of life. Proceedings of the Royal Society of London B: Biological Sciences, 371, 20150321.
doi: 10.1098/rstb.2015.0321
[26] Hollingsworth PM, Forrest LL, Spouge JL, Hajibabaei M, Ratnasingham S, van der Bank M, Chase MW, Cowan RS, Erickson DL ( 2009) A DNA barcode for land plants. Proceedings of the National Academy of Sciences, USA, 106, 12794-12797.
doi: 10.1073/pnas.0905845106
[27] Jiang H, Lei R, Ding SW, Zhu S ( 2014) Skewer: A fast and accurate adapter trimmer for next-generation sequencing paired-end reads. BMC Bioinformatics, 15, 182.
doi: 10.1186/1471-2105-15-182
[28] Kõljalg U, Larsson KH, Abarenkov K, Nilsson RH, Alexander IJ, Eberhardt U, Erland S, Høiland K, Kjøller R, Larsson E ( 2005) UNITE: A database providing web-based methods for the molecular identification of ectomycorrhizal fungi. New Phytologist, 166, 1063-1068.
doi: 10.1111/j.1469-8137.2005.01376.x
[29] Kress WJ, Erickson DL ( 2012) DNA barcodes: Methods and protocols. In: DNA Barcodes (eds Kress WJ, Erickson DL), pp. 3-8.Humana Press, Totowa.
[30] Kunz TH, Whitaker JO Jr ( 1983) An evaluation of fecal analysis for determining food habits of insectivorous bats. Canadian Journal of Zoology, 61, 1317-1321.
doi: 10.1139/z83-177
[31] Liu S, Li Y, Lu J, Su X, Tang M, Zhang R, Zhou L, Zhou C, Yang Q, Ji Y ( 2013) SOAPBarcode: Revealing arthropod biodiversity through assembly of Illumina shotgun sequences of PCR amplicons. Methods in Ecology and Evolution, 4, 1142-1150.
doi: 10.1111/mee3.2013.4.issue-12
[32] Liu S, Wang X, Xie L, Tan M, Li Z, Su X, Zhang H, Misof B, Kjer KM, Tang M ( 2016) Mitochondrial capture enriches mito-DNA 100 fold, enabling PCR-free mitogenomics biodiversity analysis. Molecular Ecology Resources, 16, 470-479.
doi: 10.1111/men.2016.16.issue-2
[33] Liu S, Yang C, Zhou C, Zhou X ( 2017) Filling reference gaps via assembling DNA barcodes using high-throughput sequencing—Moving toward barcoding the world. GigaScience, 6, 1-8.
[34] Mahon AR, Jerde CL, Galaska M, Bergner JL, Chadderton WL, Lodge DM, Hunter ME, Nico LG ( 2013) Validation of eDNA surveillance sensitivity for detection of Asian carps in controlled and field experiments. PLoS ONE, 8, e58316.
doi: 10.1371/journal.pone.0058316
[35] Matias Rodrigues JF, von Mering C ( 2013) HPC-CLUST: Distributed hierarchical clustering for large sets of nucleotide sequences. Bioinformatics, 30, 287-288.
[36] Meier R, Wong W, Srivathsan A, Foo M ( 2016) $1 DNA barcodes for reconstructing complex phenomes and finding rare species in specimen-rich samples. Cladistics, 32, 100-110.
doi: 10.1111/cla.2016.32.issue-1
[37] Nilsson RH, Ryberg M, Abarenkov K, Sjökvist E, Kristiansson E ( 2009) The ITS region as a target for characterization of fungal communities using emerging sequencing technologies. FEMS Microbiology Letters, 296, 97-101.
doi: 10.1111/fml.2009.296.issue-1
[38] Pawlowski J, Audic S, Adl S, Bass D, Belbahri L, Berney C, Bowser SS, Cepicka I, Decelle J, Dunthorn M ( 2012) CBOL protist working group: Barcoding eukaryotic richness beyond the animal, plant, and fungal kingdoms. PLoS Biology, 10, e1001419.
doi: 10.1371/journal.pbio.1001419
[39] Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, Glöckner FO ( 2007) SILVA: A comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Research, 35, 7188-7196.
doi: 10.1093/nar/gkm864
[40] Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO ( 2012) The SILVA ribosomal RNA gene database project: Improved data processing and web-based tools. Nucleic Acids Research, 41, D590-D596.
doi: 10.1093/nar/gks1219
[41] Ratnasingham S, Hebert PD (2007)BOLD: The Barcode of Life Data System (.Molecular Ecology Notes, 7, 355-364.
[42] Rognes T, Flouri T, Nichols B, Quince C, Mahé F ( 2016) VSEARCH: A versatile open source tool for metagenomics. PeerJ, 4, e2584.
doi: 10.7717/peerj.2584
[43] Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ ( 2009) Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology, 75, 7537-7541.
doi: 10.1128/AEM.01541-09
[44] Schnell IB, Thomsen PF, Wilkinson N, Rasmussen M, Jensen LRD, Willerslev E, Bertelsen MF, Gilbert MTP ( 2012) Screening mammal biodiversity using DNA from leeches. Current Biology, 22, R262-R263.
doi: 10.1016/j.cub.2012.02.058
[45] Schoch CL, Seifert KA, Huhndorf S, Robert V, Spouge JL, Levesque CA, Chen W, Bolchacova E, Voigt K, Crous PW ( 2012) Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for fungi. Proceedings of the National Academy of Sciences, USA, 109, 6241-6246.
doi: 10.1073/pnas.1117018109
[46] Schubert M, Lindgreen S, Orlando L ( 2016) AdapterRemoval v2: Rapid adapter trimming, identification, and read merging. BMC Research Notes, 9, 88.
doi: 10.1186/s13104-016-1900-2
[47] Shi ZY, Yang CQ, Hao MD, Wang XY, Ward RD, Zhang AB ( 2018) FuzzyID2: A software package for large data set species identification via barcoding and metabarcoding using hidden Markov models and fuzzy set methods. Molecular Ecology Resources, 18, 666-675.
doi: 10.1111/men.2018.18.issue-3
[48] Shokralla S, Gibson JF, Nikbakht H, Janzen DH, Hallwachs W, Hajibabaei M ( 2014) Next-generation DNA barcoding: Using next-generation sequencing to enhance and accelerate DNA barcode capture from single specimens. Molecular Ecology Resources, 14, 892-901.
[49] Shokralla S, Porter TM, Gibson JF, Dobosz R, Janzen DH, Hallwachs W, Golding GB, Hajibabaei M ( 2015) Massively parallel multiplex DNA sequencing for specimen identification using an Illumina MiSeq platform. Scientific Reports, 5, 9687.
doi: 10.1038/srep09687
[50] Sogin ML, Morrison HG, Huber JA, Welch DM, Huse SM, Neal PR, Arrieta JM, Herndl GJ ( 2006) Microbial diversity in the deep sea and the underexplored “rare biosphere”. Proceedings of the National Academy of Sciences, USA, 103, 12115-12120.
doi: 10.1073/pnas.0605127103
[51] Taberlet P, Coissac E, Hajibabaei M, Rieseberg LH ( 2012) Environmental DNA. Molecular Ecology, 21, 1789-1793.
doi: 10.1111/j.1365-294X.2012.05542.x
[52] Tang M, Hardman CJ, Ji Y, Meng G, Liu S, Tan M, Yang S, Moss ED, Wang J, Yang C ( 2015) High-throughput monitoring of wild bee diversity and abundance via mitogenomics. Methods in Ecology and Evolution, 6, 1034-1043.
doi: 10.1111/2041-210X.12416
[53] Turner CR, Miller DJ, Coyne KJ, Corush J ( 2014) Improved methods for capture, extraction, and quantitative assay of environmental DNA from Asian bigheaded carp (Hypophthalmichthys spp.). PLoS ONE, 9, e114329.
doi: 10.1371/journal.pone.0114329
[54] Yang C, Tan S, Meng G, Bourne DG, O’Brien PA, Xu J, Liao S, Chen A, Chen X, Liu S ( 2018) Access COI barcode efficiently using high throughput Single End 400 bp sequencing. bioRxiv, doi: 10.1101/498618 .
[55] Yu DW, Ji YQ, Emerson BC, Wang XY, Ye CX, Yang CY, Ding ZL ( 2012) Biodiversity soup: Metabarcoding of arthropods for rapid biodiversity assessment and biomonitoring. Methods in Ecology and Evolution, 3, 613-623.
doi: 10.1111/mee3.2012.3.issue-4
[56] Zhang J, Kapli P, Pavlidis P, Stamatakis A ( 2013) A general species delimitation method with applications to phylogenetic placements. Bioinformatics, 29, 2869-2876.
doi: 10.1093/bioinformatics/btt499
[57] Zhou X, Li Y, Liu S, Yang Q, Su X, Zhou L, Tang M, Fu R, Li J, Huang Q ( 2013) Ultra-deep sequencing enables high-fidelity recovery of biodiversity for bulk arthropod samples without PCR amplification. GigaScience, 2, 4.
doi: 10.1186/2047-217X-2-4
[1] Jun Liu, Ning Wang, Daizong Cui, Lei Lu, Min Zhao. (2019) Community structure and diversity of soil bacteria in different habitats of Da Liangzihe National Forest Park in the Lesser Khinggan Mountains . Biodiv Sci, 27(8): 911-918.
[2] Li Meng, Wei Tingting, Shi Boyang, Hao Xiyang, Xu Haigen, Sun Hongying. (2019) Biodiversity monitoring of freshwater benthic macroinvertebrates using environmental DNA . Biodiv Sci, 27(5): 480-490.
[3] Shao Xinning, Song Dazhao, Huang Qiaowen, Li Sheng, Yao Meng. (2019) Fast surveys and molecular diet analysis of carnivores based on fecal DNA and metabarcoding . Biodiv Sci, 27(5): 543-556.
[4] Li Hanxi, Huang Xuena, Li Shiguo, Zhan Aibin. (2019) Environmental DNA (eDNA)-metabarcoding-based early monitoring and warning for invasive species in aquatic ecosystems . Biodiv Sci, 27(5): 491-504.
[5] Hu Jianlin,Liu Zhifang,Ci Xiuqin,Li Jie. (2019) Use of DNA Barcoding in Identifying Tropical Trees from Dipterocarpaceae . Chin Bull Bot, 54(3): 350-359.
[6] Dandan Lang,Min Tang,Xin Zhou. (2018) Qualitative and quantitative molecular construction of plant-pollinator network: Application and prospective . Biodiv Sci, 26(5): 445-456.
[7] Hou Qinxi, Ci Xiuqin, Liu Zhifang, Xu Wumei, Li Jie. (2018) Assessment of the evolutionary history of Lauraceae in Xishuangbanna National Nature Reserve using DNA barcoding . Biodiv Sci, 26(3): 217-228.
[8] Jinfeng Hao, Xiaohong Zhang, Yusong Wang, Jinlin Liu, Yongchao Zhi, Xinjiang Li. (2017) Diversity investigation and application of DNA barcoding of Acridoidea from Baiyangdian Wetland . Biodiv Sci, 25(4): 409-417.
[9] Xiuqin Ci,Jie Li. (2017) Phylogenetic diversity and its application in floristics and biodiversity conservation . Biodiv Sci, 25(2): 175-181.
[10] Erhu Gao, Jiekun He, Zhichen Wang, Yang Xu, Xiaoping Tang, Haisheng Jiang. (2017) China’s zoogeographical regionalization based on terrestrial vertebrates . Biodiv Sci, 25(12): 1321-1330.
[11] Ya’nan Wei, Xiaomei Wang, Pengcheng Yao, Xiaoyong Chen, Hongqing Li. (2017) Comparison of species resolution rates of DNA barcoding for Chinese coastal halo-tolerant plants . Biodiv Sci, 25(10): 1095-1104.
[12] Jing Zhang,Yuan Li,Na Song,Longshan Lin,Tianxiang Gao. (2016) Species identification and phylogenetic relationship of Thryssa species in the coastal waters of China . Biodiv Sci, 24(8): 888-895.
[13] Yun Cao,Wenjing Shen,Lian Chen,Feilong Hu,Lei Zhou,Haigen Xu. (2016) Application of metabarcoding technology in studies of fungal diversity . Biodiv Sci, 24(8): 932-939.
[14] Chengye Hu,Yuyue Shui,Kuo Tian,Liang Li,Hulin Qin,Chuncao Zhang,Mengmeng Ji,Bonian Shui. (2016) Functional group classification and niche identification of major fish species in the Qixing Islands Marine Reserve, Zhejiang Province . Biodiv Sci, 24(2): 175-184.
[15] Qian Jin,Fen Chen,Guijie Luo,Weijia Cai,Xu Liu,Hao Wang,Caiqing Yang,Mengdi Hao,Aibing Zhang. (2016) Estimation of species richness of moths (Insecta: Lepidoptera) based on DNA barcoding in Suqian, China . Biodiv Sci, 24(11): 1296-1305.
Full text