Biodiversity Science ›› 2019, Vol. 27 ›› Issue (5): 526-533.doi: 10.17520/biods.2018209

DNA barcoding and emerging reference construction and data analysis technologies

Liu Shanlin()   

  1. Beijing Advanced Innovation Center for Food Nutrition and Human Health, College of Plant Protection, China Agricultural University, Beijing 100193
  • Received:2018-07-30 Accepted:2018-12-25 Online:2019-05-20
DNA barcoding has been growing exponentially in terms of the number of barcode generated as well as its applications, e.g. as conservation tools in: species identification for damaged specimens, diet analysis from gut content and feces, biodiversity assessment from environmental DNA (eDNA), bulk arthropod samples or invertebrate-derived DNA (iDNA). These applications often require coupling with high throughput sequencing (HTS) technologies, and when done so are referred to as metabarcoding. Here, we discuss the methods used to generate reference barcodes using cost-efficient HTS platforms, and introduce several rules-of-thumb and some widely-used tools to conduct data quality control, denoising, and Operational Taxonomic Units (OTUs) clustering. We hope this review will help readers better understand how these emerging technologies can be implemented alongside existing technologies to accelerate biodiversity assessments in an accurate and efficient way.

Key words: DNA barcoding, OTUs, clustering, metabarcoding, high throughput sequencing

Table 1

Marker genes widely used for barcoding"

标记基因 Marker gene 目标物种 Targeted group 数据库 Database
16S 细菌和古细菌 Bacteria and archea (Sogin et al, 2006) 核糖体数据库项目 Ribosomal Database Project (RDP, Cole et al, 2008); Greengenes (DeSantis et al, 2006); SILVA (Pruesse et al, 2007)
ITS 真菌(Schoch et al, 2012)、植物(Group et al, 2011)、原生生物(Pawlowski et al, 2012)
Fungi (Schoch et al, 2012); plant (Group et al, 2011); protist (Pawlowski et al, 2012)
UNITE (K?ljalg et al, 2005); GenBank (Benson et al, 2012)
18S 原生生物 Protist (Pawlowski et al, 2012) SILVA (Pruesse et al, 2007)
matK + rbcL 植物 Plant (Hollingsworth et al, 2009) 生命条形码数据库 Barcode of Life Data Systems
(BOLD, Ratnasingham & Hebert, 2007); GenBank (Benson et al, 2012)
COI 动物群(Hebert et al, 2003)、原生生物(Pawlowski et al, 2012)
Fauna (Hebert et al, 2003) and protist (Pawlowski et al, 2012)
核糖体数据库项目 Ribosomal Database Project (RDP, Cole et al, 2008)

Table 2

High throughput methods to achieve barcode sequences"

Targeted region length (bp)
~300 - 无法处理较长的目标序列; Roche 454平台
Can not work on long fragments;
Roche 454 platform
Shokralla et al, 2014
~180 简单, 易操作, 成本低
Straightforward, easy to operate,
目标序列偏短, 只能用于物种初筛
Short targeted region; can only be used
for species pre-clustering
Meier et al, 2016
~650 标准DNA条形码全长
Standard full-length COI
普适性差; 需要多轮PCR过程
Poor universality; multiple rounds of PCR
Shokralla et al, 2015;
Cruaud et al, 2017
~650 易操作, 标准DNA条形码全长
Easy to operate, standard full-length COI
Relatively high requirement for
computational resources
Liu et al, 2017
~650 易操作, 标准DNA条形码全长
Easy to operate, standard full-length COI
High cost of SMRT platform
Hebert et al, 2018
~650 易操作, 标准DNA条形码全长
Easy to operate, standard full-length COI
Not a mass production
Yang et al, 2018

Fig. 1

Diagram of DNA barcode data analysis"

