Biodiversity Science ›› 2019, Vol. 27 ›› Issue (5): 534-542.doi: 10.17520/biods.2018201

Analysis of prospective microbiology research using third-generation sequencing technology

Xu Yakun1, 2, Ma Yue1, 2, Hu Xiaoxi1, Wang Jun1, *()   

  1. 1 Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101
    2 University of Chinese Academy of Sciences, Beijing 100049
  • Received:2018-07-30 Accepted:2018-12-25 Online:2019-05-20
  • Wang Jun

Microbes are ubiquitous in human life. In years past, the study of microbes has only focused on single-bacteria cultures and qualitative analyses. The development of sequencing technology has greatly enhanced progress in microbiology research and more and more evidence shows that human symbiotic microbes, especially intestinal microbes, are closely related to human health. Second-generation sequencing technology is currently mainstream in microbiology research because of its high throughput, high accuracy and low cost. However, with the deepening complexity of research, the disadvantages of second-generation technology, i.e. short read length (< 450 bp), lead to subsequent challenges in data analysis and genome assembly, and limit the applicability to future research. In this context, the third-generation sequencing technology comes into being. The third-generation of sequencing (TGS) technology is also called single molecule sequencing. It directly carries out real-time sequencing of single DNA molecules without PCR amplifications. TGS technology significantly increases read length up to 2-10 kb or even 2.2 Mb. Because of its advantages of long read and no preference for GC, TGS provides a new method for full-length gene sequencing that facilitates the assembly of complete and reliable genome maps in microbes and that further reveals the diversity of microbial structures and functions. This review summarizes the technical characteristics and principles of TGS, and then mainly analyzes its applications and progress in 16S/18S rRNA gene sequencing, complete bacterial genome mapping and metagenomics research.

Key words: microbes, third-generation sequencing, 16S/18S rRNA, metagenomics

Fig. 1

Schematic diagram of PacBio single molecule real-time sequencing. (a) In the ZMW hole, a single DNA molecule template combined with primers and polymerase is bind to the bottom of ZMW hole. At the beginning of DNA sequencing, the newly added fluorescent labeled dNTP remained at the bottom of ZMW for a long time due to base pairing, and the corresponding fluorescent signals were recorded by confocal microscopy in real time. (b) (1) Fluorescence labeling cytosine deoxynucleotides; (2) Cytosine deoxynucleotides entering DNA chain pairing, emitting fluorescent signals; (3) Fluorescent group is removed by DNA polymerase, fluorescence disappeared; (4) Label new deoxynucleotides; (5) Continue a new round."

Fig. 2

Nanopore DNA sequencing using electronic signals as detection methods. The diameter of the nanoscale is very small that only a single DNA molecule is allowed to pass through. When a single strand of DNA passes through, it blocks the flow of ions and changes the current intensity across the nanopore. Because the charge properties of the four bases of ATCG are different, the type of base passed is identified according to the change of current."

Table 1

Comparison of three generation sequencing technologies"

Technical platform
Principle of
Read length
The first
Chain-terminating sequencing
600-1,000 bp
读长长; 准确率高; 能很好地
Long reads; high accuracy;
good ability to deal with
repetitive and homopolymer
通量低; 样品制备成本高,
Low throughput; high cost of Sanger
sample preparation; making massively
parallel sequencing prohibitive.
The second
200-400 bp
在二代测序中读长最长; 高通量
Longest read lengths among the
second-generation; high
样品制备较难; 难于处理重复和
Challenging sample preparation;
hard to deal with repetitive/homopo-
lymer regions.
Sequencing by synthesis
2 × 150 bp
Very high throughput
Short reads
Sequencing by
25-35 bp
高通量; 成本低
High throughput; low cost.
测序运行时间长; 读长短, 造成后续
Long sequencing runs (days); short
reads, resulting in difficulties in subsequence data analysis and genome assembly.
The third
Sequencing by
~1,000 bp 高平均读长; 不需要扩增;
最长单个读长接近100 kb
Long average read length;
no amplification of sequencing
fragments; longest individual
reads approach 100 kb.
错误率高; 依赖DNA聚合酶的活性
Low accuracy; dependence on DNA polymerase activity.
Electronic signals
最大记载2.2 M
record 2.2 M
读长超长; 电学测序; 方便携带
Over-long read; electronic
sequencing; portable.
High sequencing error
