Biodiversity Science ›› 2020, Vol. 28 ›› Issue (5): 587-595.doi: 10.17520/biods.2020156

• Reviews • Previous Article     Next Article

Verification of virus identity and host association using genomics technology

Benfeng Han, Xin Zhou, Xue Zhang()   

  1. Department of Entomology, College of Plant Protection, China Agricultural University, Beijing 100193
  • Received:2020-04-16 Accepted:2020-06-09 Online:2020-06-18
  • Xue Zhang

Genomics technology, especially metagenomic sequencing, has played an important role in identifying and tracing unknown viruses. While classical methods in virus taxonomy rely on phenotypic traits, the metagenomics pipeline assembles new virus genomes from short nucleotide fragments without the need for any a priori reference sequences. This new technology increases the efficiency in identifying viruses and hosts associated with those viruses. This is particularly useful in identifying viruses that can cause epidemics. One current challenge in accomplishing this, is the ability to trace the original and intermediate viral hosts. To do this, a comprehensive virus sequence library characterized by definite host information is needed. Unfortunately, such information is still limited. As wild and stock animals are main sources for pathogenic viruses, an extensive survey of the global virome is vitally important to help identify and prevent zoonotic epidemics. This review summarizes the application of genomics technologies in the identification of viruses and the hosts associated with those viruses, using the outbreak of SARS-CoV-2 as an example. We also address intrinsic drawbacks of current methodologies as well as the incompleteness of available virus libraries. We propose the necessity and feasibility in constructing a comprehensive virus database with host association that emphasizes the diversity of viruses and their interactions with other organisms.

Key words: SARS-CoV-2, high-throughput sequencing, virus diversity, host association, virus evolution

Table 1

Methods used in the phylogenetic analysis of viruses"

分析方法 Method 作用 Application 常用软件 Software 参考文献 Reference
Phylogenetic trees
分析不同生物间相关性, 通过树状分支可视化生物之间的亲缘关系并推测进化历史
Analyzing the correlation between different organisms, visualizing the relationship between organisms through tree branches and speculating on the evolutionary history
Zhou et al, 2020
Wu et al, 2020
Frias-De-Diego et al, 2019
Yuen et al, 2019
Reconstructing ancestral state
in phylogenies (RASP)
重建祖先在系统发生树上的地理分布, 推断历史生物地理学信息
Inferring historical biogeography through reconstructing ancestral geographic distributions on phylogenetic trees
RASP Luo et al, 2015
Frias-De-Diego et al, 2019
Yuen et al, 2019
Phylogenetic network
综合一系列系统发育树的可视化结果, 较直观地展示重组等性状冲突事件
Enabling the visualization of a multitude of optimal trees, displaying reorganization and other trait conflict events
Yu et al, 2020
Haplotype network
An intuitive method used in visualizing relationships between individual genotypes in a population level
PopART Tang et al, 2020
Leigh & Bryant, 2015
Bayesian evolutionary
Inferring the time when the population diverged based on time evolutionary tree modeling by BEAST
BEAST Luo et al, 2015
Bouckaert et al, 2014
Suchard et al, 2018
Recombination analysis
检验可能存在的重组信号, 揭示重组在基因进化中的作用
Identifying possible recombination signals and revealing the role of recombination in gene evolution
Wu et al, 2020
Lam et al, 2020

Table 2

Virus sequences deposited in the ViPR database (Data are from"


Percent without host identification
Arenaviridae 177 4,532 3,012 6,949 23.23%
Caliciviridae 225 52,189 49,073 96,673 28.16%
Coronaviridae 1,043 34,864 28,823 119,573 22.80%
Filoviridae 16 3,577 3,390 22,038 22.53%
Flaviviridae 367 345,546 261,780 877,286 39.36%
Hantaviridae 304 10,189 6,867 10,603 14.02%
Hepeviridae 33 17,838 15,022 19,203 17.95%
Herpesviridae 782 58,371 45,281 300,180 60.18%
Nairoviridae 38 3,669 1,931 3,553 9.92%
Paramyxoviridae 574 50,355 44,898 67,728 24.22%
Peribunyaviridae 183 5,031 2,434 6,367 22.80%
Phasmaviridae 16 1,106 340 1,114 0.27%
Phenuiviridae 215 5,189 2,934 7,133 13.37%
Picornaviridae 1,038 127,336 116,298 346,846 26.91%
Pneumoviridae 17 37,289 33,516 60,235 30.36%
Poxviridae 283 10,444 7,487 125,948 40.87%
Reoviridae 363 107,566 39,985 108,677 15.93%
Rhabdoviridae 530 33,347 26,510 46,761 22.12%
Togaviridae 60 12,800 10,854 46,764 36.25%
总计 Total 6,264 921,238 700,435 2273,631
[1] Adams IP, Glover RH, Monger WA, Mumford R, Jackeviciene E, Navalinskiene M, Samuitiene M, Boonham N (2009) Next-generation sequencing and metagenomic analysis: A universal diagnostic tool in plant virology. Molecular Plant Pathology, 10, 537-545.
doi: 10.1111/j.1364-3703.2009.00545.x pmid: 19523106
[2] Almazan F, Sola I, Zuniga S, Marquez-Jurado S, Morales L, Becares M, Enjuanes L (2014) Coronavirus reverse genetic systems: Infectious clones and replicons. Virus Research, 189, 262-270.
pmid: 24930446
[3] Andersen KG, Rambaut A, Lipkin WI, Holmes EC, Garry RF (2020) The proximal origin of SARS-CoV-2. Nature Medicine, 26, 450-452.
doi: 10.1038/s41591-020-0820-9 pmid: 32284615
[4] Babayan SA, Orton RJ, Streicker DG (2018) Predicting reservoir hosts and arthropod vectors from evolutionary signatures in RNA virus genomes. Science, 362, 577-580.
doi: 10.1126/science.aap9072 pmid: 30385576
[5] Bouckaert R, Heled J, Kuhnert D, Vaughan T, Wu CH, Xie D, Suchard MA, Rambaut A, Drummond AJ (2014) BEAST 2: A software platform for Bayesian evolutionary analysis. PLoS Computational Biology, 10, e1003537.
doi: 10.1371/journal.pcbi.1003537 pmid: 24722319
[6] Briese T, Paweska JT, Mcmullan LK, Hutchison SK, Street C, Palacios G, Khristova ML, Weyer J, Swanepoel R, Egholm M, Nichol ST, Lipkin WI (2009) Genetic detection and characterization of Lujo virus, a new hemorrhagic fever-associated arenavirus from southern Africa. PLoS Pathogens, 5, e1000455.
doi: 10.1371/journal.ppat.1000455 pmid: 19478873
[7] Cui J, Li F, Shi ZL (2019) Origin and evolution of pathogenic coronaviruses. Nature Reviews Microbiology, 17, 181-192.
doi: 10.1038/s41579-018-0118-9 pmid: 30531947
[8] Dong R, Zheng H, Tian K, Yau SC, Mao WG, Yu WP, Yin CC, Yu CL, He RL, Yang J, Yau SS (2017) Virus database and online inquiry system based on natural vectors. Evolutionary Bioinformatics, 13, 1-7.
[9] Forster P, Forster L, Renfrew C, Forster M (2020) Phylogenetic network analysis of SARS-CoV-2 genomes. Proceedings of the National Academy of Sciences,. USA, 117, 9241-9243.
[10] Frias-De-Diego A, Jara M, Escobar LE (2019) Papillomavirus in wildlife. Frontiers in Ecology and Evolution, 7, 406.
doi: 10.3389/fevo.2019.00406
[11] Ge XY, Li JL, Yang XL, Chmura AA, Zhu G, Epstein JH, Mazet JK, Hu B, Zhang W, Peng C, Zhang YJ, Luo CM, Tan B, Wang N, Zhu Y, Crameri G, Zhang SY, Wang LF, Daszak P, Shi ZL (2013) Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor. Nature, 503, 535-538.
doi: 10.1038/nature12711
[12] Gorbalenya AE, Gulyaeva AA, Lauber C, Sidorov IA, Leontovich AM, Penzar D, Samborskiy DV, Baker SC, Baric RS, de Groot RJ, Drosten C, Haagmans BL, Neuman BW, Perlman S, Poon LLM, Sola I, Ziebuhr J (2020) The species severe acute respiratory syndrome-related coronavirus: Classifying 2019-nCoV and naming it SARS-CoV-2. Nature Microbiology, 5, 536-544.
pmid: 32123347
[13] Gusfield D, Eddhu S, Langley C (2004) Optimal efficient reconstruction of phylogenetic networks with constrained recombination. Journal of Bioinformatics and Computational Biology, 2, 173-213.
doi: 10.1142/s0219720004000521 pmid: 15272438
[14] Hadidi A, Flores R, Candresse T, Barba M (2016) Next-generation sequencing and genome editing in plant virology. Frontiers in Microbiology, 7, 1325.
doi: 10.3389/fmicb.2016.01325 pmid: 27617007
[15] Hemida MG, Chu DK, Poon LL, Perera RA, Alhammadi MA, Ng HY, Siu LY, Guan Y, Alnaeem A, Peiris M (2014) MERS coronavirus in dromedary camel herd, Saudi Arabia. Emerging Infectious Diseases, 20, 1231-1234.
pmid: 24964193
[16] Holmes EC, Zhang YZ (2015) The evolution and emergence of hantaviruses. Current Opinion in Virology, 10, 27-33.
doi: 10.1016/j.coviro.2014.12.007 pmid: 25562117
[17] Hon CC, Lam TY, Shi ZL, Drummond AJ, Yip CW, Zeng F, Lam PY, Leung FC (2008) Evidence of the recombinant origin of a bat severe acute respiratory syndrome (SARS)-like coronavirus and its implications on the direct ancestor of SARS coronavirus. Journal of Virology, 82, 1819-1826.
pmid: 18057240
[18] Hu B, Zeng LP, Yang XL, Ge XY, Zhang W, Li B, Xie JZ, Shen XR, Zhang YZ, Wang N, Luo DS, Zheng XS, Wang MN, Daszak P, Wang LF, Cui J, Shi ZL (2017) Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus. PLoS Pathogens, 13, e1006698.
doi: 10.1371/journal.ppat.1006698 pmid: 29190287
[19] Hu YF, Yang F, Dong J, Yang J, Zhang T, Sun LL, Jin Q (2011) Direct pathogen detection from swab samples using a new high-throughput sequencing technology. Clinical Microbiology and Infection, 17, 241-244.
doi: 10.1111/j.1469-0691.2010.03246.x pmid: 20412188
[20] Huson DH, Kloepper TH (2005) Computing recombination networks from binary sequences. Bioinformatics, 21(Suppl. 2) , 159-165.
[21] Kafer S, Paraskevopoulou S, Zirkel F, Wieseke N, Donath A, Petersen M, Jones TC, Liu S, Zhou X, Middendorf M, Junglen S, Misof B, Drosten C (2019) Re-assessing the diversity of negative strand RNA viruses in insects. PLoS Pathogens, 15, e1008224.
doi: 10.1371/journal.ppat.1008224 pmid: 31830128
[22] Kunin V, Goldovsky L, Darzentas N, Ouzounis CA (2005) The net of life: Reconstructing the microbial phylogenetic network. Genome Research, 15, 954-959.
doi: 10.1101/gr.3666505 pmid: 15965028
[23] Lam TT, Shum MH, Zhu HC, Tong YG, Ni XB, Liao YS, Wei W, Cheung WY, Li WJ, Li LF, Leung GM, Holmes EC, Hu YL, Guan Y (2020) Identification of 2019-nCoV related coronaviruses in Malayan pangolins in southern China. bioRxiv, doi:
doi: 10.1101/2020.07.05.187344 pmid: 32676595
[24] Legendre M, Lartigue A, Bertaux L, Jeudy S, Bartoli J, Lescot M, Alempic JM, Ramus C, Bruley C, Labadie K, Shmakova L, Rivkina E, Coute Y, Abergel C, Claverie JM (2015) In-depth study of Mollivirus sibericum, a new 30,000-y-old giant virus infecting Acanthamoeba. Proceedings of the National Academy of Sciences, USA, 112, e5327-e5335.
[25] Leigh J, Bryant D (2015) PopART: Full-feature software for haplotype network construction. Methods in Ecology and Evolution, 6, 1110-1116.
[26] Letko M, Marzi A, Munster V (2020) Functional assessment of cell entry and receptor usage for SARS-CoV-2 and other lineage B betacoronaviruses. Nature Microbiology, 5, 562-569.
doi: 10.1038/s41564-020-0688-y pmid: 32094589
[27] Lu R, Zhao X, Li J, Niu P, Yang B, Wu H, Wang W, Song H, Huang B, Zhu N, Bi Y, Ma X, Zhan F, Wang L, Hu T, Zhou H, Hu Z, Zhou W, Zhao L, Chen J, Meng Y, Wang J, Lin Y, Yuan J, Xie Z, Ma J, Liu WJ, Wang D, Xu W, Holmes EC, Gao GF, Wu G, Chen W, Shi W, Tan W (2020) Genomic characterisation and epidemiology of 2019 novel coronavirus: Implications for virus origins and receptor binding. Lancet, 395, 565-574.
doi: 10.1016/S0140-6736(20)30251-8 pmid: 32007145
[28] Luo T, Comas I, Luo D, Lu B, Wu J, Wei L, Yang C, Liu Q, Gan M, Sun G, Shen X, Liu F, Gagneux S, Mei J, Lan R, Wan K, Gao Q (2015) Southern East Asian origin and coexpansion of Mycobacterium tuberculosis Beijing family with Han Chinese. Proceedings of the National Academy of Sciences, USA, 112, 8136-8141.
[29] Mavarez J, Salazar CA, Bermingham E, Salcedo C, Jiggins CD, Linares M (2006) Speciation by hybridization in Heliconius butterflies. Nature, 441, 868-871.
doi: 10.1038/nature04738 pmid: 16778888
[30] McBreen K, Lockhart PJ (2006) Reconstructing reticulate evolutionary histories of plants. Trends in Plant Science, 11, 398-404.
doi: 10.1016/j.tplants.2006.06.004 pmid: 16839803
[31] Mock F, Viehweger A, Barth E, Marz M (2020) VIDHOP, viral host prediction with deep learning. bioRxiv, doi:
doi: 10.1101/2020.07.05.187344 pmid: 32676595
[32] Nie FY, Lin XD, Hao ZY, Chen XN, Wang ZX, Wang MR, Wu J, Wang HW, Zhao G, Ma RZ, Holmes EC, Zhang YZ (2018) Extensive diversity and evolution of hepadnaviruses in bats in China. Virology, 514, 88-97.
doi: 10.1016/j.virol.2017.11.005 pmid: 29153861
[33] Palacios G, Druce J, Du L, Tran T, Birch C, Briese T, Conlan S, Quan PL, Hui J, Marshall J, Simons JF, Egholm M, Paddock CD, Shieh WJ, Goldsmith CS, Zaki SR, Catton M, Lipkin WI (2008) A new Arenavirus in a cluster of fatal transplant-associated diseases. New England Journal of Medicine, 358, 991-998.
doi: 10.1056/NEJMoa073785 pmid: 18256387
[34] Quick J, Loman NJ, Duraffour S, Simpson JT, Severi E, Cowley L, Bore JA, Koundouno R, Dudas G, Mikhail A, Ouedraogo N, Afrough B, Bah A, Baum JH, Becker-Ziaja B, Boettcher JP, Cabeza-Cabrerizo M, Camino-Sanchez A, Carter LL, Doerrbecker J, Enkirch T, Dorival IGG, Hetzelt N, Hinzmann J, Holm T, Kafetzopoulou LE, Koropogui M, Kosgey A, Kuisma E, Logue CH, Mazzarelli A, Meisel S, Mertens M, Michel J, Ngabo D, Nitzsche K, Pallash E, Patrono LV, Portmann J, Repits JG, Rickett NY, Sachse A, Singethan K, Vitoriano I, Yemanaberhan RL, Zekeng EG, Trina R, Bello A, Sall AA, Faye O, Faye O, Magassouba N, Williams CV, Amburgey V, Winona L, Davis E, Gerlach J, Washington F, Monteil V, Jourdain M, Bererd M, Camara A, Somlare H, Camara A, Gerard M, Bado G, Baillet B, Delaune D, Nebie KY, Diarra A, Savane Y, Pallawo RB, Gutierrez GJ, Milhano N, Roger I, Williams CJ, Yattara F, Lewandowski K, Taylor J, Rachwal P, Turner D, Pollakis G, Hiscox JA, Matthews DA, O’Shea MK, Johnston AM, Wilson D, Hutley E, Smit E, Di Caro A, Woelfel R, Stoecker K, Fleischmann E, Gabriel M, Weller SA, Koivogui L, Diallo B, Keita S, Rambaut A, Formenty P, Gunther S, Carroll MW (2016) Real-time, portable genome sequencing for Ebola surveillance. Nature, 530, 228-232.
doi: 10.1038/nature16996 pmid: 26840485
[35] Roossinck MJ (2015) Move over, bacteria! Viruses make their mark as mutualistic microbial symbionts. Journal of Virology, 89, 6532-6535.
doi: 10.1128/JVI.02974-14 pmid: 25903335
[36] Shi M, Lin XD, Chen X, Tian JH, Chen LJ, Li K, Wang W, Eden JS, Shen JJ, Liu L, Holmes EC, Zhang YZ (2018) The evolutionary history of vertebrate RNA viruses. Nature, 556, 197-202.
doi: 10.1038/s41586-018-0012-7 pmid: 29618816
[37] Shi M, Lin XD, Vasilakis N, Tian JH, Li CX, Chen LJ, Eastwood G, Diao XN, Chen MH, Chen X, Qin XC, Widen SG, Wood TG, Tesh RB, Xu J, Holmes EC, Zhang YZ (2016) Divergent viruses discovered in arthropods and vertebrates revise the evolutionary history of the Flaviviridae and related viruses. Journal of Virology, 90, 659-669.
doi: 10.1128/JVI.02036-15 pmid: 26491167
[38] Simmonds P, Adams MJ, Benko M, Breitbart M, Brister JR, Carstens EB, Davison AJ, Delwart E, Gorbalenya AE, Harrach B, Hull R, King AM, Koonin EV, Krupovic M, Kuhn JH, Lefkowitz EJ, Nibert ML, Orton R, Roossinck MJ, Sabanadzovic S, Sullivan MB, Suttle CA, Tesh RB, Van Der Vlugt RA, Varsani A, Zerbini FM (2017) Consensus statement: Virus taxonomy in the age of metagenomics. Nature Reviews Microbiology, 15, 161-168.
doi: 10.1038/nrmicro.2016.177 pmid: 28134265
[39] Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A (2018) Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evolution, 4, vey016.
[40] Tang P, Chiu C (2010) Metagenomics for the discovery of novel human viruses. Future Microbiology, 5, 177-189.
doi: 10.2217/fmb.09.120 pmid: 20143943
[41] Tang XL, Wu CC, Li X, Song YH, Yao XM, Wu XK, Duan YZ, Zhang H, Wang YR, Qian ZH, Cui J, Lu J (2020) On the origin and continuing evolution of SARS-CoV-2. National Science Review, 7, 1012-1023.
doi: 10.1093/nsr/nwaa036
[42] Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, Hu Y, Tao ZW, Tian JH, Pei YY, Yuan ML, Zhang YL, Dai FH, Liu Y, Wang QM, Zheng JJ, Xu L, Holmes EC, Zhang YZ (2020) A new coronavirus associated with human respiratory disease in China. Nature, 579, 265-269.
doi: 10.1038/s41586-020-2008-3 pmid: 32015508
[43] Yu WB, Tang GD, Zhang L, Corlett RT (2020) Decoding the evolution and transmissions of the novel pneumonia coronavirus (SARS-CoV-2/HCoV-19) using whole genomic data. Zoological Research, 41, 247-257.
doi: 10.24272/j.issn.2095-8137.2020.022 pmid: 32351056
[44] Yu Y, Harris AJ, Blair C, He X (2015) RASP (reconstruct ancestral state in phylogenies): A tool for historical biogeography. Molecular Phylogenetics and Evolution, 87, 46-49.
doi: 10.1016/j.ympev.2015.03.008 pmid: 25819445
[45] Yuen LKW, Littlejohn M, Duchêne S, Edwards R, Bukulatjpi S, Binks P, Jackson K, Davies J, Davis JS, Tong SYC, Locarnini S, Townsend J (2019) Tracing ancient human migrations into Sahul using hepatitis B virus genomes. Molecular Biology and Evolution, 36, 942-954.
doi: 10.1093/molbev/msz021 pmid: 30856252
[46] Zhong ZP, Solonenko NE, Li YF, Gazitua MC, Roux S, Davis ME, Van Etten JL, Mosley-Thompson E, Rich VI, Sullivan MB, Thompson LG (2020) Glacier ice archives fifteen-thousand-year-old viruses. bioRxiv, doi:
doi: 10.1101/2020.07.05.187344 pmid: 32676595
[47] Zhou P, Fan H, Lan T, Yang XL, Shi WF, Zhang W, Zhu Y, Zhang YW, Xie QM, Mani S, Zheng XS, Li B, Li JM, Guo H, Pei GQ, An XP, Chen JW, Zhou L, Mai KJ, Wu ZX, Li D, Anderson DE, Zhang LB, Li SY, Mi ZQ, He TT, Cong F, Guo PJ, Huang R, Luo Y, Liu XL, Chen J, Huang Y, Sun Q, Zhang XL, Wang YY, Xing SZ, Chen YS, Sun Y, Li J, Daszak P, Wang LF, Shi ZL, Tong YG, Ma JY (2018) Fatal swine acute diarrhoea syndrome caused by an HKU2-related coronavirus of bat origin. Nature, 556, 255-258.
doi: 10.1038/s41586-018-0010-9 pmid: 29618817
[48] Zhou P, Yang XL, Wang XG, Hu B, Zhang L, Zhang W, Si HR, Zhu Y, Li B, Huang CL, Chen HD, Chen J, Luo Y, Guo H, Jiang RD, Liu MQ, Chen Y, Shen XR, Wang X, Zheng XS, Zhao K, Chen QJ, Deng F, Liu LL, Yan B, Zhan FX, Wang YY, Xiao GF, Shi ZL (2020) A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature, 579, 270-273.
doi: 10.1038/s41586-020-2012-7 pmid: 32015507
[1] Qi Lu,Qiang Hu,Xiaogang Shi,Senlong Jin,Sheng Li,Meng Yao. (2019) Metabarcoding diet analysis of snow leopards (Panthera uncia) in Wolong National Nature Reserve, Sichuan Province . Biodiv Sci, 27(9): 960-969.
[2] Zhang Xue, Li Xing’an, Su Qinzhi, Cao Qina, Li Chenyi, Niu Qingsheng, Zheng Hao. (2019) A curated 16S rRNA reference database for the classification of honeybee and bumblebee gut microbiota . Biodiv Sci, 27(5): 557-566.
[3] Chen Zhixiang, Yao Xueying, Stephen R. Downie, Wang Qizhi. (2019) Assembling and analysis of Sanicula orthacantha chloroplast genome . Biodiv Sci, 27(4): 366-372.
[4] Xiaojuan Deng, Jianli Liu, Xingfu Yan, Peigui Liu. (2018) Community composition of bacteria associated with ascocarps of Tuber indicum using traditional culture method and Roche 454 high-throughput sequencing . Biodiv Sci, 26(12): 1318-1324.
[5] Aihua Zhao,Xiaojun Du,Jing Zang,Shouren Zhang,Zhihua Jiao. (2015) Soil bacterial diversity in the Baotianman deciduous broad-leaved forest . Biodiv Sci, 23(5): 649-657.
[6] Xin Sun,Ying Gao,Yunfeng Yang. (2013) Recent advancement in microbial environmental research using metagenomics tools . Biodiv Sci, 21(4): 393-400.
[7] Yuanfeng Cai,Zhongjun Jia. (2013) Progress in environmental transcriptomics based on next-generation high-throughput sequencing . Biodiv Sci, 21(4): 401-410.
Full text