Biodiversity Science ›› 2015, Vol. 23 ›› Issue (4): 550-555.doi: 10.17520/biods.2015120

• Orginal Article • Previous Article    

Using NCBIminer to search and download nucleotide sequences from GenBank

Xiaoting Xu1, Zhiheng Wang1, *(), Dimitar Dimitrov2, Carsten Rahbek3, 4   

  1. 1 Department of Ecology and Key Laboratory for Earth Surface Processes of the Ministry of Education, College of Urban and Environmental Sciences, Peking University, Beijing 100871
    2 Natural History Museum, University of Oslo, Oslo, Norway
    3 Center for Macroecology, Evolution and Climate, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
    4 Imperial College London, Grand Challenges in Ecosystems and the Environment Initiative, Silwood Park Campus, Berkshire, UK
  • Received:2015-05-07 Accepted:2015-07-09 Online:2015-08-03
  • Wang Zhiheng

GenBank is the leading public genetic resources database and currently contains over 1012 base pairs from about 300,000 formally described species. It offers valuable resources for studies on the evolution of species, genes, and genomes. However, difficulties in GenBank data mining hinder the potential wide application of this tool for big data collection. To address this issue, we introduce new bioinformatics software —NCBIminer. NCBIminer is a freely available, cross-platform, and user-friendly software for mining nucleotide sequences from GenBank. The main purpose of NCBIminer is to download sequences for user required genes and taxonomic groups based on gene names, types, and one or several reference sequences. The program algorithms have been described elsewhere and here, we focus on introducing the details in the usage of the program including how to install, run, and set parameters.

Key words: GenBank, bioinformatics, gene, phylogenetic evolution, DNA, nucleotide sequences

Appendix 1

Data format for a sequence in GenBank. The items in the left box are feature types defined in GenBank, while those in the right box are GenBank annotation information."

Appendix 2

Data format for a sequence in GenBank. The items in the left box are feature types defined in GenBank, while those in the right box are GenBank annotation informatioppendix 2 NCBIminer workflow. a, Major steps of the NCBIminer’s work flow; b, The algorithms for the establishment of improved reference sequences and sequence combination of multiple queries. Modified from Xu et al. (2015)."

1 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool.Journal of Molecular Biology, 215, 403-410.
2 Chen ZD (陈之端), Li DZ (李德铢) (2013) On Barcode of Life and Tree of Life.Plant Diversity and Resources(植物分类与资源学报), 35, 675-681. (in Chinese with English abstract)
3 Driskell AC, Ané C, Burleigh JG, McMahon MM, O’Meara BC, Sanderson MJ (2004) Prospects for building the Tree of Life from large sequence databases.Science, 306, 1172-1174.
4 Holt B, Lessard JP, Borregaard MK, Fritz SA, Araujo MB, Dimitrov D, Fabre PH, Graham CH, Graves GR, Jonsson KA, Nogues-Bravo D, Wang ZH, Whittaker RJ, Fjeldsa J, Rahbek C (2013) An update of Wallace’s zoogeographic regions of the world.Science, 339, 74-78.
5 Jones M, Koutsovoulos G, Blaxter M (2011) iPhy: an integrated phylogenetic workbench for supermatrix analyses.BMC Bioinformatics, 12, 30.
6 Li DC (2013) Similarity analysis of DNA sequences based on CLZ complexity.Journal of Computational and Theoretical Nanoscience, 10, 481-487.
7 Li DZ, Gao LM, Li HT, Wang H, Ge XJ, Liu JQ, Chen ZD, Zhou SL, Chen SL, Yang JB, Fu CX, Zeng CX, Yan HF, Zhu YJ, Sun YS, Chen SY, Zhao L, Wang K, Yang T, Duan GW, Grp CPB (2011) Comparative analysis of a large dataset indicates that internal transcribed spacer (ITS) should be incorporated into the core barcode for seed plants. Proceedings of the National Academy of Sciences, USA, 108, 19641-19646.
8 Lu LM (鲁丽敏), Sun M (孙苗), Zhang JB (张景博), Li HL (李洪雷), Lin L (林立), Yang T (杨拓), Chen M (陈闽), Chen ZD (陈之端) (2014) Tree of Life and its applications.Biodiversity Science(生物多样性), 22, 3-20. (in Chinese with English abstract)
9 Pearse WD, Purvis A (2013) phyloGenerator: an automated phylogeny generation tool for ecologists.Methods in Ecology and Evolution, 4, 692-698.
10 Pei NC (裴男才) (2015) Applications of DNA barcoding in evolutionary ecology.Biodiversity Science(生物多样性), 23, 291-292. (in Chinese)
11 Qiu Q, Zhang GJ, Ma T, Qian WB, Wang JY, Ye ZQ, Cao CC, Hu QJ, Kim J, Larkin DM, Auvil L, Capitanu B, Ma J, Lewin HA, Qian XJ, Lang YS, Zhou R, Wang LZ, Wang K, Xia JQ, Liao SG, Pan SK, Lu X, Hou HL, Wang Y, Zang XT, Yin Y, Ma H, Zhang J, Wang ZF, Zhang YM, Zhang DW, Yonezawa T, Hasegawa M, Zhong Y, Liu WB, Zhang Y, Huang ZY, Zhang SX, Long RJ, Yang HM, Wang J, Lenstra JA, Cooper DN, Wu Y, Wang J, Shi P, Wang J, Liu JQ (2012) The yak genome and adaptation to life at high altitude.Nature Genetics, 44, 946-949.
12 Ren BQ (任保青), Chen ZD (陈之端) (2010) DNA barcoding plant life.Chinese Bulletin of Botany(植物学报), 45, 1-12. (in Chinese with English abstract)
13 Sanderson M, Boss D, Chen D, Cranston K, Wehe A (2008) The PhyLoTA browser: processing GenBank for molecular phylogenetics research.Systematic Biology, 57, 335-346.
14 Xu X, Wang Z, Rahbek C, Lessard J-P, Fang J (2013)
15 Evolutionary history influences the effects of water-energy dynamics on oak diversity in Asia.Journal of Biogeography, 40, 2146-2155.
16 Xu XT, Dimitrov D, Rahbek C, Wang ZH (2015) NCBIminer: sequences harvest from Genbank.Ecography, 38, 426-430.
17 Yang ZY, Ran JH, Wang XQ (2012) Three genome-based phylogeny of Cupressaceae s.l.: further evidence for the evolution of gymnosperms and southern hemisphere biogeography.Molecular Phylogenetics and Evolution, 64, 452-470.
18 Zanne AE, Tank DC, Cornwell WK, Eastman JM, Smith SA, FitzJohn RG, McGlinn DJ, O’Meara BC, Moles AT, Reich PB, Royer DL, Soltis DE, Stevens PF, Westoby M, Wright IJ, Aarssen L, Bertin RI, Calaminus A, Govaerts R, Hemmings F, Leishman MR, Oleksyn J, Soltis PS, Swenson NG, Warman L, Beaulieu JM (2013) Three keys to the radiation of angiosperms into freezing environments.Nature, 506, 89-92.
[1] Qingsong Zhou Chao-Dong ZHU Zhi-Shu XIAO. (2020) Advances in techniques and methods of wildlife monitoring . Chin J Plant Ecol, 44(生态技术与方法专辑): 0-0.
[2] Chao Li Jin-jin Jin Jin-zhen Luo Chun-hui Wang Jun-jie Wang Jun JunZHAO. (2020) The genetic relationships of hatchery populations of Tanichthys albonubes and wild populations near Guangzhou . Biodiv Sci, 28(4): 474-484.
[3] yuchun Rao. (2020) Research Progress on Rice Root Genetics and Breeding . Chin Bull Bot, 55(3): 0-0.
[4] Jinyuan Su,Yu Yan,Chong Li,Dan Li,Fang K. Du. (2020) Informing conservation strategies with genetic diversity in Wild Plant with Extremely Small Populations: A review on gymnosperms . Biodiv Sci, 28(3): 376-384.
[5] Xiao Yan,Wang Zhenxing,Li Dongming,Qi Yanhua, Enhebayaer. (2020) Optimization of Tissue Culture and Plant Regeneration System of Mature Embryo of Leymus chinensis . Chin Bull Bot, 55(2): 192-198.
[6] Zhang Lu,He Xinhua. (2020) Nitrogen Utilization Mechanism in C3 and C4 Plants . Chin Bull Bot, 55(2): 228-239.
[7] Zhou Jian-Min. (2020) Fighting Fusarium Head Blight in Wheat—a Remedy from Afar . Chin Bull Bot, 55(2): 123-125.
[8] Zhao Hua,Shao Guangda,Gao Wenxin,Gu Biao. (2020) The Application of Double-barreled Particle Bombardment for Transient Gene Expression in Arabidopsis . Chin Bull Bot, 55(2): 182-191.
[9] Zhang Tairan,Zhang Hechen,Wu Ronghua. (2020) Recent Advances on Blue Flower Formation . Chin Bull Bot, 55(2): 216-227.
[10] Zuo Zeyuan,Liu Wanlin,Xu Jie. (2020) Evolution and Functional Analysis of Gene Clusters in Anther Tapetum Cells of Arabidopsis thaliana . Chin Bull Bot, 55(2): 147-162.
[11] Lai Xianjun,Zhang Yizheng,Gu Yinghong,Yan Lang. (2020) Transformation of Insect Derived Antifreeze Gene into Sweet Potato (Ipomoea batatas) and Enhanced Its Freeze-tolerance . Chin Bull Bot, 55(1): 9-20.
[12] Zihong Chen,Yuanbing Wang,Yongdong Dai,Kai Chen,Ling Xu,Qingcheng He. (2019) Species diversity and seasonal fluctuation of entomogenous fungi of Ascomycota in Taibaoshan Forest Park in western Yunnan . Biodiv Sci, 27(9): 993-1001.
[13] CHAI Yong-Fu,XU Jin-Shi,LIU Hong-Yan,LIU Quan-Ru,ZHENG Cheng-Yang,KANG Mu-Yi,LIANG Cun-Zhu,WANG Ren-Qing,GAO Xian-Ming,ZHANG Feng,SHI Fu-Chen,LIU Xiao,YUE Ming. (2019) Species composition and phylogenetic structure of major shrublands in North China . Chin J Plant Ecol, 43(9): 793-805.
[14] QIN Hao,ZHANG Yin-Bo,DONG Gang,ZHANG Feng. (2019) Altitudinal patterns of taxonomic, phylogenetic and functional diversity of forest communities in Mount Guandi, Shanxi, China . Chin J Plant Ecol, 43(9): 762-773.
[15] ZHANG Xue-Jiao, GAO Xian-Ming, JI Cheng-Jun, KANG Mu-Yi, WANG Ren-Qing, YUE Ming, ZHANG Feng, TANG Zhi-Yao. (2019) Response of abundance distribution of five species of Quercus to climate change in northern China . Chin J Plant Ecol, 43(9): 774-782.
Full text