Biodiversity Science ›› 2015, Vol. 23 ›› Issue (4): 550-555.doi: 10.17520/biods.2015120

• Orginal Article • Previous Article    

Using NCBIminer to search and download nucleotide sequences from GenBank

Xiaoting Xu1, Zhiheng Wang1, *(), Dimitar Dimitrov2, Carsten Rahbek3, 4   

  1. 1 Department of Ecology and Key Laboratory for Earth Surface Processes of the Ministry of Education, College of Urban and Environmental Sciences, Peking University, Beijing 100871
    2 Natural History Museum, University of Oslo, Oslo, Norway
    3 Center for Macroecology, Evolution and Climate, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
    4 Imperial College London, Grand Challenges in Ecosystems and the Environment Initiative, Silwood Park Campus, Berkshire, UK
  • Received:2015-05-07 Accepted:2015-07-09 Online:2015-08-03
  • Wang Zhiheng

GenBank is the leading public genetic resources database and currently contains over 1012 base pairs from about 300,000 formally described species. It offers valuable resources for studies on the evolution of species, genes, and genomes. However, difficulties in GenBank data mining hinder the potential wide application of this tool for big data collection. To address this issue, we introduce new bioinformatics software —NCBIminer. NCBIminer is a freely available, cross-platform, and user-friendly software for mining nucleotide sequences from GenBank. The main purpose of NCBIminer is to download sequences for user required genes and taxonomic groups based on gene names, types, and one or several reference sequences. The program algorithms have been described elsewhere and here, we focus on introducing the details in the usage of the program including how to install, run, and set parameters.

Key words: GenBank, bioinformatics, gene, phylogenetic evolution, DNA, nucleotide sequences

Appendix 1

Data format for a sequence in GenBank. The items in the left box are feature types defined in GenBank, while those in the right box are GenBank annotation information."

Appendix 2

Data format for a sequence in GenBank. The items in the left box are feature types defined in GenBank, while those in the right box are GenBank annotation informatioppendix 2 NCBIminer workflow. a, Major steps of the NCBIminer’s work flow; b, The algorithms for the establishment of improved reference sequences and sequence combination of multiple queries. Modified from Xu et al. (2015)."

