生物多样性 ›› 2015, Vol. 23 ›› Issue (4): 550-555.DOI: 10.17520/biods.2015120

• • 上一篇    

批量下载GenBank基因序列数据的新工具——NCBIminer

徐晓婷1, 王志恒1,,A;*, DimitarDimitrov2()   

  1. 1 (北京大学城市与环境学院生态学系, 北京大学地表过程分析与模拟教育部重点实验室, 北京 100871) 2 (Natural History Museum, University of Oslo, Oslo, Norway) 3 (Center for Macroecology, Evolution and Climate, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark) 4 (Imperial College London, Grand Challenges in Ecosystems and the Environment Initiative, Silwood Park Campus, Berkshire, UK);
  • 收稿日期:2015-05-07 接受日期:2015-07-09 出版日期:2015-07-20 发布日期:2015-08-03
  • 通讯作者: 王志恒
  • 基金资助:
    国家自然科学基金(31470564, 31400467, 31321061)和中国博士后科学基金(2014M550555)

Using NCBIminer to search and download nucleotide sequences from GenBank

Xiaoting Xu1, Zhiheng Wang1,*(), Dimitar Dimitrov2, Carsten Rahbek3,4   

  1. 1 Department of Ecology and Key Laboratory for Earth Surface Processes of the Ministry of Education, College of Urban and Environmental Sciences, Peking University, Beijing 100871
    2 Natural History Museum, University of Oslo, Oslo, Norway
    3 Center for Macroecology, Evolution and Climate, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
    4 Imperial College London, Grand Challenges in Ecosystems and the Environment Initiative, Silwood Park Campus, Berkshire, UK
  • Received:2015-05-07 Accepted:2015-07-09 Online:2015-07-20 Published:2015-08-03
  • Contact: Wang Zhiheng

摘要:

核苷酸序列是生物体遗传信息的载体, 是现代生物学和生态学的基础数据。随着测序技术的进步, 大量核苷酸序列被提取并存储在公共数据平台中, 其中GenBank(http://www.ncbi.nlm.nih.gov/genbank/)是目前最大的核苷酸序列数据平台之一。截至2015年2月, 该平台收录核苷酸序列总数已超过1.8亿条、覆盖全球超过30万个物种。但如何从如此海量的数据中准确、快速查找并下载所需数据已成为限制基因数据广泛使用的障碍之一。为此, 我们开发了一款可高效、准确下载GenBank数据的生物信息学软件NCBIminer。NCBIminer可根据用户提供的核苷酸序列名称、数据类型、一或多条初始化参考序列, 查找并下载用户指定的多个物种或类群的特定基因序列数据。该软件下载地址为https://github.com/greengirl/NCBIminer/releases/, 可在Windows、Linux和MAC操作系统下免费使用; 同时, 其操作简单, 用户无需生物信息学背景。为方便该软件的使用, 本文将介绍该软件的工作流程与算法、安装及使用过程中的参数设置等。

关键词: GenBank, 生物信息学, 基因序列, 系统进化, DNA, 核苷酸序列

Abstract:

GenBank is the leading public genetic resources database and currently contains over 1012 base pairs from about 300,000 formally described species. It offers valuable resources for studies on the evolution of species, genes, and genomes. However, difficulties in GenBank data mining hinder the potential wide application of this tool for big data collection. To address this issue, we introduce new bioinformatics software —NCBIminer. NCBIminer is a freely available, cross-platform, and user-friendly software for mining nucleotide sequences from GenBank. The main purpose of NCBIminer is to download sequences for user required genes and taxonomic groups based on gene names, types, and one or several reference sequences. The program algorithms have been described elsewhere and here, we focus on introducing the details in the usage of the program including how to install, run, and set parameters.

Key words: GenBank, bioinformatics, gene, phylogenetic evolution, DNA, nucleotide sequences