生物多样性 ›› 2010, Vol. 18 ›› Issue (5): 473-479.  DOI: 10.3724/SP.J.1003.2010.473

所属专题: 生物多样性信息学专题(I)

• 生物多样性信息学专题 • 上一篇    下一篇

生物多样性数字图书馆系统的架构和实现

许哲平*(), 崔金钟, 刘凤红, 王赵改, 李巧玲   

  1. 中国科学院植物研究所文献信息中心, 北京 100093
  • 收稿日期:2010-01-19 接受日期:2010-04-10 出版日期:2010-09-20 发布日期:2010-09-20
  • 通讯作者: 许哲平
  • 作者简介:* E-mail: xuzp@ibcas.ac.cn
  • 基金资助:
    国家科技基础条件平台项目(2005DKA21401)

Architecture and implementation of the biodiversity digital library

Zheping Xu*(), Jinzhong Cui, Fenghong Liu, Zhaogai Wang, Qiaoling Li   

  1. Center for Documentation and Information, Institute of Botany, Chinese Academy of Sciences, Beijing 100093
  • Received:2010-01-19 Accepted:2010-04-10 Online:2010-09-20 Published:2010-09-20
  • Contact: Zheping Xu

摘要:

生物多样性研究工作急切需要一个建立在多源数据基础上的数字图书馆。基于虚拟用户社区的生物多样性数字图书馆除了在数据类型、存储需求、共享方式等方面具有一般数字图书馆的特点之外, 在数据挖掘和应用方面也有自己的一些特点。本文在对国内外数字图书馆调研和与生物多样性遗产图书馆(Biodiversity Heritage Library)及互联网档案(Internet Archive)项目的合作的基础上, 总结了各类数字图书馆中的数据类型, 对构建生物多样性数字图书馆相关的数据标准——Dublin Core和TaxonX作了简单介绍。然后设计了具有数据汇总、数据整理、转换和翻译以及数据对外服务三个模块的系统框架,提出了生物多样性数字图书馆的系统架构和功能,展示了已经实现的部分系统运行效果, 最后对今后在版权、全文识别、海量和扩展等方面的问题进行了讨论。

关键词: 数字图书馆, 数据标准, Dublin Core, TaxonX, 全文检索

Abstract

Biodiversity research needs a digital library on multi-source data. Biodiversity digital library, based on the virtual community, has similar features as digital libraries in data types, storage requirement and sharing methods. On the other hand, it has distinct features in terms of data mining and application. Based on an investigation of related digital library projects and cooperation with the Biodiversity Heritage Library and Internet Archive, we summarize the types of literature data in some kinds of digital libraries and briefly introduce Dublin Core and TaxonX standards which will be applied in the construction of the biodiversity digital library. Then, the present architecture of the biodiversity digital library, composed of data aggregation models and data processes, conversion and translation models and service models, is proposed in order to integrate multi-source data, construct the virtual community and provide specific data service to external web sites. Part of the implemented information system is demonstrated, and then some problems like copyright, OCR (optical character recognition) and the extension of massive data sets are discussed.

Key words: digital library, data standards, Dublin Core, TaxonX, full text search