Metadata Interchange for Chinese Information
更新时间:2023-05-26 16:21:01 阅读量: 实用文档 文档下载
- metadata推荐度:
- 相关推荐
Metadata Interchange for Chinese Information
Metadata Interchange for Chinese Information
Hsueh-hua Chen
Department of Library and Information Science
National Taiwan University
Taipei, 10764, Taiwan, ROC
sherry@ccms.ntu.edu.tw
Chao-chen Chen
Department of Adult and Continuing Education
National Taiwan Normal University
Taipei, 10764, Taiwan, ROC
cc4073@.tw
Kuang-hua Chen
Department of Library and Information Science
National Taiwan University
Taipei, 10764, Taiwan, ROC
khchen@ccms.ntu.edu.tw
ABSTRACT
Accompanying with the growing Internet, DL/M has become an important researches issue. Metadata as a concrete foundation for Digital Libraries and Museums (DL/M) researches and systems, its role has being recognized by different research fields. Due to the essence of Internet, the paper not only describes the research and development of metadata for information, but also addresses on the issue of metadata interchange. We propose the metadata format MICI (Metadata Interchange for Chinese Information) developed by ROSS (Resources Organization and Searching Specification) team and design a metadata software tool, Metalogy, to fulfill all features touched on this paper.
I. Preface
Several institutions in Taiwan possess precious collections of rare books, historical remains, artifacts and precious documents. However, based on the consideration of preservation, access of these collections are very limited. Now, through the powerful WWW, we will be able to present these valuable resources on the WWW. In addition to the increasing public exposures, this will also preserve the life of the resource, which might be otherwise deteriorating.
Metadata Interchange for Chinese Information
Presently, mass amount of information can be obtained from the WWW. However, the quality of contents varies greatly. Thus, our first priority is to present these valuable resources on the web, make them available to users all over the world. Secondly, since web resources are not properly structured, our task is to organize these contents effectively based on different attributes of resources.
Many digital libraries/museums (DL/Ms) are developing metadata that is suitable for their digital information. To put digital information on the web, besides digitize the objects and contents, we have to establish cataloging system for information and develop organizing systems for resource, so to provide a more efficient retrieval mechanism. Obviously, traditional methods of information organization are not sufficient to deal with multi-media digital libraries, which contains various kinds of text, image, sound and interactive materials. Moreover, due to cultural and language differences, the metadata developed by other countries cannot be adapted to our collections directly.
Standardization is essential for the web, and we cannot create something that is only suitable for the metadata of one particular digital library. DL-resource integration is essential as well, and DL should provide a single interface with various information service options to the users. In addition, web resources cannot be provided by one library alone, and distributed retrieval system framework has been the mainstream for a long time. Thus, in organizing web information, we must take two factors into consideration: to integrate with current systems, and to be interoperable with other institutions for information exchange. Therefore, to develop an appropriate information organization model based on the attributes of our collections with reference to the in-depth study of other metadata, information organization systems is the first step to build the foundation of Chinese DL/M system.
II. The Development and Implementation of Chinese Metadata
National Science Council (NSC) of Taiwan established Taiwan Digital Museum Project (TDMP) in October 1998. Our project -- Resources Organization and Searching Specification, ROSS -- is a sub-project under TDMP [1]. The aim of ROSS covers information organization and retrieval issues about Chinese DL/Ms, which includes information storage and management system design, user-demand and information retrieval behaviors, and interoperability among different systems. Based on our researches and past experiences, we think at least five topics must be addressed:
1. organizing digital information and establishing standards for Chinese resource
description formats
2. analyzing user needs to develop a “user-friendly” environment
3. establishing thesaurus structure and authority file
Metadata Interchange for Chinese Information
4. designing systems for information retrieval and search service
5. integrating retrieval mechanisms of digital libraries/museums
The current main task of ROSS is to support other pilot projects under TDMP, and its long-term goal will be to formulate guidelines for information organization and retrieval in Chinese DL/Ms. These standards should be compatible with international standards. Participation of international organizations (i.e. CIMI, Consortium for the Computer Interchange of Museum Information) may expedite the globalization of the Chinese DL/M systems. Likewise, transparent integrated retrieval is an essential function. Thus, to develop an integrated retrieval system that meets the international standards is crucial for us.
Prior to TDMP, ROSS Research Team (ROSS, in short) was established under National Taiwan University Digital Library/Museum, NTUDL/M) in March 1997 to study issues related to the Metadata Interchange for Chinese Information (MICI). Its responsibilities include understanding the history and features of collections; studying various metadata formats both domestically and internationally; understanding relations among metadata, database and the whole system; and understanding requests and retrieval behaviors of potential users. The ROSS held that, our metadata should be able to describe attributes of the collections, to provide the mandatory access points to users, to have interoperability among different digital libraries so to be able to exchange information, and to take consideration of the quality of description.
Most of the digitized collections of NTUDL/M are historical records, which includes “Dan-Hsin File”, “An-Li-Da-Chur Document”, “Ino's Collection” and “Archives of the Dept. of Anthropology of NTU”. After studying attributes of these historical records, ROSS studied other related metadata, including CIMI (for art collections), EAD (for archives), and so on. However, due to culture differences and uniqueness of our collections, those metadata cannot fully meet our needs.
In addition, with regards to interoperability, ROSS has considered the possibility to adopt MARC (which was quite well established). However, after evaluation, we found MARC is too complicated for the historical records. Not only that, MARC was mainly designed to describe books, and it cannot fully describe the attributes of our unique collections. For example, concept of “authorship” in historical records is not obvious; instead, “related person” is one of the crucial access points. If we put a particular information into similar (but not exact) elements reluctantly, it will result a loss in semantics, which is not desirable for us. Besides, in order to process MARC, we need to have software that is both specialized and complicated, which will become an undue burden for system design. Thus, based on the consideration of cost and benefit, we decided to develop a metadata for our
Metadata Interchange for Chinese Information
Chinese collections; nevertheless, many features of MARC as well as some other metadata were adopted.
In the process of developing metadata for historical records, members of ROSS communicated continuously with content experts, end-users, specialists on user behaviors, and system designers. After much laboring, a draft of the metadata for NTUDL/M historical records was formulated in June 1998. After a five-month testing period, we started the revision in November 1998. ROSS called several meetings to discuss how the metadata was used and how it should be revised. Finally, in the end of December, we reached a preliminary consensus.
Furthermore, ROSS started to formulate metadata for other TDMP subject-based pilot project collections, which include historical objects, ancient maps, images and photos (for History of Dan-Shui River Project), and butterfly specimen (for Taiwan Butterflies Project). During the process of formulating our initial version, in addition to the discussions with content experts, we studied various metadata and web sites. In particular, Handbook of Standards; Documenting African Collections (published in 1998 by International Council of Museum, ICOM) provides guidance on the minimum amount of documentation required for museum objects from Africa, and we did a mapping of their elements to our historical object metadata
[2]. With regards to butterfly, a web site by US government was especially helpful [3]. In our metadata format, elements were divided into seven areas, and a mapping was done for different types of metadata. These sections are:
1. System Management Area: for the purpose of system management, which includes
record number, cataloging language, language, cataloger, and cataloging date.
2. Description Area: for the purpose of describing the resource itself, which includes title,
author, recipient, date of production, and place of production and use.
3. Subject Area: for the purpose of describing the subject of the resource, which includes
subject/keyword, abstract, area of coverage, category by situation, category by function, category by material, category by technique, related event, related person, ethnic group, 4.
5.
6. date, place, site, and cultural system. Resource Type Area: for the purpose of describing the physical characteristics of the resource, which include type, physical description, and seal. Relation Area: for the purpose of describing related resources, which includes collection, series, analysis, reference, and citation. Holding Area: for the purpose of describing the acquisition and collection information of
the resource, which includes owner, source, registration number, collection information, and rights restrictions.
Reproduction Area: for the purpose of describing the information format of the resource,
7.
Metadata Interchange for Chinese Information
which includes digitized information and other media.
8. Note Area: includes general note, original description, condition, historical comments,
reference, and publication record.
Under the coexistence of different types of metadata, in order to exchange information and to have interoperability among different systems, we developed Metalogy, a system which is able to manage various types of metadata based on the concept of Z39.50 of metadata management [4].
III. The Management, Maintenance and Exchange of Metadata
Due to different user-demand and collection attributes, DL/M’s approach to information organization will be different as well. Consequently, different metadata was developed for different purposes. For example, Dublin Core metadata is designed for general web information [5]; FGDC (Federal Geographic Data Committee Standard) metadata is designed for geographic information [6]; CIMI (Consortium for Computer Interchange of Museum Information) metadata is designed for museum collections [7]; GILS (Government Information Locator Service) metadata is designed for government information [8]; and TEI (Text Encoding Archival) Headers is designed for archival materials [9]. Various types of metadata will facilitate institutions to organize resources properly; thus, users will be able to retrieve the needed information more effectively.
The development of various types of metadata shows that no single metadata can accommodate all types of collections and satisfy all kinds of user-demand. In the development of metadata for a particular kind of collection, one has to know the user groups and do a large scale of user group study. In addition, to meet users’ needs and to describe the resource appropriately, a thorough understanding of the collection attributes is essential. In regards to the overall consideration of various kinds of metadata, one has to be knowledgeable about the interactions among different kinds of metadata, thus to establish an integrated DL/M system with the property of “distributed processing and integrated retrieval”.
In order to achieve the above goal, it is crucial to manage and to maintain metadata effectively. According to the comprehensiveness of the content (from simple to more detailed descriptions), metadata can be seen as one layer of the hierarchical framework (please see Figure 1). The top-layer metadata contains the most basic “core elements”, which is suitable for almost all resources and collections. The second-layer metadata is extended from the top-layer with some added elements. As a result, elements in the lower-layer metadata will be more specific, more detailed, and the user group will be much
Metadata Interchange for Chinese Information
smaller.
… …
Figure 1. The hierarchical relationships among different types of metadata
For example, Dublin Core only has fifteen elements, which is the most common core metadata format. Other special domain (i.e. archival community) may add domain-specific elements below the Dublin Core. In this way, one will be able to exchange information with the most basic core elements with other field of study; and meanwhile, to exchange its complete elements (which is established with the cooperation of its own domain experts) within its domain (please see Figure 2). For example, EAD is more elaborated than the Dublin Core. If a system uses Dublin Core to describe resources, when it exchanges information with an EAD-metadata resource, EAD will be converted into Dublin Core through the process of mapping, and a degradation of metadata will result. However, this loss of information is inevitable.
Figure 2. Concept structure of using Dublin Core as the core metadata
Metadata Interchange for Chinese Information
(Source: Eric Miller, John Perkins, Thomas Hoffman, “DC for Museum” (slide))
One institution may hold different kinds of resources. For example, libraries or museums may have collections on calligraphy/paintings, rare books, historical records, etc., and different metadata elements may be used to describe different attributes of the collections. For example, American Congress Library’s digital library project proposed to use two formats of metadata to describe its resources: the traditional MARC and EAD. The metadata we developed for Chinese historical resources contains historical records, objects, pictures/photos, maps, etc. In order to manage different kinds of metadata conveniently and to utilize them with greater flexibility, Z39.50 was adopted. In other words, each type of metadata will be assigned a TagSet name. In Z39.50, two common TagSets were developed: TagSet-M and TagSet-G (where the Dublin Core is included in TagSet-G). Each institution of various domain can develop and name its own metadata: for example, the one developed by CIMI is named TagSet-CIMI, and the one developed by us is named TagSet-MICI. Nevertheless, MICI is a rather comprehensive metadata format, and not all resource types have to use all the elements. For better utilization of various kinds of metadata, the data-entry interface of MICI was designed in a way that users may select a metadata format from the following categories: historical records, objects, maps, and pictures/photos. We will talk about the details in the next section.
IV. Design and Implementation of a Multi-metadata Input-Output
and Conversion System
This system consists of three components as shown in Figure 4. The first component is called Local Metadata; it serves to maintain all elements, institution-specific elements/database system schema, metadata input/editing, search/retrieval, data management, and the rights management of catalogers. The second component deals with the semantics and syntax of exchange of the standard metadata format. For example, conversion of XML-Dublin Core, XML-MICI, XML-MARC, or ISO 2709, etc. The third component has to do with Z39.50, which includes Z39.50 server and Z39.50 client. Its main purpose is to convert access points from Z39.50 client to database system-internal access points, and to convert the search results into Z39.50 record syntax, i.e. GRS-1.
A multi-metadata I/O and conversion system, Metalogy, was designed for the following two purposes: to utilize other metadata format, and to utilize MICI format with greater flexibility. The main concept behind this system is shown in Figure 3. Within the system, it has an all-element table, which is developed by authorized institutions. Core elements of the element table are the Dublin Core. The different institutions may select appropriate elements from this table, however, all core elements must be selected in order to ensure the
Metadata Interchange for Chinese Information
interoperability among different systems. One may use XML syntax to package the information, or one may consider using RDF to organize the information in the future.
all elements view
Figure 3. Framework concept of Metalogy
Exchange
Metadata
&
Exchange
Syntax
Local Metadata
(Metalogy authoring
tool, Element Mapping,
Customer metadata
management, Z39.50
profile)
( Dublin Core, MARC,XML…)(Other DL/DM)
Figure 4. Three components of Metalogy
Currently, we have finished the first component of Metalogy. Its structure and modules are shown in Figure 5. The Dephi programming environment was used to develop this system. Figure 6 and Figure 7 are two snapshots of Metalogy. The catalogers could use the function shown in Figure 6 to customize the metadata format for special purposes. Then, they could key in the metadata using the function shown in Figure 7.
Metadata Interchange for Chinese Information
Figure 5. Structure and modules of the first component of Metalogy
Metadata Interchange for Chinese Information
Figure 6. Customization of metadata
Figure 7. Cataloging of metadata
Metadata Interchange for Chinese Information
V. Conclusions
Multi-metadata format has surely become an international trend for DL systems. However, from our experiences with other user institutions, we found that users prefer to have an interface with their schema that is relevant for their collections. In regards to user-demand, we should strive to simplify the database structure and the data-entry interface -- the only criteria is to be able to describe the collection attributes. However, in the long run, in order to exchange information, it seems reasonable to develop a multi-metadata format system with flexibility. Furthermore, implementation of Metalogy proves this is a feasible approach.
Acknowledgements
This work is supported in part by the National Science Council of R.O.C. under the contract NSC 88-2745-P-002-007. We would like to thank research assistants involved in this research.
References
[1]
[2] [3] [4] H.H. Chen, C.C. Chen and K.H. Chen. “The Theory and Implementation for Metadata in Digital Library/Museum.” Journal of Library Science, National Taiwan University, No. 13, Dec. 1998, pp. 37-59. (in Chinese)
[5] Dublin Core Metadata Initiative, 1998, (13 Nov. 1998).
<URL: /dc/>.
[6] FGDC. “Content Standards for Digital Geospatial Metadata -- FGDC.” 1994, <URL: gs.gov/> (13 Nov. 1998).
[7] CIMI. “Consortium for the Interchange of Museum Information – CIMI.” 1997, <URL: /> (13 Nov. 1998).
[8] GILS. “Guidelines for the Preparation of GILS Entries.” 1995.
<URL: http://gopher.nara.gov:70/0/managers/gils/guidance/gilsdoc.txt>.
[9]
正在阅读:
Metadata Interchange for Chinese Information05-26
中秋佳节快乐祝福语_中秋节祝福语03-26
常见急症及意外伤害现场救护09-02
C#.net实现网页指定邮箱发送邮件12-14
部编版第七单元自然情怀现当代散文学案(《故都的秋》《我与地坛》《荷塘月色》单元教学)09-02
《乡土中国》每章节概述09-02
《平行四边形的面积》教学反思09-02
2014高考英语短文改错抓分练习(1)09-02
小学教学常规考核细则09-02
- 1Chinese Traditional Culture
- 2Impact of information and communications technology on transport
- 3Traditional Chinese Festival
- 4Information flow based event distribution middleware
- 5Traditional Chinese culture
- 6Chinese word segmentation
- 7Chinese Food Culture
- 8Traditional Chinese Festivals
- 9Traditional Chinese Festival
- 10Chinese Traditional Culture
- 教学能力大赛决赛获奖-教学实施报告-(完整图文版)
- 互联网+数据中心行业分析报告
- 2017上海杨浦区高三一模数学试题及答案
- 招商部差旅接待管理制度(4-25)
- 学生游玩安全注意事项
- 学生信息管理系统(文档模板供参考)
- 叉车门架有限元分析及系统设计
- 2014帮助残疾人志愿者服务情况记录
- 叶绿体中色素的提取和分离实验
- 中国食物成分表2020年最新权威完整改进版
- 推动国土资源领域生态文明建设
- 给水管道冲洗和消毒记录
- 计算机软件专业自我评价
- 高中数学必修1-5知识点归纳
- 2018-2022年中国第五代移动通信技术(5G)产业深度分析及发展前景研究报告发展趋势(目录)
- 生产车间巡查制度
- 2018版中国光热发电行业深度研究报告目录
- (通用)2019年中考数学总复习 第一章 第四节 数的开方与二次根式课件
- 2017_2018学年高中语文第二单元第4课说数课件粤教版
- 上市新药Lumateperone(卢美哌隆)合成检索总结报告
- Interchange
- Information
- Metadata
- Chinese
- 杜邦高性能弹性体有限公司技术中心在上海成立
- 吴梦韬 论文 浅析加强企业民主管理工作的作用
- 中国制药工业百强企业
- 江苏省常州高级中学体育与健康课程实施方案
- 2008年第3季度玻璃纤维及其制品产品质量山东省监督抽查质量公告
- 中美关系的演变及未来发展趋势研究
- 4-苯基-6-甲基-5-乙氧羰基-3,4-二氢嘧啶-2-酮的合成
- 小学生必背古诗70首检测表
- 汽车怎样起步不熄火?
- 做毛衫工艺单需要注意几点
- 计算机基础实验报告win基本操作
- 金鱼是比较娇贵的
- 2017北林风景园林硕士考研专业课考研复试专业课考研经验
- 基础数据获取神器升级刘俊环:百度地图截获器矢量版1.0Beta 发布
- 第三章城市的产生和发展2011
- 近代中国外交史第四节
- 《地球地图等高线》(巩固练习一二)
- 甘肃省残疾人就业保障金管理办法
- 人教版七年级下册生物4月月考试卷及答案-百度文库doc
- 苹果会员卡销售计划书 文档