Converting Taxonomic Descriptions to New Digital Formats

Authors

  • hong cui

DOI:

https://doi.org/10.17161/bi.v5i0.46

Keywords:

Taxonomic descriptions, morphological descriptions, semantic markup, supervised machine learning, unsupervised machine learning, system evaluation, XML

Abstract

Abstract.--The majority of taxonomic descriptions is currently in print format. The majority of digital descriptions are in formats such as DOC, HTML, or PDF and for human readers. These formats do not convey rich semantics in taxonomic descriptions for computer-aided process. Newer digital formats such as XML and RDF accommodate semantic annotations that allow computers to process the rich semantics on human's behalf, thus open up opportunities for a wide range of innovative usages of taxonomic descriptions, such as searching in more precise and flexible ways, integrating with gnomic and geographic information, generating taxonomic keys automatically, and text data mining and information visualization etc. This paper discusses the challenges in automated conversion of multiple collections of descriptions to XML format and reports an automated system, MARTT. MARTT is a machine-learning system that makes use of training examples to tag new descriptions into XML format. A number of utilities are implemented as solutions to the challenges. The utilities are used to reduce the effort for training example preparation, to facilitate the creation of a comprehensive schema, and to predict system performance on a new collection of descriptions. The system has been tested with several plant and alga taxonomic publications including Flora of China and Flora of North America.

Metrics

Metrics Loading ...

Downloads

Download data is not yet available.

Author Biography

  • hong cui
    Assistant Professor, Faculty of Information and Media Studies University of Western Ontario Canada

Downloads

Published

2008-04-28

Issue

Section

Articles (peer-reviewed)

How to Cite

cui, hong. 2008. “Converting Taxonomic Descriptions to New Digital Formats”. Biodiversity Informatics 5 (April). https://doi.org/10.17161/bi.v5i0.46.