Converting Taxonomic Descriptions to New Digital Formats
DOI:
https://doi.org/10.17161/bi.v5i0.46Keywords:
Taxonomic descriptions, morphological descriptions, semantic markup, supervised machine learning, unsupervised machine learning, system evaluation, XMLAbstract
Abstract.--The majority of taxonomic descriptions is currently in print format. The majority of digital descriptions are in formats such as DOC, HTML, or PDF and for human readers. These formats do not convey rich semantics in taxonomic descriptions for computer-aided process. Newer digital formats such as XML and RDF accommodate semantic annotations that allow computers to process the rich semantics on human's behalf, thus open up opportunities for a wide range of innovative usages of taxonomic descriptions, such as searching in more precise and flexible ways, integrating with gnomic and geographic information, generating taxonomic keys automatically, and text data mining and information visualization etc. This paper discusses the challenges in automated conversion of multiple collections of descriptions to XML format and reports an automated system, MARTT. MARTT is a machine-learning system that makes use of training examples to tag new descriptions into XML format. A number of utilities are implemented as solutions to the challenges. The utilities are used to reduce the effort for training example preparation, to facilitate the creation of a comprehensive schema, and to predict system performance on a new collection of descriptions. The system has been tested with several plant and alga taxonomic publications including Flora of China and Flora of North America.Metrics
Metrics Loading ...
Downloads
Download data is not yet available.
Downloads
Published
2008-04-28
Issue
Section
Articles (peer-reviewed)
License
Copyright for articles published in this journal is retained by the authors, with first publication rights granted to the journal. All articles are licensed under a Creative Commons Attribution Non-Commercial license.
Competing Interests: The authors have declared that no competing interests exist.
How to Cite
cui, hong. 2008. “Converting Taxonomic Descriptions to New Digital Formats”. Biodiversity Informatics 5 (April). https://doi.org/10.17161/bi.v5i0.46.