Converting Taxonomic Descriptions to New Digital Formats

Authors

  • hong cui

DOI:

https://doi.org/10.17161/bi.v5i0.46

Keywords:

Taxonomic descriptions, morphological descriptions, semantic markup, supervised machine learning, unsupervised machine learning, system evaluation, XML

Abstract

Abstract.--The majority of taxonomic descriptions is currently in print format. The majority of digital descriptions are in formats such as DOC, HTML, or PDF and for human readers. These formats do not convey rich semantics in taxonomic descriptions for computer-aided process. Newer digital formats such as XML and RDF accommodate semantic annotations that allow computers to process the rich semantics on human's behalf, thus open up opportunities for a wide range of innovative usages of taxonomic descriptions, such as searching in more precise and flexible ways, integrating with gnomic and geographic information, generating taxonomic keys automatically, and text data mining and information visualization etc. This paper discusses the challenges in automated conversion of multiple collections of descriptions to XML format and reports an automated system, MARTT. MARTT is a machine-learning system that makes use of training examples to tag new descriptions into XML format. A number of utilities are implemented as solutions to the challenges. The utilities are used to reduce the effort for training example preparation, to facilitate the creation of a comprehensive schema, and to predict system performance on a new collection of descriptions. The system has been tested with several plant and alga taxonomic publications including Flora of China and Flora of North America.

Metrics

File downloads
1,266
Jul 2008Jan 2009Jul 2009Jan 2010Jul 2010Jan 2011Jul 2011Jan 2012Jul 2012Jan 2013Jul 2013Jan 2014Jul 2014Jan 2015Jul 2015Jan 2016Jul 2016Jan 2017Jul 2017Jan 2018Jul 2018Jan 2019Jul 2019Jan 2020Jul 2020Jan 2021Jul 2021Jan 2022Jul 2022Jan 2023Jul 2023Jan 2024Jul 2024Jan 2025Jul 2025Jan 20268
|

Downloads

Author Biography

  • hong cui
    Assistant Professor, Faculty of Information and Media Studies University of Western Ontario Canada

Downloads

Published

2008-04-28

Issue

Section

Articles (peer-reviewed)

How to Cite

cui, hong. 2008. “Converting Taxonomic Descriptions to New Digital Formats”. Biodiversity Informatics 5 (April). https://doi.org/10.17161/bi.v5i0.46.