Converting Taxonomic Descriptions to New Digital Formats

Autores/as

  • hong cui

DOI:

https://doi.org/10.17161/bi.v5i0.46

Palabras clave:

Taxonomic descriptions, morphological descriptions, semantic markup, supervised machine learning, unsupervised machine learning, system evaluation, XML

Resumen

Abstract.--The majority of taxonomic descriptions is currently in print format. The majority of digital descriptions are in formats such as DOC, HTML, or PDF and for human readers. These formats do not convey rich semantics in taxonomic descriptions for computer-aided process. Newer digital formats such as XML and RDF accommodate semantic annotations that allow computers to process the rich semantics on human's behalf, thus open up opportunities for a wide range of innovative usages of taxonomic descriptions, such as searching in more precise and flexible ways, integrating with gnomic and geographic information, generating taxonomic keys automatically, and text data mining and information visualization etc. This paper discusses the challenges in automated conversion of multiple collections of descriptions to XML format and reports an automated system, MARTT. MARTT is a machine-learning system that makes use of training examples to tag new descriptions into XML format. A number of utilities are implemented as solutions to the challenges. The utilities are used to reduce the effort for training example preparation, to facilitate the creation of a comprehensive schema, and to predict system performance on a new collection of descriptions. The system has been tested with several plant and alga taxonomic publications including Flora of China and Flora of North America.

Descargas

Los datos de descarga aún no están disponibles.

Biografía del autor/a

  • hong cui
    Assistant Professor, Faculty of Information and Media Studies University of Western Ontario Canada

Descargas

Publicado

2008-04-28

Número

Sección

Articles (peer-reviewed)

Cómo citar

cui, hong. 2008. “Converting Taxonomic Descriptions to New Digital Formats”. Biodiversity Informatics 5 (April). https://doi.org/10.17161/bi.v5i0.46.