A combining approach to find all taxon names (FAT)

Authors

  • Guido Sautter
  • Klemens Böhm
  • Donat Agosti

DOI:

https://doi.org/10.17161/bi.v3i0.34

Keywords:

digital library, systematics, Named Entity Recognition, Taxonomic Name Extraction,

Abstract

Most of the literature on natural history is hidden in millions of pages stacked up in our libraries. Various initiatives aim now at making these publications digitally accessible and searchable, applying xml-mark up technologies. The unique biological names play a crucial role to link content related to a particular taxon. Thus discovering and marking them up is extremely important. Since their manual extraction and markup is cumbersome and time-intensive, it needs be automated. In this paper, we present computational linguistics techniques and evaluate how they can help to extract taxonomic names auto-matically. We build on an existing approach for extraction of such names (Koning et al. 2005) and combine it with several other learning techniques. We apply them to the texts sequentially so that each technique can use the results from the preceding ones. In particular, we use structural rules, dynamic lexica with fuzzy lookups, and word-level language recognition. We use legacy documents from different sources and times as test bed for our evaluation. The experimental results for our combining approach (FAT) show greater than 99% precision and recall. They reveal the potential of computational linguis-tics techniques towards an automated markup of biosystematics publications.

Metrics

File downloads
1,471
Jan 2007Jul 2007Jan 2008Jul 2008Jan 2009Jul 2009Jan 2010Jul 2010Jan 2011Jul 2011Jan 2012Jul 2012Jan 2013Jul 2013Jan 2014Jul 2014Jan 2015Jul 2015Jan 2016Jul 2016Jan 2017Jul 2017Jan 2018Jul 2018Jan 2019Jul 2019Jan 2020Jul 2020Jan 2021Jul 2021Jan 2022Jul 2022Jan 2023Jul 2023Jan 2024Jul 2024Jan 2025Jul 2025Jan 20269
|
Cambia Lens
2

Downloads

Downloads

Published

2006-12-02

Issue

Section

Articles (peer-reviewed)

How to Cite

Sautter, Guido, Klemens Böhm, and Donat Agosti. 2006. “A Combining Approach to Find All Taxon Names (FAT)”. Biodiversity Informatics 3 (December). https://doi.org/10.17161/bi.v3i0.34.