TaxonGrab: Extracting Taxonomic Names From Text

Authors

  • Drew Koning American Museum of Natural History
  • Indra Neil Sarkar American Museum of Natural History
  • Thomas Moritz American Museum of Natural History

DOI:

https://doi.org/10.17161/bi.v2i0.17

Keywords:

Named Entity Recognition, Taxonomic Name Extraction

Abstract

Identification of organism names in biological texts is essential for the management of archival resources to facilitate comparative biological investigation. Because organism nomenclature conforms closely to prescribed rules, automated techniques may be useful for identifying organism names from existing documents, and may also support the completion of comprehensive indices of taxonomic names; such comprehensive lists are not yet available. Using a combination of contextual rules and a language lexicon, we have developed a set of simple computational techniques for extracting taxonomic names from biological text. Our proposed method consistently performs at greater than 96% Precision and 94% Recall, and at a much higher speed than manual extraction techniques. An implementation of the described method is available as a Web based tool written in PHP. Additionally, the PHP source code is available from SourceForge: http://sourceforge.net/projects/taxongrab, and the project website is http://research.amnh.org/informatics/taxlit/apps/.

Metrics

File downloads
1,952
Jan 2006Jul 2006Jan 2007Jul 2007Jan 2008Jul 2008Jan 2009Jul 2009Jan 2010Jul 2010Jan 2011Jul 2011Jan 2012Jul 2012Jan 2013Jul 2013Jan 2014Jul 2014Jan 2015Jul 2015Jan 2016Jul 2016Jan 2017Jul 2017Jan 2018Jul 2018Jan 2019Jul 2019Jan 2020Jul 2020Jan 2021Jul 2021Jan 2022Jul 2022Jan 2023Jul 2023Jan 2024Jul 2024Jan 2025Jul 2025Jan 20269
|

Downloads

Downloads

Published

2005-11-16

Issue

Section

Articles (peer-reviewed)

How to Cite

Koning, Drew, Indra Neil Sarkar, and Thomas Moritz. 2005. “TaxonGrab: Extracting Taxonomic Names From Text”. Biodiversity Informatics 2 (November). https://doi.org/10.17161/bi.v2i0.17.