Character Selection During Interactive Taxonomic Identification: “Best Characters”

Nadia Talent, Richard B. Dickinson, Timothy A. Dickinson

Abstract


Software interfaces for interactive multiple-entry taxonomic identification (polyclaves) sometimes provide a “best character” or “separation” coefficient, to guide the user to choose a character that could most effectively reduce the number of identification steps required. The coefficient could be particularly helpful when difficult or expensive tasks are needed for forensic identification, and in very large databases, uses that appear likely to increase in importance. Several current systems also provide tools to develop taxonomies or single-entry identification keys, with a variety of coefficients that are appropriate to that purpose. For the identification task, however, information theory neatly applies, and provides the most appropriate coefficient. To our knowledge, Delta-Intkey is the only currently available system that uses a coefficient related to information theory, and it is currently being reimplemented, which may allow for improvement. We describe two improvements to the algorithm used by Delta-Intkey. The first improves transparency as the number of remaining taxa decreases, by normalizing the range of the coefficient to [0,1]. The second concerns numeric ranges, which require consistent treatment of sub-intervals and their end-points. A stand-alone Bestchar program for categorical data is provided, in the Python and R languages. The source code is freely available and dedicated to the Public Domain.

Keywords


Separation coefficient; polyclave; multi-access key; entropy; Delta-Intkey; information theory

Full Text:

PDF


DOI: https://doi.org/10.17161/bi.v9i1.4611

Copyright (c) 2014 Nadia Talent, Richard B. Dickinson, Timothy A. Dickinson



Biodiversity Informatics. ISSN: 1546-9735
Hosted by the University of Kansas Libraries.