Authors' responses to reviews of manuscript 4333; as a supplementary file for the resubmission, which is manuscript 4611 The problems with formatting the tables, which frustrated both reviewers, are very strange, and have been fixed! Since both reviewers thought that the review material was helpful, we have expanded that material while trying to keep it concise. We have added more links between terminologies (clustering, pathway keys, polythetic keys, error tolerance, mismatch threshold) and a quoted definition of "identification". The material on numeric ranges has been reworked and expanded, dividing the old table 4 into two parts with some extra columns and more extensive captions. The text is also expanded. These reviews, particularly review #2, were immensely valuable, since they provide much-needed direction on how best to explain the material, as well as very helpful suggestions about how to deliver the example code for others to make use of. Specific responses to reviewer 2: "The section "Normalizing the coefficient", as I understand, tries to show the point of the authors to improve the "best character" algorithm, first of the two suggested improvements. One can grasp it from the current form of the manuscript, but some further explanation, or examples comparing both methodologies (existing and proposed) would be extremely useful." Table 6 has been added to illustrate this. 1) The provided code only deals with the first of the two proposed improvements: the normalization of range of coefficients to 0-1. I miss an example for the second case: the application of the program to deal with numeric ranges Some more discussion has been added about how the code to handle numeric values would form part of the larger software system (and already exists in some systems), and table 5 has been added to show the mechanics of translating numeric ranges into categories. 2) .pyc are byte-code python files that should not therefore be human-readable. What the authors provide is a .py file, so the extension of the file should be changed. A wonderful tip. Fixed. 3) There is no Bestchar.pyc in the zip file, just "Bestchar backup.pyc" Fixed (a peculiar and embarrassing problem, something went terribly wrong with the dropbox repository!!). 4) some errors are raised when importing the python module: a non-UTF character and a wrong name for the input file. Authors should update the code to avoid these issues. This should be fixed. We were unable to find any bad characters in the files using BBedit's Zap Gremlins command, so hope that this was a glitch or side-effect of the other problems. In any case, the files have been replaced by the github versions, and, hopefully, no more gremlins will infiltrate. 5) The python file is not licensed as Public Domain The wording has been improved; yes, it certainly didn't make sense to mention copyright in that context. 6) The input is named "Table 4 Char 5.txt" and not "characterinput.txt" as is referenced in the manuscript We have renamed the example file to charinput.txt, and incorporated a comment in the header in the file to serve as the link to the manuscript. 7) Authors should remove the hidden __MACOSX folder. The github repository now makes that Mac/Windows incompatibility a thing of the past. 8) I suggest changing the way of sharing the files. I suggest either a link in a personal/institutional webpage or (even better) sharing the code via online repositories, such as github, google code... An excellent tip. The files are now publicly accessible in github (https://github.com/NadiaTalent/Bestchar). "Finally, in the discussion section, I would suggest to actually discuss the findings and the improvements achieved, and the way they can improve the taxonomic identifications and, in the end, the biodiversity informatics field, rather than giving an overview of the issues related with the taxonomic identification." and "In general, I miss in the text some linking between the calculations of the indexes and the separation measures and the actual meaning of these indexes when applied to character selection, and a further elaboration of the manuscript to fully expose the potential of the proposed methodology." These comments have produced many changes throughout the manuscript, particularly expansion of the introduction and the discussion. The expanded tables (particularly table 6) should also help to make clearer how character selection proceeds. A new point seems related to this, that some of the essential terms such as "character ranking", and the information coefficient itself, are also used in multivariate data analysis, for a very different purpose. Specific points to address: These have all been addressed, and probably need no explanation except for - Table 1, I miss some reference to or some explanation of the calculations of the different separation measures, both in the legend and in the text for the Xper2 and the Pairwise A good point. The text now includes explanation of the Average Pairwise Jaccard Distance. The Xper2 calculations have been glossed as "observed", and the "maximum value" column for Xper2 has been removed. Although we can observe the software's responses to particular data, the maximum was guesswork and should not have been included.