Multiple-Access Graph Search

Document Type


Lead Author Type

CIS Masters Student


Dr. Jonathan Leidig, jonathan.leidig@gvsu.edu

Embargo Period



Dichotomous keys are binary search trees commonly used in biology to efficiently identify an unknown organism. A search is initiated at the root of the tree and traverses a single path to an ultimate leaf node, the species in question. The path taken between successive nodes is determined by a decision regarding the features of the organism in question (e.g., “has vertebra/has no vertebra”). For all searches, the root node is the single access point. Each search is required to traverse the full depth of the tree from root to leaf. This topology constrains a user’s search. If the user is unable to answer the decision question at a node, they cannot make any further tree traversals. It is not uncommon for a feature to be ambiguous, missing, or unclear to the user. In situations like these, the user is left with no path forward.

This tool addresses the limitations of dichotomous keys by decomposing the tree into a graph. In the graph representation, each leaf node (species) of the dichotomous key is linked to feature nodes described by its parent nodes. Additional feature nodes are created from additional information not included in the key. The resulting graph has multiple points of access, allowing a user to initiate a search anywhere in the graph and only address features they can reliably describe.

This search tool uses a Neo4j graph database, D3.js visualization, and a web interface to assist the graph search. The interface provides both a table view and graph representation of the data. The user can interact with both the tables and the graph to select new features. The interface provides a searchable list of categorical features and a graph visualization of the search space. Once the user selects a set of features, a ranked list of candidate species is provided. The interface also suggests optimal features to examine next. The feature suggestion provides a ranked list of features that would most efficiently reduce the search space (i.e., which feature would eliminate half of the candidates).

This tool is not limited to dichotomous keys. It can also be generalized to a variety of search tasks. This project was also evaluated with additional retrieval case studies (e.g.,grass identity matrix and beer products database).

This document is currently not available here.