|

The Libraries & Identification module
Identification, also called supervised learning or classification, is no doubt one of the most important techniques in bioinformatics. The possibility to identify unknown organisms based upon various available experiment data sets is also a big step forward realized in BioNumerics, leading to more faithful consensus identifications. The same range of similarity and distance coefficients available for cluster analysis can be used for identification.
Libraries Identification can be as quick and simple as sorting a large list of database entries according to similarity with an unknown entry. A more targetted approach is by using an identification library, i.e. a collection of units, each of which consists of one or more entries of the same group (taxon, subtype, variant, serotype, ecotype…). An easy and surveyable identification report lists the identifications obtained by all individual data sets. Mathematical and statistical methods allow the estimation of the reliability and the relevance of each identification case. A detailed pairwise comparison can be obtained between any two entries from the database, which lists all the experiments that both entries share, together with the percentage similarity. As an interesting alternative to classical similarity-based identification, BioNumerics allows neural networks to be generated for each experiment type. For large databases containing groups that are difficult to distinguish, neural networks can be the quickest and most reliable identification tool.

Decision networks Decision networks are among the most versatile and powerful tools for automation in BioNumerics, allowing the user to build workflows that make decisions, predict features, perform queries, fill in fields, create graphs and plots, and much more. A decision network is an operational workflow that carries out one or more [logical] operations and/or actions on the database. The network is built of Operators as building blocks that form the Nodes of the network:
- Input operators to retrieve specific, usually experimental, data
- String, Value and Sequence operators, which perform a manipulation on data types
- Boolean operators, which combine one or more binary states into a new binary state
- Output actions, performing a specific action on the database, for example writing a field
The operators of a decision network together form an easy-to-use construction kit that allows one to build automated decision or action workflows, with endless possibilities.

Features
- Database screening. Fast identification of batches of entries with entire databases or selections from databases, using all available similarity or distance coefficients. Display of detailed pairwise comparison between any two entries.
- Libraries. Creation of identification libraries using the multi-library system. Specific similarity measures and settings can be defined per experiment type. Comprehensive identification reports showing results for each available experiment. Many different viewing options and statistical tools to facilitate interpretation.
- Neural Networks. Multiple Neural Networks can be trained for all experiment types and used for quick and accurate identification of complex groupings.
- Decision Networks. Allows automated workflows to be built that make decisions, predict features, perform queries, fill in fields, create graphs and plots, etc. Can be used for resistance prediction, in breeding research, for complex reporting or for automated analysis of multilevel and polyphasic data analysis and data sorting. Includes the option to build decision trees.
|