gcorani - JNCC2 - The Java Implementation of Naive Credal Classifier 2

Statistical classification

JNCC2 addresses the problems of statistical classification. In general, a classiﬁer learns from data the relationship that holds between a set of attributes (also called features) characterizing a given object, and the class the object belongs to. For instance, e-mail ﬁltering is a classiﬁcation problem: the classiﬁer analyzes the frequency of some keywords contained in the message, to eventually decide whether the message is an ordinary e-mail or spam. Automated reading of postal codes, handwritten characters recognition and speech recognition constitute further examples of classiﬁcation problems. For further information, visit the wikipedia page about statistical classification.

Naive Credal Classifier 2

JNCC2 is the Java implementation of the Naive Credal Classifier 2 (NCC2 - Corani and Zaffalon, 2008). NCC2 constitutes an extension of the traditional Naive Bayes Classifier (NBC) towards imprecise probabilities; it is designed to return robust classification, even on small and/or incomplete data sets. A peculiar feature of NCC2 is that it returns set-valued (or imprecise) classifications (i.e., more than one class) when faced with doubtful instances.

Extensive empirical investigation shows that NCC2 returns imprecise judgments on instances whose classification is in fact very doubtful; in fact, NBC achieves a much higher classification accuracy on the instances precisely classified by NCC2, than on those imprecisely classified by NCC2.

Requirements

As JNCC2 is developed in Java, it runs under any operating system. To run JNCC2, it is necessary to have installed the Java Runtime Environment, release 5.0 or above, which can be downloaded from the the Sun Download Center.

JNCC2 runs from the command-line, and requires only little memory to run.

Download

JNCC2 is open source; it is released under the terms of the GNU GPL license.

The latest release of JNCC2 is 1.11 (October 2008). The zip file available for download contains:

the binary file (jar file);
sources with javadoc documentation;
user manual & tutorial;
toy examples, explained in the accompanying README files.

>> download jncc2, version 1.11

CHANGELOG
Version 1.11 provides the following improvements over version 1.1:

the software can be run without defining the variable CLASSPATH, in order to make easier to use JNCC2. To see how to use jncc without defining CLASSPATH, have a look at the user manual;

this archive

Bibliography

Naive Credal Classifier 2:

G.Corani, M. Zaffalon “ Learning Reliable Classifiers From Small or Incomplete Data Sets: The Naive Credal Classifier 2 ”, Journal of Machine Learning Research, 9, 581--621, 2008. >> download

Earlier works: Naive Credal Classifier

NCC2 has later extended the former NCC, by incorporating a much more flexible and powerful methodology to deal with missing data. Earlier works about the Naive Credal Classifier (NCC), authored by Marco Zaffalon, include:

Data sets

JNCC2 loads data from ARFF files. The ARFF format (Attribute-Relation File format) is a textual format designed for classification problem. It has been originally developed for WEKA, an open source software that implements a wide collection of data mining algorithms.

Since WEKA has become a standard tool for data analysis, large repositories of ARFF data sets have been set up. The WEKA page of data sets is a good starting point for browsing through ARFF repositories.

Beware that JNCC2 will be able to work only on classification data sets, and not on regression data sets.