Statistical classification 

 

JNCC2  addresses the problems of statistical classification. In general, a classifier learns from data the relationship that holds between a set of attributes (also called features) characterizing a given object, and the class the object belongs to. For instance, e-mail filtering is a classification problem: the classifier analyzes the frequency of some keywords contained in the message, to eventually decide whether the message is an ordinary e-mail or spam. Automated reading of postal codes, handwritten characters recognition and speech recognition constitute further examples of classification problems. For further information, visit the wikipedia page about statistical classification.


Naive Credal Classifier 2

 

JNCC2 is the Java implementation of the Naive Credal Classifier 2 (NCC2 - Corani and Zaffalon, 2008). NCC2 constitutes an extension of the traditional  Naive Bayes Classifier (NBC) towards imprecise probabilities; it is designed to return robust classification, even on small and/or incomplete data sets. A peculiar feature of NCC2 is that it returns set-valued (or imprecise) classifications (i.e., more than one class) when faced with doubtful instances.

Extensive empirical investigation shows that NCC2 returns imprecise judgments on instances whose classification is in fact very doubtful; in fact, NBC achieves a much higher classification accuracy on the instances precisely classified by NCC2, than on those imprecisely classified by NCC2. 


Requirements 

As JNCC2 is developed  in Java, it runs under any operating system. To run JNCC2, it is necessary to have installed the Java Runtime Environment, release 5.0 or above, which can be downloaded from the the Sun Download Center.

JNCC2 runs from the command-line, and requires only little memory to run. 


Download

 
JNCC2 is open source; it is released under the terms of the GNU GPL license.

The latest release of JNCC2 is 1.11 (October 2008). The zip file available for download contains:
  • the binary file (jar file);
  • sources with javadoc documentation;
  • user manual & tutorial;
  • toy examples, explained in the accompanying README files.

 >> download jncc2, version 1.11


CHANGELOG
Version 1.11 provides the following improvements over version 1.1:
  • the software can be run without defining the variable CLASSPATH, in order to make easier to use JNCC2. To see how to use jncc without defining CLASSPATH, have a look at the user manual;

  • OLD RELEASES

    All users are encouraged to use the latest release. However, the previous releases are still available from this archive.

    Bibliography

    Naive Credal Classifier 2: 

    G.Corani, M. Zaffalon “ Learning Reliable Classifiers From Small or Incomplete Data Sets: The Naive Credal Classifier 2 ”, Journal of Machine Learning Research, 9, 581--621, 2008. >> download


    Earlier works: Naive Credal Classifier

    NCC2 has later extended the former NCC, by incorporating a much more flexible and powerful methodology to deal with missing data. Earlier  works about the Naive Credal Classifier (NCC), authored by  Marco Zaffalon, include:


    Data sets

    JNCC2 loads data from ARFF files. The ARFF format (Attribute-Relation File format) is a textual format designed for classification problem. It has been originally developed for WEKA, an open source software that implements a wide  collection of data mining algorithms. 

    Since WEKA has become a standard tool for data analysis, large repositories of ARFF data sets have been set up. The WEKA page of data sets is a good starting point for browsing through ARFF repositories.

    Beware that JNCC2 will be able to work only on classification data sets, and not on regression data sets.