|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectjncc20.jncc
public class jncc
Main class of the project. It loads data from the file specified by the user (using ArffParser objects); then, it trains and validates NBC and NCC. Class jncc implementes three kinds of experiments: 1) 10 runs of stratified 10-folds cross-validation; 2) validation via testing file (single training/testing experiment) 3)testing file with unknown classes. In the first two cases, accuracy stats are reported to file (via ResultsReporter objects), as the true classes are known. In the last case, NCC predictions only are reported to file, as the true classes are unknown. Numerical features are discretized via MDL-entropy-based supervised discretization (using MdlDiscretizer objects. Note that discretization intervals are always computed on the training set, and then applied unchanged on the testing set.
Nested Class Summary | |
---|---|
private static class |
jncc.ResultsReporter
Helper class for jncc, which accomplishes the following tasks: reads the temporary file where NBC and NCC predictions are stored; computes performances indexes; produces the output file, containing both the discretization log (i.e., whether some numerical feats have been discretized into a single bin) and the classifiers results. |
Field Summary | |
---|---|
private ArffParser |
aParser
Object for parsing ARFF files |
private java.lang.String |
arffTestingFile
Absolute Path of the testing Arff file |
private java.lang.String |
arffTestingFileAddress
Name of the testing Arff file |
private java.util.ArrayList<java.lang.String[]> |
categoryNames
Matrix of String with rows of different lenght, as different features (each row of the matrix corresponds to a different feature) can have different numbers of categories. |
private java.util.ArrayList<java.lang.String> |
classesNames
Names of the output class. |
private int[] |
cvFoldsIdx
Indexes for cross validation: in which fold each row of RawDataset falls |
private double[][] |
discretizationIntervals
Matrix with rows of different length; stores the bin ranges for numerical features |
private int[] |
discretLog
How many times each feature has been discretized in a single bin, over the different training/testing experiments. |
private java.util.ArrayList<java.lang.String> |
featNames
Names of input features |
private int[] |
foldsSize
How many instances are in each fold |
private java.util.ArrayList<java.lang.String> |
nonMarFeatureNamesTesting
Names of NonMar features in testing |
private java.util.ArrayList<java.lang.String> |
nonMarFeatureNamesTraining
Names of NonMar features in training |
private java.util.ArrayList<java.lang.Integer> |
nonMarInCurrentTestingDataset
Index of NonMar features positions in the current testing set (position might change during CV, as different variables can get discretized into a single bin) |
private java.util.ArrayList<java.lang.Integer> |
nonMarInCurrentTrainingDataset
Index of NonMar features positions in the current training set (position might change during CV, as different variables can get discretized into a single bin) |
private java.util.ArrayList<java.lang.Integer> |
notUsedFeatures
Variables not used in the current experiment, because discretized in a single bin; indexes refer to RawDataset |
private java.util.ArrayList<java.lang.Integer> |
numClassesNonMarTesting
Number of classes of variables NonMar in the testing set. |
private java.util.ArrayList<java.lang.Integer> |
numClassForEachUsedFeature
Number of classes for each used feature |
private int |
numCrossVRuns
Number of Cross validation Runs |
private int |
numCvFolds
Number of folds used by cross-validation |
private java.util.ArrayList<java.lang.Boolean> |
numFlags
Flags array, regarding wheter Features are numerical (1) or not (0) |
private java.lang.String |
predsFile
Absolute Path of the predictions file for CV |
private java.util.ArrayList<double[]> |
rawDataset
Copy of the data read from Arff file (having hence -9999 as marker for missing data), and category names substituted by the corresponding indexes.) |
private java.util.ArrayList<java.lang.String[]> |
rawTestingSet
Raw testing set exactly as read from file. |
private java.lang.String |
resFile
File that reports avg and std dev of performance indicators; this is the ultimate output file |
private java.util.ArrayList<java.lang.Integer>[] |
rowsClassIdx
Indexes of the rows, in RawDataset, which have the same output class. |
private java.util.ArrayList<int[]> |
testingSet
Testing set, accessed by the classifier: numerical variables are discretized, while category names and classes are substituted by indexes; missing data denoted as -9999. |
private java.util.ArrayList<int[]> |
trainingSet
Training set, accessed by the classifier: numerical variables are discretized, while category names and classes are substituted by indexes; missing data denoted as -9999. |
private java.util.ArrayList<java.lang.Integer> |
usedFeatures
Variables used in the current experiment, hence excluding those discretized in a single bin. |
private java.util.ArrayList<java.lang.String> |
usedFeaturesNames
Names of the variables used in the current experiment, hence excluding those discretized in a single bin. |
private java.lang.String |
validationMethod
Set either to "CV" or to the name of the testing Arff file |
private java.lang.String |
workPath
Path where the files for the given experiment (Arff files, NonMar.txt) are found, and where output files will be saved. |
Constructor Summary | |
---|---|
jncc(java.lang.String UserSuppliedWorkingPath,
java.lang.String UserSuppliedArffName,
java.lang.String UserSuppliedValidationName)
Initializes the necessary data members, scans the main Arff file and then instantiates the data members FeatureNames, NumFlags, CategoryNames and RawDataset. |
Method Summary | |
---|---|
private void |
discretizeNumFeaturesOnTrainingData(java.util.ArrayList<double[]> TrainingData)
Discretizes all the numerical features on the Training Set, and instantiates DiscretizationIntervals, UsedFeatures, UsedFeaturesNames, NumClassForEachUsedFeature; updates DiscretizationLog. |
private void |
drawCVindexes()
Draws stratified folders for cross-validation, instantiating CvFoldsIdx. |
private void |
findNonMarInCurrentDataset()
Prepares the NonMarInCurrentDataset data member. |
private int |
getDiscretizationIdx(java.lang.Double currentValue,
int FeatureIdx)
Given a numerical value of a certain discretized feature, returns the index of the bin in which the value falls |
static void |
main(java.lang.String[] args)
Arguments of the main: (1) the working path; (2) the name of the main ArrfFile; (3) "cv" or the name of the testing ArffFile; (4)[OPTIONAL] "unknownClasses", in case the actual classes of the testing set are unknown. |
private void |
predictionsToFileNbcNcc(int NumFold,
int[] NBCPredictions,
int[][] CredalPredictions)
Dumps to file the predictions issued by both NBC and NCC on testing set(s). |
private void |
prepareDataSetFromRawData(java.util.ArrayList<double[]> SourceData,
java.util.ArrayList<int[]> DestinationData)
Take a raw set of data (undiscretized features) and put them into a dataset to be accessed by classifiers; categorical variables are copied unchanged, while numerical variables are converted to categorical according to DiscretizationIntervals; numerical variables discretized into a unique bin (and hence listed in NonUsedFeatures) are discarded. |
private void |
validateViaCV(java.lang.String[] args)
Validates NBc and NCC via 10 runs of 10-folds cross-validation. |
private void |
validateViaTestingFile(java.lang.String TestingFile)
Validates NBC and NCC via testing file. |
private void |
validateViaTestingFileUnknownClasses()
Learns NCC; classifies the instances of the testing file via NCC, and writes the classifications to file. |
private void |
ZZvalidateViaTestingFileMultipleNCCs(java.lang.String TestingFile)
RESEARCH FEATURE NOT TO BE RELEASED |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private ArffParser aParser
private java.lang.String arffTestingFile
private java.lang.String arffTestingFileAddress
private java.util.ArrayList<java.lang.String[]> categoryNames
private java.util.ArrayList<java.lang.String> classesNames
private int[] cvFoldsIdx
private double[][] discretizationIntervals
private int[] discretLog
private java.util.ArrayList<java.lang.String> featNames
private int[] foldsSize
private java.util.ArrayList<java.lang.String> nonMarFeatureNamesTesting
private java.util.ArrayList<java.lang.String> nonMarFeatureNamesTraining
private java.util.ArrayList<java.lang.Integer> nonMarInCurrentTestingDataset
private java.util.ArrayList<java.lang.Integer> nonMarInCurrentTrainingDataset
private java.util.ArrayList<java.lang.Integer> notUsedFeatures
private java.util.ArrayList<java.lang.Integer> numClassesNonMarTesting
private java.util.ArrayList<java.lang.Integer> numClassForEachUsedFeature
private int numCrossVRuns
private int numCvFolds
private java.util.ArrayList<java.lang.Boolean> numFlags
private java.lang.String predsFile
private java.util.ArrayList<double[]> rawDataset
private java.util.ArrayList<java.lang.String[]> rawTestingSet
private java.lang.String resFile
private java.util.ArrayList<java.lang.Integer>[] rowsClassIdx
private java.util.ArrayList<int[]> testingSet
private java.util.ArrayList<int[]> trainingSet
private java.util.ArrayList<java.lang.Integer> usedFeatures
private java.util.ArrayList<java.lang.String> usedFeaturesNames
private java.lang.String validationMethod
private java.lang.String workPath
Constructor Detail |
---|
jncc(java.lang.String UserSuppliedWorkingPath, java.lang.String UserSuppliedArffName, java.lang.String UserSuppliedValidationName)
Method Detail |
---|
private void discretizeNumFeaturesOnTrainingData(java.util.ArrayList<double[]> TrainingData)
private void drawCVindexes()
private void findNonMarInCurrentDataset()
private int getDiscretizationIdx(java.lang.Double currentValue, int FeatureIdx)
public static void main(java.lang.String[] args)
private void predictionsToFileNbcNcc(int NumFold, int[] NBCPredictions, int[][] CredalPredictions)
private void prepareDataSetFromRawData(java.util.ArrayList<double[]> SourceData, java.util.ArrayList<int[]> DestinationData)
private void validateViaCV(java.lang.String[] args)
private void validateViaTestingFile(java.lang.String TestingFile)
private void validateViaTestingFileUnknownClasses()
private void ZZvalidateViaTestingFileMultipleNCCs(java.lang.String TestingFile)
|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |