|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectjncc20.Jncc
public class Jncc
Main class of the project, which loads the data set from file and then trains and validates the classifiers. It loads data from the file specified by the user; then, it trains and validates NBC and NCC according to the validation method specified by the user. Jncc implementes three validation methods: 1) 10 runs of stratified 10-folds cross-validation; 2) validation via testing file (single training/testing experiment) 3)testing file with unknown classes. In the first two cases, accuracy stats are reported to file (via ResultsReporter objects), as the true classes are known. In the last case, NCC predictions only are reported to file, as the true classes are unknown. Numerical features are discretized via MDL-entropy-based supervised discretization (using MdlDiscretizer objects. Note that discretization intervals are computed on the training set, and then applied unchanged on the testing set.
Nested Class Summary | |
---|---|
private class |
Jncc.ResultsReporter
Helper class for jncc, which accomplishes the following tasks: reads the temporary file where NBC and NCC predictions are stored; computes performances indexes; produces the output files, i.e., ResultsTable.csv (performance indicators), ConfMatrices.txt (confusion matrices) and, if a testing file is supplied, Prediction- |
Field Summary | |
---|---|
private java.lang.String |
arffFileAddress
Absolute path of the main Arff file |
private java.lang.String |
arffTestingFile
Absolute path of the testing Arff file |
private java.lang.String |
arffTestingFileName
Name of the testing Arff file |
private java.util.ArrayList<java.lang.String[]> |
categoryNames
Matrix of String with rows of different lenght; stores the name of the categories (each row corresponds to a different feature); meaningful for categorical features only. |
private java.util.ArrayList<java.lang.String> |
classNames
Names of the output classes. |
private int |
currentCvFold
|
private int[] |
cvFoldsIdx
Indexes for cross validation: in which fold each row of rawDataset falls |
private java.lang.String |
datasetName
Dataset Name as read from the field "@relation" in the Arff file |
private double[][] |
discretizationIntervals
Matrix with rows of different length; stores the bin ranges for numerical features |
private int[] |
discretLog
How many times each feature has been discretized in a single bin, over the different training/testing experiments. |
private java.util.ArrayList<java.lang.String> |
featNames
Names of input features |
private int[] |
foldsSize
How many instances are in each fold |
(package private) com.sun.management.OperatingSystemMXBean |
mxbean
needed to track execution time |
private NaiveBayes |
nbc
Naive Bayes classifier |
private NaiveCredalClassifier2 |
ncc2
NCC2 classifier |
private java.util.ArrayList<java.lang.String> |
nonMarFeatsTesting
Names of NonMar features in testing |
private java.util.ArrayList<java.lang.String> |
nonMarFeatsTraining
Names of NonMar features in training |
private java.util.ArrayList<java.lang.Integer> |
nonMarTesting
Index of NonMar features positions in the current testing set (position might change during CV, as different variables can get discretized into a single bin) |
private java.util.ArrayList<java.lang.Integer> |
nonMarTraining
Index of NonMar features positions in the current training set (position might change during CV, as different variables can get discretized into a single bin) |
private java.util.ArrayList<java.lang.Integer> |
notUsedFeatures
Variables not used in the current experiment, because discretized in a single bin; indexes refer to rawDataset |
private java.util.ArrayList<java.lang.Integer> |
numClassesNonMarTesting
Number of classes of variables NonMar in the testing set. |
private java.util.ArrayList<java.lang.Integer> |
numClassForEachUsedFeature
Number of classes for each used feature |
private int |
numCvFolds
Number of folds used by cross-validation |
private int |
numCvRuns
Number of Cross validation Runs |
private java.util.ArrayList<java.lang.Boolean> |
numFlags
Flags array, regarding wheter Features are numerical (1) or not (0) |
private java.lang.String |
predictionsFile
Absolute path of the temporary predictions file |
private java.lang.String |
probabilitiesFile
File that reports the estimated probabilities by precise classifiers and whether the imprecise classifier is precise or not; used to compute the curve of precision vs. |
private java.util.ArrayList<double[]> |
rawDataset
Copy of the data read from Arff file (having hence -9999 as marker for missing data), and category names substituted by the corresponding indexes.) |
private java.util.ArrayList<java.lang.String[]> |
rawTestingSet
Raw testing set exactly as read from file. |
private java.lang.String |
resultsFile
File that reports avg and std dev of performance indicators; this is the ultimate output file |
private java.util.ArrayList<java.lang.Integer>[] |
rowsClassIdx
Indexes of the rows, in rawDataset, which have the same output class. |
private double |
startTime
time at which program is started |
private java.util.ArrayList<int[]> |
testingSet
Testing set, accessed by the classifier: numerical variables are discretized, while category names and classes are substituted by indexes; missing data denoted as -9999. |
private java.util.ArrayList<int[]> |
trainingSet
Training set, accessed by the classifier: numerical variables are discretized, while category names and classes are substituted by indexes; missing data denoted as -9999. |
private boolean |
unknownClasses
Whether classes of the testing set are known or not |
private java.util.ArrayList<java.lang.Integer> |
usedFeatures
Variables used in the current experiment, hence excluding those discretized in a single bin. |
private java.util.ArrayList<java.lang.String> |
usedFeaturesNames
Names of the variables used in the current experiment, hence excluding those discretized in a single bin. |
private java.lang.String |
validationMethod
Set either to "CV" or to the name of the testing Arff file |
private java.lang.String |
workPath
Path where the files for the given experiment (Arff files, NonMar.txt) are found, and where output files will be saved. |
Constructor Summary | |
---|---|
Jncc(java.lang.String UserSuppliedWorkingPath,
java.lang.String UserSuppliedArffName,
java.lang.String UserSuppliedValidationName,
int numArgs)
Initializes the necessary data members, scans the main Arff file and then instantiates the data members FeatureNames, NumFlags, CategoryNames and rawDataset. |
Method Summary | |
---|---|
private static void |
checkArgs(java.lang.String[] args)
Sanity-check of the parameters supplied by the user |
private void |
deleteFileIfExisting(java.lang.String file)
|
private void |
discretizeNumFeats(java.util.ArrayList<double[]> trainingData)
Discretizes all the numerical features on the Training Set, and instantiates DiscretizationIntervals, UsedFeatures, UsedFeaturesNames, NumClassForEachUsedFeature; updates DiscretizationLog. |
private void |
drawCVindexes()
Draws stratified folders for cross-validation, instantiating CvFoldsIdx. |
private boolean |
findFeatName(java.lang.String tmpString)
|
private void |
findNonMarInCurrentDataset()
Prepares the NonMarInCurrentDataset data member. |
private int |
getDiscretizationIdx(java.lang.Double currentValue,
int FeatureIdx)
Given a numerical value of a certain discretized feature, returns the index of the bin in which the value falls |
private void |
initResultsFiles(java.lang.String validationFile)
Initializes file where to store predictions (which are only temporary) and performance indicators; validationFile is the unique available Arff file (in case of CV), the testing file in case of validation via testing file; it is not defined in case of unknownclasses. |
static void |
main(java.lang.String[] args)
Arguments of the main: (1) the working path; (2) the name of the main ArrfFile; (3) "cv" or the name of the testing ArffFile; (4)[OPTIONAL] "unknownClasses", in case the actual classes of the testing set are unknown. |
private void |
parseArffFile()
Scans the main Arff file. |
private void |
parseArffTestingFile(boolean UnknownClasses)
Parses the testing file, checking that all declarations are coherent with those already loaded from the training Arff file; if the classes are unknown, it reads only the instances, without looking for the classes. |
private void |
parseNonMar()
Reads the file NonMar.txt, containing the list of nonMar variables; if no file is found, all variables are assumed to be MAR. |
private void |
prepareDataSetFromRawData(java.util.ArrayList<double[]> SourceData,
java.util.ArrayList<int[]> DestinationData)
Take a raw set of data (undiscretized features) and put them into a dataset to be accessed by classifiers; categorical variables are copied unchanged, while numerical variables are converted to categorical according to DiscretizationIntervals; numerical variables discretized into a unique bin (and hence listed in NonUsedFeatures) are discarded. |
private void |
prepareTrainTestSet()
Prepares training and testing sets for validation via testing set, discretizing also numerical variables. |
private void |
prepareTrainTestSet(int currentFold)
Prepares training and testing sets for cross-validation, discretizing also numerical variables. |
private static void |
printArgError()
|
private void |
printElapsedTime()
|
private static void |
printHelp()
Writes an help message to the user, specifying the syntax to be used with JNCC2. |
private void |
saveTmpPredictions()
Dumps to file the predictions issued by the classifiers on the testing set(s); they will be later analyzed to compute the indicators, and eventually deleted. |
private void |
trainValidClassifiers()
Trains classfier on the training set, validates them on the testing set and save predictions to a temporary file |
private void |
validateCV(java.lang.String[] args)
Validates NBc and NCC via 10 runs of 10-folds cross-validation. |
private void |
validateTFile(java.lang.String TestingFile)
Validates NBC and NCC via testing file. |
private void |
validateTFileUnkClasses()
Learns NCC; classifies the instances of the testing file via NCC, and writes the classifications to file. |
private void |
writePerfIndicators()
Once classifiers have been validated (either via CV or single testing file), save to file all the relevant information |
private void |
writePredictions()
Write to file the instances, actual classes, probability distribution computed by NBC and non-dominated classes identified by NCC2. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private java.lang.String arffFileAddress
private java.lang.String arffTestingFile
private java.lang.String arffTestingFileName
private java.util.ArrayList<java.lang.String[]> categoryNames
private java.util.ArrayList<java.lang.String> classNames
private int currentCvFold
private int[] cvFoldsIdx
private java.lang.String datasetName
private double[][] discretizationIntervals
private int[] discretLog
private java.util.ArrayList<java.lang.String> featNames
private int[] foldsSize
com.sun.management.OperatingSystemMXBean mxbean
private NaiveBayes nbc
private NaiveCredalClassifier2 ncc2
private java.util.ArrayList<java.lang.String> nonMarFeatsTesting
private java.util.ArrayList<java.lang.String> nonMarFeatsTraining
private java.util.ArrayList<java.lang.Integer> nonMarTesting
private java.util.ArrayList<java.lang.Integer> nonMarTraining
private java.util.ArrayList<java.lang.Integer> notUsedFeatures
private java.util.ArrayList<java.lang.Integer> numClassesNonMarTesting
private java.util.ArrayList<java.lang.Integer> numClassForEachUsedFeature
private int numCvFolds
private int numCvRuns
private java.util.ArrayList<java.lang.Boolean> numFlags
private java.lang.String predictionsFile
private java.lang.String probabilitiesFile
private java.util.ArrayList<double[]> rawDataset
private java.util.ArrayList<java.lang.String[]> rawTestingSet
private java.lang.String resultsFile
private java.util.ArrayList<java.lang.Integer>[] rowsClassIdx
private double startTime
private java.util.ArrayList<int[]> testingSet
private java.util.ArrayList<int[]> trainingSet
private boolean unknownClasses
private java.util.ArrayList<java.lang.Integer> usedFeatures
private java.util.ArrayList<java.lang.String> usedFeaturesNames
private java.lang.String validationMethod
private java.lang.String workPath
Constructor Detail |
---|
Jncc(java.lang.String UserSuppliedWorkingPath, java.lang.String UserSuppliedArffName, java.lang.String UserSuppliedValidationName, int numArgs)
Method Detail |
---|
private static void checkArgs(java.lang.String[] args)
private void deleteFileIfExisting(java.lang.String file)
private void discretizeNumFeats(java.util.ArrayList<double[]> trainingData)
private void drawCVindexes()
private boolean findFeatName(java.lang.String tmpString)
private void findNonMarInCurrentDataset()
private int getDiscretizationIdx(java.lang.Double currentValue, int FeatureIdx)
private void initResultsFiles(java.lang.String validationFile)
public static void main(java.lang.String[] args)
private void parseArffFile()
private void parseArffTestingFile(boolean UnknownClasses)
private void parseNonMar()
Then, put the names of NonMar variables in TrainingNonMarFeatureNames and TestingNonMarFeatureNames.
private void prepareDataSetFromRawData(java.util.ArrayList<double[]> SourceData, java.util.ArrayList<int[]> DestinationData)
private void prepareTrainTestSet()
private void prepareTrainTestSet(int currentFold)
private static void printArgError()
private void printElapsedTime()
private static void printHelp()
private void saveTmpPredictions()
private void trainValidClassifiers()
private void validateCV(java.lang.String[] args)
private void validateTFile(java.lang.String TestingFile)
private void validateTFileUnkClasses()
private void writePerfIndicators()
private void writePredictions()
|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |