CrossValidationTModel, TLearner, TInput, TOutput Class |
Namespace: Accord.MachineLearning.Performance
public class CrossValidation<TModel, TLearner, TInput, TOutput> : BaseSplitSetValidation<CrossValidationResult<TModel, TInput, TOutput>, TModel, TLearner, TInput, TOutput> where TModel : class, Object, ITransform<TInput, TOutput> where TLearner : class, Object, ISupervisedLearning<TModel, TInput, TOutput>
The CrossValidationTModel, TLearner, TInput, TOutput type exposes the following members.
Name | Description | |
---|---|---|
CrossValidationTModel, TLearner, TInput, TOutput |
Initializes a new instance of the CrossValidation class.
|
Name | Description | |
---|---|---|
DefaultValue |
Gets or sets a value to be used as the Loss in case the model throws
an exception during learning. Default is null (exceptions will not be ignored).
(Inherited from BaseSplitSetValidationTResult, TModel, TLearner, TInput, TOutput.) | |
Fit |
Gets or sets a LearnNewModelTLearner, TInput, TOutput, TModel function that can be used to create
new machine learning models using the current
learning algorithm.
(Inherited from BaseSplitSetValidationTResult, TModel, TLearner, TInput, TOutput.) | |
Folds |
Gets the array of data set indexes contained in each fold.
| |
Indices |
Gets the array of fold indices for each point in the data set.
| |
K |
Gets the number of folds in the k-fold cross validation.
| |
Learner |
Gets or sets a CreateLearnerFromSubsetTLearner, TInput, TOutput function
that can be used to create a TModel from a subset of the learning dataset.
(Inherited from BaseSplitSetValidationTResult, TModel, TLearner, TInput, TOutput.) | |
Loss |
Gets or sets a ComputeLossTOutput, TInfo function that can
be used to measure how far the actual model predictions were from the expected ground-truth.
(Inherited from BaseSplitSetValidationTResult, TModel, TLearner, TInput, TOutput.) | |
ParallelOptions |
Gets or sets the parallelization options for this algorithm.
(Inherited from ParallelLearningBase.) | |
Token |
Gets or sets a cancellation token that can be used
to cancel the algorithm while it is running.
(Inherited from ParallelLearningBase.) |
Name | Description | |
---|---|---|
CreateValidationSplits |
Creates a list of the sample indices that should serve as the validation set.
| |
Equals | Determines whether the specified object is equal to the current object. (Inherited from Object.) | |
Finalize | Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object.) | |
GetFold |
Gets a subset of the training and testing sets.
| |
GetHashCode | Serves as the default hash function. (Inherited from Object.) | |
GetType | Gets the Type of the current instance. (Inherited from Object.) | |
Learn |
Learns a model that can map the given inputs to the given outputs.
(Overrides BaseSplitSetValidationTResult, TModel, TLearner, TInput, TOutputLearn(TInput, TOutput, Double).) | |
LearnSubset |
Learns and evaluates a model in a single subset of the data.
(Inherited from BaseSplitSetValidationTResult, TModel, TLearner, TInput, TOutput.) | |
MemberwiseClone | Creates a shallow copy of the current Object. (Inherited from Object.) | |
ToString | Returns a string that represents the current object. (Inherited from Object.) |
Name | Description | |
---|---|---|
HasMethod |
Checks whether an object implements a method with the given name.
(Defined by ExtensionMethods.) | |
IsEqual |
Compares two objects for equality, performing an elementwise
comparison if the elements are vectors or matrices.
(Defined by Matrix.) | |
To(Type) | Overloaded.
Converts an object into another type, irrespective of whether
the conversion can be done at compile time or not. This can be
used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.) | |
ToT | Overloaded.
Converts an object into another type, irrespective of whether
the conversion can be done at compile time or not. This can be
used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.) |
Cross-validation is a technique for estimating the performance of a predictive model. It can be used to measure how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice.
One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, multiple rounds of cross-validation are performed using different partitions, and the validation results are averaged over the rounds.
References:
// Ensure results are reproducible Accord.Math.Random.Generator.Seed = 0; // This is a sample code on how to use Cross-Validation // to assess the performance of Support Vector Machines. // Consider the example binary data. We will be trying // to learn a XOR problem and see how well does SVMs // perform on this data. double[][] data = { new double[] { -1, -1 }, new double[] { 1, -1 }, new double[] { -1, 1 }, new double[] { 1, 1 }, new double[] { -1, -1 }, new double[] { 1, -1 }, new double[] { -1, 1 }, new double[] { 1, 1 }, new double[] { -1, -1 }, new double[] { 1, -1 }, new double[] { -1, 1 }, new double[] { 1, 1 }, new double[] { -1, -1 }, new double[] { 1, -1 }, new double[] { -1, 1 }, new double[] { 1, 1 }, }; int[] xor = // result of xor for the sample input data { -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1, -1, 1, 1, -1, }; // Create a new Cross-validation algorithm passing the data set size and the number of folds var crossvalidation = new CrossValidation<SupportVectorMachine<Linear, double[]>, double[]>() { K = 3, // Use 3 folds in cross-validation // Indicate how learning algorithms for the models should be created Learner = (s) => new SequentialMinimalOptimization<Linear, double[]>() { Complexity = 100 }, // Indicate how the performance of those models will be measured Loss = (expected, actual, p) => new ZeroOneLoss(expected).Loss(actual), Stratify = false, // do not force balancing of classes }; // If needed, control the parallelization degree crossvalidation.ParallelOptions.MaxDegreeOfParallelism = 1; // Compute the cross-validation var result = crossvalidation.Learn(data, xor); // Finally, access the measured performance. double trainingErrors = result.Training.Mean; // should be 0.30606060606060609 (+/- var. 0.083498622589531682) double validationErrors = result.Validation.Mean; // should be 0.3666666666666667 (+/- var. 0.023333333333333334) // If desired, compute an aggregate confusion matrix for the validation sets: GeneralConfusionMatrix gcm = result.ToConfusionMatrix(data, xor); double accuracy = gcm.Accuracy; // should be 0.625 double error = gcm.Error; // should be 0.375
// Ensure results are reproducible Accord.Math.Random.Generator.Seed = 0; // This is a sample code on how to use Cross-Validation // to assess the performance of Hidden Markov Models. // Declare some testing data int[][] inputs = new int[][] { new int[] { 0,1,1,0 }, // Class 0 new int[] { 0,0,1,0 }, // Class 0 new int[] { 0,1,1,1,0 }, // Class 0 new int[] { 0,1,1,1,0 }, // Class 0 new int[] { 0,1,1,0 }, // Class 0 new int[] { 0,0,0,0,0 }, // Class 1 new int[] { 0,0,0,1,0 }, // Class 1 new int[] { 0,0,0,0,0 }, // Class 1 new int[] { 0,0,0 }, // Class 1 new int[] { 0,0,0,0 }, // Class 1 new int[] { 1,0,0,1 }, // Class 2 new int[] { 1,1,0,1 }, // Class 2 new int[] { 1,0,0,0,1 }, // Class 2 new int[] { 1,0,1 }, // Class 2 new int[] { 1,1,0,1 }, // Class 2 }; int[] outputs = new int[] { 0,0,0,0,0, // First 5 sequences are of class 0 1,1,1,1,1, // Middle 5 sequences are of class 1 2,2,2,2,2, // Last 5 sequences are of class 2 }; // Create a new Cross-validation algorithm passing the data set size and the number of folds var crossvalidation = new CrossValidation<HiddenMarkovClassifier, int[]>() { K = 3, // Use 3 folds in cross-validation Learner = (s) => new HiddenMarkovClassifierLearning() { Learner = (p) => new BaumWelchLearning() { NumberOfStates = 3 } }, Loss = (expected, actual, p) => { var cm = new GeneralConfusionMatrix(classes: p.Model.NumberOfClasses, expected: expected, predicted: actual); p.Variance = cm.Variance; return p.Value = cm.Kappa; }, Stratify = false, }; // If needed, control the parallelization degree crossvalidation.ParallelOptions.MaxDegreeOfParallelism = 1; // Compute the cross-validation var result = crossvalidation.Learn(inputs, outputs); // If desired, compute an aggregate confusion matrix for the validation sets: GeneralConfusionMatrix gcm = result.ToConfusionMatrix(inputs, outputs); // Finally, access the measured performance. double trainingErrors = result.Training.Mean; double validationErrors = result.Validation.Mean; double trainingErrorVar = result.Training.Variance; double validationErrorVar = result.Validation.Variance; double trainingErrorPooledVar = result.Training.PooledVariance; double validationErrorPooledVar = result.Validation.PooledVariance;
// Ensure we have reproducible results Accord.Math.Random.Generator.Seed = 0; // Get some data to be learned. We will be using the Wiconsin's // (Diagnostic) Breast Cancer dataset, where the goal is to determine // whether the characteristics extracted from a breast cancer exam // correspond to a malignant or benign type of cancer: var data = new WisconsinDiagnosticBreastCancer(); double[][] input = data.Features; // 569 samples, 30-dimensional features int[] output = data.ClassLabels; // 569 samples, 2 different class labels // Let's say we want to measure the cross-validation performance of // a decision tree with a maximum tree height of 5 and where variables // are able to join the decision path at most 2 times during evaluation: var cv = CrossValidation.Create( k: 10, // We will be using 10-fold cross validation learner: (p) => new C45Learning() // here we create the learning algorithm { Join = 2, MaxHeight = 5 }, // Now we have to specify how the tree performance should be measured: loss: (actual, expected, p) => new ZeroOneLoss(expected).Loss(actual), // This function can be used to perform any special // operations before the actual learning is done, but // here we will just leave it as simple as it can be: fit: (teacher, x, y, w) => teacher.Learn(x, y, w), // Finally, we have to pass the input and output data // that will be used in cross-validation. x: input, y: output ); // After the cross-validation object has been created, // we can call its .Learn method with the input and // output data that will be partitioned into the folds: var result = cv.Learn(input, output); // We can grab some information about the problem: int numberOfSamples = result.NumberOfSamples; // should be 569 int numberOfInputs = result.NumberOfInputs; // should be 30 int numberOfOutputs = result.NumberOfOutputs; // should be 2 double trainingError = result.Training.Mean; // should be 0.017771153143274855 double validationError = result.Validation.Mean; // should be 0.0755952380952381 // If desired, compute an aggregate confusion matrix for the validation sets: GeneralConfusionMatrix gcm = result.ToConfusionMatrix(input, output); double accuracy = gcm.Accuracy; // result should be 0.92442882249560632
// Ensure we have reproducible results Accord.Math.Random.Generator.Seed = 0; // Let's say we have the following data to be classified // into three possible classes. Those are the samples: // int[][] inputs = { // input output new int[] { 0, 1, 1, 0 }, // 0 new int[] { 0, 1, 0, 0 }, // 0 new int[] { 0, 0, 1, 0 }, // 0 new int[] { 0, 1, 1, 0 }, // 0 new int[] { 0, 1, 0, 0 }, // 0 new int[] { 1, 0, 0, 0 }, // 1 new int[] { 1, 0, 0, 0 }, // 1 new int[] { 1, 0, 0, 1 }, // 1 new int[] { 0, 0, 0, 1 }, // 1 new int[] { 0, 0, 0, 1 }, // 1 new int[] { 1, 1, 1, 1 }, // 2 new int[] { 1, 0, 1, 1 }, // 2 new int[] { 1, 1, 0, 1 }, // 2 new int[] { 0, 1, 1, 1 }, // 2 new int[] { 1, 1, 1, 1 }, // 2 }; int[] outputs = // those are the class labels { 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, }; // Let's say we want to measure the cross-validation // performance of Naive Bayes on the above data set: var cv = CrossValidation.Create( k: 10, // We will be using 10-fold cross validation // First we define the learning algorithm: learner: (p) => new NaiveBayesLearning(), // Now we have to specify how the n.b. performance should be measured: loss: (actual, expected, p) => new ZeroOneLoss(expected).Loss(actual), // This function can be used to perform any special // operations before the actual learning is done, but // here we will just leave it as simple as it can be: fit: (teacher, x, y, w) => teacher.Learn(x, y, w), // Finally, we have to pass the input and output data // that will be used in cross-validation. x: inputs, y: outputs ); // After the cross-validation object has been created, // we can call its .Learn method with the input and // output data that will be partitioned into the folds: var result = cv.Learn(inputs, outputs); // We can grab some information about the problem: int numberOfSamples = result.NumberOfSamples; // should be 15 int numberOfInputs = result.NumberOfInputs; // should be 4 int numberOfOutputs = result.NumberOfOutputs; // should be 3 double trainingError = result.Training.Mean; // should be 0 double validationError = result.Validation.Mean; // should be 0.15 (+/- var. 0.11388888888888887) // If desired, compute an aggregate confusion matrix for the validation sets: GeneralConfusionMatrix gcm = result.ToConfusionMatrix(inputs, outputs);