Click or drag to resize
Accord.NET (logo)

Codification Class

Codification Filter class.
Inheritance Hierarchy
SystemObject
  Accord.Statistics.FiltersBaseFilterCodificationTOptions, CodificationString
    Accord.Statistics.FiltersCodificationString
      Accord.Statistics.FiltersCodification

Namespace:  Accord.Statistics.Filters
Assembly:  Accord.Statistics (in Accord.Statistics.dll) Version: 3.8.0
Syntax
[SerializableAttribute]
public class Codification : Codification<string>, 
	IAutoConfigurableFilter, IFilter, ITransform<string[], double[]>, 
	ICovariantTransform<string[], double[]>, ITransform
Request Example View Source

The Codification type exposes the following members.

Constructors
Properties
  NameDescription
Public propertyActive
Gets or sets whether this filter is active. An inactive filter will repass the input table as output unchanged.
(Inherited from BaseFilterTOptions, TFilter.)
Public propertyColumns
Gets the collection of filter options.
(Inherited from BaseFilterTOptions, TFilter.)
Public propertyDefaultMissingValueReplacement
Gets or sets the default value to be used as a replacement for missing values. Default is to use System.DBNull.Value.
(Inherited from CodificationT.)
Public propertyItemInt32
Gets options associated with a given variable (data column).
(Inherited from BaseFilterTOptions, TFilter.)
Public propertyItemString
Gets options associated with a given variable (data column).
(Inherited from BaseFilterTOptions, TFilter.)
Public propertyNumberOfInputs
Gets the number of inputs accepted by the model.
(Inherited from BaseFilterTOptions, TFilter.)
Public propertyNumberOfOutputs
Gets the number of outputs generated by the model.
(Inherited from CodificationT.)
Public propertyToken
Gets or sets a cancellation token that can be used to stop the learning algorithm while it is running.
(Inherited from BaseFilterTOptions, TFilter.)
Top
Methods
  NameDescription
Public methodAdd(TOptions)
Add a new column options definition to the collection.
(Inherited from BaseFilterTOptions, TFilter.)
Public methodAdd(CodificationVariable)
Adds a new column options to this filter's collection, specifying how a particular column should be processed by the filter..
(Inherited from CodificationT.)
Public methodAdd(String, CodificationVariable)
Adds a new column options to this filter's collection, specifying how a particular column should be processed by the filter..
(Inherited from CodificationT.)
Public methodAdd(String, CodificationVariable, T)
Adds a new column options to this filter's collection, specifying how a particular column should be processed by the filter..
(Inherited from CodificationT.)
Public methodAdd(String, CodificationVariable, T)
Adds a new column options to this filter's collection, specifying how a particular column should be processed by the filter..
(Inherited from CodificationT.)
Public methodApply(DataTable)
Applies the Filter to a DataTable.
(Inherited from BaseFilterTOptions, TFilter.)
Public methodApply(DataTable, String)
Applies the Filter to a DataTable.
(Inherited from BaseFilterTOptions, TFilter.)
Public methodDetect(DataTable)
Auto detects the filter options by analyzing a given DataTable.
Public methodDetect(DataTable, String)
Auto detects the filter options by analyzing a given DataTable.
Public methodDetect(String, String)
Auto detects the filter options by analyzing a set of string labels.
Public methodDetect(String, String)
Auto detects the filter options by analyzing a set of string labels.
Public methodDetect(String, String)
Auto detects the filter options by analyzing a set of string labels.
Public methodEquals
Determines whether the specified object is equal to the current object.
(Inherited from Object.)
Protected methodFinalize
Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection.
(Inherited from Object.)
Public methodGetEnumerator
Returns an enumerator that iterates through the collection.
(Inherited from BaseFilterTOptions, TFilter.)
Public methodGetHashCode
Serves as the default hash function.
(Inherited from Object.)
Public methodGetType
Gets the Type of the current instance.
(Inherited from Object.)
Public methodLearn(DataTable, Double)
Learns a model that can map the given inputs to the desired outputs.
(Inherited from CodificationT.)
Public methodLearn(T, Double)
Learns a model that can map the given inputs to the desired outputs.
(Inherited from CodificationT.)
Public methodLearn(T, Double)
Learns a model that can map the given inputs to the desired outputs.
(Inherited from CodificationT.)
Protected methodMemberwiseClone
Creates a shallow copy of the current Object.
(Inherited from Object.)
Protected methodOnAddingOptions
Called when a new column options definition is being added. Can be used to validate or modify these options beforehand.
(Inherited from CodificationT.)
Protected methodProcessFilter
Processes the current filter.
(Inherited from CodificationT.)
Public methodRevert(Int32)
Translates an integer (codeword) representation of the value of a given variable into its original value.
(Inherited from CodificationT.)
Public methodRevert(String, Int32)
Translates an integer (codeword) representation of the value of a given variable into its original value.
(Inherited from CodificationT.)
Public methodRevert(String, Int32)
Translates an integer (codeword) representation of the value of a given variable into its original value.
(Inherited from CodificationT.)
Public methodRevert(String, Int32)
Translates the integer (codeword) representations of the values of the given variables into their original values.
(Inherited from CodificationT.)
Public methodToDouble
Converts this instance into a transform that can generate double[].
(Inherited from CodificationT.)
Public methodToString
Returns a string that represents the current object.
(Inherited from Object.)
Public methodTransform(String)
Transforms a matrix of key-value pairs (where the first column denotes a key, and the second column a value) into their integer vector representation.
Public methodTransform(T)
Translates an array of values into their integer representation, assuming values are given in original order of columns.
(Inherited from CodificationT.)
Public methodTransform(T)
Translates a value of the given variables into their integer (codeword) representation.
(Inherited from CodificationT.)
Public methodTransform(DataRow, String)
Translates an array of values into their integer representation, assuming values are given in original order of columns.
(Inherited from CodificationT.)
Public methodTransform(DataTable, String)
Translates an array of values into their integer representation, assuming values are given in original order of columns.
(Inherited from CodificationT.)
Public methodTransform(String, T)
Translates a value of a given variable into its integer (codeword) representation.
(Inherited from CodificationT.)
Public methodTransform(String, T)
Translates a value of the given variables into their integer (codeword) representation.
(Inherited from CodificationT.)
Public methodTransform(String, T)
Translates a value of the given variables into their integer (codeword) representation.
(Inherited from CodificationT.)
Public methodTransform(T, Double)
Applies the transformation to a set of input vectors, producing an associated set of output vectors.
(Inherited from CodificationT.)
Public methodTransform(T, Int32)
Translates a value of the given variables into their integer (codeword) representation.
(Inherited from CodificationT.)
Public methodTransform(DataRow, String, String)
Translates an array of values into their integer representation, assuming values are given in original order of columns.
(Inherited from CodificationT.)
Public methodTransform(DataTable, String, String)
Translates an array of values into their integer representation, assuming values are given in original order of columns.
(Inherited from CodificationT.)
Public methodTranslate(String) Obsolete.
Translates an array of values into their integer representation, assuming values are given in original order of columns.
Public methodTranslate(DataRow, String) Obsolete.
Translates an array of values into their integer representation, assuming values are given in original order of columns.
Public methodTranslate(String, Int32) Obsolete.
Translates an integer (codeword) representation of the value of a given variable into its original value.
Public methodTranslate(String, Int32) Obsolete.
Translates an integer (codeword) representation of the value of a given variable into its original value.
Public methodTranslate(String, String) Obsolete.
Translates a value of a given variable into its integer (codeword) representation.
Public methodTranslate(String, String) Obsolete.
Translates a value of the given variables into their integer (codeword) representation.
Public methodTranslate(String, String) Obsolete.
Translates a value of the given variables into their integer (codeword) representation.
Public methodTranslate(String, Int32) Obsolete.
Translates the integer (codeword) representations of the values of the given variables into their original values.
Public methodTranslate(String, String) Obsolete.
Translates a value of the given variables into their integer (codeword) representation.
Top
Extension Methods
  NameDescription
Public Extension MethodHasMethod
Checks whether an object implements a method with the given name.
(Defined by ExtensionMethods.)
Public Extension MethodIsEqual
Compares two objects for equality, performing an elementwise comparison if the elements are vectors or matrices.
(Defined by Matrix.)
Public Extension MethodTo(Type)Overloaded.
Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.)
Public Extension MethodToTOverloaded.
Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.)
Top
Remarks

The codification filter performs an integer codification of classes in given in a string form. An unique integer identifier will be assigned for each of the string classes.

Examples

When handling data tables, often there will be cases in which a single table contains both numerical variables and categorical data in the form of text labels. Since most machine learning and statistics algorithms expect their data to be numeric, the codification filter can be used to create mappings between text labels and discrete symbols.

// Show the start data
DataGridBox.Show(table);
// Create a new data projection (column) filter
var filter = new Codification(table, "Category");

// Apply the filter and get the result
DataTable result = filter.Apply(table);

// Show it
DataGridBox.Show(result);

The following more elaborated examples show how to use the Codification filter without necessarily handling System.Data.DataTables.

// Suppose we have a data table relating the age of
// a person and its categorical classification, as 
// in "child", "adult" or "elder".

// The Codification filter is able to extract those
// string labels and transform them into discrete
// symbols, assigning integer labels to each of them
// such as "child" = 0, "adult" = 1, and "elder" = 3.

// Create the aforementioned sample table
DataTable table = new DataTable("Sample data");
table.Columns.Add("Age", typeof(int));
table.Columns.Add("Label", typeof(string));

//            age   label
table.Rows.Add(10, "child");
table.Rows.Add(07, "child");
table.Rows.Add(04, "child");
table.Rows.Add(21, "adult");
table.Rows.Add(27, "adult");
table.Rows.Add(12, "child");
table.Rows.Add(79, "elder");
table.Rows.Add(40, "adult");
table.Rows.Add(30, "adult");


// Now, let's say we need to translate those text labels
// into integer symbols. Let's use a Codification filter:

var codebook = new Codification(table);


// After that, we can use the codebook to "translate"
// the text labels into discrete symbols, such as:

int a = codebook.Transform(columnName: "Label", value: "child"); // returns 0
int b = codebook.Transform(columnName: "Label", value: "adult"); // returns 1
int c = codebook.Transform(columnName: "Label", value: "elder"); // returns 2

// We can also do the reverse:
string labela = codebook.Revert(columnName: "Label", codeword: 0); // returns "child"
string labelb = codebook.Revert(columnName: "Label", codeword: 1); // returns "adult"
string labelc = codebook.Revert(columnName: "Label", codeword: 2); // returns "elder"

After we have created the codebook, we can use it to feed data with categorical variables to method which would otherwise not know how to handle text labels data. Continuing with our example, the next code section shows how to convert an entire data table into a numerical matrix.

// We can also process an entire data table at once:
DataTable result = codebook.Apply(table);

// The resulting table can be transformed to jagged array:
double[][] matrix = Matrix.ToArray(result);

// and the resulting matrix will be given by
string str = matrix.ToCSharp();

Finally, by expressing our data in terms of a simple numerical matrix we will be able to feed it to any machine learning algorithm. The following code section shows how to create a linear multi-class Support Vector Machine to classify ages into any of the previously considered text labels ("child", "adult" or "elder").

// Now we will be able to feed this matrix to any machine learning
// algorithm without having to worry about text labels in our data:

// Use the first column as input variables,
// and the second column as outputs classes
// 
double[][] inputs = matrix.GetColumns(0);
int[] outputs = matrix.GetColumn(1).ToInt32();

// Create a Multi-class learning algorithm for the machine
var teacher = new MulticlassSupportVectorLearning<Linear>()
{
    Learner = (p) => new SequentialMinimalOptimization<Linear>()
    {
        Complexity = 1
    }
};

// Run the learning algorithm
var svm = teacher.Learn(inputs, outputs);

// Compute the classification error (should be 0)
double error = new ZeroOneLoss(outputs).Loss(svm.Decide(inputs));


// After we have learned the machine, we can use it to classify
// new data points, and use the codebook to translate the machine
// outputs to the original text labels:

string result1 = codebook.Revert("Label", svm.Decide(new double[] { 10 })); // child
string result2 = codebook.Revert("Label", svm.Decide(new double[] { 40 })); // adult
string result3 = codebook.Revert("Label", svm.Decide(new double[] { 70 })); // elder

Every Learn() method in the framework expects the class labels to be contiguous and zero-indexed, meaning that if there is a classification problem with n classes, all class labels must be numbers ranging from 0 to n-1. However, not every dataset might be in this format and sometimes we will have to pre-process the data to be in this format. The example below shows how to use the Codification class to perform such pre-processing.

// Let's say we have the following data to be classified
// into three possible classes. Those are the samples:
// 
double[][] inputs =
{
    //               input         output
    new double[] { 0, 1, 1, 0 }, //  0 
    new double[] { 0, 1, 0, 0 }, //  0
    new double[] { 0, 0, 1, 0 }, //  0
    new double[] { 0, 1, 1, 0 }, //  0
    new double[] { 0, 1, 0, 0 }, //  0
    new double[] { 1, 0, 0, 0 }, //  1
    new double[] { 1, 0, 0, 0 }, //  1
    new double[] { 1, 0, 0, 1 }, //  1
    new double[] { 0, 0, 0, 1 }, //  1
    new double[] { 0, 0, 0, 1 }, //  1
    new double[] { 1, 1, 1, 1 }, //  2
    new double[] { 1, 0, 1, 1 }, //  2
    new double[] { 1, 1, 0, 1 }, //  2
    new double[] { 0, 1, 1, 1 }, //  2
    new double[] { 1, 1, 1, 1 }, //  2
};

// Now, suppose that our class labels are not contiguous. We
// have 3 classes, but they have the class labels 5, 1, and 8
// respectively. In this case, we can use a Codification filter
// to obtain a contiguous zero-indexed labeling before learning
int[] output_labels =
{
    5, 5, 5, 5, 5,
    1, 1, 1, 1, 1,
    8, 8, 8, 8, 8,
};

// Create a codification object to obtain a output mapping
var codebook = new Codification<int>().Learn(output_labels);

// Transform the original labels using the codebook
int[] outputs = codebook.Transform(output_labels);

// Create the multi-class learning algorithm for the machine
var teacher = new MulticlassSupportVectorLearning<Gaussian>()
{
    // Configure the learning algorithm to use SMO to train the
    //  underlying SVMs in each of the binary class subproblems.
    Learner = (param) => new SequentialMinimalOptimization<Gaussian>()
    {
        // Estimate a suitable guess for the Gaussian kernel's parameters.
        // This estimate can serve as a starting point for a grid search.
        UseKernelEstimation = true
    }
};

// The following line is only needed to ensure reproducible results. Please remove it to enable full parallelization
teacher.ParallelOptions.MaxDegreeOfParallelism = 1; // (Remove, comment, or change this line to enable full parallelism)

// Learn a machine
var machine = teacher.Learn(inputs, outputs);

// Obtain class predictions for each sample
int[] predicted = machine.Decide(inputs);

// Translate the integers back to the original lagbels
int[] predicted_labels = codebook.Revert(predicted);

The codification filter can also work with missing values. The example below shows how a codification codebook can be created from a dataset that includes missing values and how to use this codebook to replace missing values by some other representation (in the case below, replacing null by NaN double numbers.

            // In this example, we will be using a modified version of the famous Play Tennis 
            // example by Tom Mitchell (1998), where some values have been replaced by missing 
            // values. We will use NaN double values to represent values missing from the data.

            // Note: this example uses DataTables to represent the input data, 
            // but this is not required. The same could be performed using plain
            // double[][] matrices and vectors instead.
            DataTable data = new DataTable("Tennis Example with Missing Values");

            data.Columns.Add("Day", typeof(string));
            data.Columns.Add("Outlook", typeof(string));
            data.Columns.Add("Temperature", typeof(string));
            data.Columns.Add("Humidity", typeof(string));
            data.Columns.Add("Wind", typeof(string));
            data.Columns.Add("PlayTennis", typeof(string));

            data.Rows.Add("D1", "Sunny", "Hot", "High", "Weak", "No");
            data.Rows.Add("D2", null, "Hot", "High", "Strong", "No");
            data.Rows.Add("D3", null, null, "High", null, "Yes");
            data.Rows.Add("D4", "Rain", "Mild", "High", "Weak", "Yes");
            data.Rows.Add("D5", "Rain", "Cool", null, "Weak", "Yes");
            data.Rows.Add("D6", "Rain", "Cool", "Normal", "Strong", "No");
            data.Rows.Add("D7", "Overcast", "Cool", "Normal", "Strong", "Yes");
            data.Rows.Add("D8", null, "Mild", "High", null, "No");
            data.Rows.Add("D9", null, "Cool", "Normal", "Weak", "Yes");
            data.Rows.Add("D10", null, null, "Normal", null, "Yes");
            data.Rows.Add("D11", null, "Mild", "Normal", null, "Yes");
            data.Rows.Add("D12", "Overcast", "Mild", null, "Strong", "Yes");
            data.Rows.Add("D13", "Overcast", "Hot", null, "Weak", "Yes");
            data.Rows.Add("D14", "Rain", "Mild", "High", "Strong", "No");

            // Create a new codification codebook to convert 
            // the strings above into numeric, integer labels:
            var codebook = new Codification()
            {
                DefaultMissingValueReplacement = Double.NaN
            };

            // Learn the codebook
            codebook.Learn(data);

            // Use the codebook to convert all the data
            DataTable symbols = codebook.Apply(data);

            // Grab the training input and output instances:
            string[] inputNames = new[] { "Outlook", "Temperature", "Humidity", "Wind" };
            double[][] inputs = symbols.ToJagged(inputNames);
            int[] outputs = symbols.ToArray<int>("PlayTennis");

            // Create a new learning algorithm
            var teacher = new C45Learning()
            {
                Attributes = DecisionVariable.FromCodebook(codebook, inputNames)
            };

            // Use the learning algorithm to induce a new tree:
            DecisionTree tree = teacher.Learn(inputs, outputs);

            // To get the estimated class labels, we can use
            int[] predicted = tree.Decide(inputs);

            // The classification error (~0.214) can be computed as 
            double error = new ZeroOneLoss(outputs).Loss(predicted);

            // Moreover, we may decide to convert our tree to a set of rules:
            DecisionSet rules = tree.ToRules();

            // And using the codebook, we can inspect the tree reasoning:
            string ruleText = rules.ToString(codebook, "PlayTennis",
                System.Globalization.CultureInfo.InvariantCulture);

            // The output should be:
            string expected = @"No =: (Outlook == Sunny)
No =: (Outlook == Rain) && (Wind == Strong)
Yes =: (Outlook == Overcast)
Yes =: (Outlook == Rain) && (Wind == Weak)
";

The codification can also support more advanced scenarios where it is necessary to use different categorical representations for different variables, such as one-hot-vectors and categorical-with-baselines, as shown in the example below:

// This example downloads an example dataset from the web and learns a multinomial logistic 
// regression on it. However, please keep in mind that the Multinomial Logistic Regression 
// can also work without many of the elements that will be shown below, like the codebook, 
// DataTables, and a CsvReader. 

// Let's download an example dataset from the web to learn a multinomial logistic regression:
CsvReader reader = CsvReader.FromUrl("https://raw.githubusercontent.com/rlowrance/re/master/hsbdemo.csv", hasHeaders: true);

// Let's read the CSV into a DataTable. As mentioned above, this step
// can help, but is not necessarily required for learning a the model:
DataTable table = reader.ToTable();

// We will learn a MLR regression between the following input and output fields of this table:
string[] inputNames = new[] { "write", "ses" };
string[] outputNames = new[] { "prog" };

// Now let's create a codification codebook to convert the string fields in the data 
// into integer symbols. This is required because the MLR model can only learn from 
// numeric data, so strings have to be transformed first. We can force a particular
// interpretation for those columns if needed, as shown in the initializer below:
var codification = new Codification()
{
    { "write", CodificationVariable.Continuous },
    { "ses", CodificationVariable.CategoricalWithBaseline, new[] { "low", "middle", "high" } },
    { "prog", CodificationVariable.Categorical, new[] { "academic", "general" } },
};

// Learn the codification
codification.Learn(table);

// Now, transform symbols into a vector representation, growing the number of inputs:
double[][] x = codification.Transform(table, inputNames, out inputNames).ToDouble();
double[][] y = codification.Transform(table, outputNames, out outputNames).ToDouble();

// Create a new Multinomial Logistic Regression Analysis:
var analysis = new MultinomialLogisticRegressionAnalysis()
{
    InputNames = inputNames,
    OutputNames = outputNames,
};

// Learn the regression from the input and output pairs:
MultinomialLogisticRegression regression = analysis.Learn(x, y);

// Let's retrieve some information about what we just learned:
int coefficients = analysis.Coefficients.Count; // should be 9
int numberOfInputs = analysis.NumberOfInputs;   // should be 3
int numberOfOutputs = analysis.NumberOfOutputs; // should be 3

inputNames = analysis.InputNames; // should be "write", "ses: middle", "ses: high"
outputNames = analysis.OutputNames; // should be "prog: academic", "prog: general", "prog: vocation"

// The regression is best visualized when it is data-bound to a 
// Windows.Forms DataGridView or WPF DataGrid. You can get the
// values for all different coefficients and discrete values:

// DataGridBox.Show(regression.Coefficients); // uncomment this line

// You can get the matrix of coefficients:
double[][] coef = analysis.CoefficientValues;

// Should be equal to:
double[][] expectedCoef = new double[][]
{
    new double[] { 2.85217775752471, -0.0579282723520426, -0.533293368378012, -1.16283850605289 },
    new double[] { 5.21813357698422, -0.113601186660817, 0.291387041358367, -0.9826369387481 }
};

// And their associated standard errors:
double[][] stdErr = analysis.StandardErrors;

// Should be equal to:
double[][] expectedErr = new double[][]
{
    new double[] { -2.02458003380033, -0.339533576505471, -1.164084923948, -0.520961533343425, 0.0556314901718 },
    new double[] { -3.73971589217449, -1.47672790071382, -1.76795568348094, -0.495032307980058, 0.113563519656386 }
};

// We can also get statistics and hypothesis tests:
WaldTest[][] wald = analysis.WaldTests;        // should all have p < 0.05
ChiSquareTest chiSquare = analysis.ChiSquare;  // should be p=1.06300120956871E-08
double logLikelihood = analysis.LogLikelihood; // should be -179.98173272217591

// You can use the regression to predict the values:
int[] pred = regression.Transform(x);

// And get the accuracy of the prediction if needed:
var cm = GeneralConfusionMatrix.Estimate(regression, x, y.ArgMax(dimension: 1));

double acc = cm.Accuracy; // should be 0.61
double kappa = cm.Kappa;  // should be 0.2993487536492252

Another examples of an advanced scenario where the source dataset contains both symbolic and discrete/continuous variables are shown below:

// Let's say we would like predict a continuous number from a set 
// of discrete and continuous input variables. For this, we will 
// be using the Servo dataset from UCI's Machine Learning repository 
// as an example: http://archive.ics.uci.edu/ml/datasets/Servo

// Create a Servo dataset
Servo servo = new Servo();
object[][] instances = servo.Instances; // 167 x 4 
double[] outputs = servo.Output;        // 167 x 1

// This dataset contains 4 columns, where the first two are 
// symbolic (having possible values A, B, C, D, E), and the
// last two are continuous.

// We will use a codification filter to transform the symbolic 
// variables into one-hot vectors, while keeping the other two
// continuous variables intact:
var codebook = new Codification<object>()
{
    { "motor", CodificationVariable.Categorical },
    { "screw", CodificationVariable.Categorical },
    { "pgain", CodificationVariable.Continuous },
    { "vgain", CodificationVariable.Continuous },
};

// Learn the codebook
codebook.Learn(instances);

// We can gather some info about the problem:
int numberOfInputs = codebook.NumberOfInputs;   // should be 4 (since there are 4 variables)
int numberOfOutputs = codebook.NumberOfOutputs; // should be 12 (due their one-hot encodings)

// Now we can use it to obtain double[] vectors:
double[][] inputs = codebook.ToDouble().Transform(instances);

// We will use Ordinary Least Squares to create a
// linear regression model with an intercept term
var ols = new OrdinaryLeastSquares()
{
    UseIntercept = true
};

// Use Ordinary Least Squares to estimate a regression model:
MultipleLinearRegression regression = ols.Learn(inputs, outputs);

// We can compute the predicted points using:
double[] predicted = regression.Transform(inputs);

// And the squared error using the SquareLoss class:
double error = new SquareLoss(outputs).Loss(predicted);

// We can also compute other measures, such as the coefficient of determination r² using:
double r2 = new RSquaredLoss(numberOfOutputs, outputs).Loss(predicted); // should be 0.55086630162967354

// Or the adjusted or weighted versions of r² using:
var r2loss = new RSquaredLoss(numberOfOutputs, outputs)
{
    Adjust = true,        
    // Weights = weights; // (uncomment if you have a weighted problem)
};

double ar2 = r2loss.Loss(predicted); // should be 0.51586887058782993

// Alternatively, we can also use the less generic, but maybe more user-friendly method directly:
double ur2 = regression.CoefficientOfDetermination(inputs, outputs, adjust: true); // should be 0.51586887058782993
// Note: this example uses a System.Data.DataTable to represent input data,
// but note that this is not required. The data could have been represented
// as jagged double matrices (double[][]) directly.

// If you have to handle heterogeneus data in your application, such as user records
// in a database, this data is best represented within the framework using a .NET's 
// DataTable object. In order to try to learn a classification or regression model
// using this datatable, first we will need to convert the table into a representation
// that the machine learning model can understand. Such representation is quite often,
// a matrix of doubles (double[][]).
var data = new DataTable("Customer Revenue Example");

data.Columns.Add("Day", "CustomerId", "Time (hour)", "Weather", "Revenue");
data.Rows.Add("D1", 0, 8, "Sunny", 101.2);
data.Rows.Add("D2", 1, 10, "Sunny", 24.1);
data.Rows.Add("D3", 2, 10, "Rain", 107);
data.Rows.Add("D4", 3, 16, "Rain", 223);
data.Rows.Add("D5", 4, 15, "Rain", 1);
data.Rows.Add("D6", 5, 20, "Rain", 42);
data.Rows.Add("D7", 6, 12, "Cloudy", 123);
data.Rows.Add("D8", 7, 12, "Sunny", 64);

// One way to perform this conversion is by using a Codification filter. The Codification
// filter can take care of converting variables that actually denote symbols (i.e. the 
// weather in the example above) into representations that make more sense given the assumption
// of a real vector-based classifier.

// Create a codification codebook
var codebook = new Codification()
{
    { "Weather", CodificationVariable.Categorical },
    { "Time (hour)", CodificationVariable.Continuous },
    { "Revenue", CodificationVariable.Continuous },
};

// Learn from the data
codebook.Learn(data);

// Now, we will use the codebook to transform the DataTable into double[][] vectors. Due
// the way the conversion works, we can end up with more columns in your output vectors
// than the ones started with. If you would like more details about what those columns
// represent, you can pass then as 'out' parameters in the methods that follow below.
string[] inputNames;  // (note: if you do not want to run this example yourself, you 
string outputName;    // can see below the new variable names that will be generated)

// Now, we can translate our training data into integer symbols using our codebook:
double[][] inputs = codebook.Apply(data, "Weather", "Time (hour)").ToJagged(out inputNames);
double[] outputs = codebook.Apply(data, "Revenue").ToVector(out outputName);
// (note: the Apply method transform a DataTable into another DataTable containing the codified 
//  variables. The ToJagged and ToVector methods are then used to transform those tables into
//  double[][] matrices and double[] vectors, respectively.

// If we would like to learn a linear regression model for this data, there are two possible
// ways depending on which aspect of the linear regression we are interested the most. If we
// are interested in interpreting the linear regression, performing hypothesis tests with the
// coefficients and performing an actual _linear regression analysis_, then we can use the
// MultipleLinearRegressionAnalysis class for this. If however we are only interested in using
// the learned model directly to predict new values for the dataset, then we could be using the
// MultipleLinearRegression and OrdinaryLeastSquares classes directly instead. 

// This example deals with the former case. For the later, please see the documentation page
// for the MultipleLinearRegression class.

// We can create a new multiple linear analysis for the variables
var mlra = new MultipleLinearRegressionAnalysis(intercept: true)
{
    // We can also inform the names of the new variables that have been created by the
    // codification filter. Those can help in the visualizing the analysis once it is 
    // data-bound to a visual control such a Windows.Forms.DataGridView or WPF DataGrid:

    Inputs = inputNames, // will be { "Weather: Sunny", "Weather: Rain, "Weather: Cloudy", "Time (hours)" }
    Output = outputName  // will be "Revenue"
};

// To overcome linear dependency errors
mlra.OrdinaryLeastSquares.IsRobust = true;

// Compute the analysis and obtain the estimated regression
MultipleLinearRegression regression = mlra.Learn(inputs, outputs);

// And then predict the label using
double predicted = mlra.Transform(inputs[0]); // result will be ~72.3

// Because we opted for doing a MultipleLinearRegressionAnalysis instead of a simple
// linear regression, we will have further information about the regression available:
int inputCount = mlra.NumberOfInputs;   // should be 4
int outputCount = mlra.NumberOfOutputs; // should be 1
double r2 = mlra.RSquared;              // should be 0.12801838425195311
AnovaSourceCollection a = mlra.Table;   // ANOVA table (bind to a visual control for quick inspection)
double[][] h = mlra.InformationMatrix;  // should contain Fisher's information matrix for the problem
ZTest z = mlra.ZTest;                   // should be 0 (p=0.999, non-significant)
See Also