Codification Class |
Namespace: Accord.Statistics.Filters
[SerializableAttribute] public class Codification : Codification<string>, IAutoConfigurableFilter, IFilter, ITransform<string[], double[]>, ICovariantTransform<string[], double[]>, ITransform
The Codification type exposes the following members.
Name | Description | |
---|---|---|
Codification |
Creates a new Codification Filter.
| |
Codification(DataTable) |
Creates a new Codification Filter.
| |
Codification(DataTable, String) |
Creates a new Codification Filter.
| |
Codification(String, String) |
Creates a new Codification Filter.
| |
Codification(String, String) |
Creates a new Codification Filter.
| |
Codification(String, String) |
Creates a new Codification Filter.
|
Name | Description | |
---|---|---|
Active |
Gets or sets whether this filter is active. An inactive
filter will repass the input table as output unchanged.
(Inherited from BaseFilterTOptions, TFilter.) | |
Columns |
Gets the collection of filter options.
(Inherited from BaseFilterTOptions, TFilter.) | |
DefaultMissingValueReplacement |
Gets or sets the default value to be used as a replacement for missing values.
Default is to use System.DBNull.Value.
(Inherited from CodificationT.) | |
ItemInt32 |
Gets options associated with a given variable (data column).
(Inherited from BaseFilterTOptions, TFilter.) | |
ItemString |
Gets options associated with a given variable (data column).
(Inherited from BaseFilterTOptions, TFilter.) | |
NumberOfInputs |
Gets the number of inputs accepted by the model.
(Inherited from BaseFilterTOptions, TFilter.) | |
NumberOfOutputs |
Gets the number of outputs generated by the model.
(Inherited from CodificationT.) | |
Token |
Gets or sets a cancellation token that can be used to
stop the learning algorithm while it is running.
(Inherited from BaseFilterTOptions, TFilter.) |
Name | Description | |
---|---|---|
Add(TOptions) |
Add a new column options definition to the collection.
(Inherited from BaseFilterTOptions, TFilter.) | |
Add(CodificationVariable) |
Adds a new column options to this filter's collection,
specifying how a particular column should be processed by the filter..
(Inherited from CodificationT.) | |
Add(String, CodificationVariable) |
Adds a new column options to this filter's collection,
specifying how a particular column should be processed by the filter..
(Inherited from CodificationT.) | |
Add(String, CodificationVariable, T) |
Adds a new column options to this filter's collection,
specifying how a particular column should be processed by the filter..
(Inherited from CodificationT.) | |
Add(String, CodificationVariable, T) |
Adds a new column options to this filter's collection,
specifying how a particular column should be processed by the filter..
(Inherited from CodificationT.) | |
Apply(DataTable) |
Applies the Filter to a DataTable.
(Inherited from BaseFilterTOptions, TFilter.) | |
Apply(DataTable, String) |
Applies the Filter to a DataTable.
(Inherited from BaseFilterTOptions, TFilter.) | |
Detect(DataTable) |
Auto detects the filter options by analyzing a given DataTable.
| |
Detect(DataTable, String) |
Auto detects the filter options by analyzing a given DataTable.
| |
Detect(String, String) |
Auto detects the filter options by analyzing a set of string labels.
| |
Detect(String, String) |
Auto detects the filter options by analyzing a set of string labels.
| |
Detect(String, String) |
Auto detects the filter options by analyzing a set of string labels.
| |
Equals | Determines whether the specified object is equal to the current object. (Inherited from Object.) | |
Finalize | Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object.) | |
GetEnumerator |
Returns an enumerator that iterates through the collection.
(Inherited from BaseFilterTOptions, TFilter.) | |
GetHashCode | Serves as the default hash function. (Inherited from Object.) | |
GetType | Gets the Type of the current instance. (Inherited from Object.) | |
Learn(DataTable, Double) |
Learns a model that can map the given inputs to the desired outputs.
(Inherited from CodificationT.) | |
Learn(T, Double) |
Learns a model that can map the given inputs to the desired outputs.
(Inherited from CodificationT.) | |
Learn(T, Double) |
Learns a model that can map the given inputs to the desired outputs.
(Inherited from CodificationT.) | |
MemberwiseClone | Creates a shallow copy of the current Object. (Inherited from Object.) | |
OnAddingOptions |
Called when a new column options definition is being added.
Can be used to validate or modify these options beforehand.
(Inherited from CodificationT.) | |
ProcessFilter |
Processes the current filter.
(Inherited from CodificationT.) | |
Revert(Int32) |
Translates an integer (codeword) representation of
the value of a given variable into its original
value.
(Inherited from CodificationT.) | |
Revert(String, Int32) |
Translates an integer (codeword) representation of
the value of a given variable into its original
value.
(Inherited from CodificationT.) | |
Revert(String, Int32) |
Translates an integer (codeword) representation of
the value of a given variable into its original
value.
(Inherited from CodificationT.) | |
Revert(String, Int32) |
Translates the integer (codeword) representations of
the values of the given variables into their original
values.
(Inherited from CodificationT.) | |
ToDouble |
Converts this instance into a transform that can generate double[].
(Inherited from CodificationT.) | |
ToString | Returns a string that represents the current object. (Inherited from Object.) | |
Transform(String) |
Transforms a matrix of key-value pairs (where the first column
denotes a key, and the second column a value) into their integer
vector representation.
| |
Transform(T) |
Translates an array of values into their
integer representation, assuming values
are given in original order of columns.
(Inherited from CodificationT.) | |
Transform(T) |
Translates a value of the given variables
into their integer (codeword) representation.
(Inherited from CodificationT.) | |
Transform(DataRow, String) |
Translates an array of values into their
integer representation, assuming values
are given in original order of columns.
(Inherited from CodificationT.) | |
Transform(DataTable, String) |
Translates an array of values into their
integer representation, assuming values
are given in original order of columns.
(Inherited from CodificationT.) | |
Transform(String, T) |
Translates a value of a given variable
into its integer (codeword) representation.
(Inherited from CodificationT.) | |
Transform(String, T) |
Translates a value of the given variables
into their integer (codeword) representation.
(Inherited from CodificationT.) | |
Transform(String, T) |
Translates a value of the given variables
into their integer (codeword) representation.
(Inherited from CodificationT.) | |
Transform(T, Double) |
Applies the transformation to a set of input vectors,
producing an associated set of output vectors.
(Inherited from CodificationT.) | |
Transform(T, Int32) |
Translates a value of the given variables
into their integer (codeword) representation.
(Inherited from CodificationT.) | |
Transform(DataRow, String, String) |
Translates an array of values into their
integer representation, assuming values
are given in original order of columns.
(Inherited from CodificationT.) | |
Transform(DataTable, String, String) |
Translates an array of values into their
integer representation, assuming values
are given in original order of columns.
(Inherited from CodificationT.) | |
Translate(String) | Obsolete.
Translates an array of values into their
integer representation, assuming values
are given in original order of columns.
| |
Translate(DataRow, String) | Obsolete.
Translates an array of values into their
integer representation, assuming values
are given in original order of columns.
| |
Translate(String, Int32) | Obsolete.
Translates an integer (codeword) representation of
the value of a given variable into its original
value.
| |
Translate(String, Int32) | Obsolete.
Translates an integer (codeword) representation of
the value of a given variable into its original
value.
| |
Translate(String, String) | Obsolete.
Translates a value of a given variable
into its integer (codeword) representation.
| |
Translate(String, String) | Obsolete.
Translates a value of the given variables
into their integer (codeword) representation.
| |
Translate(String, String) | Obsolete.
Translates a value of the given variables
into their integer (codeword) representation.
| |
Translate(String, Int32) | Obsolete.
Translates the integer (codeword) representations of
the values of the given variables into their original
values.
| |
Translate(String, String) | Obsolete.
Translates a value of the given variables
into their integer (codeword) representation.
|
Name | Description | |
---|---|---|
HasMethod |
Checks whether an object implements a method with the given name.
(Defined by ExtensionMethods.) | |
IsEqual |
Compares two objects for equality, performing an elementwise
comparison if the elements are vectors or matrices.
(Defined by Matrix.) | |
To(Type) | Overloaded.
Converts an object into another type, irrespective of whether
the conversion can be done at compile time or not. This can be
used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.) | |
ToT | Overloaded.
Converts an object into another type, irrespective of whether
the conversion can be done at compile time or not. This can be
used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.) |
The codification filter performs an integer codification of classes in given in a string form. An unique integer identifier will be assigned for each of the string classes.
When handling data tables, often there will be cases in which a single table contains both numerical variables and categorical data in the form of text labels. Since most machine learning and statistics algorithms expect their data to be numeric, the codification filter can be used to create mappings between text labels and discrete symbols.
// Show the start data
DataGridBox.Show(table);
// Create a new data projection (column) filter var filter = new Codification(table, "Category"); // Apply the filter and get the result DataTable result = filter.Apply(table); // Show it DataGridBox.Show(result);
The following more elaborated examples show how to use the Codification filter without necessarily handling System.Data.DataTables.
// Suppose we have a data table relating the age of // a person and its categorical classification, as // in "child", "adult" or "elder". // The Codification filter is able to extract those // string labels and transform them into discrete // symbols, assigning integer labels to each of them // such as "child" = 0, "adult" = 1, and "elder" = 3. // Create the aforementioned sample table DataTable table = new DataTable("Sample data"); table.Columns.Add("Age", typeof(int)); table.Columns.Add("Label", typeof(string)); // age label table.Rows.Add(10, "child"); table.Rows.Add(07, "child"); table.Rows.Add(04, "child"); table.Rows.Add(21, "adult"); table.Rows.Add(27, "adult"); table.Rows.Add(12, "child"); table.Rows.Add(79, "elder"); table.Rows.Add(40, "adult"); table.Rows.Add(30, "adult"); // Now, let's say we need to translate those text labels // into integer symbols. Let's use a Codification filter: var codebook = new Codification(table); // After that, we can use the codebook to "translate" // the text labels into discrete symbols, such as: int a = codebook.Transform(columnName: "Label", value: "child"); // returns 0 int b = codebook.Transform(columnName: "Label", value: "adult"); // returns 1 int c = codebook.Transform(columnName: "Label", value: "elder"); // returns 2 // We can also do the reverse: string labela = codebook.Revert(columnName: "Label", codeword: 0); // returns "child" string labelb = codebook.Revert(columnName: "Label", codeword: 1); // returns "adult" string labelc = codebook.Revert(columnName: "Label", codeword: 2); // returns "elder"
After we have created the codebook, we can use it to feed data with categorical variables to method which would otherwise not know how to handle text labels data. Continuing with our example, the next code section shows how to convert an entire data table into a numerical matrix.
// We can also process an entire data table at once: DataTable result = codebook.Apply(table); // The resulting table can be transformed to jagged array: double[][] matrix = Matrix.ToArray(result); // and the resulting matrix will be given by string str = matrix.ToCSharp();
Finally, by expressing our data in terms of a simple numerical matrix we will be able to feed it to any machine learning algorithm. The following code section shows how to create a linear multi-class Support Vector Machine to classify ages into any of the previously considered text labels ("child", "adult" or "elder").
// Now we will be able to feed this matrix to any machine learning // algorithm without having to worry about text labels in our data: // Use the first column as input variables, // and the second column as outputs classes // double[][] inputs = matrix.GetColumns(0); int[] outputs = matrix.GetColumn(1).ToInt32(); // Create a Multi-class learning algorithm for the machine var teacher = new MulticlassSupportVectorLearning<Linear>() { Learner = (p) => new SequentialMinimalOptimization<Linear>() { Complexity = 1 } }; // Run the learning algorithm var svm = teacher.Learn(inputs, outputs); // Compute the classification error (should be 0) double error = new ZeroOneLoss(outputs).Loss(svm.Decide(inputs)); // After we have learned the machine, we can use it to classify // new data points, and use the codebook to translate the machine // outputs to the original text labels: string result1 = codebook.Revert("Label", svm.Decide(new double[] { 10 })); // child string result2 = codebook.Revert("Label", svm.Decide(new double[] { 40 })); // adult string result3 = codebook.Revert("Label", svm.Decide(new double[] { 70 })); // elder
Every Learn() method in the framework expects the class labels to be contiguous and zero-indexed, meaning that if there is a classification problem with n classes, all class labels must be numbers ranging from 0 to n-1. However, not every dataset might be in this format and sometimes we will have to pre-process the data to be in this format. The example below shows how to use the Codification class to perform such pre-processing.
// Let's say we have the following data to be classified // into three possible classes. Those are the samples: // double[][] inputs = { // input output new double[] { 0, 1, 1, 0 }, // 0 new double[] { 0, 1, 0, 0 }, // 0 new double[] { 0, 0, 1, 0 }, // 0 new double[] { 0, 1, 1, 0 }, // 0 new double[] { 0, 1, 0, 0 }, // 0 new double[] { 1, 0, 0, 0 }, // 1 new double[] { 1, 0, 0, 0 }, // 1 new double[] { 1, 0, 0, 1 }, // 1 new double[] { 0, 0, 0, 1 }, // 1 new double[] { 0, 0, 0, 1 }, // 1 new double[] { 1, 1, 1, 1 }, // 2 new double[] { 1, 0, 1, 1 }, // 2 new double[] { 1, 1, 0, 1 }, // 2 new double[] { 0, 1, 1, 1 }, // 2 new double[] { 1, 1, 1, 1 }, // 2 }; // Now, suppose that our class labels are not contiguous. We // have 3 classes, but they have the class labels 5, 1, and 8 // respectively. In this case, we can use a Codification filter // to obtain a contiguous zero-indexed labeling before learning int[] output_labels = { 5, 5, 5, 5, 5, 1, 1, 1, 1, 1, 8, 8, 8, 8, 8, }; // Create a codification object to obtain a output mapping var codebook = new Codification<int>().Learn(output_labels); // Transform the original labels using the codebook int[] outputs = codebook.Transform(output_labels); // Create the multi-class learning algorithm for the machine var teacher = new MulticlassSupportVectorLearning<Gaussian>() { // Configure the learning algorithm to use SMO to train the // underlying SVMs in each of the binary class subproblems. Learner = (param) => new SequentialMinimalOptimization<Gaussian>() { // Estimate a suitable guess for the Gaussian kernel's parameters. // This estimate can serve as a starting point for a grid search. UseKernelEstimation = true } }; // The following line is only needed to ensure reproducible results. Please remove it to enable full parallelization teacher.ParallelOptions.MaxDegreeOfParallelism = 1; // (Remove, comment, or change this line to enable full parallelism) // Learn a machine var machine = teacher.Learn(inputs, outputs); // Obtain class predictions for each sample int[] predicted = machine.Decide(inputs); // Translate the integers back to the original lagbels int[] predicted_labels = codebook.Revert(predicted);
The codification filter can also work with missing values. The example below shows how a codification codebook can be created from a dataset that includes missing values and how to use this codebook to replace missing values by some other representation (in the case below, replacing null by NaN double numbers.
// In this example, we will be using a modified version of the famous Play Tennis // example by Tom Mitchell (1998), where some values have been replaced by missing // values. We will use NaN double values to represent values missing from the data. // Note: this example uses DataTables to represent the input data, // but this is not required. The same could be performed using plain // double[][] matrices and vectors instead. DataTable data = new DataTable("Tennis Example with Missing Values"); data.Columns.Add("Day", typeof(string)); data.Columns.Add("Outlook", typeof(string)); data.Columns.Add("Temperature", typeof(string)); data.Columns.Add("Humidity", typeof(string)); data.Columns.Add("Wind", typeof(string)); data.Columns.Add("PlayTennis", typeof(string)); data.Rows.Add("D1", "Sunny", "Hot", "High", "Weak", "No"); data.Rows.Add("D2", null, "Hot", "High", "Strong", "No"); data.Rows.Add("D3", null, null, "High", null, "Yes"); data.Rows.Add("D4", "Rain", "Mild", "High", "Weak", "Yes"); data.Rows.Add("D5", "Rain", "Cool", null, "Weak", "Yes"); data.Rows.Add("D6", "Rain", "Cool", "Normal", "Strong", "No"); data.Rows.Add("D7", "Overcast", "Cool", "Normal", "Strong", "Yes"); data.Rows.Add("D8", null, "Mild", "High", null, "No"); data.Rows.Add("D9", null, "Cool", "Normal", "Weak", "Yes"); data.Rows.Add("D10", null, null, "Normal", null, "Yes"); data.Rows.Add("D11", null, "Mild", "Normal", null, "Yes"); data.Rows.Add("D12", "Overcast", "Mild", null, "Strong", "Yes"); data.Rows.Add("D13", "Overcast", "Hot", null, "Weak", "Yes"); data.Rows.Add("D14", "Rain", "Mild", "High", "Strong", "No"); // Create a new codification codebook to convert // the strings above into numeric, integer labels: var codebook = new Codification() { DefaultMissingValueReplacement = Double.NaN }; // Learn the codebook codebook.Learn(data); // Use the codebook to convert all the data DataTable symbols = codebook.Apply(data); // Grab the training input and output instances: string[] inputNames = new[] { "Outlook", "Temperature", "Humidity", "Wind" }; double[][] inputs = symbols.ToJagged(inputNames); int[] outputs = symbols.ToArray<int>("PlayTennis"); // Create a new learning algorithm var teacher = new C45Learning() { Attributes = DecisionVariable.FromCodebook(codebook, inputNames) }; // Use the learning algorithm to induce a new tree: DecisionTree tree = teacher.Learn(inputs, outputs); // To get the estimated class labels, we can use int[] predicted = tree.Decide(inputs); // The classification error (~0.214) can be computed as double error = new ZeroOneLoss(outputs).Loss(predicted); // Moreover, we may decide to convert our tree to a set of rules: DecisionSet rules = tree.ToRules(); // And using the codebook, we can inspect the tree reasoning: string ruleText = rules.ToString(codebook, "PlayTennis", System.Globalization.CultureInfo.InvariantCulture); // The output should be: string expected = @"No =: (Outlook == Sunny) No =: (Outlook == Rain) && (Wind == Strong) Yes =: (Outlook == Overcast) Yes =: (Outlook == Rain) && (Wind == Weak) ";
The codification can also support more advanced scenarios where it is necessary to use different categorical representations for different variables, such as one-hot-vectors and categorical-with-baselines, as shown in the example below:
// This example downloads an example dataset from the web and learns a multinomial logistic // regression on it. However, please keep in mind that the Multinomial Logistic Regression // can also work without many of the elements that will be shown below, like the codebook, // DataTables, and a CsvReader. // Let's download an example dataset from the web to learn a multinomial logistic regression: CsvReader reader = CsvReader.FromUrl("https://raw.githubusercontent.com/rlowrance/re/master/hsbdemo.csv", hasHeaders: true); // Let's read the CSV into a DataTable. As mentioned above, this step // can help, but is not necessarily required for learning a the model: DataTable table = reader.ToTable(); // We will learn a MLR regression between the following input and output fields of this table: string[] inputNames = new[] { "write", "ses" }; string[] outputNames = new[] { "prog" }; // Now let's create a codification codebook to convert the string fields in the data // into integer symbols. This is required because the MLR model can only learn from // numeric data, so strings have to be transformed first. We can force a particular // interpretation for those columns if needed, as shown in the initializer below: var codification = new Codification() { { "write", CodificationVariable.Continuous }, { "ses", CodificationVariable.CategoricalWithBaseline, new[] { "low", "middle", "high" } }, { "prog", CodificationVariable.Categorical, new[] { "academic", "general" } }, }; // Learn the codification codification.Learn(table); // Now, transform symbols into a vector representation, growing the number of inputs: double[][] x = codification.Transform(table, inputNames, out inputNames).ToDouble(); double[][] y = codification.Transform(table, outputNames, out outputNames).ToDouble(); // Create a new Multinomial Logistic Regression Analysis: var analysis = new MultinomialLogisticRegressionAnalysis() { InputNames = inputNames, OutputNames = outputNames, }; // Learn the regression from the input and output pairs: MultinomialLogisticRegression regression = analysis.Learn(x, y); // Let's retrieve some information about what we just learned: int coefficients = analysis.Coefficients.Count; // should be 9 int numberOfInputs = analysis.NumberOfInputs; // should be 3 int numberOfOutputs = analysis.NumberOfOutputs; // should be 3 inputNames = analysis.InputNames; // should be "write", "ses: middle", "ses: high" outputNames = analysis.OutputNames; // should be "prog: academic", "prog: general", "prog: vocation" // The regression is best visualized when it is data-bound to a // Windows.Forms DataGridView or WPF DataGrid. You can get the // values for all different coefficients and discrete values: // DataGridBox.Show(regression.Coefficients); // uncomment this line // You can get the matrix of coefficients: double[][] coef = analysis.CoefficientValues; // Should be equal to: double[][] expectedCoef = new double[][] { new double[] { 2.85217775752471, -0.0579282723520426, -0.533293368378012, -1.16283850605289 }, new double[] { 5.21813357698422, -0.113601186660817, 0.291387041358367, -0.9826369387481 } }; // And their associated standard errors: double[][] stdErr = analysis.StandardErrors; // Should be equal to: double[][] expectedErr = new double[][] { new double[] { -2.02458003380033, -0.339533576505471, -1.164084923948, -0.520961533343425, 0.0556314901718 }, new double[] { -3.73971589217449, -1.47672790071382, -1.76795568348094, -0.495032307980058, 0.113563519656386 } }; // We can also get statistics and hypothesis tests: WaldTest[][] wald = analysis.WaldTests; // should all have p < 0.05 ChiSquareTest chiSquare = analysis.ChiSquare; // should be p=1.06300120956871E-08 double logLikelihood = analysis.LogLikelihood; // should be -179.98173272217591 // You can use the regression to predict the values: int[] pred = regression.Transform(x); // And get the accuracy of the prediction if needed: var cm = GeneralConfusionMatrix.Estimate(regression, x, y.ArgMax(dimension: 1)); double acc = cm.Accuracy; // should be 0.61 double kappa = cm.Kappa; // should be 0.2993487536492252
Another examples of an advanced scenario where the source dataset contains both symbolic and discrete/continuous variables are shown below:
// Let's say we would like predict a continuous number from a set // of discrete and continuous input variables. For this, we will // be using the Servo dataset from UCI's Machine Learning repository // as an example: http://archive.ics.uci.edu/ml/datasets/Servo // Create a Servo dataset Servo servo = new Servo(); object[][] instances = servo.Instances; // 167 x 4 double[] outputs = servo.Output; // 167 x 1 // This dataset contains 4 columns, where the first two are // symbolic (having possible values A, B, C, D, E), and the // last two are continuous. // We will use a codification filter to transform the symbolic // variables into one-hot vectors, while keeping the other two // continuous variables intact: var codebook = new Codification<object>() { { "motor", CodificationVariable.Categorical }, { "screw", CodificationVariable.Categorical }, { "pgain", CodificationVariable.Continuous }, { "vgain", CodificationVariable.Continuous }, }; // Learn the codebook codebook.Learn(instances); // We can gather some info about the problem: int numberOfInputs = codebook.NumberOfInputs; // should be 4 (since there are 4 variables) int numberOfOutputs = codebook.NumberOfOutputs; // should be 12 (due their one-hot encodings) // Now we can use it to obtain double[] vectors: double[][] inputs = codebook.ToDouble().Transform(instances); // We will use Ordinary Least Squares to create a // linear regression model with an intercept term var ols = new OrdinaryLeastSquares() { UseIntercept = true }; // Use Ordinary Least Squares to estimate a regression model: MultipleLinearRegression regression = ols.Learn(inputs, outputs); // We can compute the predicted points using: double[] predicted = regression.Transform(inputs); // And the squared error using the SquareLoss class: double error = new SquareLoss(outputs).Loss(predicted); // We can also compute other measures, such as the coefficient of determination r² using: double r2 = new RSquaredLoss(numberOfOutputs, outputs).Loss(predicted); // should be 0.55086630162967354 // Or the adjusted or weighted versions of r² using: var r2loss = new RSquaredLoss(numberOfOutputs, outputs) { Adjust = true, // Weights = weights; // (uncomment if you have a weighted problem) }; double ar2 = r2loss.Loss(predicted); // should be 0.51586887058782993 // Alternatively, we can also use the less generic, but maybe more user-friendly method directly: double ur2 = regression.CoefficientOfDetermination(inputs, outputs, adjust: true); // should be 0.51586887058782993
// Note: this example uses a System.Data.DataTable to represent input data, // but note that this is not required. The data could have been represented // as jagged double matrices (double[][]) directly. // If you have to handle heterogeneus data in your application, such as user records // in a database, this data is best represented within the framework using a .NET's // DataTable object. In order to try to learn a classification or regression model // using this datatable, first we will need to convert the table into a representation // that the machine learning model can understand. Such representation is quite often, // a matrix of doubles (double[][]). var data = new DataTable("Customer Revenue Example"); data.Columns.Add("Day", "CustomerId", "Time (hour)", "Weather", "Revenue"); data.Rows.Add("D1", 0, 8, "Sunny", 101.2); data.Rows.Add("D2", 1, 10, "Sunny", 24.1); data.Rows.Add("D3", 2, 10, "Rain", 107); data.Rows.Add("D4", 3, 16, "Rain", 223); data.Rows.Add("D5", 4, 15, "Rain", 1); data.Rows.Add("D6", 5, 20, "Rain", 42); data.Rows.Add("D7", 6, 12, "Cloudy", 123); data.Rows.Add("D8", 7, 12, "Sunny", 64); // One way to perform this conversion is by using a Codification filter. The Codification // filter can take care of converting variables that actually denote symbols (i.e. the // weather in the example above) into representations that make more sense given the assumption // of a real vector-based classifier. // Create a codification codebook var codebook = new Codification() { { "Weather", CodificationVariable.Categorical }, { "Time (hour)", CodificationVariable.Continuous }, { "Revenue", CodificationVariable.Continuous }, }; // Learn from the data codebook.Learn(data); // Now, we will use the codebook to transform the DataTable into double[][] vectors. Due // the way the conversion works, we can end up with more columns in your output vectors // than the ones started with. If you would like more details about what those columns // represent, you can pass then as 'out' parameters in the methods that follow below. string[] inputNames; // (note: if you do not want to run this example yourself, you string outputName; // can see below the new variable names that will be generated) // Now, we can translate our training data into integer symbols using our codebook: double[][] inputs = codebook.Apply(data, "Weather", "Time (hour)").ToJagged(out inputNames); double[] outputs = codebook.Apply(data, "Revenue").ToVector(out outputName); // (note: the Apply method transform a DataTable into another DataTable containing the codified // variables. The ToJagged and ToVector methods are then used to transform those tables into // double[][] matrices and double[] vectors, respectively. // If we would like to learn a linear regression model for this data, there are two possible // ways depending on which aspect of the linear regression we are interested the most. If we // are interested in interpreting the linear regression, performing hypothesis tests with the // coefficients and performing an actual _linear regression analysis_, then we can use the // MultipleLinearRegressionAnalysis class for this. If however we are only interested in using // the learned model directly to predict new values for the dataset, then we could be using the // MultipleLinearRegression and OrdinaryLeastSquares classes directly instead. // This example deals with the former case. For the later, please see the documentation page // for the MultipleLinearRegression class. // We can create a new multiple linear analysis for the variables var mlra = new MultipleLinearRegressionAnalysis(intercept: true) { // We can also inform the names of the new variables that have been created by the // codification filter. Those can help in the visualizing the analysis once it is // data-bound to a visual control such a Windows.Forms.DataGridView or WPF DataGrid: Inputs = inputNames, // will be { "Weather: Sunny", "Weather: Rain, "Weather: Cloudy", "Time (hours)" } Output = outputName // will be "Revenue" }; // To overcome linear dependency errors mlra.OrdinaryLeastSquares.IsRobust = true; // Compute the analysis and obtain the estimated regression MultipleLinearRegression regression = mlra.Learn(inputs, outputs); // And then predict the label using double predicted = mlra.Transform(inputs[0]); // result will be ~72.3 // Because we opted for doing a MultipleLinearRegressionAnalysis instead of a simple // linear regression, we will have further information about the regression available: int inputCount = mlra.NumberOfInputs; // should be 4 int outputCount = mlra.NumberOfOutputs; // should be 1 double r2 = mlra.RSquared; // should be 0.12801838425195311 AnovaSourceCollection a = mlra.Table; // ANOVA table (bind to a visual control for quick inspection) double[][] h = mlra.InformationMatrix; // should contain Fisher's information matrix for the problem ZTest z = mlra.ZTest; // should be 0 (p=0.999, non-significant)