Codification(T) Class

SystemObject
  Accord.Statistics.FiltersBaseFilterCodificationTOptions, CodificationT
    Accord.Statistics.FiltersCodificationT
      Accord.Statistics.FiltersCodification

[SerializableAttribute]
public class Codification<T> : BaseFilter<CodificationTOptions, Codification<T>>, 
	ITransform<T[], double[]>, ICovariantTransform<T[], double[]>, 
	ITransform, IUnsupervisedLearning<Codification<T>, T[], double[]>, 
	ITransform<T[], int[]>, ICovariantTransform<T[], int[]>, IUnsupervisedLearning<Codification<T>, T[], int[]>

<SerializableAttribute>
Public Class Codification(Of T)
	Inherits BaseFilter(Of CodificationTOptions, Codification(Of T))
	Implements ITransform(Of T(), Double()), ICovariantTransform(Of T(), Double()), 
	ITransform, IUnsupervisedLearning(Of Codification(Of T), T(), Double()), 
	ITransform(Of T(), Integer()), ICovariantTransform(Of T(), Integer()), 
	IUnsupervisedLearning(Of Codification(Of T), T(), Integer())

Request Example View Source

Type Parameters

T: The object type that needs to be codified. Default is string.

	Name	Description
	CodificationT	Creates a new Codification Filter.
	CodificationT(DataTable)	Creates a new Codification Filter.
	CodificationT(DataTable, String)	Creates a new Codification Filter.
	CodificationT(String, T)	Creates a new Codification Filter.
	CodificationT(String, T)	Creates a new Codification Filter.
	CodificationT(String, T)	Creates a new Codification Filter.

Top

	Name	Description
	Active	Gets or sets whether this filter is active. An inactive filter will repass the input table as output unchanged. (Inherited from BaseFilterTOptions, TFilter.)
	Columns	Gets the collection of filter options. (Inherited from BaseFilterTOptions, TFilter.)
	DefaultMissingValueReplacement	Gets or sets the default value to be used as a replacement for missing values. Default is to use System.DBNull.Value.
	ItemInt32	Gets options associated with a given variable (data column). (Inherited from BaseFilterTOptions, TFilter.)
	ItemString	Gets options associated with a given variable (data column). (Inherited from BaseFilterTOptions, TFilter.)
	NumberOfInputs	Gets the number of inputs accepted by the model. (Inherited from BaseFilterTOptions, TFilter.)
	NumberOfOutputs	Gets the number of outputs generated by the model.
	Token	Gets or sets a cancellation token that can be used to stop the learning algorithm while it is running. (Inherited from BaseFilterTOptions, TFilter.)

Top

	Name	Description
	Add(TOptions)	Add a new column options definition to the collection. (Inherited from BaseFilterTOptions, TFilter.)
	Add(CodificationVariable)	Adds a new column options to this filter's collection, specifying how a particular column should be processed by the filter..
	Add(String, CodificationVariable)	Adds a new column options to this filter's collection, specifying how a particular column should be processed by the filter..
	Add(String, CodificationVariable, T)	Adds a new column options to this filter's collection, specifying how a particular column should be processed by the filter..
	Add(String, CodificationVariable, T)	Adds a new column options to this filter's collection, specifying how a particular column should be processed by the filter..
	Apply(DataTable)	Applies the Filter to a DataTable. (Inherited from BaseFilterTOptions, TFilter.)
	Apply(DataTable, String)	Applies the Filter to a DataTable. (Inherited from BaseFilterTOptions, TFilter.)
	Equals	Determines whether the specified object is equal to the current object. (Inherited from Object.)
	Finalize	Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object.)
	GetEnumerator	Returns an enumerator that iterates through the collection. (Inherited from BaseFilterTOptions, TFilter.)
	GetHashCode	Serves as the default hash function. (Inherited from Object.)
	GetType	Gets the Type of the current instance. (Inherited from Object.)
	Learn(DataTable, Double)	Learns a model that can map the given inputs to the desired outputs.
	Learn(T, Double)	Learns a model that can map the given inputs to the desired outputs.
	Learn(T, Double)	Learns a model that can map the given inputs to the desired outputs.
	MemberwiseClone	Creates a shallow copy of the current Object. (Inherited from Object.)
	OnAddingOptions	Called when a new column options definition is being added. Can be used to validate or modify these options beforehand. (Overrides BaseFilterTOptions, TFilterOnAddingOptions(TOptions).)
	ProcessFilter	Processes the current filter. (Overrides BaseFilterTOptions, TFilterProcessFilter(DataTable).)
	Revert(Int32)	Translates an integer (codeword) representation of the value of a given variable into its original value.
	Revert(String, Int32)	Translates an integer (codeword) representation of the value of a given variable into its original value.
	Revert(String, Int32)	Translates an integer (codeword) representation of the value of a given variable into its original value.
	Revert(String, Int32)	Translates the integer (codeword) representations of the values of the given variables into their original values.
	ToDouble	Converts this instance into a transform that can generate double[].
	ToString	Returns a string that represents the current object. (Inherited from Object.)
	Transform(T)	Translates an array of values into their integer representation, assuming values are given in original order of columns.
	Transform(T)	Translates a value of the given variables into their integer (codeword) representation.
	Transform(DataRow, String)	Translates an array of values into their integer representation, assuming values are given in original order of columns.
	Transform(DataTable, String)	Translates an array of values into their integer representation, assuming values are given in original order of columns.
	Transform(String, T)	Translates a value of a given variable into its integer (codeword) representation.
	Transform(String, T)	Translates a value of the given variables into their integer (codeword) representation.
	Transform(String, T)	Translates a value of the given variables into their integer (codeword) representation.
	Transform(String, T)	Translates a value of the given variables into their integer (codeword) representation.
	Transform(T, Double)	Applies the transformation to a set of input vectors, producing an associated set of output vectors.
	Transform(T, Int32)	Translates a value of the given variables into their integer (codeword) representation.
	Transform(DataRow, String, String)	Translates an array of values into their integer representation, assuming values are given in original order of columns.
	Transform(DataTable, String, String)	Translates an array of values into their integer representation, assuming values are given in original order of columns.

Top

	Name	Description
	HasMethod	Checks whether an object implements a method with the given name. (Defined by ExtensionMethods.)
	IsEqual	Compares two objects for equality, performing an elementwise comparison if the elements are vectors or matrices. (Defined by Matrix.)
	To(Type)	Overloaded. Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime. (Defined by ExtensionMethods.)
	ToT	Overloaded. Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime. (Defined by ExtensionMethods.)

Top

The codification filter performs an integer codification of classes in given in a string form. An unique integer identifier will be assigned for each of the string classes.

Every Learn() method in the framework expects the class labels to be contiguous and zero-indexed, meaning that if there is a classification problem with n classes, all class labels must be numbers ranging from 0 to n-1. However, not every dataset might be in this format and sometimes we will have to pre-process the data to be in this format. The example below shows how to use the Codification class to perform such pre-processing.

Copy

// Let's say we have the following data to be classified
// into three possible classes. Those are the samples:
// 
double[][] inputs =
{
    //               input         output
    new double[] { 0, 1, 1, 0 }, //  0 
    new double[] { 0, 1, 0, 0 }, //  0
    new double[] { 0, 0, 1, 0 }, //  0
    new double[] { 0, 1, 1, 0 }, //  0
    new double[] { 0, 1, 0, 0 }, //  0
    new double[] { 1, 0, 0, 0 }, //  1
    new double[] { 1, 0, 0, 0 }, //  1
    new double[] { 1, 0, 0, 1 }, //  1
    new double[] { 0, 0, 0, 1 }, //  1
    new double[] { 0, 0, 0, 1 }, //  1
    new double[] { 1, 1, 1, 1 }, //  2
    new double[] { 1, 0, 1, 1 }, //  2
    new double[] { 1, 1, 0, 1 }, //  2
    new double[] { 0, 1, 1, 1 }, //  2
    new double[] { 1, 1, 1, 1 }, //  2
};

// Now, suppose that our class labels are not contiguous. We
// have 3 classes, but they have the class labels 5, 1, and 8
// respectively. In this case, we can use a Codification filter
// to obtain a contiguous zero-indexed labeling before learning
int[] output_labels =
{
    5, 5, 5, 5, 5,
    1, 1, 1, 1, 1,
    8, 8, 8, 8, 8,
};

// Create a codification object to obtain a output mapping
var codebook = new Codification<int>().Learn(output_labels);

// Transform the original labels using the codebook
int[] outputs = codebook.Transform(output_labels);

// Create the multi-class learning algorithm for the machine
var teacher = new MulticlassSupportVectorLearning<Gaussian>()
{
    // Configure the learning algorithm to use SMO to train the
    //  underlying SVMs in each of the binary class subproblems.
    Learner = (param) => new SequentialMinimalOptimization<Gaussian>()
    {
        // Estimate a suitable guess for the Gaussian kernel's parameters.
        // This estimate can serve as a starting point for a grid search.
        UseKernelEstimation = true
    }
};

// The following line is only needed to ensure reproducible results. Please remove it to enable full parallelization
teacher.ParallelOptions.MaxDegreeOfParallelism = 1; // (Remove, comment, or change this line to enable full parallelism)

// Learn a machine
var machine = teacher.Learn(inputs, outputs);

// Obtain class predictions for each sample
int[] predicted = machine.Decide(inputs);

// Translate the integers back to the original lagbels
int[] predicted_labels = codebook.Revert(predicted);

Most classifiers in the framework also expect the input data to be of the same nature, i.e. continuous. The codification filter can also be used to convert discrete, categorical, ordinal and baseline categorical variables into continuous vectors that can be fed to other machine learning algorithms, such as K-Means:.

Copy

Accord.Math.Random.Generator.Seed = 0;

// Declare some mixed discrete and continuous observations
double[][] observations =
{
    //             (categorical) (discrete) (continuous)
    new double[] {       1,          -1,        -2.2      },
    new double[] {       1,          -6,        -5.5      },
    new double[] {       2,           1,         1.1      },
    new double[] {       2,           2,         1.2      },
    new double[] {       2,           2,         2.6      },
    new double[] {       3,           2,         1.4      },
    new double[] {       3,           4,         5.2      },
    new double[] {       1,           6,         5.1      },
    new double[] {       1,           6,         5.9      },
};

// Create a new codification algorithm to convert 
// the mixed variables above into all continuous:
var codification = new Codification<double>()
{
    CodificationVariable.Categorical,
    CodificationVariable.Discrete,
    CodificationVariable.Continuous
};

// Learn the codification from observations
var model = codification.Learn(observations);

// Transform the mixed observations into only continuous:
double[][] newObservations = model.ToDouble().Transform(observations);

// (newObservations will be equivalent to)
double[][] expected =
{
    //               (one hot)    (discrete)    (continuous)
    new double[] {    1, 0, 0,        -1,          -2.2      },
    new double[] {    1, 0, 0,        -6,          -5.5      },
    new double[] {    0, 1, 0,         1,           1.1      },
    new double[] {    0, 1, 0,         2,           1.2      },
    new double[] {    0, 1, 0,         2,           2.6      },
    new double[] {    0, 0, 1,         2,           1.4      },
    new double[] {    0, 0, 1,         4,           5.2      },
    new double[] {    1, 0, 0,         6,           5.1      },
    new double[] {    1, 0, 0,         6,           5.9      },
};

// Create a new K-Means algorithm
KMeans kmeans = new KMeans(k: 3);

// Compute and retrieve the data centroids
var clusters = kmeans.Learn(observations);

// Use the centroids to parition all the data
int[] labels = clusters.Decide(observations);

For more examples, please see the documentation page for the non-generic Codification filter.

Reference

Accord.Statistics.Filters Namespace

Accord.Statistics.FiltersCodification

Accord.Statistics.FiltersDiscretizationTInput, TOutput

CodificationAddLanguageSpecificTextSet("LST95327139_0?cs=&lt;|vb=(Of |cpp=&lt;|fs=&lt;'|nu=(");TAddLanguageSpecificTextSet("LST95327139_1?cs=&gt;|vb=)|cpp=&gt;|fs=&gt;|nu=)"); Class

Type Parameters

Reference

CodificationT Class