DecisionTree Class

SystemObject
  Accord.MachineLearningTransformBaseDouble, Int32
    Accord.MachineLearningClassifierBaseDouble, Int32
      Accord.MachineLearningMulticlassClassifierBaseDouble
        Accord.MachineLearningMulticlassClassifierBase
          Accord.MachineLearning.DecisionTreesDecisionTree

[SerializableAttribute]
public class DecisionTree : MulticlassClassifierBase, 
	IEnumerable

<SerializableAttribute>
Public Class DecisionTree
	Inherits MulticlassClassifierBase
	Implements IEnumerable

Request Example View Source

	Name	Description
	DecisionTree	Creates a new DecisionTree to process the given inputs and the given number of possible classes.

Top

	Name	Description
	Attributes	Gets the collection of attributes processed by this tree.
	InputCount	Obsolete. Deprecated. Please use the NumberOfInputs property.
	NumberOfClasses	Gets the number of classes expected and recognized by the classifier. (Inherited from ClassifierBaseTInput, TClasses.)
	NumberOfInputs	Gets the number of inputs accepted by the model. (Inherited from TransformBaseTInput, TOutput.)
	NumberOfOutputs	Gets the number of outputs generated by the model. (Inherited from TransformBaseTInput, TOutput.)
	OutputClasses	Obsolete. Deprecated. Please use the NumberOfOutputs property instead.
	Root	Gets or sets the root node for this tree.

Top

	Name	Description
	Compute(Double)	Obsolete. Deprecated. Please use the Decide() method instead.
	Compute(Int32)	Obsolete. Deprecated. Please use the Decide() method instead.
	Compute(Int32)	Obsolete. Deprecated. Please use the Decide() method instead.
	Compute(Double, DecisionNode)	Obsolete. Deprecated. Please use the Decide() method instead.
	Decide(TInput)	Computes class-label decisions for a given set of input vectors. (Inherited from ClassifierBaseTInput, TClasses.)
	Decide(Int32)	Computes a class-label decision for a given input. (Inherited from MulticlassClassifierBase.)
	Decide(Single)	Computes a class-label decision for a given input. (Inherited from MulticlassClassifierBase.)
	Decide(Single)	Computes a class-label decision for a given input. (Inherited from MulticlassClassifierBase.)
	Decide(Double)	Computes the tree decision for a given input. (Overrides ClassifierBaseTInput, TClassesDecide(TInput).)
	Decide(Int32)	Computes the tree decision for a given input. (Overrides MulticlassClassifierBaseDecide(Int32).)
	Decide(NullableInt32)	Computes the tree decision for a given input.
	Decide(NullableInt32)	Computes the tree decision for a given input.
	Decide(TInput, TClasses)	Computes a class-label decision for a given input. (Inherited from ClassifierBaseTInput, TClasses.)
	Decide(Int32, Boolean)	Computes class-label decisions for the given input. (Inherited from MulticlassClassifierBase.)
	Decide(Int32, Double)	Computes class-label decisions for the given input. (Inherited from MulticlassClassifierBase.)
	Decide(Int32, Int32)	Computes class-label decisions for the given input. (Inherited from MulticlassClassifierBase.)
	Decide(Int32, Boolean)	Computes a class-label decision for a given input. (Inherited from MulticlassClassifierBase.)
	Decide(Int32, Double)	Computes a class-label decision for a given input. (Inherited from MulticlassClassifierBase.)
	Decide(Int32, Double)	Computes a class-label decision for a given input. (Inherited from MulticlassClassifierBase.)
	Decide(Int32, Int32)	Computes a class-label decision for a given input. (Inherited from MulticlassClassifierBase.)
	Decide(Int32, Int32)	Computes a class-label decision for a given input. (Inherited from MulticlassClassifierBase.)
	Decide(Single, Boolean)	Computes class-label decisions for the given input. (Inherited from MulticlassClassifierBase.)
	Decide(Single, Double)	Computes class-label decisions for the given input. (Inherited from MulticlassClassifierBase.)
	Decide(Single, Int32)	Computes class-label decisions for the given input. (Inherited from MulticlassClassifierBase.)
	Decide(Single, Boolean)	Computes a class-label decision for a given input. (Inherited from MulticlassClassifierBase.)
	Decide(Single, Double)	Computes a class-label decision for a given input. (Inherited from MulticlassClassifierBase.)
	Decide(Single, Double)	Computes a class-label decision for a given input. (Inherited from MulticlassClassifierBase.)
	Decide(Single, Int32)	Computes a class-label decision for a given input. (Inherited from MulticlassClassifierBase.)
	Decide(Single, Int32)	Computes a class-label decision for a given input. (Inherited from MulticlassClassifierBase.)
	Decide(TInput, Boolean)	Computes class-label decisions for the given input. (Inherited from MulticlassClassifierBaseTInput.)
	Decide(TInput, Double)	Computes class-label decisions for the given input. (Inherited from MulticlassClassifierBaseTInput.)
	Decide(TInput, Int32)	Computes class-label decisions for the given input. (Inherited from MulticlassClassifierBaseTInput.)
	Decide(TInput, Double)	Computes a class-label decision for a given input. (Inherited from MulticlassClassifierBaseTInput.)
	Decide(Double, DecisionNode)	Computes the tree decision for a given input.
	Decide(NullableInt32, DecisionNode)	Computes the tree decision for a given input.
	Decide(NullableInt32, Int32)	Computes the tree decision for a given input.
	Equals	Determines whether the specified object is equal to the current object. (Inherited from Object.)
	Finalize	Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object.)
	GetEnumerator	Returns an enumerator that iterates through the tree.
	GetHashCode	Serves as the default hash function. (Inherited from Object.)
	GetHeight	Computes the height of the tree, defined as the greatest distance (in links) between the tree's root node and its leaves.
	GetType	Gets the Type of the current instance. (Inherited from Object.)
	Load(Stream)	Obsolete. Obsolete. Please use LoadT(Stream, SerializerCompression).
	Load(String)	Obsolete. Obsolete. Please use LoadT(String).
	MemberwiseClone	Creates a shallow copy of the current Object. (Inherited from Object.)
	Save(Stream)	Obsolete. Obsolete. Please use SaveT(T, Stream, SerializerCompression) (or use it as an extension method).
	Save(String)	Obsolete. Obsolete. Please use SaveT(T, String) (or use it as an extension method).
	ToAssembly(String, String)	Creates a .NET assembly (.dll) containing a static class of the given name implementing the decision tree. The class will contain a single static Compute method implementing the tree.
	ToAssembly(String, String, String)	Creates a .NET assembly (.dll) containing a static class of the given name implementing the decision tree. The class will contain a single static Compute method implementing the tree.
	ToCode(String)	Generates a C# class implementing the decision tree.
	ToCode(TextWriter, String)	Generates a C# class implementing the decision tree.
	ToExpression	Creates an Expression Tree representation of this decision tree, which can in turn be compiled into code.
	ToMultilabel	Views this instance as a multi-label classifier, giving access to more advanced methods, such as the prediction of one-hot vectors. (Inherited from MulticlassClassifierBaseTInput.)
	ToRules	Transforms the tree into a set of decision rules.
	ToString	Returns a string that represents the current object. (Inherited from Object.)
	Transform(TInput)	Applies the transformation to an input, producing an associated output. (Inherited from ClassifierBaseTInput, TClasses.)
	Transform(Int32)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBase.)
	Transform(Single)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBase.)
	Transform(Single)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBase.)
	Transform(TInput)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from TransformBaseTInput, TOutput.)
	Transform(TInput, TClasses)	Applies the transformation to an input, producing an associated output. (Inherited from ClassifierBaseTInput, TClasses.)
	Transform(Int32, Double)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBase.)
	Transform(Int32, Int32)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBase.)
	Transform(Int32, Boolean)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBase.)
	Transform(Int32, Double)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBase.)
	Transform(Int32, Double)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBase.)
	Transform(Int32, Int32)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBase.)
	Transform(Int32, Int32)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBase.)
	Transform(Single, Boolean)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBase.)
	Transform(Single, Double)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBase.)
	Transform(Single, Int32)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBase.)
	Transform(Single, Int32)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBase.)
	Transform(Single, Boolean)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBase.)
	Transform(Single, Double)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBase.)
	Transform(Single, Double)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBase.)
	Transform(Single, Int32)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBase.)
	Transform(Single, Int32)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBase.)
	Transform(TInput, Boolean)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBaseTInput.)
	Transform(TInput, Double)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBaseTInput.)
	Transform(TInput, Int32)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBaseTInput.)
	Transform(TInput, Boolean)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBaseTInput.)
	Transform(TInput, Double)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBaseTInput.)
	Transform(TInput, Double)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBaseTInput.)
	Transform(TInput, Int32)	Applies the transformation to an input, producing an associated output. (Inherited from MulticlassClassifierBaseTInput.)
	Traverse(DecisionTreeTraversalMethod)	Traverse the tree using a tree traversal method. Can be iterated with a foreach loop.
	Traverse(DecisionTreeTraversalMethod, DecisionNode)	Traverse a subtree using a tree traversal method. Can be iterated with a foreach loop.

Top

	Name	Description
	HasMethod	Checks whether an object implements a method with the given name. (Defined by ExtensionMethods.)
	IsEqual	Compares two objects for equality, performing an elementwise comparison if the elements are vectors or matrices. (Defined by Matrix.)
	SetEqualsDecisionNode	Compares two enumerables for set equality. Two enumerables are set equal if they contain the same elements, but not necessarily in the same order. (Defined by Matrix.)
	To(Type)	Overloaded. Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime. (Defined by ExtensionMethods.)
	ToT	Overloaded. Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime. (Defined by ExtensionMethods.)

Top

Represents a decision tree which can be compiled to code at run-time. For sample usage and example of learning, please see the documentation pages for the ID3 and C4.5 learning algorithms.

It is also possible to create random forests using the random forest learning algorithm.

This example shows the simplest way to induce a decision tree with discrete variables.

Copy

// In this example, we will learn a decision tree directly from integer
// matrices that define the inputs and outputs of our learning problem.

int[][] inputs =
{
    new int[] { 0, 0 },
    new int[] { 0, 1 },
    new int[] { 1, 0 },
    new int[] { 1, 1 },
};

int[] outputs = // xor between inputs[0] and inputs[1]
{
    0, 1, 1, 0
};

// Create an ID3 learning algorithm
ID3Learning teacher = new ID3Learning();

// Learn a decision tree for the XOR problem
var tree = teacher.Learn(inputs, outputs);

// Compute the error in the learning
double error = new ZeroOneLoss(outputs).Loss(tree.Decide(inputs));

// The tree can now be queried for new examples:
int[] predicted = tree.Decide(inputs); // should be { 0, 1, 1, 0 }

This example shows a common textbook example, and how to induce a decision tree using a codebook to convert string (text) variables into discrete symbols.

Copy

// In this example, we will be using the famous Play Tennis example by Tom Mitchell (1998).
// In Mitchell's example, one would like to infer if a person would play tennis or not
// based solely on four input variables. Those variables are all categorical, meaning that
// there is no order between the possible values for the variable (i.e. there is no order
// relationship between Sunny and Rain, one is not bigger nor smaller than the other, but are 
// just distinct). Moreover, the rows, or instances presented above represent days on which the
// behavior of the person has been registered and annotated, pretty much building our set of 
// observation instances for learning:

// Note: this example uses DataTables to represent the input data , but this is not required.
DataTable data = new DataTable("Mitchell's Tennis Example");

data.Columns.Add("Day", "Outlook", "Temperature", "Humidity", "Wind", "PlayTennis");
data.Rows.Add("D1", "Sunny", "Hot", "High", "Weak", "No");
data.Rows.Add("D2", "Sunny", "Hot", "High", "Strong", "No");
data.Rows.Add("D3", "Overcast", "Hot", "High", "Weak", "Yes");
data.Rows.Add("D4", "Rain", "Mild", "High", "Weak", "Yes");
data.Rows.Add("D5", "Rain", "Cool", "Normal", "Weak", "Yes");
data.Rows.Add("D6", "Rain", "Cool", "Normal", "Strong", "No");
data.Rows.Add("D7", "Overcast", "Cool", "Normal", "Strong", "Yes");
data.Rows.Add("D8", "Sunny", "Mild", "High", "Weak", "No");
data.Rows.Add("D9", "Sunny", "Cool", "Normal", "Weak", "Yes");
data.Rows.Add("D10", "Rain", "Mild", "Normal", "Weak", "Yes");
data.Rows.Add("D11", "Sunny", "Mild", "Normal", "Strong", "Yes");
data.Rows.Add("D12", "Overcast", "Mild", "High", "Strong", "Yes");
data.Rows.Add("D13", "Overcast", "Hot", "Normal", "Weak", "Yes");
data.Rows.Add("D14", "Rain", "Mild", "High", "Strong", "No");

// In order to try to learn a decision tree, we will first convert this problem to a more simpler
// representation. Since all variables are categories, it does not matter if they are represented
// as strings, or numbers, since both are just symbols for the event they represent. Since numbers
// are more easily representable than text string, we will convert the problem to use a discrete 
// alphabet through the use of a Accord.Statistics.Filters.Codification codebook.</para>

// A codebook effectively transforms any distinct possible value for a variable into an integer 
// symbol. For example, “Sunny” could as well be represented by the integer label 0, “Overcast” 
// by “1”, Rain by “2”, and the same goes by for the other variables. So:</para>

// Create a new codification codebook to 
// convert strings into integer symbols
var codebook = new Codification(data);

// Translate our training data into integer symbols using our codebook:
DataTable symbols = codebook.Apply(data);
int[][] inputs = symbols.ToArray<int>("Outlook", "Temperature", "Humidity", "Wind");
int[] outputs = symbols.ToArray<int>("PlayTennis");

// For this task, in which we have only categorical variables, the simplest choice 
// to induce a decision tree is to use the ID3 algorithm by Quinlan. Let’s do it:

// Create a teacher ID3 algorithm
var id3learning = new ID3Learning()
{
    // Now that we already have our learning input/ouput pairs, we should specify our
    // decision tree. We will be trying to build a tree to predict the last column, entitled
    // “PlayTennis”. For this, we will be using the “Outlook”, “Temperature”, “Humidity” and
    // “Wind” as predictors (variables which will we will use for our decision). Since those
    // are categorical, we must specify, at the moment of creation of our tree, the
    // characteristics of each of those variables. So:

    new DecisionVariable("Outlook",     3), // 3 possible values (Sunny, overcast, rain)
    new DecisionVariable("Temperature", 3), // 3 possible values (Hot, mild, cool)  
    new DecisionVariable("Humidity",    2), // 2 possible values (High, normal)    
    new DecisionVariable("Wind",        2)  // 2 possible values (Weak, strong) 

    // Note: It is also possible to create a DecisionVariable[] from a codebook:
    // DecisionVariable[] attributes = DecisionVariable.FromCodebook(codebook);
};

// Learn the training instances!
DecisionTree tree = id3learning.Learn(inputs, outputs);

// Compute the training error when predicting training instances
double error = new ZeroOneLoss(outputs).Loss(tree.Decide(inputs));

// The tree can now be queried for new examples through 
// its decide method. For example, we can create a query

int[] query = codebook.Transform(new[,]
{
    { "Outlook",     "Sunny"  },
    { "Temperature", "Hot"    },
    { "Humidity",    "High"   },
    { "Wind",        "Strong" }
});

// And then predict the label using
int predicted = tree.Decide(query);  // result will be 0

// We can translate it back to strings using
string answer = codebook.Revert("PlayTennis", predicted); // Answer will be: "No"

For more examples with discrete variables, please see ID3Learning

This example shows the simplest way to induce a decision tree with continuous variables.

Copy

            // In this example, we will process the famous Fisher's Iris dataset in 
            // which the task is to classify weather the features of an Iris flower 
            // belongs to an Iris setosa, an Iris versicolor, or an Iris virginica:
            // 
            //  - https://en.wikipedia.org/wiki/Iris_flower_data_set
            // 

            // First, let's load the dataset into an array of text that we can process
            string[][] text = Resources.iris_data.Split(new[] { "\r\n" },
                StringSplitOptions.RemoveEmptyEntries).Apply(x => x.Split(','));

            // The first four columns contain the flower features
            double[][] inputs = text.GetColumns(0, 1, 2, 3).To<double[][]>();

            // The last column contains the expected flower type
            string[] labels = text.GetColumn(4);

            // Since the labels are represented as text, the first step is to convert
            // those text labels into integer class labels, so we can process them
            // more easily. For this, we will create a codebook to encode class labels:
            // 
            var codebook = new Codification("Output", labels);

            // With the codebook, we can convert the labels:
            int[] outputs = codebook.Translate("Output", labels);

            // And we can use the C4.5 for learning:
            C45Learning teacher = new C45Learning();

            // Finally induce the tree from the data:
            var tree = teacher.Learn(inputs, outputs);

            // To get the estimated class labels, we can use
            int[] predicted = tree.Decide(inputs);

            // The classification error (0.0266) can be computed as 
            double error = new ZeroOneLoss(outputs).Loss(predicted);

            // Moreover, we may decide to convert our tree to a set of rules:
            DecisionSet rules = tree.ToRules();

            // And using the codebook, we can inspect the tree reasoning:
            string ruleText = rules.ToString(codebook, "Output",
                System.Globalization.CultureInfo.InvariantCulture);

            // The output is:
            string expected = @"Iris-setosa =: (2 <= 2.45)
Iris-versicolor =: (2 > 2.45) && (3 <= 1.75) && (0 <= 7.05) && (1 <= 2.85)
Iris-versicolor =: (2 > 2.45) && (3 <= 1.75) && (0 <= 7.05) && (1 > 2.85)
Iris-versicolor =: (2 > 2.45) && (3 > 1.75) && (0 <= 5.95) && (1 > 3.05)
Iris-virginica =: (2 > 2.45) && (3 <= 1.75) && (0 > 7.05)
Iris-virginica =: (2 > 2.45) && (3 > 1.75) && (0 > 5.95)
Iris-virginica =: (2 > 2.45) && (3 > 1.75) && (0 <= 5.95) && (1 <= 3.05)
";

For more examples with continuous variables, please see C45Learning

The next example shows how to estimate the true performance of a decision tree model using cross-validation:

Copy

// Ensure we have reproducible results
Accord.Math.Random.Generator.Seed = 0;

// Get some data to be learned. We will be using the Wiconsin's
// (Diagnostic) Breast Cancer dataset, where the goal is to determine
// whether the characteristics extracted from a breast cancer exam
// correspond to a malignant or benign type of cancer:
var data = new WisconsinDiagnosticBreastCancer();
double[][] input = data.Features; // 569 samples, 30-dimensional features
int[] output = data.ClassLabels;  // 569 samples, 2 different class labels

// Let's say we want to measure the cross-validation performance of
// a decision tree with a maximum tree height of 5 and where variables
// are able to join the decision path at most 2 times during evaluation:
var cv = CrossValidation.Create(

    k: 10, // We will be using 10-fold cross validation

    learner: (p) => new C45Learning() // here we create the learning algorithm
    {
        Join = 2,
        MaxHeight = 5
    },

    // Now we have to specify how the tree performance should be measured:
    loss: (actual, expected, p) => new ZeroOneLoss(expected).Loss(actual),

    // This function can be used to perform any special
    // operations before the actual learning is done, but
    // here we will just leave it as simple as it can be:
    fit: (teacher, x, y, w) => teacher.Learn(x, y, w),

    // Finally, we have to pass the input and output data
    // that will be used in cross-validation. 
    x: input, y: output
);

// After the cross-validation object has been created,
// we can call its .Learn method with the input and 
// output data that will be partitioned into the folds:
var result = cv.Learn(input, output);

// We can grab some information about the problem:
int numberOfSamples = result.NumberOfSamples; // should be 569
int numberOfInputs = result.NumberOfInputs;   // should be 30
int numberOfOutputs = result.NumberOfOutputs; // should be 2

double trainingError = result.Training.Mean; // should be 0.017771153143274855
double validationError = result.Validation.Mean; // should be 0.0755952380952381

// If desired, compute an aggregate confusion matrix for the validation sets:
GeneralConfusionMatrix gcm = result.ToConfusionMatrix(input, output);
double accuracy = gcm.Accuracy; // result should be 0.92442882249560632

Reference

Accord.MachineLearning.DecisionTrees Namespace

Accord.MachineLearning.DecisionTrees.LearningID3Learning

Accord.MachineLearning.DecisionTrees.LearningC45Learning

Accord.MachineLearning.DecisionTreesRandomForestLearning