Click or drag to resize
Accord.NET (logo)

DecisionTree Class

Decision tree (for both discrete and continuous classification problems).
Inheritance Hierarchy
SystemObject
  Accord.MachineLearningTransformBaseDouble, Int32
    Accord.MachineLearningClassifierBaseDouble, Int32
      Accord.MachineLearningMulticlassClassifierBaseDouble
        Accord.MachineLearningMulticlassClassifierBase
          Accord.MachineLearning.DecisionTreesDecisionTree

Namespace:  Accord.MachineLearning.DecisionTrees
Assembly:  Accord.MachineLearning (in Accord.MachineLearning.dll) Version: 3.8.0
Syntax
[SerializableAttribute]
public class DecisionTree : MulticlassClassifierBase, 
	IEnumerable
Request Example View Source

The DecisionTree type exposes the following members.

Constructors
  NameDescription
Public methodDecisionTree
Creates a new DecisionTree to process the given inputs and the given number of possible classes.
Top
Properties
  NameDescription
Public propertyAttributes
Gets the collection of attributes processed by this tree.
Public propertyInputCount Obsolete.
Deprecated. Please use the NumberOfInputs property.
Public propertyNumberOfClasses
Gets the number of classes expected and recognized by the classifier.
(Inherited from ClassifierBaseTInput, TClasses.)
Public propertyNumberOfInputs
Gets the number of inputs accepted by the model.
(Inherited from TransformBaseTInput, TOutput.)
Public propertyNumberOfOutputs
Gets the number of outputs generated by the model.
(Inherited from TransformBaseTInput, TOutput.)
Public propertyOutputClasses Obsolete.
Deprecated. Please use the NumberOfOutputs property instead.
Public propertyRoot
Gets or sets the root node for this tree.
Top
Methods
  NameDescription
Public methodCompute(Double) Obsolete.
Deprecated. Please use the Decide() method instead.
Public methodCompute(Int32) Obsolete.
Deprecated. Please use the Decide() method instead.
Public methodCompute(Int32) Obsolete.
Deprecated. Please use the Decide() method instead.
Public methodCompute(Double, DecisionNode) Obsolete.
Deprecated. Please use the Decide() method instead.
Public methodDecide(TInput)
Computes class-label decisions for a given set of input vectors.
(Inherited from ClassifierBaseTInput, TClasses.)
Public methodDecide(Int32)
Computes a class-label decision for a given input.
(Inherited from MulticlassClassifierBase.)
Public methodDecide(Single)
Computes a class-label decision for a given input.
(Inherited from MulticlassClassifierBase.)
Public methodDecide(Single)
Computes a class-label decision for a given input.
(Inherited from MulticlassClassifierBase.)
Public methodDecide(Double)
Computes the tree decision for a given input.
(Overrides ClassifierBaseTInput, TClassesDecide(TInput).)
Public methodDecide(Int32)
Computes the tree decision for a given input.
(Overrides MulticlassClassifierBaseDecide(Int32).)
Public methodDecide(NullableInt32)
Computes the tree decision for a given input.
Public methodDecide(NullableInt32)
Computes the tree decision for a given input.
Public methodDecide(TInput, TClasses)
Computes a class-label decision for a given input.
(Inherited from ClassifierBaseTInput, TClasses.)
Public methodDecide(Int32, Boolean)
Computes class-label decisions for the given input.
(Inherited from MulticlassClassifierBase.)
Public methodDecide(Int32, Double)
Computes class-label decisions for the given input.
(Inherited from MulticlassClassifierBase.)
Public methodDecide(Int32, Int32)
Computes class-label decisions for the given input.
(Inherited from MulticlassClassifierBase.)
Public methodDecide(Int32, Boolean)
Computes a class-label decision for a given input.
(Inherited from MulticlassClassifierBase.)
Public methodDecide(Int32, Double)
Computes a class-label decision for a given input.
(Inherited from MulticlassClassifierBase.)
Public methodDecide(Int32, Double)
Computes a class-label decision for a given input.
(Inherited from MulticlassClassifierBase.)
Public methodDecide(Int32, Int32)
Computes a class-label decision for a given input.
(Inherited from MulticlassClassifierBase.)
Public methodDecide(Int32, Int32)
Computes a class-label decision for a given input.
(Inherited from MulticlassClassifierBase.)
Public methodDecide(Single, Boolean)
Computes class-label decisions for the given input.
(Inherited from MulticlassClassifierBase.)
Public methodDecide(Single, Double)
Computes class-label decisions for the given input.
(Inherited from MulticlassClassifierBase.)
Public methodDecide(Single, Int32)
Computes class-label decisions for the given input.
(Inherited from MulticlassClassifierBase.)
Public methodDecide(Single, Boolean)
Computes a class-label decision for a given input.
(Inherited from MulticlassClassifierBase.)
Public methodDecide(Single, Double)
Computes a class-label decision for a given input.
(Inherited from MulticlassClassifierBase.)
Public methodDecide(Single, Double)
Computes a class-label decision for a given input.
(Inherited from MulticlassClassifierBase.)
Public methodDecide(Single, Int32)
Computes a class-label decision for a given input.
(Inherited from MulticlassClassifierBase.)
Public methodDecide(Single, Int32)
Computes a class-label decision for a given input.
(Inherited from MulticlassClassifierBase.)
Public methodDecide(TInput, Boolean)
Computes class-label decisions for the given input.
(Inherited from MulticlassClassifierBaseTInput.)
Public methodDecide(TInput, Double)
Computes class-label decisions for the given input.
(Inherited from MulticlassClassifierBaseTInput.)
Public methodDecide(TInput, Int32)
Computes class-label decisions for the given input.
(Inherited from MulticlassClassifierBaseTInput.)
Public methodDecide(TInput, Double)
Computes a class-label decision for a given input.
(Inherited from MulticlassClassifierBaseTInput.)
Public methodDecide(Double, DecisionNode)
Computes the tree decision for a given input.
Public methodDecide(NullableInt32, DecisionNode)
Computes the tree decision for a given input.
Public methodDecide(NullableInt32, Int32)
Computes the tree decision for a given input.
Public methodEquals
Determines whether the specified object is equal to the current object.
(Inherited from Object.)
Protected methodFinalize
Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection.
(Inherited from Object.)
Public methodGetEnumerator
Returns an enumerator that iterates through the tree.
Public methodGetHashCode
Serves as the default hash function.
(Inherited from Object.)
Public methodGetHeight
Computes the height of the tree, defined as the greatest distance (in links) between the tree's root node and its leaves.
Public methodGetType
Gets the Type of the current instance.
(Inherited from Object.)
Public methodStatic memberLoad(Stream) Obsolete.
Public methodStatic memberLoad(String) Obsolete.
Protected methodMemberwiseClone
Creates a shallow copy of the current Object.
(Inherited from Object.)
Public methodSave(Stream) Obsolete.
Public methodSave(String) Obsolete.
Public methodToAssembly(String, String)
Creates a .NET assembly (.dll) containing a static class of the given name implementing the decision tree. The class will contain a single static Compute method implementing the tree.
Public methodToAssembly(String, String, String)
Creates a .NET assembly (.dll) containing a static class of the given name implementing the decision tree. The class will contain a single static Compute method implementing the tree.
Public methodToCode(String)
Generates a C# class implementing the decision tree.
Public methodToCode(TextWriter, String)
Generates a C# class implementing the decision tree.
Public methodToExpression
Creates an Expression Tree representation of this decision tree, which can in turn be compiled into code.
Public methodToMultilabel
Views this instance as a multi-label classifier, giving access to more advanced methods, such as the prediction of one-hot vectors.
(Inherited from MulticlassClassifierBaseTInput.)
Public methodToRules
Transforms the tree into a set of decision rules.
Public methodToString
Returns a string that represents the current object.
(Inherited from Object.)
Public methodTransform(TInput)
Applies the transformation to an input, producing an associated output.
(Inherited from ClassifierBaseTInput, TClasses.)
Public methodTransform(Int32)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBase.)
Public methodTransform(Single)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBase.)
Public methodTransform(Single)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBase.)
Public methodTransform(TInput)
Applies the transformation to a set of input vectors, producing an associated set of output vectors.
(Inherited from TransformBaseTInput, TOutput.)
Public methodTransform(TInput, TClasses)
Applies the transformation to an input, producing an associated output.
(Inherited from ClassifierBaseTInput, TClasses.)
Public methodTransform(Int32, Double)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBase.)
Public methodTransform(Int32, Int32)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBase.)
Public methodTransform(Int32, Boolean)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBase.)
Public methodTransform(Int32, Double)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBase.)
Public methodTransform(Int32, Double)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBase.)
Public methodTransform(Int32, Int32)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBase.)
Public methodTransform(Int32, Int32)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBase.)
Public methodTransform(Single, Boolean)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBase.)
Public methodTransform(Single, Double)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBase.)
Public methodTransform(Single, Int32)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBase.)
Public methodTransform(Single, Int32)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBase.)
Public methodTransform(Single, Boolean)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBase.)
Public methodTransform(Single, Double)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBase.)
Public methodTransform(Single, Double)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBase.)
Public methodTransform(Single, Int32)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBase.)
Public methodTransform(Single, Int32)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBase.)
Public methodTransform(TInput, Boolean)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBaseTInput.)
Public methodTransform(TInput, Double)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBaseTInput.)
Public methodTransform(TInput, Int32)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBaseTInput.)
Public methodTransform(TInput, Boolean)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBaseTInput.)
Public methodTransform(TInput, Double)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBaseTInput.)
Public methodTransform(TInput, Double)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBaseTInput.)
Public methodTransform(TInput, Int32)
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBaseTInput.)
Public methodTraverse(DecisionTreeTraversalMethod)
Traverse the tree using a tree traversal method. Can be iterated with a foreach loop.
Public methodTraverse(DecisionTreeTraversalMethod, DecisionNode)
Traverse a subtree using a tree traversal method. Can be iterated with a foreach loop.
Top
Extension Methods
  NameDescription
Public Extension MethodHasMethod
Checks whether an object implements a method with the given name.
(Defined by ExtensionMethods.)
Public Extension MethodIsEqual
Compares two objects for equality, performing an elementwise comparison if the elements are vectors or matrices.
(Defined by Matrix.)
Public Extension MethodSetEqualsDecisionNode
Compares two enumerables for set equality. Two enumerables are set equal if they contain the same elements, but not necessarily in the same order.
(Defined by Matrix.)
Public Extension MethodTo(Type)Overloaded.
Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.)
Public Extension MethodToTOverloaded.
Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.)
Top
Remarks

Represents a decision tree which can be compiled to code at run-time. For sample usage and example of learning, please see the documentation pages for the ID3 and C4.5 learning algorithms.

It is also possible to create random forests using the random forest learning algorithm.

Examples

This example shows the simplest way to induce a decision tree with discrete variables.

// In this example, we will learn a decision tree directly from integer
// matrices that define the inputs and outputs of our learning problem.

int[][] inputs =
{
    new int[] { 0, 0 },
    new int[] { 0, 1 },
    new int[] { 1, 0 },
    new int[] { 1, 1 },
};

int[] outputs = // xor between inputs[0] and inputs[1]
{
    0, 1, 1, 0
};

// Create an ID3 learning algorithm
ID3Learning teacher = new ID3Learning();

// Learn a decision tree for the XOR problem
var tree = teacher.Learn(inputs, outputs);

// Compute the error in the learning
double error = new ZeroOneLoss(outputs).Loss(tree.Decide(inputs));

// The tree can now be queried for new examples:
int[] predicted = tree.Decide(inputs); // should be { 0, 1, 1, 0 }

This example shows a common textbook example, and how to induce a decision tree using a codebook to convert string (text) variables into discrete symbols.

// In this example, we will be using the famous Play Tennis example by Tom Mitchell (1998).
// In Mitchell's example, one would like to infer if a person would play tennis or not
// based solely on four input variables. Those variables are all categorical, meaning that
// there is no order between the possible values for the variable (i.e. there is no order
// relationship between Sunny and Rain, one is not bigger nor smaller than the other, but are 
// just distinct). Moreover, the rows, or instances presented above represent days on which the
// behavior of the person has been registered and annotated, pretty much building our set of 
// observation instances for learning:

// Note: this example uses DataTables to represent the input data , but this is not required.
DataTable data = new DataTable("Mitchell's Tennis Example");

data.Columns.Add("Day", "Outlook", "Temperature", "Humidity", "Wind", "PlayTennis");
data.Rows.Add("D1", "Sunny", "Hot", "High", "Weak", "No");
data.Rows.Add("D2", "Sunny", "Hot", "High", "Strong", "No");
data.Rows.Add("D3", "Overcast", "Hot", "High", "Weak", "Yes");
data.Rows.Add("D4", "Rain", "Mild", "High", "Weak", "Yes");
data.Rows.Add("D5", "Rain", "Cool", "Normal", "Weak", "Yes");
data.Rows.Add("D6", "Rain", "Cool", "Normal", "Strong", "No");
data.Rows.Add("D7", "Overcast", "Cool", "Normal", "Strong", "Yes");
data.Rows.Add("D8", "Sunny", "Mild", "High", "Weak", "No");
data.Rows.Add("D9", "Sunny", "Cool", "Normal", "Weak", "Yes");
data.Rows.Add("D10", "Rain", "Mild", "Normal", "Weak", "Yes");
data.Rows.Add("D11", "Sunny", "Mild", "Normal", "Strong", "Yes");
data.Rows.Add("D12", "Overcast", "Mild", "High", "Strong", "Yes");
data.Rows.Add("D13", "Overcast", "Hot", "Normal", "Weak", "Yes");
data.Rows.Add("D14", "Rain", "Mild", "High", "Strong", "No");

// In order to try to learn a decision tree, we will first convert this problem to a more simpler
// representation. Since all variables are categories, it does not matter if they are represented
// as strings, or numbers, since both are just symbols for the event they represent. Since numbers
// are more easily representable than text string, we will convert the problem to use a discrete 
// alphabet through the use of a Accord.Statistics.Filters.Codification codebook.</para>

// A codebook effectively transforms any distinct possible value for a variable into an integer 
// symbol. For example, “Sunny” could as well be represented by the integer label 0, “Overcast” 
// by “1”, Rain by “2”, and the same goes by for the other variables. So:</para>

// Create a new codification codebook to 
// convert strings into integer symbols
var codebook = new Codification(data);

// Translate our training data into integer symbols using our codebook:
DataTable symbols = codebook.Apply(data);
int[][] inputs = symbols.ToArray<int>("Outlook", "Temperature", "Humidity", "Wind");
int[] outputs = symbols.ToArray<int>("PlayTennis");

// For this task, in which we have only categorical variables, the simplest choice 
// to induce a decision tree is to use the ID3 algorithm by Quinlan. Let’s do it:

// Create a teacher ID3 algorithm
var id3learning = new ID3Learning()
{
    // Now that we already have our learning input/ouput pairs, we should specify our
    // decision tree. We will be trying to build a tree to predict the last column, entitled
    // “PlayTennis”. For this, we will be using the “Outlook”, “Temperature”, “Humidity” and
    // “Wind” as predictors (variables which will we will use for our decision). Since those
    // are categorical, we must specify, at the moment of creation of our tree, the
    // characteristics of each of those variables. So:

    new DecisionVariable("Outlook",     3), // 3 possible values (Sunny, overcast, rain)
    new DecisionVariable("Temperature", 3), // 3 possible values (Hot, mild, cool)  
    new DecisionVariable("Humidity",    2), // 2 possible values (High, normal)    
    new DecisionVariable("Wind",        2)  // 2 possible values (Weak, strong) 

    // Note: It is also possible to create a DecisionVariable[] from a codebook:
    // DecisionVariable[] attributes = DecisionVariable.FromCodebook(codebook);
};

// Learn the training instances!
DecisionTree tree = id3learning.Learn(inputs, outputs);

// Compute the training error when predicting training instances
double error = new ZeroOneLoss(outputs).Loss(tree.Decide(inputs));

// The tree can now be queried for new examples through 
// its decide method. For example, we can create a query

int[] query = codebook.Transform(new[,]
{
    { "Outlook",     "Sunny"  },
    { "Temperature", "Hot"    },
    { "Humidity",    "High"   },
    { "Wind",        "Strong" }
});

// And then predict the label using
int predicted = tree.Decide(query);  // result will be 0

// We can translate it back to strings using
string answer = codebook.Revert("PlayTennis", predicted); // Answer will be: "No"

For more examples with discrete variables, please see ID3Learning

This example shows the simplest way to induce a decision tree with continuous variables.

            // In this example, we will process the famous Fisher's Iris dataset in 
            // which the task is to classify weather the features of an Iris flower 
            // belongs to an Iris setosa, an Iris versicolor, or an Iris virginica:
            // 
            //  - https://en.wikipedia.org/wiki/Iris_flower_data_set
            // 

            // First, let's load the dataset into an array of text that we can process
            string[][] text = Resources.iris_data.Split(new[] { "\r\n" },
                StringSplitOptions.RemoveEmptyEntries).Apply(x => x.Split(','));

            // The first four columns contain the flower features
            double[][] inputs = text.GetColumns(0, 1, 2, 3).To<double[][]>();

            // The last column contains the expected flower type
            string[] labels = text.GetColumn(4);

            // Since the labels are represented as text, the first step is to convert
            // those text labels into integer class labels, so we can process them
            // more easily. For this, we will create a codebook to encode class labels:
            // 
            var codebook = new Codification("Output", labels);

            // With the codebook, we can convert the labels:
            int[] outputs = codebook.Translate("Output", labels);

            // And we can use the C4.5 for learning:
            C45Learning teacher = new C45Learning();

            // Finally induce the tree from the data:
            var tree = teacher.Learn(inputs, outputs);

            // To get the estimated class labels, we can use
            int[] predicted = tree.Decide(inputs);

            // The classification error (0.0266) can be computed as 
            double error = new ZeroOneLoss(outputs).Loss(predicted);

            // Moreover, we may decide to convert our tree to a set of rules:
            DecisionSet rules = tree.ToRules();

            // And using the codebook, we can inspect the tree reasoning:
            string ruleText = rules.ToString(codebook, "Output",
                System.Globalization.CultureInfo.InvariantCulture);

            // The output is:
            string expected = @"Iris-setosa =: (2 <= 2.45)
Iris-versicolor =: (2 > 2.45) && (3 <= 1.75) && (0 <= 7.05) && (1 <= 2.85)
Iris-versicolor =: (2 > 2.45) && (3 <= 1.75) && (0 <= 7.05) && (1 > 2.85)
Iris-versicolor =: (2 > 2.45) && (3 > 1.75) && (0 <= 5.95) && (1 > 3.05)
Iris-virginica =: (2 > 2.45) && (3 <= 1.75) && (0 > 7.05)
Iris-virginica =: (2 > 2.45) && (3 > 1.75) && (0 > 5.95)
Iris-virginica =: (2 > 2.45) && (3 > 1.75) && (0 <= 5.95) && (1 <= 3.05)
";

For more examples with continuous variables, please see C45Learning

The next example shows how to estimate the true performance of a decision tree model using cross-validation:

// Ensure we have reproducible results
Accord.Math.Random.Generator.Seed = 0;

// Get some data to be learned. We will be using the Wiconsin's
// (Diagnostic) Breast Cancer dataset, where the goal is to determine
// whether the characteristics extracted from a breast cancer exam
// correspond to a malignant or benign type of cancer:
var data = new WisconsinDiagnosticBreastCancer();
double[][] input = data.Features; // 569 samples, 30-dimensional features
int[] output = data.ClassLabels;  // 569 samples, 2 different class labels

// Let's say we want to measure the cross-validation performance of
// a decision tree with a maximum tree height of 5 and where variables
// are able to join the decision path at most 2 times during evaluation:
var cv = CrossValidation.Create(

    k: 10, // We will be using 10-fold cross validation

    learner: (p) => new C45Learning() // here we create the learning algorithm
    {
        Join = 2,
        MaxHeight = 5
    },

    // Now we have to specify how the tree performance should be measured:
    loss: (actual, expected, p) => new ZeroOneLoss(expected).Loss(actual),

    // This function can be used to perform any special
    // operations before the actual learning is done, but
    // here we will just leave it as simple as it can be:
    fit: (teacher, x, y, w) => teacher.Learn(x, y, w),

    // Finally, we have to pass the input and output data
    // that will be used in cross-validation. 
    x: input, y: output
);

// After the cross-validation object has been created,
// we can call its .Learn method with the input and 
// output data that will be partitioned into the folds:
var result = cv.Learn(input, output);

// We can grab some information about the problem:
int numberOfSamples = result.NumberOfSamples; // should be 569
int numberOfInputs = result.NumberOfInputs;   // should be 30
int numberOfOutputs = result.NumberOfOutputs; // should be 2

double trainingError = result.Training.Mean; // should be 0.017771153143274855
double validationError = result.Validation.Mean; // should be 0.0755952380952381

// If desired, compute an aggregate confusion matrix for the validation sets:
GeneralConfusionMatrix gcm = result.ToConfusionMatrix(input, output);
double accuracy = gcm.Accuracy; // result should be 0.92442882249560632
See Also