NaiveBayes Class 
Namespace: Accord.MachineLearning.Bayes
[SerializableAttribute] public class NaiveBayes : NaiveBayes<GeneralDiscreteDistribution, int>
The NaiveBayes type exposes the following members.
Name  Description  

NaiveBayes(Int32, Int32) 
Constructs a new Naïve Bayes Classifier.
 
NaiveBayes(Int32, Double, Int32)  Obsolete.
Obsolete.

Name  Description  

ClassCount  Obsolete.
Gets the number of possible output classes.
 
Distributions 
Gets the probability distributions for each class and input.
 
InputCount  Obsolete.
Gets the number of inputs in the model.
 
NumberOfClasses 
Gets the number of classes expected and recognized by the classifier.
(Inherited from ClassifierBaseTInput, TClasses.)  
NumberOfInputs 
Gets the number of inputs accepted by the model.
(Inherited from TransformBaseTInput, TOutput.)  
NumberOfOutputs 
Gets the number of outputs generated by the model.
(Inherited from TransformBaseTInput, TOutput.)  
NumberOfSymbols 
Gets the number of symbols for each input in the model.
 
Priors 
Gets the prior beliefs for each class.
(Inherited from BayesTDistribution, TInput.)  
SymbolCount  Obsolete.
Gets the number of symbols for each input in the model.

Name  Description  

Compute(Int32)  Obsolete.
Computes the most likely class for a given instance.
 
Compute(Int32, Double, Double)  Obsolete.
Computes the most likely class for a given instance.
 
Decide(TInput) 
Computes classlabel decisions for a given set of input vectors.
(Inherited from ClassifierBaseTInput, TClasses.)  
Decide(TInput) 
Computes a classlabel decision for a given input.
(Inherited from MulticlassScoreClassifierBaseTInput.)  
Decide(TInput, TClasses) 
Computes a classlabel decision for a given input.
(Inherited from ClassifierBaseTInput, TClasses.)  
Decide(TInput, Boolean) 
Computes classlabel decisions for the given input.
(Inherited from MulticlassClassifierBaseTInput.)  
Decide(TInput, Double) 
Computes classlabel decisions for the given input.
(Inherited from MulticlassClassifierBaseTInput.)  
Decide(TInput, Int32) 
Computes classlabel decisions for the given input.
(Inherited from MulticlassClassifierBaseTInput.)  
Decide(TInput, Double) 
Computes a classlabel decision for a given input.
(Inherited from MulticlassClassifierBaseTInput.)  
Equals  Determines whether the specified object is equal to the current object. (Inherited from Object.)  
Estimate  Obsolete.
Obsolete.
 
Finalize  Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object.)  
GetHashCode  Serves as the default hash function. (Inherited from Object.)  
GetType  Gets the Type of the current instance. (Inherited from Object.)  
Load(Stream)  Obsolete.
Loads a machine from a stream.
 
Load(String)  Obsolete.
Loads a machine from a file.
 
LoadTDistribution(Stream)  Obsolete.
Loads a machine from a stream.
 
LoadTDistribution(String)  Obsolete.
Loads a machine from a file.
 
LogLikelihood(TInput) 
Computes the loglikelihood that the given input
vector belongs to its most plausible class.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
LogLikelihood(TInput) 
Computes the loglikelihood that the given input
vectors belongs to each of the possible classes.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
LogLikelihood(TInput, Int32) 
Predicts a class label vector for the given input vector, returning the
loglikelihood that the input vector belongs to its predicted class.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
LogLikelihood(TInput, Double) 
Computes the loglikelihood that the given input
vectors belongs to each of the possible classes.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
LogLikelihood(TInput, Int32) 
Computes the loglikelihood that the given input vector
belongs to the specified classIndex.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
LogLikelihood(TInput, Int32) 
Computes the loglikelihood that the given input vector
belongs to the specified classIndex.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
LogLikelihood(TInput, Int32) 
Predicts a class label for each input vector, returning the
loglikelihood that each vector belongs to its predicted class.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
LogLikelihood(TInput, Int32) 
Computes the loglikelihood that the given input vector
belongs to the specified classIndex.
(Inherited from BayesTDistribution, TInput.)  
LogLikelihood(TInput, Int32, Double) 
Computes the loglikelihood that the given input vector
belongs to the specified classIndex.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
LogLikelihood(TInput, Int32, Double) 
Computes the loglikelihood that the given input vector
belongs to the specified classIndex.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
LogLikelihood(TInput, Int32, Double) 
Predicts a class label for each input vector, returning the
loglikelihood that each vector belongs to its predicted class.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
LogLikelihoods(TInput) 
Computes the loglikelihood that the given input
vector belongs to each of the possible classes.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
LogLikelihoods(TInput) 
Computes the loglikelihood that the given input
vectors belongs to each of the possible classes.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
LogLikelihoods(TInput, Double) 
Computes the loglikelihood that the given input
vector belongs to each of the possible classes.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
LogLikelihoods(TInput, Int32) 
Predicts a class label vector for the given input vector, returning the
loglikelihoods of the input vector belonging to each possible class.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
LogLikelihoods(TInput, Double) 
Computes the loglikelihood that the given input
vector belongs to each of the possible classes.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
LogLikelihoods(TInput, Int32) 
Computes the loglikelihood that the given input vector
belongs to the specified classIndex.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
LogLikelihoods(TInput, Int32) 
Predicts a class label vector for each input vector, returning the
loglikelihoods of the input vector belonging to each possible class.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
LogLikelihoods(TInput, Int32, Double) 
Predicts a class label vector for the given input vector, returning the
loglikelihoods of the input vector belonging to each possible class.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
LogLikelihoods(TInput, Int32, Double) 
Computes the loglikelihood that the given input vector
belongs to the specified classIndex.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
LogLikelihoods(TInput, Int32, Double) 
Predicts a class label vector for each input vector, returning the
loglikelihoods of the input vector belonging to each possible class.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
MemberwiseClone  Creates a shallow copy of the current Object. (Inherited from Object.)  
Normal(Int32, Int32) 
Constructs a new Naïve Bayes Classifier.
 
Normal(Int32, Int32, NormalDistribution) 
Constructs a new Naïve Bayes Classifier.
 
Normal(Int32, Int32, NormalDistribution) 
Constructs a new Naïve Bayes Classifier.
 
Normal(Int32, Int32, NormalDistribution) 
Constructs a new Naïve Bayes Classifier.
 
Normal(Int32, Int32, Double) 
Constructs a new Naïve Bayes Classifier.
 
Normal(Int32, Int32, NormalDistribution, Double) 
Constructs a new Naïve Bayes Classifier.
 
Probabilities(TInput) 
Computes the probabilities that the given input
vector belongs to each of the possible classes.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Probabilities(TInput) 
Computes the probabilities that the given input
vector belongs to each of the possible classes.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Probabilities(TInput, Double) 
Computes the probabilities that the given input
vector belongs to each of the possible classes.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Probabilities(TInput, Int32) 
Predicts a class label vector for the given input vector, returning the
probabilities of the input vector belonging to each possible class.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Probabilities(TInput, Double) 
Computes the probabilities that the given input
vector belongs to each of the possible classes.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Probabilities(TInput, Int32) 
Predicts a class label vector for each input vector, returning the
probabilities of the input vector belonging to each possible class.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Probabilities(TInput, Int32, Double) 
Predicts a class label vector for the given input vector, returning the
probabilities of the input vector belonging to each possible class.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Probabilities(TInput, Int32, Double) 
Predicts a class label vector for each input vector, returning the
probabilities of the input vector belonging to each possible class.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Probability(TInput) 
Predicts a class label for the given input vector, returning the
probability that the input vector belongs to its predicted class.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Probability(TInput) 
Predicts a class label for the given input vector, returning the
probability that the input vector belongs to its predicted class.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Probability(TInput, Int32) 
Computes the probability that the given input vector
belongs to the specified classIndex.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Probability(TInput, Int32) 
Predicts a class label for the given input vector, returning the
probability that the input vector belongs to its predicted class.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Probability(TInput, Double) 
Predicts a class label for the given input vector, returning the
probability that the input vector belongs to its predicted class.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Probability(TInput, Int32) 
Computes the probability that the given input vector
belongs to the specified classIndex.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Probability(TInput, Int32) 
Computes the probability that the given input vector
belongs to the specified classIndex.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Probability(TInput, Int32) 
Predicts a class label for each input vector, returning the
probability that each vector belongs to its predicted class.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Probability(TInput, Int32, Double) 
Computes the probability that the given input vector
belongs to the specified classIndex.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Probability(TInput, Int32, Double) 
Computes the probability that the given input vector
belongs to the specified classIndex.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Probability(TInput, Int32, Double) 
Predicts a class label for each input vector, returning the
probability that each vector belongs to its predicted class.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Save(Stream)  Obsolete.
Saves the Naïve Bayes model to a stream.
 
Save(String)  Obsolete.
Saves the Naïve Bayes model to a stream.
 
Score(TInput) 
Computes a numerical score measuring the association between
the given input vector and its most strongly
associated class (as predicted by the classifier).
(Inherited from MulticlassScoreClassifierBaseTInput.)  
Score(TInput) 
Computes a numerical score measuring the association between
the given input vector and its most strongly
associated class (as predicted by the classifier).
(Inherited from MulticlassScoreClassifierBaseTInput.)  
Score(TInput, Int32) 
Computes a numerical score measuring the association between
the given input vector and a given
classIndex.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Score(TInput, Int32) 
Predicts a class label for the input vector, returning a
numerical score measuring the strength of association of the
input vector to its most strongly related class.
(Inherited from MulticlassScoreClassifierBaseTInput.)  
Score(TInput, Double) 
Computes a numerical score measuring the association between
the given input vector and its most strongly
associated class (as predicted by the classifier).
(Inherited from MulticlassScoreClassifierBaseTInput.)  
Score(TInput, Int32) 
Computes a numerical score measuring the association between
the given input vector and a given
classIndex.
(Inherited from MulticlassScoreClassifierBaseTInput.)  
Score(TInput, Int32) 
Computes a numerical score measuring the association between
the given input vector and a given
classIndex.
(Inherited from MulticlassScoreClassifierBaseTInput.)  
Score(TInput, Int32) 
Predicts a class label for each input vector, returning a
numerical score measuring the strength of association of the
input vector to the most strongly related class.
(Inherited from MulticlassScoreClassifierBaseTInput.)  
Score(TInput, Int32, Double) 
Computes a numerical score measuring the association between
the given input vector and a given
classIndex.
(Inherited from MulticlassScoreClassifierBaseTInput.)  
Score(TInput, Int32, Double) 
Computes a numerical score measuring the association between
the given input vector and a given
classIndex.
(Inherited from MulticlassScoreClassifierBaseTInput.)  
Score(TInput, Int32, Double) 
Predicts a class label for each input vector, returning a
numerical score measuring the strength of association of the
input vector to the most strongly related class.
(Inherited from MulticlassScoreClassifierBaseTInput.)  
Scores(TInput) 
Computes a numerical score measuring the association between
the given input vector and each class.
(Inherited from MulticlassScoreClassifierBaseTInput.)  
Scores(TInput) 
Computes a numerical score measuring the association between
the given input vector and each class.
(Inherited from MulticlassScoreClassifierBaseTInput.)  
Scores(TInput, Double) 
Computes a numerical score measuring the association between
the given input vector and each class.
(Inherited from MulticlassScoreClassifierBaseTInput.)  
Scores(TInput, Int32) 
Predicts a class label vector for the given input vector, returning a
numerical score measuring the strength of association of the input vector
to each of the possible classes.
(Inherited from MulticlassScoreClassifierBaseTInput.)  
Scores(TInput, Double) 
Computes a numerical score measuring the association between
the given input vector and each class.
(Inherited from MulticlassScoreClassifierBaseTInput.)  
Scores(TInput, Int32) 
Predicts a class label vector for each input vector, returning a
numerical score measuring the strength of association of the input vector
to each of the possible classes.
(Inherited from MulticlassScoreClassifierBaseTInput.)  
Scores(TInput, Int32, Double) 
Predicts a class label vector for the given input vector, returning a
numerical score measuring the strength of association of the input vector
to each of the possible classes.
(Inherited from MulticlassScoreClassifierBaseTInput.)  
Scores(TInput, Int32, Double) 
Predicts a class label vector for each input vector, returning a
numerical score measuring the strength of association of the input vector
to each of the possible classes.
(Inherited from MulticlassScoreClassifierBaseTInput.)  
ToMulticlass 
Views this instance as a multiclass generative classifier.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
ToMultilabel 
Views this instance as a multilabel generative classifier,
giving access to more advanced methods, such as the prediction
of onehot vectors.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
ToString  Returns a string that represents the current object. (Inherited from Object.)  
Transform(TInput) 
Applies the transformation to an input, producing an associated output.
(Inherited from ClassifierBaseTInput, TClasses.)  
Transform(TInput) 
Applies the transformation to a set of input vectors,
producing an associated set of output vectors.
(Inherited from TransformBaseTInput, TOutput.)  
Transform(TInput, TClasses) 
Applies the transformation to an input, producing an associated output.
(Inherited from ClassifierBaseTInput, TClasses.)  
Transform(TInput, Boolean) 
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBaseTInput.)  
Transform(TInput, Int32) 
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBaseTInput.)  
Transform(TInput, Boolean) 
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBaseTInput.)  
Transform(TInput, Double) 
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBaseTInput.)  
Transform(TInput, Int32) 
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassClassifierBaseTInput.)  
Transform(TInput, Double) 
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.)  
Transform(TInput, Double) 
Applies the transformation to an input, producing an associated output.
(Inherited from MulticlassLikelihoodClassifierBaseTInput.) 
Name  Description  

HasMethod 
Checks whether an object implements a method with the given name.
(Defined by ExtensionMethods.)  
IsEqual 
Compares two objects for equality, performing an elementwise
comparison if the elements are vectors or matrices.
(Defined by Matrix.)  
To(Type)  Overloaded.
Converts an object into another type, irrespective of whether
the conversion can be done at compile time or not. This can be
used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.)  
ToT  Overloaded.
Converts an object into another type, irrespective of whether
the conversion can be done at compile time or not. This can be
used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.) 
A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be "independent feature model".
In simple terms, a naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature, given the class variable. In spite of their naive design and apparently oversimplified assumptions, naive Bayes classifiers have worked quite well in many complex realworld situations.
This class implements a discrete (integervalued) NaiveBayes classifier. There is also a special named constructor to create classifiers assuming normal distributions for each variable. For arbitrary distribution classifiers, please see NaiveBayesTDistribution.
References:
In this example, we will be using the famous Play Tennis example by Tom Mitchell (1998). In Mitchell's example, one would like to infer if a person would play tennis or not based solely on four input variables. Those variables are all categorical, meaning that there is no order between the possible values for the variable (i.e. there is no order relationship between Sunny and Rain, one is not bigger nor smaller than the other, but are just distinct). Moreover, the rows, or instances presented below represent days on which the behavior of the person has been registered and annotated, pretty much building our set of observation instances for learning:
DataTable data = new DataTable("Mitchell's Tennis Example"); data.Columns.Add("Day", "Outlook", "Temperature", "Humidity", "Wind", "PlayTennis"); data.Rows.Add("D1", "Sunny", "Hot", "High", "Weak", "No"); data.Rows.Add("D2", "Sunny", "Hot", "High", "Strong", "No"); data.Rows.Add("D3", "Overcast", "Hot", "High", "Weak", "Yes"); data.Rows.Add("D4", "Rain", "Mild", "High", "Weak", "Yes"); data.Rows.Add("D5", "Rain", "Cool", "Normal", "Weak", "Yes"); data.Rows.Add("D6", "Rain", "Cool", "Normal", "Strong", "No"); data.Rows.Add("D7", "Overcast", "Cool", "Normal", "Strong", "Yes"); data.Rows.Add("D8", "Sunny", "Mild", "High", "Weak", "No"); data.Rows.Add("D9", "Sunny", "Cool", "Normal", "Weak", "Yes"); data.Rows.Add("D10", "Rain", "Mild", "Normal", "Weak", "Yes"); data.Rows.Add("D11", "Sunny", "Mild", "Normal", "Strong", "Yes"); data.Rows.Add("D12", "Overcast", "Mild", "High", "Strong", "Yes"); data.Rows.Add("D13", "Overcast", "Hot", "Normal", "Weak", "Yes"); data.Rows.Add("D14", "Rain", "Mild", "High", "Strong", "No");
Obs: The DataTable representation is not required, and instead the NaiveBayes could also be trained directly on integer arrays containing the integer codewords.
In order to estimate a discrete Naive Bayes, we will first convert this problem to a more simpler representation. Since all variables are categories, it does not matter if they are represented as strings, or numbers, since both are just symbols for the event they represent. Since numbers are more easily representable than text strings, we will convert the problem to use a discrete alphabet through the use of a codebook.
A codebook effectively transforms any distinct possible value for a variable into an integer symbol. For example, “Sunny” could as well be represented by the integer label 0, “Overcast” by “1”, Rain by “2”, and the same goes by for the other variables. So:
// Create a new codification codebook to // convert strings into discrete symbols Codification codebook = new Codification(data, "Outlook", "Temperature", "Humidity", "Wind", "PlayTennis"); // Extract input and output pairs to train DataTable symbols = codebook.Apply(data); int[][] inputs = symbols.ToArray<int>("Outlook", "Temperature", "Humidity", "Wind"); int[] outputs = symbols.ToArray<int>("PlayTennis");
Now that we already have our learning input/output pairs, we should specify our Bayes model. We will be trying to build a model to predict the last column, entitled “PlayTennis”. For this, we will be using the “Outlook”, “Temperature”, “Humidity” and “Wind” as predictors (variables which will we will use for our decision). Since those are categorical, we must specify, at the moment of creation of our Bayes model, the number of each possible symbols for those variables.
// Create a new Naive Bayes learning var learner = new NaiveBayesLearning(); // Learn a Naive Bayes model from the examples NaiveBayes nb = learner.Learn(inputs, outputs);
Now that we have created and estimated our classifier, we can query the classifier for new input samples through the Decide method.
// Consider we would like to know whether one should play tennis at a // sunny, cool, humid and windy day. Let us first encode this instance int[] instance = codebook.Translate("Sunny", "Cool", "High", "Strong"); // Let us obtain the numeric output that represents the answer int c = nb.Decide(instance); // answer will be 0 // Now let us convert the numeric output to an actual "Yes" or "No" answer string result = codebook.Translate("PlayTennis", c); // answer will be "No" // We can also extract the probabilities for each possible answer double[] probs = nb.Probabilities(instance); // { 0.795, 0.205 }
Please note that, while the example uses a DataTable to exemplify how data stored into tables can be loaded in the framework, it is not necessary at all to use DataTables in your own, final code. For example, please consider the same example shown above, but without DataTables:
string[] columnNames = { "Outlook", "Temperature", "Humidity", "Wind", "PlayTennis" }; string[][] data = { new string[] { "Sunny", "Hot", "High", "Weak", "No" }, new string[] { "Sunny", "Hot", "High", "Strong", "No" }, new string[] { "Overcast", "Hot", "High", "Weak", "Yes" }, new string[] { "Rain", "Mild", "High", "Weak", "Yes" }, new string[] { "Rain", "Cool", "Normal", "Weak", "Yes" }, new string[] { "Rain", "Cool", "Normal", "Strong", "No" }, new string[] { "Overcast", "Cool", "Normal", "Strong", "Yes" }, new string[] { "Sunny", "Mild", "High", "Weak", "No" }, new string[] { "Sunny", "Cool", "Normal", "Weak", "Yes" }, new string[] { "Rain", "Mild", "Normal", "Weak", "Yes" }, new string[] { "Sunny", "Mild", "Normal", "Strong", "Yes" }, new string[] { "Overcast", "Mild", "High", "Strong", "Yes" }, new string[] { "Overcast", "Hot", "Normal", "Weak", "Yes" }, new string[] { "Rain", "Mild", "High", "Strong", "No" }, }; // Create a new codification codebook to // convert strings into discrete symbols Codification codebook = new Codification(columnNames, data); // Extract input and output pairs to train int[][] symbols = codebook.Transform(data); int[][] inputs = symbols.Get(null, 0, 1); // Gets all rows, from 0 to the last (but not the last) int[] outputs = symbols.GetColumn(1); // Gets only the last column // Create a new Naive Bayes learning var learner = new NaiveBayesLearning(); NaiveBayes nb = learner.Learn(inputs, outputs); // Consider we would like to know whether one should play tennis at a // sunny, cool, humid and windy day. Let us first encode this instance int[] instance = codebook.Translate("Sunny", "Cool", "High", "Strong"); // Let us obtain the numeric output that represents the answer int c = nb.Decide(instance); // answer will be 0 // Now let us convert the numeric output to an actual "Yes" or "No" answer string result = codebook.Translate("PlayTennis", c); // answer will be "No" // We can also extract the probabilities for each possible answer double[] probs = nb.Probabilities(instance); // { 0.795, 0.205 }
In this second example, we will be creating a simple multiclass classification problem using integer vectors and learning a discrete Naive Bayes on those vectors.
// Let's say we have the following data to be classified // into three possible classes. Those are the samples: // int[][] inputs = { // input output new int[] { 0, 1, 1, 0 }, // 0 new int[] { 0, 1, 0, 0 }, // 0 new int[] { 0, 0, 1, 0 }, // 0 new int[] { 0, 1, 1, 0 }, // 0 new int[] { 0, 1, 0, 0 }, // 0 new int[] { 1, 0, 0, 0 }, // 1 new int[] { 1, 0, 0, 0 }, // 1 new int[] { 1, 0, 0, 1 }, // 1 new int[] { 0, 0, 0, 1 }, // 1 new int[] { 0, 0, 0, 1 }, // 1 new int[] { 1, 1, 1, 1 }, // 2 new int[] { 1, 0, 1, 1 }, // 2 new int[] { 1, 1, 0, 1 }, // 2 new int[] { 0, 1, 1, 1 }, // 2 new int[] { 1, 1, 1, 1 }, // 2 }; int[] outputs = // those are the class labels { 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, }; // Let us create a learning algorithm var learner = new NaiveBayesLearning(); // and teach a model on the data examples NaiveBayes nb = learner.Learn(inputs, outputs); // Now, let's test the model output for the first input sample: int answer = nb.Decide(new int[] { 0, 1, 1, 0 }); // should be 1
Like all other learning algorithms in the framework, it is also possible to obtain a better measure of the performance of the Naive Bayes algorithm using crossvalidation, as shown in the example below:
// Ensure we have reproducible results Accord.Math.Random.Generator.Seed = 0; // Let's say we have the following data to be classified // into three possible classes. Those are the samples: // int[][] inputs = { // input output new int[] { 0, 1, 1, 0 }, // 0 new int[] { 0, 1, 0, 0 }, // 0 new int[] { 0, 0, 1, 0 }, // 0 new int[] { 0, 1, 1, 0 }, // 0 new int[] { 0, 1, 0, 0 }, // 0 new int[] { 1, 0, 0, 0 }, // 1 new int[] { 1, 0, 0, 0 }, // 1 new int[] { 1, 0, 0, 1 }, // 1 new int[] { 0, 0, 0, 1 }, // 1 new int[] { 0, 0, 0, 1 }, // 1 new int[] { 1, 1, 1, 1 }, // 2 new int[] { 1, 0, 1, 1 }, // 2 new int[] { 1, 1, 0, 1 }, // 2 new int[] { 0, 1, 1, 1 }, // 2 new int[] { 1, 1, 1, 1 }, // 2 }; int[] outputs = // those are the class labels { 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, }; // Let's say we want to measure the crossvalidation // performance of Naive Bayes on the above data set: var cv = CrossValidation.Create( k: 10, // We will be using 10fold cross validation // First we define the learning algorithm: learner: (p) => new NaiveBayesLearning(), // Now we have to specify how the n.b. performance should be measured: loss: (actual, expected, p) => new ZeroOneLoss(expected).Loss(actual), // This function can be used to perform any special // operations before the actual learning is done, but // here we will just leave it as simple as it can be: fit: (teacher, x, y, w) => teacher.Learn(x, y, w), // Finally, we have to pass the input and output data // that will be used in crossvalidation. x: inputs, y: outputs ); // After the crossvalidation object has been created, // we can call its .Learn method with the input and // output data that will be partitioned into the folds: var result = cv.Learn(inputs, outputs); // We can grab some information about the problem: int numberOfSamples = result.NumberOfSamples; // should be 15 int numberOfInputs = result.NumberOfInputs; // should be 4 int numberOfOutputs = result.NumberOfOutputs; // should be 3 double trainingError = result.Training.Mean; // should be 0 double validationError = result.Validation.Mean; // should be 0.15 (+/ var. 0.11388888888888887) // If desired, compute an aggregate confusion matrix for the validation sets: GeneralConfusionMatrix gcm = result.ToConfusionMatrix(inputs, outputs);