BaumWelchLearningTDistribution, TObservation Class 
Namespace: Accord.Statistics.Models.Markov.Learning
public class BaumWelchLearning<TDistribution, TObservation> : BaseBaumWelchLearning<HiddenMarkovModel<TDistribution, TObservation>, TDistribution, TObservation, IFittingOptions>, IConvergenceLearning where TDistribution : Object, IFittableDistribution<TObservation>
The BaumWelchLearningTDistribution, TObservation type exposes the following members.
Name  Description  

BaumWelchLearningTDistribution, TObservation 
Initializes a new instance of the BaumWelchLearningTDistribution, TObservation class.
 
BaumWelchLearningTDistribution, TObservation(HiddenMarkovModelTDistribution, TObservation) 
Initializes a new instance of the BaumWelchLearningTDistribution, TObservation class.

Name  Description  

Convergence 
Gets or sets convergence parameters.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)  
CurrentIteration 
Gets or sets the number of performed iterations.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)  
Emissions 
Gets or sets the function that initializes the emission
distributions in the hidden Markov Models.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)  
FittingOptions 
Gets or sets the distribution fitting options
to use when estimating distribution densities
during learning.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)  
HasConverged 
Gets or sets whether the algorithm has converged.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)  
Iterations  Obsolete.
Please use MaxIterations instead.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)  
LogGamma 
Gets the Gamma matrix of log probabilities created during
the last iteration of the BaumWelch learning algorithm.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)  
LogKsi 
Gets the Ksi matrix of log probabilities created during
the last iteration of the BaumWelch learning algorithm.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)  
LogLikelihood 
Gets the loglikelihood of the model at the last iteration.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)  
LogWeights 
Gets the sample weights in the last iteration of the
BaumWelch learning algorithm.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)  
MaxIterations 
Gets or sets the maximum number of iterations
performed by the learning algorithm.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)  
Model 
Gets or sets the model being trained.
(Inherited from BaseHiddenMarkovModelLearningTModel, TObservation.)  
NumberOfStates 
Gets or sets the number of states to be used when this
learning algorithm needs to create new models.
(Inherited from BaseHiddenMarkovModelLearningTModel, TObservation.)  
Observations 
Gets all observations as a single vector.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)  
ParallelOptions 
Gets or sets the parallelization options for this algorithm.
(Inherited from ParallelLearningBase.)  
Token 
Gets or sets a cancellation token that can be used
to cancel the algorithm while it is running.
(Inherited from ParallelLearningBase.)  
Tolerance 
Gets or sets the maximum change in the average loglikelihood
after an iteration of the algorithm used to detect convergence.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)  
Topology 
Gets or sets the state transition topology to be used when this learning
algorithm needs to create new models. Default is Forward.
(Inherited from BaseHiddenMarkovModelLearningTModel, TObservation.) 
Name  Description  

ComputeForwardBackward 
Computes the forward and backward probabilities matrices
for a given observation referenced by its index in the
input training data.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)  
ComputeKsi 
Computes the ksi matrix of probabilities for a given observation
referenced by its index in the input training data.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)  
Create 
Creates an instance of the model to be learned. Inheritors of this abstract
class must define this method so new models can be created from the training data.
(Overrides BaseHiddenMarkovModelLearningTModel, TObservationCreate(TObservation).)  
Equals  Determines whether the specified object is equal to the current object. (Inherited from Object.)  
Finalize  Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object.)  
Fit 
Fits one emission distribution. This method can be override in a
base class in order to implement special fitting options.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)  
GetHashCode  Serves as the default hash function. (Inherited from Object.)  
GetType  Gets the Type of the current instance. (Inherited from Object.)  
Learn 
Learns a model that can map the given inputs to the desired outputs.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)  
MemberwiseClone  Creates a shallow copy of the current Object. (Inherited from Object.)  
ToString  Returns a string that represents the current object. (Inherited from Object.)  
UpdateEmissions 
Updates the emission probability matrix.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.) 
Name  Description  

HasMethod 
Checks whether an object implements a method with the given name.
(Defined by ExtensionMethods.)  
IsEqual 
Compares two objects for equality, performing an elementwise
comparison if the elements are vectors or matrices.
(Defined by Matrix.)  
To(Type)  Overloaded.
Converts an object into another type, irrespective of whether
the conversion can be done at compile time or not. This can be
used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.)  
ToT  Overloaded.
Converts an object into another type, irrespective of whether
the conversion can be done at compile time or not. This can be
used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.) 
The BaumWelch algorithm is an unsupervised algorithm used to learn a single hidden Markov model object from a set of observation sequences. It works by using a variant of the ExpectationMaximization algorithm to search a set of model parameters (i.e. the matrix of transition probabilities A, the vector of state probability distributions B, and the initial probability vector π) that would result in a model having a high likelihood of being able to generate a set of training sequences given to this algorithm.
For increased accuracy, this class performs all computations using logprobabilities.
For a more thorough explanation on hidden Markov models with practical examples on gesture recognition, please see Sequence Classifiers in C#, Part I: Hidden Markov Models [1].
[1]: http://www.codeproject.com/Articles/541428/SequenceClassifiersinCsharpPartIHiddenMarko
In the following example, we will create a Continuous Hidden Markov Model using a univariate Normal distribution to model properly model continuous sequences.
// Create continuous sequences. In the sequences below, there // seems to be two states, one for values between 0 and 1 and // another for values between 5 and 7. The states seems to be // switched on every observation. double[][] sequences = new double[][] { new double[] { 0.1, 5.2, 0.3, 6.7, 0.1, 6.0 }, new double[] { 0.2, 6.2, 0.3, 6.3, 0.1, 5.0 }, new double[] { 0.1, 7.0, 0.1, 7.0, 0.2, 5.6 }, }; // Specify a initial normal distribution for the samples. var density = new NormalDistribution(); // Creates a continuous hidden Markov Model with two states organized in a forward // topology and an underlying univariate Normal distribution as probability density. var model = new HiddenMarkovModel<NormalDistribution, double>(new Ergodic(2), density); // Configure the learning algorithms to train the sequence classifier until the // difference in the average loglikelihood changes only by as little as 0.0001 var teacher = new BaumWelchLearning<NormalDistribution, double>(model) { Tolerance = 0.0001, Iterations = 0, }; // Fit the model teacher.Learn(sequences); double logLikelihood = teacher.LogLikelihood; // See the logprobability of the sequences learned double a1 = model.LogLikelihood(new[] { 0.1, 5.2, 0.3, 6.7, 0.1, 6.0 }); // 0.12799388666109757 double a2 = model.LogLikelihood(new[] { 0.2, 6.2, 0.3, 6.3, 0.1, 5.0 }); // 0.01171157434400194 // See the probability of an unrelated sequence double a3 = model.LogLikelihood(new[] { 1.1, 2.2, 1.3, 3.2, 4.2, 1.0 }); // 298.7465244473417 double likelihood = Math.Exp(logLikelihood); a1 = Math.Exp(a1); // 0.879 a2 = Math.Exp(a2); // 1.011 a3 = Math.Exp(a3); // 0.000 // We can also ask the model to decode one of the sequences. After // this step the resulting sequence will be: { 0, 1, 0, 1, 0, 1 } // int[] states = model.Decide(new[] { 0.1, 5.2, 0.3, 6.7, 0.1, 6.0 });
In the following example, we will create a Discrete Hidden Markov Model using a Generic Discrete Probability Distribution to reproduce the same code example given in documentation.
// Continuous Markov Models can operate using any // probability distribution, including discrete ones. // In the following example, we will try to create a // Continuous Hidden Markov Model using a discrete // distribution to detect if a given sequence starts // with a zero and has any number of ones after that. int[][] sequences = new double[][] { new double[] { 0,1,1,1,1,0,1,1,1,1 }, new double[] { 0,1,1,1,0,1,1,1,1,1 }, new double[] { 0,1,1,1,1,1,1,1,1,1 }, new double[] { 0,1,1,1,1,1 }, new double[] { 0,1,1,1,1,1,1 }, new double[] { 0,1,1,1,1,1,1,1,1,1 }, new double[] { 0,1,1,1,1,1,1,1,1,1 }, }.ToInt32(); // Create a new Hidden Markov Model with 3 states and // a generic discrete distribution with two symbols var hmm = HiddenMarkovModel.CreateDiscrete(3, 2); // Try to fit the model to the data until the difference in // the average loglikelihood changes only by as little as 0.0001 var teacher = new BaumWelchLearning<GeneralDiscreteDistribution, int>(hmm) { Tolerance = 0.0001, Iterations = 0 }; // Learn the model teacher.Learn(sequences); double ll = Math.Exp(teacher.LogLikelihood); // Calculate the probability that the given // sequences originated from the model double l1 = hmm.Probability(new int[] { 0, 1 }); // 0.999 double l2 = hmm.Probability(new int[] { 0, 1, 1, 1 }); // 0.916 // Sequences which do not start with zero have much lesser probability. double l3 = hmm.Probability(new int[] { 1, 1 }); // 0.000 double l4 = hmm.Probability(new int[] { 1, 0, 0, 0 }); // 0.000 // Sequences which contains few errors have higher probability // than the ones which do not start with zero. This shows some // of the temporal elasticity and error tolerance of the HMMs. double l5 = hmm.Probability(new int[] { 0, 1, 0, 1, 1, 1, 1, 1, 1 }); // 0.034 double l6 = hmm.Probability(new int[] { 0, 1, 1, 1, 1, 1, 1, 0, 1 }); // 0.034
The next example shows how to create a multivariate model using a multivariate normal distribution. In this example, sequences contain vectorvalued observations, such as in the case of (x,y) pairs.
// Create sequences of vectorvalued observations. In the // sequence below, a single observation is composed of two // coordinate values, such as (x, y). There seems to be two // states, one for (x,y) values less than (5,5) and another // for higher values. The states seems to be switched on // every observation. double[][][] sequences = { new double[][] // sequence 1 { new double[] { 1, 2 }, // observation 1 of sequence 1 new double[] { 6, 7 }, // observation 2 of sequence 1 new double[] { 2, 3 }, // observation 3 of sequence 1 }, new double[][] // sequence 2 { new double[] { 2, 2 }, // observation 1 of sequence 2 new double[] { 9, 8 }, // observation 2 of sequence 2 new double[] { 1, 0 }, // observation 3 of sequence 2 }, new double[][] // sequence 3 { new double[] { 1, 3 }, // observation 1 of sequence 3 new double[] { 8, 9 }, // observation 2 of sequence 3 new double[] { 3, 3 }, // observation 3 of sequence 3 }, }; // Specify a initial normal distribution for the samples. var density = new MultivariateNormalDistribution(dimension: 2); // Creates a continuous hidden Markov Model with two states organized in a forward // topology and an underlying univariate Normal distribution as probability density. var model = new HiddenMarkovModel<MultivariateNormalDistribution, double[]>(new Forward(2), density); // Configure the learning algorithms to train the sequence classifier until the // difference in the average loglikelihood changes only by as little as 0.0001 var teacher = new BaumWelchLearning<MultivariateNormalDistribution, double[]>(model) { Tolerance = 0.0001, Iterations = 0, }; // Fit the model teacher.Learn(sequences); double logLikelihood = teacher.LogLikelihood; // See the likelihood of the sequences learned double a1 = Math.Exp(model.LogLikelihood(new[] { new double[] { 1, 2 }, new double[] { 6, 7 }, new double[] { 2, 3 }})); // 0.000208 double a2 = Math.Exp(model.LogLikelihood(new[] { new double[] { 2, 2 }, new double[] { 9, 8 }, new double[] { 1, 0 }})); // 0.0000376 // See the likelihood of an unrelated sequence double a3 = Math.Exp(model.LogLikelihood(new[] { new double[] { 8, 7 }, new double[] { 9, 8 }, new double[] { 1, 0 }})); // 2.10 x 10^(89)
The following example shows how to create a hidden Markov model that considers each feature to be independent of each other. This is the same as following Bayes' assumption of independence for each feature in the feature vector.
// Let's say we have 2 meteorological sensors gathering data // from different time periods of the day. Those periods are // represented below: double[][][] data = { new double[][] // first sequence (we just repeated the measurements { // once, so there is only one observation sequence) new double[] { 1, 2 }, // Day 1, 15:00 pm new double[] { 6, 7 }, // Day 1, 16:00 pm new double[] { 2, 3 }, // Day 1, 17:00 pm new double[] { 2, 2 }, // Day 1, 18:00 pm new double[] { 9, 8 }, // Day 1, 19:00 pm new double[] { 1, 0 }, // Day 1, 20:00 pm new double[] { 1, 3 }, // Day 1, 21:00 pm new double[] { 8, 9 }, // Day 1, 22:00 pm new double[] { 3, 3 }, // Day 1, 23:00 pm new double[] { 1, 3 }, // Day 2, 00:00 am new double[] { 1, 1 }, // Day 2, 01:00 am } }; // Let's assume those sensors are unrelated (for simplicity). As // such, let's assume the data gathered from the sensors may reside // into circular centroids denoting each state the underlying system // might be in. NormalDistribution[] initial_components = { new NormalDistribution(), // initial value for the first variable's distribution new NormalDistribution() // initial value for the second variable's distribution }; // Specify a initial independent normal distribution for the samples. var density = new Independent<NormalDistribution, double>(initial_components); // Creates a continuous hidden Markov Model with two states organized in an Ergodic // topology and an underlying independent Normal distribution as probability density. var model = new HiddenMarkovModel<Independent<NormalDistribution>, double[]>(new Ergodic(2), density); // Configure the learning algorithms to train the sequence classifier until the // difference in the average loglikelihood changes only by as little as 0.0001 var teacher = new BaumWelchLearning<Independent<NormalDistribution>, double[]>(model) { Tolerance = 0.0001, Iterations = 0, }; // Fit the model teacher.Learn(data); double error = teacher.LogLikelihood; // Get the hidden state associated with each observation // int[] hiddenStates = null; // loglikelihood of the Viterbi path double logLikelihood = model.LogLikelihood(data[0], ref hiddenStates);
Finally, the last example shows how to fit a mixturedensity hidden Markov models.
// Suppose we have a set of six sequences and we would like to // fit a hidden Markov model with mixtures of Normal distributions // as the emission densities. // First, let's consider a set of univariate sequences: double[][] sequences = { new double[] { 1, 1, 2, 2, 2, 3, 3, 3 }, new double[] { 1, 2, 2, 2, 3, 3 }, new double[] { 1, 2, 2, 3, 3, 5 }, new double[] { 2, 2, 2, 2, 3, 3, 3, 4, 5, 5, 1 }, new double[] { 1, 1, 1, 2, 2, 5 }, new double[] { 1, 2, 2, 4, 4, 5 }, }; // Now we can begin specifying a initial Gaussian mixture distribution. It is // better to add some different initial parameters to the mixture components: var density = new Mixture<NormalDistribution>( new NormalDistribution(mean: 2, stdDev: 1.0), // 1st component in the mixture new NormalDistribution(mean: 0, stdDev: 0.6), // 2nd component in the mixture new NormalDistribution(mean: 4, stdDev: 0.4), // 3rd component in the mixture new NormalDistribution(mean: 6, stdDev: 1.1) // 4th component in the mixture ); // Let's then create a continuous hidden Markov Model with two states organized in a forward // topology with the underlying univariate Normal mixture distribution as probability density. var model = new HiddenMarkovModel<Mixture<NormalDistribution>, double>(new Forward(2), density); // Now we should configure the learning algorithms to train the sequence classifier. We will // learn until the difference in the average loglikelihood changes only by as little as 0.0001 var teacher = new BaumWelchLearning<Mixture<NormalDistribution>, double>(model) { Tolerance = 0.0001, Iterations = 0, // Note, however, that since this example is extremely simple and we have only a few // data points, a fullblown mixture wouldn't really be needed. Thus we will have a // great chance that the mixture would become degenerated quickly. We can avoid this // by specifying some regularization constants in the Normal distribution fitting: FittingOptions = new MixtureOptions() { Iterations = 1, // limit the inner em to a single iteration InnerOptions = new NormalOptions() { Regularization = 1e5 // specify a regularization constant } } }; // Finally, we can fit the model teacher.Learn(sequences); double logLikelihood = teacher.LogLikelihood; // And now check the likelihood of some approximate sequences. double a1 = Math.Exp(model.LogLikelihood(new double[] { 1, 1, 2, 2, 3 })); // 2.3413833128741038E+45 double a2 = Math.Exp(model.LogLikelihood(new double[] { 1, 1, 2, 5, 5 })); // 9.94607618459872E+19 // We can see that the likelihood of an unrelated sequence is much smaller: double a3 = Math.Exp(model.LogLikelihood(new double[] { 8, 2, 6, 4, 1 })); // 1.5063654166181737E44
When using Normal distributions, it is often the case we might find problems which are difficult to solve. Some problems may include constant variables or other numerical difficulties preventing a the proper estimation of a Normal distribution from the data.
A sign of those difficulties arises when the learning algorithm throws the exception "Variance is zero. Try specifying a regularization constant in the fitting options" for univariate distributions (e.g. NormalDistribution or a NonPositiveDefiniteMatrixException informing that the "Covariance matrix is not positive definite. Try specifying a regularization constant in the fitting options" for multivariate distributions like the MultivariateNormalDistribution. In both cases, this is an indication that the variables being learned can not be suitably modeled by Normal distributions. To avoid numerical difficulties when estimating those probabilities, a small regularization constant can be added to the variances or to the covariance matrices until they become greater than zero or positive definite.
To specify a regularization constant as given in the above message, we can indicate a fitting options object for the model distribution using:
// Suppose we have a set of six sequences and we would like to // fit a hidden Markov model with mixtures of Normal distributions // as the emission densities. // First, let's consider a set of univariate sequences: double[][] sequences = { new double[] { 0.223, 1.05, 0.574, 0.965, 0.448, 0.265, 0.087, 0.362, 0.717, 0.032 }, new double[] { 1.05, 0.574, 0.965, 0.448, 0.265, 0.087, 0.362, 0.717, 0.032, 0.346 }, new double[] { 0.574, 0.965, 0.448, 0.265, 0.087, 0.362, 0.717, 0.032, 0.346, 0.989 }, new double[] { 0.965, 0.448, 0.265, 0.087, 0.362, 0.717, 0.032, 0.346, 0.989, 0.619 }, new double[] { 0.448, 0.265, 0.087, 0.362, 0.717, 0.032, 0.346, 0.989, 0.619, 0.02 }, new double[] { 0.265, 0.087, 0.362, 0.717, 0.032, 0.346, 0.989, 0.619, 0.02, 0.297 }, }; // Now we can begin specifying a initial Gaussian mixture distribution. It is // better to add some different initial parameters to the mixture components: var density = new Mixture<NormalDistribution>( new NormalDistribution(mean: 2, stdDev: 1.0), // 1st component in the mixture new NormalDistribution(mean: 0, stdDev: 0.6), // 2nd component in the mixture new NormalDistribution(mean: 4, stdDev: 0.4), // 3rd component in the mixture new NormalDistribution(mean: 6, stdDev: 1.1) // 4th component in the mixture ); // Let's then create a continuous hidden Markov Model with two states organized in a forward // topology with the underlying univariate Normal mixture distribution as probability density. var model = new HiddenMarkovModel<Mixture<NormalDistribution>, double>(new Forward(2), density); // Now we should configure the learning algorithms to train the sequence classifier. We will // learn until the difference in the average loglikelihood changes only by as little as 0.0001 var teacher = new BaumWelchLearning<Mixture<NormalDistribution>, double, MixtureOptions>(model) { Tolerance = 0.0001, Iterations = 0, // Note, however, that since this example is extremely simple and we have only a few // data points, a fullblown mixture wouldn't really be needed. Thus we will have a // great chance that the mixture would become degenerated quickly. We can avoid this // by specifying some regularization constants in the Normal distribution fitting: FittingOptions = new MixtureOptions() { Iterations = 1, // limit the inner em to a single iteration InnerOptions = new NormalOptions() { Regularization = 1e5 // specify a regularization constant } } }; // Finally, we can fit the model teacher.Learn(sequences); double logLikelihood = teacher.LogLikelihood; // And now check the likelihood of some approximate sequences. double[] newSequence = { 0.223, 1.05, 0.574, 0.965, 0.448, 0.265, 0.087, 0.362, 0.717, 0.032 }; double a1 = Math.Exp(model.LogLikelihood(newSequence)); // 11729312967893.566 int[] path = model.Decide(newSequence); // We can see that the likelihood of an unrelated sequence is much smaller: double a3 = Math.Exp(model.LogLikelihood(new double[] { 8, 2, 6, 4, 1 })); // 0.0
Typically, any small value would suffice as a regularization constant, though smaller values may lead to longer fitting times. Too high values, on the other hand, would lead to decreased accuracy.