Click or drag to resize
Accord.NET (logo)

BaumWelchLearningTDistribution, TObservation Class

Baum-Welch learning algorithm for arbitrary-density (generic) Hidden Markov Models.
Inheritance Hierarchy
SystemObject
  Accord.MachineLearningParallelLearningBase
    Accord.Statistics.Models.Markov.LearningBaseHiddenMarkovModelLearningHiddenMarkovModelTDistribution, TObservation, TObservation
      Accord.Statistics.Models.Markov.LearningBaseBaumWelchLearningHiddenMarkovModelTDistribution, TObservation, TDistribution, TObservation, IFittingOptions
        Accord.Statistics.Models.Markov.LearningBaumWelchLearningTDistribution, TObservation

Namespace:  Accord.Statistics.Models.Markov.Learning
Assembly:  Accord.Statistics (in Accord.Statistics.dll) Version: 3.8.0
Syntax
public class BaumWelchLearning<TDistribution, TObservation> : BaseBaumWelchLearning<HiddenMarkovModel<TDistribution, TObservation>, TDistribution, TObservation, IFittingOptions>, 
	IConvergenceLearning
where TDistribution : Object, IFittableDistribution<TObservation>
Request Example View Source

Type Parameters

TDistribution
The type of the emission distributions in the model.
TObservation
The type of the observations (i.e. int for a discrete model).

The BaumWelchLearningTDistribution, TObservation type exposes the following members.

Constructors
  NameDescription
Public methodBaumWelchLearningTDistribution, TObservation
Initializes a new instance of the BaumWelchLearningTDistribution, TObservation class.
Public methodBaumWelchLearningTDistribution, TObservation(HiddenMarkovModelTDistribution, TObservation)
Initializes a new instance of the BaumWelchLearningTDistribution, TObservation class.
Top
Properties
  NameDescription
Public propertyConvergence
Gets or sets convergence parameters.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)
Public propertyCurrentIteration
Gets or sets the number of performed iterations.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)
Public propertyEmissions
Gets or sets the function that initializes the emission distributions in the hidden Markov Models.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)
Public propertyFittingOptions
Gets or sets the distribution fitting options to use when estimating distribution densities during learning.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)
Public propertyHasConverged
Gets or sets whether the algorithm has converged.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)
Public propertyIterations Obsolete.
Please use MaxIterations instead.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)
Public propertyLogGamma
Gets the Gamma matrix of log probabilities created during the last iteration of the Baum-Welch learning algorithm.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)
Public propertyLogKsi
Gets the Ksi matrix of log probabilities created during the last iteration of the Baum-Welch learning algorithm.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)
Public propertyLogLikelihood
Gets the log-likelihood of the model at the last iteration.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)
Public propertyLogWeights
Gets the sample weights in the last iteration of the Baum-Welch learning algorithm.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)
Public propertyMaxIterations
Gets or sets the maximum number of iterations performed by the learning algorithm.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)
Public propertyModel
Gets or sets the model being trained.
(Inherited from BaseHiddenMarkovModelLearningTModel, TObservation.)
Public propertyNumberOfStates
Gets or sets the number of states to be used when this learning algorithm needs to create new models.
(Inherited from BaseHiddenMarkovModelLearningTModel, TObservation.)
Protected propertyObservations
Gets all observations as a single vector.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)
Public propertyParallelOptions
Gets or sets the parallelization options for this algorithm.
(Inherited from ParallelLearningBase.)
Public propertyToken
Gets or sets a cancellation token that can be used to cancel the algorithm while it is running.
(Inherited from ParallelLearningBase.)
Public propertyTolerance
Gets or sets the maximum change in the average log-likelihood after an iteration of the algorithm used to detect convergence.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)
Public propertyTopology
Gets or sets the state transition topology to be used when this learning algorithm needs to create new models. Default is Forward.
(Inherited from BaseHiddenMarkovModelLearningTModel, TObservation.)
Top
Methods
  NameDescription
Protected methodComputeForwardBackward
Computes the forward and backward probabilities matrices for a given observation referenced by its index in the input training data.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)
Protected methodComputeKsi
Computes the ksi matrix of probabilities for a given observation referenced by its index in the input training data.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)
Protected methodCreate
Creates an instance of the model to be learned. Inheritors of this abstract class must define this method so new models can be created from the training data.
(Overrides BaseHiddenMarkovModelLearningTModel, TObservationCreate(TObservation).)
Public methodEquals
Determines whether the specified object is equal to the current object.
(Inherited from Object.)
Protected methodFinalize
Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection.
(Inherited from Object.)
Protected methodFit
Fits one emission distribution. This method can be override in a base class in order to implement special fitting options.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)
Public methodGetHashCode
Serves as the default hash function.
(Inherited from Object.)
Public methodGetType
Gets the Type of the current instance.
(Inherited from Object.)
Public methodLearn
Learns a model that can map the given inputs to the desired outputs.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)
Protected methodMemberwiseClone
Creates a shallow copy of the current Object.
(Inherited from Object.)
Public methodToString
Returns a string that represents the current object.
(Inherited from Object.)
Protected methodUpdateEmissions
Updates the emission probability matrix.
(Inherited from BaseBaumWelchLearningTModel, TDistribution, TObservation, TOptions.)
Top
Extension Methods
  NameDescription
Public Extension MethodHasMethod
Checks whether an object implements a method with the given name.
(Defined by ExtensionMethods.)
Public Extension MethodIsEqual
Compares two objects for equality, performing an elementwise comparison if the elements are vectors or matrices.
(Defined by Matrix.)
Public Extension MethodTo(Type)Overloaded.
Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.)
Public Extension MethodToTOverloaded.
Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.)
Top
Remarks

The Baum-Welch algorithm is an unsupervised algorithm used to learn a single hidden Markov model object from a set of observation sequences. It works by using a variant of the Expectation-Maximization algorithm to search a set of model parameters (i.e. the matrix of transition probabilities A, the vector of state probability distributions B, and the initial probability vector π) that would result in a model having a high likelihood of being able to generate a set of training sequences given to this algorithm.

For increased accuracy, this class performs all computations using log-probabilities.

For a more thorough explanation on hidden Markov models with practical examples on gesture recognition, please see Sequence Classifiers in C#, Part I: Hidden Markov Models [1].

[1]: http://www.codeproject.com/Articles/541428/Sequence-Classifiers-in-Csharp-Part-I-Hidden-Marko

Examples

In the following example, we will create a Continuous Hidden Markov Model using a univariate Normal distribution to model properly model continuous sequences.

// Create continuous sequences. In the sequences below, there
//  seems to be two states, one for values between 0 and 1 and
//  another for values between 5 and 7. The states seems to be
//  switched on every observation.
double[][] sequences = new double[][]
{
    new double[] { 0.1, 5.2, 0.3, 6.7, 0.1, 6.0 },
    new double[] { 0.2, 6.2, 0.3, 6.3, 0.1, 5.0 },
    new double[] { 0.1, 7.0, 0.1, 7.0, 0.2, 5.6 },
};


// Specify a initial normal distribution for the samples.
var density = new NormalDistribution();

// Creates a continuous hidden Markov Model with two states organized in a forward
//  topology and an underlying univariate Normal distribution as probability density.
var model = new HiddenMarkovModel<NormalDistribution, double>(new Ergodic(2), density);

// Configure the learning algorithms to train the sequence classifier until the
// difference in the average log-likelihood changes only by as little as 0.0001
var teacher = new BaumWelchLearning<NormalDistribution, double>(model)
{
    Tolerance = 0.0001,
    Iterations = 0,
};

// Fit the model
teacher.Learn(sequences);

double logLikelihood = teacher.LogLikelihood;

// See the log-probability of the sequences learned
double a1 = model.LogLikelihood(new[] { 0.1, 5.2, 0.3, 6.7, 0.1, 6.0 }); // -0.12799388666109757
double a2 = model.LogLikelihood(new[] { 0.2, 6.2, 0.3, 6.3, 0.1, 5.0 }); // 0.01171157434400194

// See the probability of an unrelated sequence
double a3 = model.LogLikelihood(new[] { 1.1, 2.2, 1.3, 3.2, 4.2, 1.0 }); // -298.7465244473417

double likelihood = Math.Exp(logLikelihood);
a1 = Math.Exp(a1); // 0.879
a2 = Math.Exp(a2); // 1.011
a3 = Math.Exp(a3); // 0.000

// We can also ask the model to decode one of the sequences. After
// this step the resulting sequence will be: { 0, 1, 0, 1, 0, 1 }
// 
int[] states = model.Decide(new[] { 0.1, 5.2, 0.3, 6.7, 0.1, 6.0 });

In the following example, we will create a Discrete Hidden Markov Model using a Generic Discrete Probability Distribution to reproduce the same code example given in documentation.

// Continuous Markov Models can operate using any
// probability distribution, including discrete ones. 

// In the following example, we will try to create a
// Continuous Hidden Markov Model using a discrete
// distribution to detect if a given sequence starts
// with a zero and has any number of ones after that.

int[][] sequences = new double[][]
{
    new double[] { 0,1,1,1,1,0,1,1,1,1 },
    new double[] { 0,1,1,1,0,1,1,1,1,1 },
    new double[] { 0,1,1,1,1,1,1,1,1,1 },
    new double[] { 0,1,1,1,1,1         },
    new double[] { 0,1,1,1,1,1,1       },
    new double[] { 0,1,1,1,1,1,1,1,1,1 },
    new double[] { 0,1,1,1,1,1,1,1,1,1 },
}.ToInt32();

// Create a new Hidden Markov Model with 3 states and
//  a generic discrete distribution with two symbols
var hmm = HiddenMarkovModel.CreateDiscrete(3, 2);

// Try to fit the model to the data until the difference in
//  the average log-likelihood changes only by as little as 0.0001
var teacher = new BaumWelchLearning<GeneralDiscreteDistribution, int>(hmm)
{
    Tolerance = 0.0001,
    Iterations = 0
};

// Learn the model
teacher.Learn(sequences);

double ll = Math.Exp(teacher.LogLikelihood);

// Calculate the probability that the given
//  sequences originated from the model
double l1 = hmm.Probability(new int[] { 0, 1 });       // 0.999
double l2 = hmm.Probability(new int[] { 0, 1, 1, 1 }); // 0.916

// Sequences which do not start with zero have much lesser probability.
double l3 = hmm.Probability(new int[] { 1, 1 });       // 0.000
double l4 = hmm.Probability(new int[] { 1, 0, 0, 0 }); // 0.000

// Sequences which contains few errors have higher probability
//  than the ones which do not start with zero. This shows some
//  of the temporal elasticity and error tolerance of the HMMs.
double l5 = hmm.Probability(new int[] { 0, 1, 0, 1, 1, 1, 1, 1, 1 }); // 0.034
double l6 = hmm.Probability(new int[] { 0, 1, 1, 1, 1, 1, 1, 0, 1 }); // 0.034

The next example shows how to create a multivariate model using a multivariate normal distribution. In this example, sequences contain vector-valued observations, such as in the case of (x,y) pairs.

// Create sequences of vector-valued observations. In the
// sequence below, a single observation is composed of two
// coordinate values, such as (x, y). There seems to be two
// states, one for (x,y) values less than (5,5) and another
// for higher values. The states seems to be switched on
// every observation.
double[][][] sequences =
{
    new double[][] // sequence 1
    {
        new double[] { 1, 2 }, // observation 1 of sequence 1
        new double[] { 6, 7 }, // observation 2 of sequence 1
        new double[] { 2, 3 }, // observation 3 of sequence 1
    },
    new double[][] // sequence 2
    {
        new double[] { 2, 2 }, // observation 1 of sequence 2
        new double[] { 9, 8 }, // observation 2 of sequence 2
        new double[] { 1, 0 }, // observation 3 of sequence 2
    },
    new double[][] // sequence 3
    {
        new double[] { 1, 3 }, // observation 1 of sequence 3
        new double[] { 8, 9 }, // observation 2 of sequence 3
        new double[] { 3, 3 }, // observation 3 of sequence 3
    },
};


// Specify a initial normal distribution for the samples.
var density = new MultivariateNormalDistribution(dimension: 2);

// Creates a continuous hidden Markov Model with two states organized in a forward
//  topology and an underlying univariate Normal distribution as probability density.
var model = new HiddenMarkovModel<MultivariateNormalDistribution, double[]>(new Forward(2), density);

// Configure the learning algorithms to train the sequence classifier until the
// difference in the average log-likelihood changes only by as little as 0.0001
var teacher = new BaumWelchLearning<MultivariateNormalDistribution, double[]>(model)
{
    Tolerance = 0.0001,
    Iterations = 0,
};

// Fit the model
teacher.Learn(sequences);

double logLikelihood = teacher.LogLikelihood;

// See the likelihood of the sequences learned
double a1 = Math.Exp(model.LogLikelihood(new[] {
    new double[] { 1, 2 },
    new double[] { 6, 7 },
    new double[] { 2, 3 }})); // 0.000208

double a2 = Math.Exp(model.LogLikelihood(new[] {
    new double[] { 2, 2 },
    new double[] { 9, 8  },
    new double[] { 1, 0 }})); // 0.0000376

// See the likelihood of an unrelated sequence
double a3 = Math.Exp(model.LogLikelihood(new[] {
    new double[] { 8, 7 },
    new double[] { 9, 8  },
    new double[] { 1, 0 }})); // 2.10 x 10^(-89)

The following example shows how to create a hidden Markov model that considers each feature to be independent of each other. This is the same as following Bayes' assumption of independence for each feature in the feature vector.

// Let's say we have 2 meteorological sensors gathering data
// from different time periods of the day. Those periods are
// represented below:

double[][][] data =
{
    new double[][] // first sequence (we just repeated the measurements 
    {              //  once, so there is only one observation sequence)

        new double[] { 1, 2 }, // Day 1, 15:00 pm
        new double[] { 6, 7 }, // Day 1, 16:00 pm
        new double[] { 2, 3 }, // Day 1, 17:00 pm
        new double[] { 2, 2 }, // Day 1, 18:00 pm
        new double[] { 9, 8 }, // Day 1, 19:00 pm
        new double[] { 1, 0 }, // Day 1, 20:00 pm
        new double[] { 1, 3 }, // Day 1, 21:00 pm
        new double[] { 8, 9 }, // Day 1, 22:00 pm
        new double[] { 3, 3 }, // Day 1, 23:00 pm
        new double[] { 1, 3 }, // Day 2, 00:00 am
        new double[] { 1, 1 }, // Day 2, 01:00 am
    }
};

// Let's assume those sensors are unrelated (for simplicity). As
// such, let's assume the data gathered from the sensors may reside
// into circular centroids denoting each state the underlying system
// might be in.
NormalDistribution[] initial_components =
{
    new NormalDistribution(), // initial value for the first variable's distribution
    new NormalDistribution()  // initial value for the second variable's distribution
};

// Specify a initial independent normal distribution for the samples.
var density = new Independent<NormalDistribution, double>(initial_components);

// Creates a continuous hidden Markov Model with two states organized in an Ergodic
//  topology and an underlying independent Normal distribution as probability density.
var model = new HiddenMarkovModel<Independent<NormalDistribution>, double[]>(new Ergodic(2), density);

// Configure the learning algorithms to train the sequence classifier until the
// difference in the average log-likelihood changes only by as little as 0.0001
var teacher = new BaumWelchLearning<Independent<NormalDistribution>, double[]>(model)
{
    Tolerance = 0.0001,
    Iterations = 0,
};

// Fit the model
teacher.Learn(data);

double error = teacher.LogLikelihood;

// Get the hidden state associated with each observation
// 
int[] hiddenStates = null; // log-likelihood of the Viterbi path
double logLikelihood = model.LogLikelihood(data[0], ref hiddenStates);

Finally, the last example shows how to fit a mixture-density hidden Markov models.

// Suppose we have a set of six sequences and we would like to
// fit a hidden Markov model with mixtures of Normal distributions
// as the emission densities. 

// First, let's consider a set of univariate sequences:
double[][] sequences =
{
    new double[] { 1, 1, 2, 2, 2, 3, 3, 3 },
    new double[] { 1, 2, 2, 2, 3, 3 },
    new double[] { 1, 2, 2, 3, 3, 5 },
    new double[] { 2, 2, 2, 2, 3, 3, 3, 4, 5, 5, 1 },
    new double[] { 1, 1, 1, 2, 2, 5 },
    new double[] { 1, 2, 2, 4, 4, 5 },
};


// Now we can begin specifying a initial Gaussian mixture distribution. It is
// better to add some different initial parameters to the mixture components:
var density = new Mixture<NormalDistribution>(
    new NormalDistribution(mean: 2, stdDev: 1.0), // 1st component in the mixture
    new NormalDistribution(mean: 0, stdDev: 0.6), // 2nd component in the mixture
    new NormalDistribution(mean: 4, stdDev: 0.4), // 3rd component in the mixture
    new NormalDistribution(mean: 6, stdDev: 1.1)  // 4th component in the mixture
);

// Let's then create a continuous hidden Markov Model with two states organized in a forward
//  topology with the underlying univariate Normal mixture distribution as probability density.
var model = new HiddenMarkovModel<Mixture<NormalDistribution>, double>(new Forward(2), density);

// Now we should configure the learning algorithms to train the sequence classifier. We will
// learn until the difference in the average log-likelihood changes only by as little as 0.0001
var teacher = new BaumWelchLearning<Mixture<NormalDistribution>, double>(model)
{
    Tolerance = 0.0001,
    Iterations = 0,

    // Note, however, that since this example is extremely simple and we have only a few
    // data points, a full-blown mixture wouldn't really be needed. Thus we will have a
    // great chance that the mixture would become degenerated quickly. We can avoid this
    // by specifying some regularization constants in the Normal distribution fitting:

    FittingOptions = new MixtureOptions()
    {
        Iterations = 1, // limit the inner e-m to a single iteration

        InnerOptions = new NormalOptions()
        {
            Regularization = 1e-5 // specify a regularization constant
        }
    }
};

// Finally, we can fit the model
teacher.Learn(sequences);

double logLikelihood = teacher.LogLikelihood;

// And now check the likelihood of some approximate sequences.
double a1 = Math.Exp(model.LogLikelihood(new double[] { 1, 1, 2, 2, 3 })); // 2.3413833128741038E+45
double a2 = Math.Exp(model.LogLikelihood(new double[] { 1, 1, 2, 5, 5 })); // 9.94607618459872E+19

// We can see that the likelihood of an unrelated sequence is much smaller:
double a3 = Math.Exp(model.LogLikelihood(new double[] { 8, 2, 6, 4, 1 })); // 1.5063654166181737E-44

When using Normal distributions, it is often the case we might find problems which are difficult to solve. Some problems may include constant variables or other numerical difficulties preventing a the proper estimation of a Normal distribution from the data.

A sign of those difficulties arises when the learning algorithm throws the exception "Variance is zero. Try specifying a regularization constant in the fitting options" for univariate distributions (e.g. NormalDistribution or a NonPositiveDefiniteMatrixException informing that the "Covariance matrix is not positive definite. Try specifying a regularization constant in the fitting options" for multivariate distributions like the MultivariateNormalDistribution. In both cases, this is an indication that the variables being learned can not be suitably modeled by Normal distributions. To avoid numerical difficulties when estimating those probabilities, a small regularization constant can be added to the variances or to the covariance matrices until they become greater than zero or positive definite.

To specify a regularization constant as given in the above message, we can indicate a fitting options object for the model distribution using:

// Suppose we have a set of six sequences and we would like to
// fit a hidden Markov model with mixtures of Normal distributions
// as the emission densities. 

// First, let's consider a set of univariate sequences:
double[][] sequences =
{
    new double[] { -0.223, -1.05, -0.574, 0.965, -0.448, 0.265, 0.087, 0.362, 0.717, -0.032 },
    new double[] { -1.05, -0.574, 0.965, -0.448, 0.265, 0.087, 0.362, 0.717, -0.032, -0.346 },
    new double[] { -0.574, 0.965, -0.448, 0.265, 0.087, 0.362, 0.717, -0.032, -0.346, -0.989 },
    new double[] { 0.965, -0.448, 0.265, 0.087, 0.362, 0.717, -0.032, -0.346, -0.989, -0.619 },
    new double[] { -0.448, 0.265, 0.087, 0.362, 0.717, -0.032, -0.346, -0.989, -0.619, 0.02 },
    new double[] { 0.265, 0.087, 0.362, 0.717, -0.032, -0.346, -0.989, -0.619, 0.02, -0.297 },
};


// Now we can begin specifying a initial Gaussian mixture distribution. It is
// better to add some different initial parameters to the mixture components:
var density = new Mixture<NormalDistribution>(
    new NormalDistribution(mean: 2, stdDev: 1.0), // 1st component in the mixture
    new NormalDistribution(mean: 0, stdDev: 0.6), // 2nd component in the mixture
    new NormalDistribution(mean: 4, stdDev: 0.4), // 3rd component in the mixture
    new NormalDistribution(mean: 6, stdDev: 1.1)  // 4th component in the mixture
);

// Let's then create a continuous hidden Markov Model with two states organized in a forward
//  topology with the underlying univariate Normal mixture distribution as probability density.
var model = new HiddenMarkovModel<Mixture<NormalDistribution>, double>(new Forward(2), density);

// Now we should configure the learning algorithms to train the sequence classifier. We will
// learn until the difference in the average log-likelihood changes only by as little as 0.0001
var teacher = new BaumWelchLearning<Mixture<NormalDistribution>, double, MixtureOptions>(model)
{
    Tolerance = 0.0001,
    Iterations = 0,

    // Note, however, that since this example is extremely simple and we have only a few
    // data points, a full-blown mixture wouldn't really be needed. Thus we will have a
    // great chance that the mixture would become degenerated quickly. We can avoid this
    // by specifying some regularization constants in the Normal distribution fitting:

    FittingOptions = new MixtureOptions()
    {
        Iterations = 1, // limit the inner e-m to a single iteration

        InnerOptions = new NormalOptions()
        {
            Regularization = 1e-5 // specify a regularization constant
        }
    }
};

// Finally, we can fit the model
teacher.Learn(sequences);

double logLikelihood = teacher.LogLikelihood;

// And now check the likelihood of some approximate sequences.
double[] newSequence = { -0.223, -1.05, -0.574, 0.965, -0.448, 0.265, 0.087, 0.362, 0.717, -0.032 };
double a1 = Math.Exp(model.LogLikelihood(newSequence)); // 11729312967893.566

int[] path = model.Decide(newSequence);

// We can see that the likelihood of an unrelated sequence is much smaller:
double a3 = Math.Exp(model.LogLikelihood(new double[] { 8, 2, 6, 4, 1 })); // 0.0

Typically, any small value would suffice as a regularization constant, though smaller values may lead to longer fitting times. Too high values, on the other hand, would lead to decreased accuracy.

See Also