BagOfAudioWords Class

SystemObject
  Accord.MachineLearningParallelLearningBase
    Accord.MachineLearningBaseBagOfWordsBagOfAudioWords, MelFrequencyCepstrumCoefficientDescriptor, Double, IUnsupervisedLearningIClassifierDouble, Int32, Double, Int32, MelFrequencyCepstrumCoefficient, Signal
      Accord.AuditionBaseBagOfAudioWordsBagOfAudioWords, MelFrequencyCepstrumCoefficientDescriptor, Double, IUnsupervisedLearningIClassifierDouble, Int32, Double, Int32, MelFrequencyCepstrumCoefficient
        Accord.AuditionBagOfAudioWords

[SerializableAttribute]
public class BagOfAudioWords : BaseBagOfAudioWords<BagOfAudioWords, MelFrequencyCepstrumCoefficientDescriptor, double[], IUnsupervisedLearning<IClassifier<double[], int>, double[], int>, MelFrequencyCepstrumCoefficient>

<SerializableAttribute>
Public Class BagOfAudioWords
	Inherits BaseBagOfAudioWords(Of BagOfAudioWords, MelFrequencyCepstrumCoefficientDescriptor, Double(), IUnsupervisedLearning(Of IClassifier(Of Double(), Integer), Double(), Integer), MelFrequencyCepstrumCoefficient)

Request Example View Source

	Name	Description
	BagOfAudioWords(Int32)	Constructs a new BagOfAudioWords using a MFCC feature detector to identify features.
	BagOfAudioWords(IUnsupervisedLearningIClassifierDouble, Int32, Double, Int32)	Constructs a new BagOfAudioWords using a MFCC feature detector to identify features.

Top

	Name	Description
	Clustering	Gets the clustering algorithm used to create this model. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Detector	Gets the feature extractor used to identify features in the input data. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	MaxDescriptorsPerInstance	Gets or sets the maximum number of descriptors per image that should be used to learn the codebook. Default is 0 (meaning to use all descriptors). (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	NumberOfDescriptors	Gets or sets the maximum number of descriptors that should be used to learn the codebook. Default is 0 (meaning to use all descriptors). (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	NumberOfInputs	Gets the number of inputs accepted by the model. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	NumberOfOutputs	Gets the number of outputs generated by the model. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	NumberOfWords	Gets the number of words in this codebook. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	ParallelOptions	Gets or sets the parallelization options for this algorithm. (Inherited from ParallelLearningBase.)
	Statistics	Gets statistics about the last codebook learned. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Token	Gets or sets a cancellation token that can be used to cancel the algorithm while it is running. (Inherited from ParallelLearningBase.)

Top

	Name	Description
	Create(Int32)	Creates a Bag-of-Words model using MFCC and K-Means.
	CreateTClustering(TClustering)	Creates a Bag-of-Words model using the MFCC feature extractor and the given clustering algorithm.
	CreateTExtractor(TExtractor, Int32)	Creates a Bag-of-Words model using the given feature detector and K-Means.
	CreateTExtractor, TClustering(TExtractor, Int32)	Creates a Bag-of-Words model using the given feature detector and K-Means.
	CreateTExtractor, TClustering(TExtractor, TClustering)	Creates a Bag-of-Words model using the given feature detector and clustering algorithm.
	CreateTExtractor, TClustering, TFeature(TExtractor, TClustering)	Creates a Bag-of-Words model using the given feature detector and clustering algorithm.
	CreateTExtractor, TClustering, TPoint, TFeature(TExtractor, TClustering)	Creates a Bag-of-Words model using the given feature detector and clustering algorithm.
	Equals	Determines whether the specified object is equal to the current object. (Inherited from Object.)
	Finalize	Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object.)
	For	Executes a parallel for using the feature detector in a thread-safe way. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	GetHashCode	Serves as the default hash function. (Inherited from Object.)
	GetType	Gets the Type of the current instance. (Inherited from Object.)
	Init	Initializes this instance. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	InnerLearnT	Generic learn method implementation that should work for any input type. This method is useful for re-using code between methods that accept Bitmap, BitmapData, UnmanagedImage, filenames as strings, etc. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Learn(String, Double)	Learns a model that can map the given inputs to the desired outputs. (Inherited from BaseBagOfAudioWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Learn(TFeature, Double)	Learns a model that can map the given inputs to the desired outputs. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Learn(TInput, Double)	Learns a model that can map the given inputs to the desired outputs. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	MemberwiseClone	Creates a shallow copy of the current Object. (Inherited from Object.)
	ToString	Returns a string that represents the current object. (Inherited from Object.)
	Transform(String)	Applies the transformation to an input, producing an associated output. (Inherited from BaseBagOfAudioWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Transform(String)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfAudioWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Transform(ListTPoint)	Applies the transformation to an input, producing an associated output. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Transform(TInput)	Applies the transformation to an input, producing an associated output. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Transform(TInput)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Transform(String, Double)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfAudioWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Transform(String, Int32)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfAudioWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Transform(String, Double)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfAudioWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Transform(String, Int32)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfAudioWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Transform(IEnumerableTPoint, Double)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Transform(IEnumerableTPoint, Int32)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Transform(TInput, Double)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Transform(TInput, Int32)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Transform(TInput, Double)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Transform(TInput, Int32)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)

Top

	Name	Description
	HasMethod	Checks whether an object implements a method with the given name. (Defined by ExtensionMethods.)
	IsEqual	Compares two objects for equality, performing an elementwise comparison if the elements are vectors or matrices. (Defined by Matrix.)
	To(Type)	Overloaded. Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime. (Defined by ExtensionMethods.)
	ToT	Overloaded. Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime. (Defined by ExtensionMethods.)

Top

The bag-of-words (BoW) model can be used to transform data with multiple possible lengths (i.e. words in a text, pixels in an image) into finite-dimensional vectors of fixed length. Those vectors are usually referred as representations as they can be used in place of the original data as if they were the data itself. For example, using Bag-of-Words it becomes possible to transform a set of N images with varying sizes and dimensions into a N x C matrix where C is the number of "visual words" being used to represent each of the N images in the set.

Those rows can then be used in classification, clustering, and any other machine learning tasks where a finite vector representation would be required.

The framework can compute BoW representations for images using any choice of feature extractor and clustering algorithm. By default, the framework uses the MFCC features extractor and the KMeans clustering algorithm.

The first example shows how to create and use a BoW with default parameters.

Copy

// Ensure results are reproducible
Accord.Math.Random.Generator.Seed = 0;

// The Bag-of-Audio-Words model converts audio signals of arbitrary 
// size into fixed-length feature vectors. In this example, we
// will be setting the codebook size to 10. This means all feature
// vectors that will be generated will have the same length of 10.

// By default, the BoW object will use the MFCC extractor as the 
// feature extractor and K-means as the clustering algorithm.

// Create a new Bag-of-Audio-Words (BoW) model
var bow = BagOfAudioWords.Create(numberOfWords: 32);
// Note: a simple BoW model can also be created using
// var bow = new BagOfAudioWords(numberOfWords: 10);

// Get some training images
FreeSpokenDigitsDataset fsdd = new FreeSpokenDigitsDataset(basePath);
string[] trainFileNames = fsdd.Training.LocalPaths;
int[] trainOutputs = fsdd.Training.Digits;

// Compute the model
bow.Learn(trainFileNames);

// After this point, we will be able to translate
// the signals into double[] feature vectors using
double[][] trainInputs = bow.Transform(trainFileNames);

// We can also check some statistics about the dataset:
int numberOfSignals = bow.Statistics.TotalNumberOfInstances; // 1350

// Statistics about all the descriptors that have been extracted:
int totalDescriptors = bow.Statistics.TotalNumberOfDescriptors; // 29106
double totalMean = bow.Statistics.TotalNumberOfDescriptorsPerInstance.Mean; // 21.56
double totalVar = bow.Statistics.TotalNumberOfDescriptorsPerInstance.Variance; // 52.764002965159314
IntRange totalRange = bow.Statistics.TotalNumberOfDescriptorsPerInstanceRange; // [8, 115]

// Statistics only about the descriptors that have been actually used:
int takenDescriptors = bow.Statistics.NumberOfDescriptorsTaken; // 29106
double takenMean = bow.Statistics.NumberOfDescriptorsTakenPerInstance.Mean; // 21.56
double takenVar = bow.Statistics.NumberOfDescriptorsTakenPerInstance.Variance; // 52.764002965159314
IntRange takenRange = bow.Statistics.NumberOfDescriptorsTakenPerInstanceRange; // [8, 115]

After the representations have been extracted, it is possible to use them in arbitrary machine learning tasks, such as classification:

Copy

// Now, the features can be used to train any classification
// algorithm as if they were the signals themselves. For example,
// we can use them to train an Chi-square SVM as shown below:

// Create the SMO algorithm to learn a Chi-Square kernel SVM
var teacher = new MulticlassSupportVectorLearning<ChiSquare>()
{
    Learner = (p) => new SequentialMinimalOptimization<ChiSquare>()
};

// Obtain a learned machine
var svm = teacher.Learn(trainInputs, trainOutputs);

// Use the machine to classify the features
int[] output = svm.Decide(trainInputs);

// Compute the error between the expected and predicted labels for the training set:
var trainMetrics = GeneralConfusionMatrix.Estimate(svm, trainInputs, trainOutputs);
double trainAcc = trainMetrics.Accuracy; // should be around 0.97259259259259256

// Now, we can evaluate the performance of the model on the testing set:
string[] testFileNames = fsdd.Testing.LocalPaths;
int[] testOutputs = fsdd.Testing.Digits;

// First we transform the testing set to double[]:
double[][] testInputs = bow.Transform(testFileNames);

// Then we compute the error between expected and predicted for the testing set:
var testMetrics = GeneralConfusionMatrix.Estimate(svm, testInputs, testOutputs);
double testAcc = testMetrics.Accuracy; // should be around 0.8666666666666667

Reference

Accord.Audition Namespace

Accord.AuditionBagOfAudioWordsTFeature, TPoint

Accord.AuditionBagOfAudioWordsTFeature, TPoint, TClustering, TExtractor