BagOfVisualWords Class

Bag of Visual Words

Inheritance Hierarchy

SystemObject
  Accord.MachineLearningParallelLearningBase
    Accord.MachineLearningBaseBagOfWordsBagOfVisualWords, SpeededUpRobustFeaturePoint, Double, IUnsupervisedLearningIClassifierDouble, Int32, Double, Int32, SpeededUpRobustFeaturesDetector, UnmanagedImage
      Accord.ImagingBaseBagOfVisualWordsBagOfVisualWords, SpeededUpRobustFeaturePoint, Double, IUnsupervisedLearningIClassifierDouble, Int32, Double, Int32, SpeededUpRobustFeaturesDetector
        Accord.ImagingBagOfVisualWords

Namespace: Accord.Imaging
Assembly: Accord.Vision (in Accord.Vision.dll) Version: 3.8.0

Syntax

Copy

[SerializableAttribute]
public class BagOfVisualWords : BaseBagOfVisualWords<BagOfVisualWords, SpeededUpRobustFeaturePoint, double[], IUnsupervisedLearning<IClassifier<double[], int>, double[], int>, SpeededUpRobustFeaturesDetector>

<SerializableAttribute>
Public Class BagOfVisualWords
	Inherits BaseBagOfVisualWords(Of BagOfVisualWords, SpeededUpRobustFeaturePoint, Double(), IUnsupervisedLearning(Of IClassifier(Of Double(), Integer), Double(), Integer), SpeededUpRobustFeaturesDetector)

Request Example View Source

The BagOfVisualWords type exposes the following members.

Constructors

	Name	Description
	BagOfVisualWords(Int32)	Constructs a new BagOfVisualWords using a surf feature detector to identify features.
	BagOfVisualWords(IUnsupervisedLearningIClassifierDouble, Int32, Double, Int32)	Constructs a new BagOfVisualWords using a surf feature detector to identify features.

Top

Properties

	Name	Description
	Clustering	Gets the clustering algorithm used to create this model. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Detector	Gets the feature extractor used to identify features in the input data. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	MaxDescriptorsPerInstance	Gets or sets the maximum number of descriptors per image that should be used to learn the codebook. Default is 0 (meaning to use all descriptors). (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	NumberOfDescriptors	Gets or sets the maximum number of descriptors that should be used to learn the codebook. Default is 0 (meaning to use all descriptors). (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	NumberOfInputs	Gets the number of inputs accepted by the model. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	NumberOfOutputs	Gets the number of outputs generated by the model. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	NumberOfWords	Gets the number of words in this codebook. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	ParallelOptions	Gets or sets the parallelization options for this algorithm. (Inherited from ParallelLearningBase.)
	Statistics	Gets statistics about the last codebook learned. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Token	Gets or sets a cancellation token that can be used to cancel the algorithm while it is running. (Inherited from ParallelLearningBase.)

Top

Methods

	Name	Description
	Compute(Bitmap)	Obsolete. Computes the Bag of Words model. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Compute(TPoint)	Obsolete. Computes the Bag of Words model. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Compute(Bitmap, Double)	Obsolete. Computes the Bag of Words model. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Create(Int32)	Creates a Bag-of-Words model using SpeededUpRobustFeaturesDetector and KMeans
	CreateTClustering(TClustering)	Creates a Bag-of-Words model using the SURF feature detector and the given clustering algorithm.
	CreateTExtractor(TExtractor, Int32)	Creates a Bag-of-Words model using the given feature detector and KMeans.
	CreateTExtractor, TClustering(TExtractor, TClustering)	Creates a Bag-of-Words model using the given feature detector and clustering algorithm.
	CreateTExtractor, TClustering(IImageFeatureExtractorFeatureDescriptor, TClustering)	Creates a Bag-of-Words model using the given feature detector and clustering algorithm.
	CreateTExtractor, TClustering, TFeature(TExtractor, TClustering)	Creates a Bag-of-Words model using the given feature detector and clustering algorithm.
	CreateTExtractor, TClustering, TPoint, TFeature(TExtractor, TClustering)	Creates a Bag-of-Words model using the given feature detector and clustering algorithm.
	Equals	Determines whether the specified object is equal to the current object. (Inherited from Object.)
	Finalize	Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object.)
	For	Executes a parallel for using the feature detector in a thread-safe way. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	GetFeatureVector(ListTFeature)	Obsolete. Gets the codeword representation of a given image. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	GetFeatureVector(Bitmap)	Obsolete. Gets the codeword representation of a given image. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	GetFeatureVector(String)	Obsolete. Gets the codeword representation of a given image. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	GetFeatureVector(UnmanagedImage)	Obsolete. Gets the codeword representation of a given image. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	GetHashCode	Serves as the default hash function. (Inherited from Object.)
	GetType	Gets the Type of the current instance. (Inherited from Object.)
	Init	Initializes this instance. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	InnerLearnT	Generic learn method implementation that should work for any input type. This method is useful for re-using code between methods that accept Bitmap, BitmapData, UnmanagedImage, filenames as strings, etc. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Learn(Bitmap, Double)	Learns a model that can map the given inputs to the desired outputs. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Learn(String, Double)	Learns a model that can map the given inputs to the desired outputs. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Learn(TFeature, Double)	Learns a model that can map the given inputs to the desired outputs. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Learn(TInput, Double)	Learns a model that can map the given inputs to the desired outputs. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Load(Stream)	Obsolete. Loads a bag of words from a stream.
	Load(String)	Obsolete. Loads a bag of words from a file.
	LoadTPoint(Stream)	Obsolete. Loads a bag of words from a stream.
	LoadTPoint(String)	Obsolete. Loads a bag of words from a file.
	LoadTPoint, TFeature(Stream)	Obsolete. Loads a bag of words from a stream.
	LoadTPoint, TFeature(String)	Obsolete. Loads a bag of words from a file.
	MemberwiseClone	Creates a shallow copy of the current Object. (Inherited from Object.)
	Save(Stream)	Obsolete. Saves the bag of words to a stream. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Save(String)	Obsolete. Saves the bag of words to a file. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	ToString	Returns a string that represents the current object. (Inherited from Object.)
	Transform(Bitmap)	Applies the transformation to an input, producing an associated output. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Transform(Bitmap)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Transform(String)	Applies the transformation to an input, producing an associated output. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Transform(String)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Transform(ListTPoint)	Applies the transformation to an input, producing an associated output. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Transform(TInput)	Applies the transformation to an input, producing an associated output. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Transform(TInput)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Transform(Bitmap, Double)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Transform(Bitmap, Int32)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Transform(Bitmap, Double)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Transform(Bitmap, Int32)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Transform(String, Double)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Transform(String, Int32)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Transform(String, Double)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Transform(String, Int32)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfVisualWordsTModel, TFeature, TPoint, TClustering, TExtractor.)
	Transform(IEnumerableTPoint, Double)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Transform(IEnumerableTPoint, Int32)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Transform(TInput, Double)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Transform(TInput, Int32)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Transform(TInput, Double)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)
	Transform(TInput, Int32)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from BaseBagOfWordsTModel, TPoint, TFeature, TClustering, TExtractor, TInput.)

Top

Extension Methods

	Name	Description
	HasMethod	Checks whether an object implements a method with the given name. (Defined by ExtensionMethods.)
	IsEqual	Compares two objects for equality, performing an elementwise comparison if the elements are vectors or matrices. (Defined by Matrix.)
	To(Type)	Overloaded. Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime. (Defined by ExtensionMethods.)
	ToT	Overloaded. Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime. (Defined by ExtensionMethods.)

Top

Remarks

The bag-of-words (BoW) model can be used to transform data with multiple possible lengths (i.e. words in a text, pixels in an image) into finite-dimensional vectors of fixed length. Those vectors are usually referred as representations as they can be used in place of the original data as if they were the data itself. For example, using Bag-of-Words it becomes possible to transform a set of N images with varying sizes and dimensions into a N x C matrix where C is the number of "visual words" being used to represent each of the N images in the set.

Those rows can then be used in classification, clustering, and any other machine learning tasks where a finite vector representation would be required.

The framework can compute BoW representations for images using any choice of feature extractor and clustering algorithm. By default, the framework uses the SURF features detector and the KMeans clustering algorithm.

Examples

The first example shows how to create and use a BoW with default parameters.

Copy

// Ensure results are reproducible
Accord.Math.Random.Generator.Seed = 0;

// The Bag-of-Visual-Words model converts images of arbitrary 
// size into fixed-length feature vectors. In this example, we
// will be setting the codebook size to 10. This means all feature
// vectors that will be generated will have the same length of 10.

// By default, the BoW object will use the sparse SURF as the 
// feature extractor and K-means as the clustering algorithm.

// Create a new Bag-of-Visual-Words (BoW) model
var bow = BagOfVisualWords.Create(numberOfWords: 10);
// Note: a simple BoW model can also be created using
// var bow = new BagOfVisualWords(numberOfWords: 10);

// Get some training images
Bitmap[] images = GetImages();

// Compute the model
bow.Learn(images);

// After this point, we will be able to translate
// images into double[] feature vectors using
double[][] features = bow.Transform(images);

// We can also check some statistics about the dataset:
int numberOfImages = bow.Statistics.TotalNumberOfInstances; // 6

// Statistics about all the descriptors that have been extracted:
int totalDescriptors = bow.Statistics.TotalNumberOfDescriptors; // 4132
double totalMean = bow.Statistics.TotalNumberOfDescriptorsPerInstance.Mean; // 688.66666666666663
double totalVar = bow.Statistics.TotalNumberOfDescriptorsPerInstance.Variance; // 96745.866666666669
IntRange totalRange = bow.Statistics.TotalNumberOfDescriptorsPerInstanceRange; // [409, 1265]

// Statistics only about the descriptors that have been actually used:
int takenDescriptors = bow.Statistics.NumberOfDescriptorsTaken; // 4132
double takenMean = bow.Statistics.NumberOfDescriptorsTakenPerInstance.Mean; // 688.66666666666663
double takenVar = bow.Statistics.NumberOfDescriptorsTakenPerInstance.Variance; // 96745.866666666669
IntRange takenRange = bow.Statistics.NumberOfDescriptorsTakenPerInstanceRange; // [409, 1265]

After the representations have been extracted, it is possible to use them in arbitrary machine learning tasks, such as classification:

Copy

// Now, the features can be used to train any classification
// algorithm as if they were the images themselves. For example,
// let's assume the first three images belong to a class and
// the second three to another class. We can train an SVM using

int[] labels = { -1, -1, -1, +1, +1, +1 };

// Create the SMO algorithm to learn a Linear kernel SVM
var teacher = new SequentialMinimalOptimization<Linear>()
{
    Complexity = 10000 // make a hard margin SVM
};

// Obtain a learned machine
var svm = teacher.Learn(features, labels);

// Use the machine to classify the features
bool[] output = svm.Decide(features);

// Compute the error between the expected and predicted labels
double error = new ZeroOneLoss(labels).Loss(output);

By default, the BoW uses K-Means to cluster feature vectors. The next example demonstrates how to use a different clustering algorithm when computing the BoW, including the Binary Split algorithm.

Copy

// Ensure results are reproducible
Accord.Math.Random.Generator.Seed = 0;

// The Bag-of-Visual-Words model converts images of arbitrary 
// size into fixed-length feature vectors. In this example, we
// will be setting the codebook size to 10. This means all feature
// vectors that will be generated will have the same length of 10.

// By default, the BoW object will use the sparse SURF as the 
// feature extractor and K-means as the clustering algorithm.
// In this example, we will use the Binary-Split clustering
// algorithm instead.

// Create a new Bag-of-Visual-Words (BoW) model
var bow = BagOfVisualWords.Create(new BinarySplit(10));

// Since we are using generics, we can setup properties 
// of the binary split clustering algorithm directly:
bow.Clustering.ComputeProportions = true;
bow.Clustering.ComputeCovariances = false;

// Get some training images
Bitmap[] images = GetImages();

// Compute the model
bow.Learn(images);

// After this point, we will be able to translate
// images into double[] feature vectors using
double[][] features = bow.Transform(images);

Copy

// Now, the features can be used to train any classification
// algorithm as if they were the images themselves. For example,
// let's assume the first three images belong to a class and
// the second three to another class. We can train an SVM using

int[] labels = { -1, -1, -1, +1, +1, +1 };

// Create the SMO algorithm to learn a Linear kernel SVM
var teacher = new SequentialMinimalOptimization<Linear>()
{
    Complexity = 10000 // make a hard margin SVM
};

// Obtain a learned machine
var svm = teacher.Learn(features, labels);

// Use the machine to classify the features
bool[] output = svm.Decide(features);

// Compute the error between the expected and predicted labels
double error = new ZeroOneLoss(labels).Loss(output); // should be 0

By default, the BoW uses the SURF feature detector to extract sparse features from the images. However, it is also possible to use other detectors, including dense detectors such as HistogramsOfOrientedGradients.

Copy

Accord.Math.Random.Generator.Seed = 0;

// The Bag-of-Visual-Words model converts images of arbitrary 
// size into fixed-length feature vectors. In this example, we
// will be setting the codebook size to 10. This means all feature
// vectors that will be generated will have the same length of 10.

// By default, the BoW object will use the sparse SURF as the 
// feature extractor and K-means as the clustering algorithm.
// In this example, we will use the HOG feature extractor
// and the Binary-Split clustering algorithm instead. However, 
// this is just an example: the best features and the best clustering 
// algorithm might need to be found through experimentation. Please
// also try with KMeans first to obtain a baseline value.

// Create a new Bag-of-Visual-Words (BoW) model using HOG features
var bow = BagOfVisualWords.Create(new HistogramsOfOrientedGradients(), new BinarySplit(10));

// Get some training images
Bitmap[] images = GetImages();

// Compute the model
bow.Learn(images);

// After this point, we will be able to translate
// images into double[] feature vectors using
double[][] features = bow.Transform(images);

Copy

// Now, the features can be used to train any classification
// algorithm as if they were the images themselves. For example,
// let's assume the first three images belong to a class and
// the second three to another class. We can train an SVM using

int[] labels = { -1, -1, -1, +1, +1, +1 };

// Create the SMO algorithm to learn a Linear kernel SVM
var teacher = new SequentialMinimalOptimization<Linear>()
{
    Complexity = 100 // make a hard margin SVM
};

// Obtain a learned machine
var svm = teacher.Learn(features, labels);

// Use the machine to classify the features
bool[] output = svm.Decide(features);

// Compute the error between the expected and predicted labels
double error = new ZeroOneLoss(labels).Loss(output); // should be 0

Or this also simple case using the FREAK detector and the BinarySplit clustering algorithm:

Copy

// Ensure results are reproducible
Accord.Math.Random.Generator.Seed = 0;

// The Bag-of-Visual-Words model converts images of arbitrary 
// size into fixed-length feature vectors. In this example, we
// will be setting the codebook size to 10. This means all feature
// vectors that will be generated will have the same length of 10.

// By default, the BoW object will use the sparse SURF as the 
// feature extractor and K-means as the clustering algorithm.
// In this example, we will use the FREAK feature extractor
// and the Binary-Split clustering algorithm instead.

// Create a new Bag-of-Visual-Words (BoW) model using FREAK binary features
var bow = BagOfVisualWords.Create(new FastRetinaKeypointDetector(), new BinarySplit(10));

// Get some training images
Bitmap[] images = GetImages();

// Compute the model
bow.Learn(images);

bow.ParallelOptions.MaxDegreeOfParallelism = 1;

// After this point, we will be able to translate
// images into double[] feature vectors using
double[][] features = bow.Transform(images);

Copy

// Now, the features can be used to train any classification
// algorithm as if they were the images themselves. For example,
// let's assume the first three images belong to a class and
// the second three to another class. We can train an SVM using

int[] labels = { -1, -1, -1, +1, +1, +1 };

// Create the SMO algorithm to learn a Linear kernel SVM
var teacher = new SequentialMinimalOptimization<Linear>()
{
    Complexity = 1000 // make a hard margin SVM
};

// Obtain a learned machine
var svm = teacher.Learn(features, labels);

// Use the machine to classify the features
bool[] output = svm.Decide(features);

// Compute the error between the expected and predicted labels
double error = new ZeroOneLoss(labels).Loss(output); // should be 0

More advanced use cases are also supported. For example, some image patches can be represented using different data representations, such as byte vectors. In this case, it is still possible to use the BoW using an appropriate clustering algorithm that doesn't depend on Euclidean distances.

Copy

// Ensure results are reproducible
Accord.Math.Random.Generator.Seed = 0;

// The Bag-of-Visual-Words model converts images of arbitrary 
// size into fixed-length feature vectors. In this example, we
// will be setting the codebook size to 10. This means all feature
// vectors that will be generated will have the same length of 10.

// By default, the BoW object will use the sparse SURF as the 
// feature extractor and K-means as the clustering algorithm.
// In this example, we will use the FREAK feature extractor
// and the K-Modes clustering algorithm instead.

// Create a new Bag-of-Visual-Words (BoW) model using FREAK binary features
var bow = BagOfVisualWords.Create<FastRetinaKeypointDetector, KModes<byte>, byte[]>(
    new FastRetinaKeypointDetector(), new KModes<byte>(10, new Hamming()));

// Get some training images
Bitmap[] images = GetImages();

// Compute the model
bow.Learn(images);

// After this point, we will be able to translate
// images into double[] feature vectors using
double[][] features = bow.Transform(images);

Copy

// Now, the features can be used to train any classification
// algorithm as if they were the images themselves. For example,
// let's assume the first three images belong to a class and
// the second three to another class. We can train an SVM using

int[] labels = { -1, -1, -1, +1, +1, +1 };

// Create the SMO algorithm to learn a Linear kernel SVM
var teacher = new SequentialMinimalOptimization<Linear>()
{
    Complexity = 1000 // make a hard margin SVM
};

// Obtain a learned machine
var svm = teacher.Learn(features, labels);

// Use the machine to classify the features
bool[] output = svm.Decide(features);

// Compute the error between the expected and predicted labels
double error = new ZeroOneLoss(labels).Loss(output); // should be 0

Other more specialized feature extractors can also be used, such as Haralick texture feature extractors for performing texture classification.

Copy

// Ensure results are reproducible
Accord.Math.Random.Generator.Seed = 0;

// The Bag-of-Visual-Words model converts images of arbitrary 
// size into fixed-length feature vectors. In this example, we
// will be setting the codebook size to 3. This means all feature
// vectors that will be generated will have the same length of 3.

// By default, the BoW object will use the sparse SURF as the 
// feature extractor and K-means as the clustering algorithm.
// In this example, we will use the Haralick feature extractor.

// Create a new Bag-of-Visual-Words (BoW) model using Haralick features
var bow = BagOfVisualWords.Create(new Haralick()
{
    CellSize = 256, // divide images in cells of 256x256 pixels
    Mode = HaralickMode.AverageWithRange,
}, new KMeans(3));

// Generate some training images. Haralick is best for classifying
// textures, so we will be generating examples of wood and clouds:
var woodenGenerator = new WoodTexture();
var cloudsGenerator = new CloudsTexture();

Bitmap[] images = new[]
{
    woodenGenerator.Generate(512, 512).ToBitmap(),
    woodenGenerator.Generate(512, 512).ToBitmap(),
    woodenGenerator.Generate(512, 512).ToBitmap(),

    cloudsGenerator.Generate(512, 512).ToBitmap(),
    cloudsGenerator.Generate(512, 512).ToBitmap(),
    cloudsGenerator.Generate(512, 512).ToBitmap()
};

// Compute the model
bow.Learn(images);

bow.ParallelOptions.MaxDegreeOfParallelism = 1;

// After this point, we will be able to translate
// images into double[] feature vectors using
double[][] features = bow.Transform(images);

Copy

// Now, the features can be used to train any classification
// algorithm as if they were the images themselves. For example,
// let's assume the first three images belong to a class and
// the second three to another class. We can train an SVM using

int[] labels = { -1, -1, -1, +1, +1, +1 };

// Create the SMO algorithm to learn a Linear kernel SVM
var teacher = new SequentialMinimalOptimization<Linear>()
{
    Complexity = 100 // make a hard margin SVM
};

// Obtain a learned machine
var svm = teacher.Learn(features, labels);

// Use the machine to classify the features
bool[] output = svm.Decide(features);

// Compute the error between the expected and predicted labels
double error = new ZeroOneLoss(labels).Loss(output); // should be 0

In some applications, learning a BoW with the default settings might need a large amount of memory to be available. In those cases, it is possible to reduce the memory and CPU requirements for the learning phase using the NumberOfDescriptors and MaxDescriptorsPerInstance properties. It is also possible to avoid loading all images at once by feeding the algorithm with the image filenames instead of their Bitmap representations:

Copy

// Ensure results are reproducible
Accord.Math.Random.Generator.Seed = 0;

// Depending on the problem we are trying to tackle, learning a BoW might require 
// large amounts of available memory. In those cases, we can alleviate the amount
// of memory required by using only a subsample of the training datasete to learn
// the model. Likewise, we can also load images from the disk on-demand instead of
// having to load all of them right at the beginning.

// Create a new Bag-of-Visual-Words (BoW) model
var bow = BagOfVisualWords.Create(numberOfWords: 10);

// We will learn the codebooks from only 25 descriptors, which
// will be randomly selected from the multiple training images
bow.NumberOfDescriptors = 1000; // Note: in the real world, use >10,000 samples

// We will load at most 5 descriptors from each image. This means
// that we will only keep 5 descriptors per image at maximum in 
// memory at a given time.
bow.MaxDescriptorsPerInstance = 200; // Note: In the real world, use >1,000 samples

// Get some training images. Here, instead of loading Bitmaps as in
// the other examples, we will just specify their paths in the disk:
string[] filenames =
{
    Path.Combine(basePath, "flower01.jpg"),
    Path.Combine(basePath, "flower02.jpg"),
    Path.Combine(basePath, "flower03.jpg"),
    Path.Combine(basePath, "flower04.jpg"),
    Path.Combine(basePath, "flower05.jpg"),
    Path.Combine(basePath, "flower06.jpg"),
};

// Compute the model
bow.Learn(filenames);

// After this point, we will be able to translate
// images into double[] feature vectors using
double[][] features = bow.Transform(filenames);

// We can also check some statistics about the dataset:
int numberOfImages = bow.Statistics.TotalNumberOfInstances; // 6

// Statistics about all the descriptors that have been extracted:
int totalDescriptors = bow.Statistics.TotalNumberOfDescriptors; // 4132
double totalMean = bow.Statistics.TotalNumberOfDescriptorsPerInstance.Mean; // 688.66666666666663
double totalVar = bow.Statistics.TotalNumberOfDescriptorsPerInstance.Variance; // 96745.866666666669
IntRange totalRange = bow.Statistics.TotalNumberOfDescriptorsPerInstanceRange; // [409, 1265]

// Statistics only about the descriptors that have been actually used:
int takenDescriptors = bow.Statistics.NumberOfDescriptorsTaken; // 1000
double takenMean = bow.Statistics.NumberOfDescriptorsTakenPerInstance.Mean; // 200
double takenVar = bow.Statistics.NumberOfDescriptorsTakenPerInstance.Variance; // 0
IntRange takenRange = bow.Statistics.NumberOfDescriptorsTakenPerInstanceRange; // [200, 200]

Copy

// Now, the features can be used to train any classification
// algorithm as if they were the images themselves. For example,
// let's assume the first three images belong to a class and
// the second three to another class. We can train an SVM using

int[] labels = { -1, -1, -1, +1, +1, +1 };

// Create the SMO algorithm to learn a Linear kernel SVM
var teacher = new SequentialMinimalOptimization<Linear>()
{
    Complexity = 10000 // make a hard margin SVM
};

// Obtain a learned machine
var svm = teacher.Learn(features, labels);

// Use the machine to classify the features
bool[] output = svm.Decide(features);

// Compute the error between the expected and predicted labels
double error = new ZeroOneLoss(labels).Loss(output);

Reference

Accord.Imaging Namespace

Accord.ImagingBagOfVisualWordsTPoint

Accord.ImagingBagOfVisualWordsTPoint, TFeature

Accord.ImagingBagOfVisualWordsTPoint, TFeature, TClustering, TExtractor