Click or drag to resize
Accord.NET (logo)

MiniBatchKMeans Class

Fast k-means clustering algorithm.
Inheritance Hierarchy
SystemObject
  Accord.MachineLearningParallelLearningBase
    Accord.MachineLearningKMeans
      Accord.MachineLearningMiniBatchKMeans

Namespace:  Accord.MachineLearning
Assembly:  Accord.MachineLearning (in Accord.MachineLearning.dll) Version: 3.8.0
Syntax
public class MiniBatchKMeans : KMeans
Request Example View Source

The MiniBatchKMeans type exposes the following members.

Constructors
Properties
  NameDescription
Public propertyBatchSize
Gets or sets the size of batches.
Public propertyCentroids
Gets or sets the cluster centroids.
(Inherited from KMeans.)
Public propertyClusters
Gets the clusters found by K-means.
(Inherited from KMeans.)
Public propertyComputeCovariances
Gets or sets whether covariance matrices for the clusters should be computed at the end of an iteration. Default is true.
(Inherited from KMeans.)
Public propertyComputeError
Gets or sets whether the clustering distortion error (the average distance between all data points and the cluster centroids) should be computed at the end of the algorithm. The result will be stored in Error. Default is true.
(Inherited from KMeans.)
Public propertyDimension
Gets the dimensionality of the data space.
(Inherited from KMeans.)
Public propertyDistance
Gets or sets the distance function used as a distance metric between data points.
(Inherited from KMeans.)
Public propertyError
Gets the cluster distortion error after the last call to this class' Compute methods.
(Inherited from KMeans.)
Public propertyInitializationBatchSize
Gets or sets the size of the batch used during initialization.
Public propertyIterations
Gets the number of iterations performed in the last call to this class' Compute methods.
(Inherited from KMeans.)
Public propertyK
Gets the number of clusters.
(Inherited from KMeans.)
Public propertyLabels
Public propertyMaxIterations
Gets or sets the maximum number of iterations to be performed by the method. If set to zero, no iteration limit will be imposed. Default is 0.
(Inherited from KMeans.)
Public propertyNumberOfInitializations
Gets or sets the number of different initializations of the centroids.
Public propertyParallelOptions
Gets or sets the parallelization options for this algorithm.
(Inherited from ParallelLearningBase.)
Public propertyToken
Gets or sets a cancellation token that can be used to cancel the algorithm while it is running.
(Inherited from ParallelLearningBase.)
Public propertyTolerance
Gets or sets the relative convergence threshold for stopping the algorithm. Default is 1e-5.
(Inherited from KMeans.)
Public propertyUseSeeding
Gets or sets the strategy used to initialize the centroids of the clustering algorithm. Default is KMeansPlusPlus.
(Inherited from KMeans.)
Top
Methods
  NameDescription
Public methodCompute(Double) Obsolete.
Divides the input data into K clusters.
(Inherited from KMeans.)
Public methodCompute(Double, Double) Obsolete.
Divides the input data into K clusters.
(Inherited from KMeans.)
Public methodCompute(Double, Double) Obsolete.
Divides the input data into K clusters.
(Inherited from KMeans.)
Protected methodComputeInformation(Double)
Computes the information about each cluster (covariance, proportions and error).
(Inherited from KMeans.)
Protected methodComputeInformation(Double, Int32)
Computes the information about each cluster (covariance, proportions and error).
(Inherited from KMeans.)
Protected methodconverged
Determines if the algorithm has converged by comparing the centroids between two consecutive iterations.
(Inherited from KMeans.)
Public methodEquals
Determines whether the specified object is equal to the current object.
(Inherited from Object.)
Protected methodFinalize
Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection.
(Inherited from Object.)
Public methodGetHashCode
Serves as the default hash function.
(Inherited from Object.)
Public methodGetType
Gets the Type of the current instance.
(Inherited from Object.)
Public methodLearn
Learns a model that can map the given inputs to the desired outputs.
(Overrides KMeansLearn(Double, Double).)
Protected methodMemberwiseClone
Creates a shallow copy of the current Object.
(Inherited from Object.)
Public methodRandomize
Randomizes the clusters inside a dataset.
(Inherited from KMeans.)
Public methodToString
Returns a string that represents the current object.
(Inherited from Object.)
Top
Extension Methods
  NameDescription
Public Extension MethodHasMethod
Checks whether an object implements a method with the given name.
(Defined by ExtensionMethods.)
Public Extension MethodIsEqual
Compares two objects for equality, performing an elementwise comparison if the elements are vectors or matrices.
(Defined by Matrix.)
Public Extension MethodTo(Type)Overloaded.
Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.)
Public Extension MethodToTOverloaded.
Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.)
Top
Remarks

The Mini-Batch K-Means clustering algorithm is a modification of the K-Means algorithm.

In each iteration, it uses only a portion of data to update the cluster centroids with the gradient step. The subsets of data are called mini-batches and are randomly sampled from the whole dataset in each iteration.

Mini-Batch K-Means is faster than k-means for large datasets since batching reduces computational time of the algorithm.

The algorithm is composed of the following steps:

  1. Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids.
  2. Form a batch by choosing B objects from the whole input dataset. For each object in the batch, determine the group that has the closest centroid. Then, update the centroid with a gradient step.
  3. Repeat step 2 until the centroids converge or the maximal number of iterations has been performed.

References:

  • D. Sculley. Web-Scale K-Means Clustering. Available on: https://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf

Examples
Accord.Math.Random.Generator.Seed = 0;

// Declare some observations
double[][] observations =
{
    new double[] { -5, -2, -1 },
    new double[] { -5, -5, -6 },
    new double[] {  2,  1,  1 },
    new double[] {  1,  1,  2 },
    new double[] {  1,  2,  2 },
    new double[] {  3,  1,  2 },
    new double[] { 11,  5,  4 },
    new double[] { 15,  5,  6 },
    new double[] { 10,  5,  6 },
};

// Create a new Mini-Batch K-Means algorithm
MiniBatchKMeans mbkmeans = new MiniBatchKMeans(k: 3, batchSize: 2);

// Compute and retrieve the data centroids
var clusters = mbkmeans.Learn(observations);

// Use the centroids to parition all the data
int[] labels = clusters.Decide(observations);
See Also