Click or drag to resize
Accord.NET (logo)

PrincipalComponentAnalysis Class

Principal component analysis (PCA) is a technique used to reduce multidimensional data sets to lower dimensions for analysis.
Inheritance Hierarchy
SystemObject
  Accord.MachineLearningTransformBaseDouble, Double
    Accord.MachineLearningMultipleTransformBaseDouble, Double
      Accord.Statistics.Analysis.BaseBasePrincipalComponentAnalysis
        Accord.Statistics.AnalysisPrincipalComponentAnalysis

Namespace:  Accord.Statistics.Analysis
Assembly:  Accord.Statistics (in Accord.Statistics.dll) Version: 3.6.0
Syntax
[SerializableAttribute]
public class PrincipalComponentAnalysis : BasePrincipalComponentAnalysis, 
	ITransform<double[], double[]>, ITransform, IUnsupervisedLearning<MultivariateLinearRegression, double[], double[]>, 
	IMultivariateAnalysis, IAnalysis, IProjectionAnalysis
Request Example View Source

The PrincipalComponentAnalysis type exposes the following members.

Constructors
Properties
  NameDescription
Public propertyComponentMatrix Obsolete.
Gets a matrix whose columns contain the principal components. Also known as the Eigenvectors or loadings matrix.
(Inherited from BasePrincipalComponentAnalysis.)
Public propertyComponentProportions
The respective role each component plays in the data set.
(Inherited from BasePrincipalComponentAnalysis.)
Public propertyComponents
Gets the Principal Components in a object-oriented structure.
(Inherited from BasePrincipalComponentAnalysis.)
Public propertyComponentVectors
Gets a matrix whose columns contain the principal components. Also known as the Eigenvectors or loadings matrix.
(Inherited from BasePrincipalComponentAnalysis.)
Public propertyCumulativeProportions
The cumulative distribution of the components proportion role. Also known as the cumulative energy of the principal components.
(Inherited from BasePrincipalComponentAnalysis.)
Public propertyEigenvalues
Provides access to the Eigenvalues stored during the analysis.
(Inherited from BasePrincipalComponentAnalysis.)
Public propertyExplainedVariance
Gets or sets the amount of explained variance that should be generated by this model. This value will alter the NumberOfOutputs that can be generated by this model.
(Inherited from BasePrincipalComponentAnalysis.)
Public propertyMaximumNumberOfOutputs
Gets the maximum number of outputs (dimensionality of the output vectors) that can be generated by this model.
(Inherited from BasePrincipalComponentAnalysis.)
Public propertyMeans
Gets the column mean of the source data given at method construction.
(Inherited from BasePrincipalComponentAnalysis.)
Public propertyMethod
Gets or sets the method used by this analysis.
(Inherited from BasePrincipalComponentAnalysis.)
Public propertyNumberOfInputs
Gets the number of inputs accepted by the model.
(Inherited from TransformBaseTInput, TOutput.)
Public propertyNumberOfOutputs
Gets or sets the number of outputs (dimensionality of the output vectors) that should be generated by this model.
(Inherited from BasePrincipalComponentAnalysis.)
Public propertyOverwrite
Gets or sets whether calculations will be performed overwriting data in the original source matrix, using less memory.
(Inherited from BasePrincipalComponentAnalysis.)
Public propertyResult Obsolete.
Gets the resulting projection of the source data given on the creation of the analysis into the space spawned by principal components.
(Inherited from BasePrincipalComponentAnalysis.)
Public propertySingularValues
Provides access to the Singular Values stored during the analysis. If a covariance method is chosen, then it will contain an empty vector.
(Inherited from BasePrincipalComponentAnalysis.)
Public propertySource Obsolete.
Returns the original data supplied to the analysis.
(Inherited from BasePrincipalComponentAnalysis.)
Public propertyStandardDeviations
Gets the column standard deviations of the source data given at method construction.
(Inherited from BasePrincipalComponentAnalysis.)
Public propertyToken
Gets or sets a cancellation token that can be used to cancel the algorithm while it is running.
(Inherited from BasePrincipalComponentAnalysis.)
Public propertyWhiten
Gets or sets whether the transformation result should be whitened (have unit standard deviation) before it is returned.
(Inherited from BasePrincipalComponentAnalysis.)
Top
Methods
  NameDescription
Protected methodAdjust(Double, Boolean) Obsolete.
Adjusts a data matrix, centering and standardizing its values using the already computed column's means and standard deviations.
Protected methodAdjust(Double, Boolean) Obsolete.
Adjusts a data matrix, centering and standardizing its values using the already computed column's means and standard deviations.
Public methodCompute Obsolete.
Computes the Principal Component Analysis algorithm.
Protected methodCreateComponents
Creates additional information about principal components.
(Inherited from BasePrincipalComponentAnalysis.)
Public methodEquals
Determines whether the specified object is equal to the current object.
(Inherited from Object.)
Protected methodFinalize
Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection.
(Inherited from Object.)
Public methodStatic memberFromCorrelationMatrix
Constructs a new Principal Component Analysis from a Correlation matrix.
Public methodStatic memberFromCovarianceMatrix
Constructs a new Principal Component Analysis from a Covariance matrix.
Public methodStatic memberFromGramMatrix
Constructs a new Principal Component Analysis from a Kernel (Gram) matrix.
Public methodGetHashCode
Serves as the default hash function.
(Inherited from Object.)
Public methodGetNumberOfComponents
Returns the minimal number of principal components required to represent a given percentile of the data.
(Inherited from BasePrincipalComponentAnalysis.)
Public methodGetType
Gets the Type of the current instance.
(Inherited from Object.)
Public methodLearn
Learns a model that can map the given inputs to the desired outputs.
Protected methodMemberwiseClone
Creates a shallow copy of the current Object.
(Inherited from Object.)
Public methodStatic memberReduce
Reduces the dimensionality of a given matrix x to the given number of dimensions.
Public methodRevert(Double) Obsolete.
Reverts a set of projected data into it's original form. Complete reverse transformation is only possible if all components are present, and, if the data has been standardized, the original standard deviation and means of the original matrix are known.
Public methodRevert(Double)
Reverts a set of projected data into it's original form. Complete reverse transformation is only possible if all components are present, and, if the data has been standardized, the original standard deviation and means of the original matrix are known.
Public methodToString
Returns a string that represents the current object.
(Inherited from Object.)
Public methodTransform(TInput)
Applies the transformation to a set of input vectors, producing an associated set of output vectors.
(Inherited from MultipleTransformBaseTInput, TOutput.)
Public methodTransform(Double) Obsolete.
Obsolete.
(Inherited from BasePrincipalComponentAnalysis.)
Public methodTransform(Double)
Applies the transformation to an input, producing an associated output.
(Inherited from BasePrincipalComponentAnalysis.)
Public methodTransform(Double, Double)
Projects a given matrix into principal component space.
(Overrides MultipleTransformBaseTInput, TOutputTransform(TInput, TOutput).)
Public methodTransform(Double, Int32) Obsolete.
Projects a given matrix into principal component space.
(Inherited from BasePrincipalComponentAnalysis.)
Public methodTransform(Double, Double)
Applies the transformation to an input, producing an associated output.
(Inherited from BasePrincipalComponentAnalysis.)
Public methodTransform(Double, Int32) Obsolete.
Projects a given matrix into principal component space.
(Inherited from BasePrincipalComponentAnalysis.)
Public methodTransform(Double, Int32) Obsolete.
Projects a given matrix into principal component space.
(Inherited from BasePrincipalComponentAnalysis.)
Top
Fields
  NameDescription
Protected fieldarray
Obsolete
(Inherited from BasePrincipalComponentAnalysis.)
Protected fieldcovarianceMatrix
Obsolete
(Inherited from BasePrincipalComponentAnalysis.)
Protected fieldonlyCovarianceMatrixAvailable
Obsolete
(Inherited from BasePrincipalComponentAnalysis.)
Protected fieldresult
Obsolete
(Inherited from BasePrincipalComponentAnalysis.)
Protected fieldsaveResult
Obsolete
(Inherited from BasePrincipalComponentAnalysis.)
Protected fieldsource
Obsolete
(Inherited from BasePrincipalComponentAnalysis.)
Top
Extension Methods
  NameDescription
Public Extension MethodHasMethod
Checks whether an object implements a method with the given name.
(Defined by ExtensionMethods.)
Public Extension MethodIsEqual
Compares two objects for equality, performing an elementwise comparison if the elements are vectors or matrices.
(Defined by Matrix.)
Public Extension MethodToT
Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.)
Top
Remarks

Principal Components Analysis or the Karhunen-Loève expansion is a classical method for dimensionality reduction or exploratory data analysis.

Mathematically, PCA is a process that decomposes the covariance matrix of a matrix into two parts: Eigenvalues and column eigenvectors, whereas Singular Value Decomposition (SVD) decomposes a matrix per se into three parts: singular values, column eigenvectors, and row eigenvectors. The relationships between PCA and SVD lie in that the eigenvalues are the square of the singular values and the column vectors are the same for both.

This class uses SVD on the data set which generally gives better numerical accuracy.

This class can also be bound to standard controls such as the DataGridView by setting their DataSource property to the analysis' Components property.

Examples

The example below shows a typical usage of the analysis. However, users often ask why the framework produces different values than other packages such as STATA or MATLAB. After the simple introductory example below, we will be exploring why those results are often different.

// Below is the same data used on the excellent paper "Tutorial
//   On Principal Component Analysis", by Lindsay Smith (2002).
double[][] data =
{
    new double[] { 2.5,  2.4 },
    new double[] { 0.5,  0.7 },
    new double[] { 2.2,  2.9 },
    new double[] { 1.9,  2.2 },
    new double[] { 3.1,  3.0 },
    new double[] { 2.3,  2.7 },
    new double[] { 2.0,  1.6 },
    new double[] { 1.0,  1.1 },
    new double[] { 1.5,  1.6 },
    new double[] { 1.1,  0.9 }
};

// Let's create an analysis with centering (covariance method)
// but no standardization (correlation method) and whitening:
var pca = new PrincipalComponentAnalysis()
{
    Method = PrincipalComponentMethod.Center,
    Whiten = true
};

// Now we can learn the linear projection from the data
MultivariateLinearRegression transform = pca.Learn(data);

// Finally, we can project all the data
double[][] output1 = pca.Transform(data);

// Or just its first components by setting 
// NumberOfOutputs to the desired components:
pca.NumberOfOutputs = 1;

// And then calling transform again:
double[][] output2 = pca.Transform(data);

// We can also limit to 80% of explained variance:
pca.ExplainedVariance = 0.8;

// And then call transform again:
double[][] output3 = pca.Transform(data);

A question often asked by users is "why my matrices have inverted signs" or "why my results differ from [another software]". In short, despite any differences, the results are most likely correct (unless you firmly believe you have found a bug; in this case, please fill in a bug report).

The example below explores, in the same steps given in Lindsay's tutorial, anything that would cause any discrepancies between the results given by Accord.NET and results given by other softwares.

// Reproducing Lindsay Smith's "Tutorial on Principal Component Analysis"
// using the framework's default method. The tutorial can be found online
// at http://www.sccg.sk/~haladova/principal_components.pdf

// Step 1. Get some data
// ---------------------

double[][] data =
{
    new[] { 2.5,  2.4 },
    new[] { 0.5,  0.7 },
    new[] { 2.2,  2.9 },
    new[] { 1.9,  2.2 },
    new[] { 3.1,  3.0 },
    new[] { 2.3,  2.7 },
    new[] { 2.0,  1.6 },
    new[] { 1.0,  1.1 },
    new[] { 1.5,  1.6 },
    new[] { 1.1,  0.9 }
};


// Step 2. Subtract the mean
// -------------------------
//   Note: The framework does this automatically. By default, the framework
//   uses the "Center" method, which only subtracts the mean. However, it is
//   also possible to remove the mean *and* divide by the standard deviation
//   (thus performing the correlation method) by specifying "Standardize"
//   instead of "Center" as the AnalysisMethod.

var method = PrincipalComponentMethod.Center;
// var method = PrincipalComponentMethod.Standardize


// Step 3. Compute the covariance matrix
// -------------------------------------
//   Note: Accord.NET does not need to compute the covariance
//   matrix in order to compute PCA. The framework uses the SVD
//   method which is more numerically stable, but may require
//   more processing or memory. In order to replicate the tutorial
//   using covariance matrices, please see the next unit test.

// Create the analysis using the selected method
var pca = new PrincipalComponentAnalysis(method);

// Compute it
pca.Learn(data);


// Step 4. Compute the eigenvectors and eigenvalues of the covariance matrix
// -------------------------------------------------------------------------
//   Note: Since Accord.NET uses the SVD method rather than the Eigendecomposition
//   method, the Eigenvalues are computed from the singular values. However, it is
//   not the Eigenvalues themselves which are important, but rather their proportion:

// Those are the expected eigenvalues, in descending order:
double[] eigenvalues = { 1.28402771, 0.0490833989 };

// And this will be their proportion:
double[] proportion = eigenvalues.Divide(eigenvalues.Sum());

// Those are the expected eigenvectors,
// in descending order of eigenvalues:
double[,] eigenvectors =
{
    { -0.677873399, -0.735178656 },
    { -0.735178656,  0.677873399 }
};

// Now, here is the place most users get confused. The fact is that
// the Eigenvalue decomposition (EVD) is not unique, and both the SVD
// and EVD routines used by the framework produces results which are
// numerically different from packages such as STATA or MATLAB, but
// those are correct.

// If v is an eigenvector, a multiple of this eigenvector (such as a*v, with
// a being a scalar) will also be an eigenvector. In the Lindsay case, the
// framework produces a first eigenvector with inverted signs. This is the same
// as considering a=-1 and taking a*v. The result is still correct.

// Retrieve the first expected eigenvector
double[] v = eigenvectors.GetColumn(0);

// Multiply by a scalar and store it back
eigenvectors.SetColumn(0, v.Multiply(-1));

// Everything is alright (up to the 9 decimal places shown in the tutorial)
Assert.IsTrue(eigenvectors.IsEqual(pca.ComponentMatrix, rtol: 1e-9));
Assert.IsTrue(proportion.IsEqual(pca.ComponentProportions, rtol: 1e-9));
Assert.IsTrue(eigenvalues.IsEqual(pca.Eigenvalues, rtol: 1e-5));

// Step 5. Deriving the new data set
// ---------------------------------

double[][] actual = pca.Transform(data);

// transformedData shown in pg. 18
double[,] expected = new double[,]
{
    {  0.827970186, -0.175115307 },
    { -1.77758033,   0.142857227 },
    {  0.992197494,  0.384374989 },
    {  0.274210416,  0.130417207 },
    {  1.67580142,  -0.209498461 },
    {  0.912949103,  0.175282444 },
    { -0.099109437, -0.349824698 },
    { -1.14457216,   0.046417258 },
    { -0.438046137,  0.017764629 },
    { -1.22382056,  -0.162675287 },
};

// Everything is correct (up to 8 decimal places)
Assert.IsTrue(expected.IsEqual(actual, atol: 1e-8));

// Let's say we would like to project down to one 
// principal component. It suffices to set:
pca.NumberOfOutputs = 1;

// And then do the transform
actual = pca.Transform(data);

// transformedData shown in pg. 18
expected = new double[,]
{
    {  0.827970186 },
    { -1.77758033, },
    {  0.992197494 },
    {  0.274210416 },
    {  1.67580142, },
    {  0.912949103 },
    { -0.099109437 },
    { -1.14457216, },
    { -0.438046137 },
    { -1.22382056, },
};

// Everything is correct (up to 8 decimal places)
Assert.IsTrue(expected.IsEqual(actual, atol: 1e-8));

Some users would like to analyze huge amounts of data. In this case, computing the SVD directly on the data could result in memory exceptions or excessive computing times. If your data's number of dimensions is much less than the number of observations (i.e. your matrix have less columns than rows) then it would be a better idea to summarize your data in the form of a covariance or correlation matrix and compute PCA using the EVD.

The example below shows how to compute the analysis with covariance matrices only.

// Create the Principal Component Analysis 
// specifying the CovarianceMatrix method:
var pca = new PrincipalComponentAnalysis()
{
    Method = PrincipalComponentMethod.CovarianceMatrix,
    Means = mean // pass the original data mean vectors
};

// Learn the PCA projection using passing the cov matrix
MultivariateLinearRegression transform = pca.Learn(cov);

// Now, we can transform data as usual
double[,] actual = pca.Transform(data);
See Also