PrincipalComponentAnalysis Class

Principal component analysis (PCA) is a technique used to reduce multidimensional data sets to lower dimensions for analysis.

Inheritance Hierarchy

SystemObject
  Accord.MachineLearningTransformBaseDouble, Double
    Accord.MachineLearningMultipleTransformBaseDouble, Double
      Accord.Statistics.Analysis.BaseBasePrincipalComponentAnalysis
        Accord.Statistics.AnalysisPrincipalComponentAnalysis

Namespace: Accord.Statistics.Analysis
Assembly: Accord.Statistics (in Accord.Statistics.dll) Version: 3.8.0

Syntax

Copy

[SerializableAttribute]
public class PrincipalComponentAnalysis : BasePrincipalComponentAnalysis, 
	ITransform<double[], double[]>, ICovariantTransform<double[], double[]>, 
	ITransform, IUnsupervisedLearning<MultivariateLinearRegression, double[], double[]>, 
	IMultivariateAnalysis, IAnalysis, IProjectionAnalysis

<SerializableAttribute>
Public Class PrincipalComponentAnalysis
	Inherits BasePrincipalComponentAnalysis
	Implements ITransform(Of Double(), Double()), 
	ICovariantTransform(Of Double(), Double()), ITransform, IUnsupervisedLearning(Of MultivariateLinearRegression, Double(), Double()), 
	IMultivariateAnalysis, IAnalysis, IProjectionAnalysis

Request Example View Source

The PrincipalComponentAnalysis type exposes the following members.

Constructors

	Name	Description
	PrincipalComponentAnalysis(Double, AnalysisMethod)	Obsolete. Constructs a new Principal Component Analysis.
	PrincipalComponentAnalysis(Double, AnalysisMethod)	Obsolete. Constructs a new Principal Component Analysis.
	PrincipalComponentAnalysis(PrincipalComponentMethod, Boolean, Int32)	Constructs a new Principal Component Analysis.

Top

Properties

	Name	Description
	ComponentMatrix	Obsolete. Gets a matrix whose columns contain the principal components. Also known as the Eigenvectors or loadings matrix. (Inherited from BasePrincipalComponentAnalysis.)
	ComponentProportions	The respective role each component plays in the data set. (Inherited from BasePrincipalComponentAnalysis.)
	Components	Gets the Principal Components in a object-oriented structure. (Inherited from BasePrincipalComponentAnalysis.)
	ComponentVectors	Gets a matrix whose columns contain the principal components. Also known as the Eigenvectors or loadings matrix. (Inherited from BasePrincipalComponentAnalysis.)
	CumulativeProportions	The cumulative distribution of the components proportion role. Also known as the cumulative energy of the principal components. (Inherited from BasePrincipalComponentAnalysis.)
	Eigenvalues	Provides access to the Eigenvalues stored during the analysis. (Inherited from BasePrincipalComponentAnalysis.)
	ExplainedVariance	Gets or sets the amount of explained variance that should be generated by this model. This value will alter the NumberOfOutputs that can be generated by this model. (Inherited from BasePrincipalComponentAnalysis.)
	MaximumNumberOfOutputs	Gets the maximum number of outputs (dimensionality of the output vectors) that can be generated by this model. (Inherited from BasePrincipalComponentAnalysis.)
	Means	Gets the column mean of the source data given at method construction. (Inherited from BasePrincipalComponentAnalysis.)
	Method	Gets or sets the method used by this analysis. (Inherited from BasePrincipalComponentAnalysis.)
	NumberOfInputs	Gets the number of inputs accepted by the model. (Inherited from TransformBaseTInput, TOutput.)
	NumberOfOutputs	Gets or sets the number of outputs (dimensionality of the output vectors) that should be generated by this model. (Inherited from BasePrincipalComponentAnalysis.)
	Overwrite	Gets or sets whether calculations will be performed overwriting data in the original source matrix, using less memory. (Inherited from BasePrincipalComponentAnalysis.)
	Result	Obsolete. Gets the resulting projection of the source data given on the creation of the analysis into the space spawned by principal components. (Inherited from BasePrincipalComponentAnalysis.)
	SingularValues	Provides access to the Singular Values stored during the analysis. If a covariance method is chosen, then it will contain an empty vector. (Inherited from BasePrincipalComponentAnalysis.)
	Source	Obsolete. Returns the original data supplied to the analysis. (Inherited from BasePrincipalComponentAnalysis.)
	StandardDeviations	Gets the column standard deviations of the source data given at method construction. (Inherited from BasePrincipalComponentAnalysis.)
	Token	Gets or sets a cancellation token that can be used to cancel the algorithm while it is running. (Inherited from BasePrincipalComponentAnalysis.)
	Whiten	Gets or sets whether the transformation result should be whitened (have unit standard deviation) before it is returned. (Inherited from BasePrincipalComponentAnalysis.)

Top

Methods

	Name	Description
	Adjust(Double, Boolean)	Obsolete. Adjusts a data matrix, centering and standardizing its values using the already computed column's means and standard deviations.
	Adjust(Double, Boolean)	Obsolete. Adjusts a data matrix, centering and standardizing its values using the already computed column's means and standard deviations.
	Compute	Obsolete. Computes the Principal Component Analysis algorithm.
	CreateComponents	Creates additional information about principal components. (Inherited from BasePrincipalComponentAnalysis.)
	Equals	Determines whether the specified object is equal to the current object. (Inherited from Object.)
	Finalize	Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object.)
	FromCorrelationMatrix	Constructs a new Principal Component Analysis from a Correlation matrix.
	FromCovarianceMatrix	Constructs a new Principal Component Analysis from a Covariance matrix.
	FromGramMatrix	Constructs a new Principal Component Analysis from a Kernel (Gram) matrix.
	GetHashCode	Serves as the default hash function. (Inherited from Object.)
	GetNumberOfComponents	Returns the minimal number of principal components required to represent a given percentile of the data. (Inherited from BasePrincipalComponentAnalysis.)
	GetType	Gets the Type of the current instance. (Inherited from Object.)
	Learn	Learns a model that can map the given inputs to the desired outputs.
	MemberwiseClone	Creates a shallow copy of the current Object. (Inherited from Object.)
	Reduce	Reduces the dimensionality of a given matrix x to the given number of dimensions.
	Revert(Double)	Obsolete. Reverts a set of projected data into it's original form. Complete reverse transformation is only possible if all components are present, and, if the data has been standardized, the original standard deviation and means of the original matrix are known.
	Revert(Double)	Reverts a set of projected data into it's original form. Complete reverse transformation is only possible if all components are present, and, if the data has been standardized, the original standard deviation and means of the original matrix are known.
	ToString	Returns a string that represents the current object. (Inherited from Object.)
	Transform(TInput)	Applies the transformation to a set of input vectors, producing an associated set of output vectors. (Inherited from MultipleTransformBaseTInput, TOutput.)
	Transform(Double)	Obsolete. Obsolete. (Inherited from BasePrincipalComponentAnalysis.)
	Transform(Double)	Applies the transformation to an input, producing an associated output. (Inherited from BasePrincipalComponentAnalysis.)
	Transform(Double, Double)	Projects a given matrix into principal component space. (Overrides MultipleTransformBaseTInput, TOutputTransform(TInput, TOutput).)
	Transform(Double, Int32)	Obsolete. Projects a given matrix into principal component space. (Inherited from BasePrincipalComponentAnalysis.)
	Transform(Double, Double)	Applies the transformation to an input, producing an associated output. (Inherited from BasePrincipalComponentAnalysis.)
	Transform(Double, Int32)	Obsolete. Projects a given matrix into principal component space. (Inherited from BasePrincipalComponentAnalysis.)
	Transform(Double, Int32)	Obsolete. Projects a given matrix into principal component space. (Inherited from BasePrincipalComponentAnalysis.)

Top

Fields

	Name	Description
	array	Obsolete (Inherited from BasePrincipalComponentAnalysis.)
	covarianceMatrix	Obsolete (Inherited from BasePrincipalComponentAnalysis.)
	onlyCovarianceMatrixAvailable	Obsolete (Inherited from BasePrincipalComponentAnalysis.)
	result	Obsolete (Inherited from BasePrincipalComponentAnalysis.)
	saveResult	Obsolete (Inherited from BasePrincipalComponentAnalysis.)
	source	Obsolete (Inherited from BasePrincipalComponentAnalysis.)

Top

Extension Methods

	Name	Description
	HasMethod	Checks whether an object implements a method with the given name. (Defined by ExtensionMethods.)
	IsEqual	Compares two objects for equality, performing an elementwise comparison if the elements are vectors or matrices. (Defined by Matrix.)
	To(Type)	Overloaded. Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime. (Defined by ExtensionMethods.)
	ToT	Overloaded. Converts an object into another type, irrespective of whether the conversion can be done at compile time or not. This can be used to convert generic types to numeric types during runtime. (Defined by ExtensionMethods.)

Top

Remarks

Principal Components Analysis or the Karhunen-Loève expansion is a classical method for dimensionality reduction or exploratory data analysis.

Mathematically, PCA is a process that decomposes the covariance matrix of a matrix into two parts: Eigenvalues and column eigenvectors, whereas Singular Value Decomposition (SVD) decomposes a matrix per se into three parts: singular values, column eigenvectors, and row eigenvectors. The relationships between PCA and SVD lie in that the eigenvalues are the square of the singular values and the column vectors are the same for both.

This class uses SVD on the data set which generally gives better numerical accuracy.

This class can also be bound to standard controls such as the DataGridView by setting their DataSource property to the analysis' Components property.

Examples

The example below shows a typical usage of the analysis. However, users often ask why the framework produces different values than other packages such as STATA or MATLAB. After the simple introductory example below, we will be exploring why those results are often different.

Copy

// Below is the same data used on the excellent paper "Tutorial
//   On Principal Component Analysis", by Lindsay Smith (2002).
double[][] data =
{
    new double[] { 2.5,  2.4 },
    new double[] { 0.5,  0.7 },
    new double[] { 2.2,  2.9 },
    new double[] { 1.9,  2.2 },
    new double[] { 3.1,  3.0 },
    new double[] { 2.3,  2.7 },
    new double[] { 2.0,  1.6 },
    new double[] { 1.0,  1.1 },
    new double[] { 1.5,  1.6 },
    new double[] { 1.1,  0.9 }
};

// Let's create an analysis with centering (covariance method)
// but no standardization (correlation method) and whitening:
var pca = new PrincipalComponentAnalysis()
{
    Method = PrincipalComponentMethod.Center,
    Whiten = true
};

// Now we can learn the linear projection from the data
MultivariateLinearRegression transform = pca.Learn(data);

// Finally, we can project all the data
double[][] output1 = pca.Transform(data);

// Or just its first components by setting 
// NumberOfOutputs to the desired components:
pca.NumberOfOutputs = 1;

// And then calling transform again:
double[][] output2 = pca.Transform(data);

// We can also limit to 80% of explained variance:
pca.ExplainedVariance = 0.8;

// And then call transform again:
double[][] output3 = pca.Transform(data);

A question often asked by users is "why my matrices have inverted signs" or "why my results differ from [another software]". In short, despite any differences, the results are most likely correct (unless you firmly believe you have found a bug; in this case, please fill in a bug report).

The example below explores, in the same steps given in Lindsay's tutorial, anything that would cause any discrepancies between the results given by Accord.NET and results given by other softwares.

Copy

// Reproducing Lindsay Smith's "Tutorial on Principal Component Analysis"
// using the framework's default method. The tutorial can be found online
// at http://www.sccg.sk/~haladova/principal_components.pdf

// Step 1. Get some data
// ---------------------

double[][] data =
{
    new[] { 2.5,  2.4 },
    new[] { 0.5,  0.7 },
    new[] { 2.2,  2.9 },
    new[] { 1.9,  2.2 },
    new[] { 3.1,  3.0 },
    new[] { 2.3,  2.7 },
    new[] { 2.0,  1.6 },
    new[] { 1.0,  1.1 },
    new[] { 1.5,  1.6 },
    new[] { 1.1,  0.9 }
};


// Step 2. Subtract the mean
// -------------------------
//   Note: The framework does this automatically. By default, the framework
//   uses the "Center" method, which only subtracts the mean. However, it is
//   also possible to remove the mean *and* divide by the standard deviation
//   (thus performing the correlation method) by specifying "Standardize"
//   instead of "Center" as the AnalysisMethod.

var method = PrincipalComponentMethod.Center;
// var method = PrincipalComponentMethod.Standardize


// Step 3. Compute the covariance matrix
// -------------------------------------
//   Note: Accord.NET does not need to compute the covariance
//   matrix in order to compute PCA. The framework uses the SVD
//   method which is more numerically stable, but may require
//   more processing or memory. In order to replicate the tutorial
//   using covariance matrices, please see the next unit test.

// Create the analysis using the selected method
var pca = new PrincipalComponentAnalysis(method);

// Compute it
pca.Learn(data);


// Step 4. Compute the eigenvectors and eigenvalues of the covariance matrix
// -------------------------------------------------------------------------
//   Note: Since Accord.NET uses the SVD method rather than the Eigendecomposition
//   method, the Eigenvalues are computed from the singular values. However, it is
//   not the Eigenvalues themselves which are important, but rather their proportion:

// Those are the expected eigenvalues, in descending order:
double[] eigenvalues = { 1.28402771, 0.0490833989 };

// And this will be their proportion:
double[] proportion = eigenvalues.Divide(eigenvalues.Sum());

// Those are the expected eigenvectors,
// in descending order of eigenvalues:
double[,] eigenvectors =
{
    { -0.677873399, -0.735178656 },
    { -0.735178656,  0.677873399 }
};

// Now, here is the place most users get confused. The fact is that
// the Eigenvalue decomposition (EVD) is not unique, and both the SVD
// and EVD routines used by the framework produces results which are
// numerically different from packages such as STATA or MATLAB, but
// those are correct.

// If v is an eigenvector, a multiple of this eigenvector (such as a*v, with
// a being a scalar) will also be an eigenvector. In the Lindsay case, the
// framework produces a first eigenvector with inverted signs. This is the same
// as considering a=-1 and taking a*v. The result is still correct.

// Retrieve the first expected eigenvector
double[] v = eigenvectors.GetColumn(0);

// Multiply by a scalar and store it back
eigenvectors.SetColumn(0, v.Multiply(-1));

// Everything is alright (up to the 9 decimal places shown in the tutorial)
Assert.IsTrue(eigenvectors.IsEqual(pca.ComponentMatrix, rtol: 1e-9));
Assert.IsTrue(proportion.IsEqual(pca.ComponentProportions, rtol: 1e-9));
Assert.IsTrue(eigenvalues.IsEqual(pca.Eigenvalues, rtol: 1e-5));

// Step 5. Deriving the new data set
// ---------------------------------

double[][] actual = pca.Transform(data);

// transformedData shown in pg. 18
double[,] expected = new double[,]
{
    {  0.827970186, -0.175115307 },
    { -1.77758033,   0.142857227 },
    {  0.992197494,  0.384374989 },
    {  0.274210416,  0.130417207 },
    {  1.67580142,  -0.209498461 },
    {  0.912949103,  0.175282444 },
    { -0.099109437, -0.349824698 },
    { -1.14457216,   0.046417258 },
    { -0.438046137,  0.017764629 },
    { -1.22382056,  -0.162675287 },
};

// Everything is correct (up to 8 decimal places)
Assert.IsTrue(expected.IsEqual(actual, atol: 1e-8));

// Let's say we would like to project down to one 
// principal component. It suffices to set:
pca.NumberOfOutputs = 1;

// And then do the transform
actual = pca.Transform(data);

// transformedData shown in pg. 18
expected = new double[,]
{
    {  0.827970186 },
    { -1.77758033, },
    {  0.992197494 },
    {  0.274210416 },
    {  1.67580142, },
    {  0.912949103 },
    { -0.099109437 },
    { -1.14457216, },
    { -0.438046137 },
    { -1.22382056, },
};

// Everything is correct (up to 8 decimal places)
Assert.IsTrue(expected.IsEqual(actual, atol: 1e-8));

Some users would like to analyze huge amounts of data. In this case, computing the SVD directly on the data could result in memory exceptions or excessive computing times. If your data's number of dimensions is much less than the number of observations (i.e. your matrix have less columns than rows) then it would be a better idea to summarize your data in the form of a covariance or correlation matrix and compute PCA using the EVD.

The example below shows how to compute the analysis with covariance matrices only.

Copy

// Create the Principal Component Analysis 
// specifying the CovarianceMatrix method:
var pca = new PrincipalComponentAnalysis()
{
    Method = PrincipalComponentMethod.CovarianceMatrix,
    Means = mean // pass the original data mean vectors
};

// Learn the PCA projection using passing the cov matrix
MultivariateLinearRegression transform = pca.Learn(cov);

// Now, we can transform data as usual
double[,] actual = pca.Transform(data);

Reference

Accord.Statistics.Analysis Namespace