PrincipalComponentAnalysis Class |
Namespace: Accord.Statistics.Analysis
[SerializableAttribute] public class PrincipalComponentAnalysis : BasePrincipalComponentAnalysis, ITransform<double[], double[]>, ICovariantTransform<double[], double[]>, ITransform, IUnsupervisedLearning<MultivariateLinearRegression, double[], double[]>, IMultivariateAnalysis, IAnalysis, IProjectionAnalysis
The PrincipalComponentAnalysis type exposes the following members.
Name | Description | |
---|---|---|
PrincipalComponentAnalysis(Double, AnalysisMethod) | Obsolete.
Constructs a new Principal Component Analysis.
| |
PrincipalComponentAnalysis(Double, AnalysisMethod) | Obsolete.
Constructs a new Principal Component Analysis.
| |
PrincipalComponentAnalysis(PrincipalComponentMethod, Boolean, Int32) |
Constructs a new Principal Component Analysis.
|
Name | Description | |
---|---|---|
ComponentMatrix | Obsolete.
Gets a matrix whose columns contain the principal components. Also known as the Eigenvectors or loadings matrix.
(Inherited from BasePrincipalComponentAnalysis.) | |
ComponentProportions |
The respective role each component plays in the data set.
(Inherited from BasePrincipalComponentAnalysis.) | |
Components |
Gets the Principal Components in a object-oriented structure.
(Inherited from BasePrincipalComponentAnalysis.) | |
ComponentVectors |
Gets a matrix whose columns contain the principal components. Also known as the Eigenvectors or loadings matrix.
(Inherited from BasePrincipalComponentAnalysis.) | |
CumulativeProportions |
The cumulative distribution of the components proportion role. Also known
as the cumulative energy of the principal components.
(Inherited from BasePrincipalComponentAnalysis.) | |
Eigenvalues |
Provides access to the Eigenvalues stored during the analysis.
(Inherited from BasePrincipalComponentAnalysis.) | |
ExplainedVariance |
Gets or sets the amount of explained variance that should be generated
by this model. This value will alter the NumberOfOutputs
that can be generated by this model.
(Inherited from BasePrincipalComponentAnalysis.) | |
MaximumNumberOfOutputs |
Gets the maximum number of outputs (dimensionality of the output vectors)
that can be generated by this model.
(Inherited from BasePrincipalComponentAnalysis.) | |
Means |
Gets the column mean of the source data given at method construction.
(Inherited from BasePrincipalComponentAnalysis.) | |
Method |
Gets or sets the method used by this analysis.
(Inherited from BasePrincipalComponentAnalysis.) | |
NumberOfInputs |
Gets the number of inputs accepted by the model.
(Inherited from TransformBaseTInput, TOutput.) | |
NumberOfOutputs |
Gets or sets the number of outputs (dimensionality of the output vectors)
that should be generated by this model.
(Inherited from BasePrincipalComponentAnalysis.) | |
Overwrite |
Gets or sets whether calculations will be performed overwriting
data in the original source matrix, using less memory.
(Inherited from BasePrincipalComponentAnalysis.) | |
Result | Obsolete.
Gets the resulting projection of the source
data given on the creation of the analysis
into the space spawned by principal components.
(Inherited from BasePrincipalComponentAnalysis.) | |
SingularValues |
Provides access to the Singular Values stored during the analysis.
If a covariance method is chosen, then it will contain an empty vector.
(Inherited from BasePrincipalComponentAnalysis.) | |
Source | Obsolete.
Returns the original data supplied to the analysis.
(Inherited from BasePrincipalComponentAnalysis.) | |
StandardDeviations |
Gets the column standard deviations of the source data given at method construction.
(Inherited from BasePrincipalComponentAnalysis.) | |
Token |
Gets or sets a cancellation token that can be used
to cancel the algorithm while it is running.
(Inherited from BasePrincipalComponentAnalysis.) | |
Whiten |
Gets or sets whether the transformation result should be whitened
(have unit standard deviation) before it is returned.
(Inherited from BasePrincipalComponentAnalysis.) |
Name | Description | |
---|---|---|
Adjust(Double, Boolean) | Obsolete.
Adjusts a data matrix, centering and standardizing its values
using the already computed column's means and standard deviations.
| |
Adjust(Double, Boolean) | Obsolete.
Adjusts a data matrix, centering and standardizing its values
using the already computed column's means and standard deviations.
| |
Compute | Obsolete.
Computes the Principal Component Analysis algorithm.
| |
CreateComponents |
Creates additional information about principal components.
(Inherited from BasePrincipalComponentAnalysis.) | |
Equals | Determines whether the specified object is equal to the current object. (Inherited from Object.) | |
Finalize | Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object.) | |
FromCorrelationMatrix |
Constructs a new Principal Component Analysis from a Correlation matrix.
| |
FromCovarianceMatrix |
Constructs a new Principal Component Analysis from a Covariance matrix.
| |
FromGramMatrix |
Constructs a new Principal Component Analysis from a Kernel (Gram) matrix.
| |
GetHashCode | Serves as the default hash function. (Inherited from Object.) | |
GetNumberOfComponents |
Returns the minimal number of principal components
required to represent a given percentile of the data.
(Inherited from BasePrincipalComponentAnalysis.) | |
GetType | Gets the Type of the current instance. (Inherited from Object.) | |
Learn |
Learns a model that can map the given inputs to the desired outputs.
| |
MemberwiseClone | Creates a shallow copy of the current Object. (Inherited from Object.) | |
Reduce |
Reduces the dimensionality of a given matrix x
to the given number of dimensions.
| |
Revert(Double) | Obsolete.
Reverts a set of projected data into it's original form. Complete reverse
transformation is only possible if all components are present, and, if the
data has been standardized, the original standard deviation and means of
the original matrix are known.
| |
Revert(Double) |
Reverts a set of projected data into it's original form. Complete reverse
transformation is only possible if all components are present, and, if the
data has been standardized, the original standard deviation and means of
the original matrix are known.
| |
ToString | Returns a string that represents the current object. (Inherited from Object.) | |
Transform(TInput) |
Applies the transformation to a set of input vectors,
producing an associated set of output vectors.
(Inherited from MultipleTransformBaseTInput, TOutput.) | |
Transform(Double) | Obsolete.
Obsolete.
(Inherited from BasePrincipalComponentAnalysis.) | |
Transform(Double) |
Applies the transformation to an input, producing an associated output.
(Inherited from BasePrincipalComponentAnalysis.) | |
Transform(Double, Double) |
Projects a given matrix into principal component space.
(Overrides MultipleTransformBaseTInput, TOutputTransform(TInput, TOutput).) | |
Transform(Double, Int32) | Obsolete.
Projects a given matrix into principal component space.
(Inherited from BasePrincipalComponentAnalysis.) | |
Transform(Double, Double) |
Applies the transformation to an input, producing an associated output.
(Inherited from BasePrincipalComponentAnalysis.) | |
Transform(Double, Int32) | Obsolete.
Projects a given matrix into principal component space.
(Inherited from BasePrincipalComponentAnalysis.) | |
Transform(Double, Int32) | Obsolete.
Projects a given matrix into principal component space.
(Inherited from BasePrincipalComponentAnalysis.) |
Name | Description | |
---|---|---|
array | Obsolete (Inherited from BasePrincipalComponentAnalysis.) | |
covarianceMatrix | Obsolete (Inherited from BasePrincipalComponentAnalysis.) | |
onlyCovarianceMatrixAvailable | Obsolete (Inherited from BasePrincipalComponentAnalysis.) | |
result | Obsolete (Inherited from BasePrincipalComponentAnalysis.) | |
saveResult | Obsolete (Inherited from BasePrincipalComponentAnalysis.) | |
source | Obsolete (Inherited from BasePrincipalComponentAnalysis.) |
Name | Description | |
---|---|---|
HasMethod |
Checks whether an object implements a method with the given name.
(Defined by ExtensionMethods.) | |
IsEqual |
Compares two objects for equality, performing an elementwise
comparison if the elements are vectors or matrices.
(Defined by Matrix.) | |
To(Type) | Overloaded.
Converts an object into another type, irrespective of whether
the conversion can be done at compile time or not. This can be
used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.) | |
ToT | Overloaded.
Converts an object into another type, irrespective of whether
the conversion can be done at compile time or not. This can be
used to convert generic types to numeric types during runtime.
(Defined by ExtensionMethods.) |
Principal Components Analysis or the Karhunen-Loève expansion is a classical method for dimensionality reduction or exploratory data analysis.
Mathematically, PCA is a process that decomposes the covariance matrix of a matrix into two parts: Eigenvalues and column eigenvectors, whereas Singular Value Decomposition (SVD) decomposes a matrix per se into three parts: singular values, column eigenvectors, and row eigenvectors. The relationships between PCA and SVD lie in that the eigenvalues are the square of the singular values and the column vectors are the same for both.
This class uses SVD on the data set which generally gives better numerical accuracy.
This class can also be bound to standard controls such as the DataGridView by setting their DataSource property to the analysis' Components property.
The example below shows a typical usage of the analysis. However, users often ask why the framework produces different values than other packages such as STATA or MATLAB. After the simple introductory example below, we will be exploring why those results are often different.
// Below is the same data used on the excellent paper "Tutorial // On Principal Component Analysis", by Lindsay Smith (2002). double[][] data = { new double[] { 2.5, 2.4 }, new double[] { 0.5, 0.7 }, new double[] { 2.2, 2.9 }, new double[] { 1.9, 2.2 }, new double[] { 3.1, 3.0 }, new double[] { 2.3, 2.7 }, new double[] { 2.0, 1.6 }, new double[] { 1.0, 1.1 }, new double[] { 1.5, 1.6 }, new double[] { 1.1, 0.9 } }; // Let's create an analysis with centering (covariance method) // but no standardization (correlation method) and whitening: var pca = new PrincipalComponentAnalysis() { Method = PrincipalComponentMethod.Center, Whiten = true }; // Now we can learn the linear projection from the data MultivariateLinearRegression transform = pca.Learn(data); // Finally, we can project all the data double[][] output1 = pca.Transform(data); // Or just its first components by setting // NumberOfOutputs to the desired components: pca.NumberOfOutputs = 1; // And then calling transform again: double[][] output2 = pca.Transform(data); // We can also limit to 80% of explained variance: pca.ExplainedVariance = 0.8; // And then call transform again: double[][] output3 = pca.Transform(data);
A question often asked by users is "why my matrices have inverted signs" or "why my results differ from [another software]". In short, despite any differences, the results are most likely correct (unless you firmly believe you have found a bug; in this case, please fill in a bug report).
The example below explores, in the same steps given in Lindsay's tutorial, anything that would cause any discrepancies between the results given by Accord.NET and results given by other softwares.
// Reproducing Lindsay Smith's "Tutorial on Principal Component Analysis" // using the framework's default method. The tutorial can be found online // at http://www.sccg.sk/~haladova/principal_components.pdf // Step 1. Get some data // --------------------- double[][] data = { new[] { 2.5, 2.4 }, new[] { 0.5, 0.7 }, new[] { 2.2, 2.9 }, new[] { 1.9, 2.2 }, new[] { 3.1, 3.0 }, new[] { 2.3, 2.7 }, new[] { 2.0, 1.6 }, new[] { 1.0, 1.1 }, new[] { 1.5, 1.6 }, new[] { 1.1, 0.9 } }; // Step 2. Subtract the mean // ------------------------- // Note: The framework does this automatically. By default, the framework // uses the "Center" method, which only subtracts the mean. However, it is // also possible to remove the mean *and* divide by the standard deviation // (thus performing the correlation method) by specifying "Standardize" // instead of "Center" as the AnalysisMethod. var method = PrincipalComponentMethod.Center; // var method = PrincipalComponentMethod.Standardize // Step 3. Compute the covariance matrix // ------------------------------------- // Note: Accord.NET does not need to compute the covariance // matrix in order to compute PCA. The framework uses the SVD // method which is more numerically stable, but may require // more processing or memory. In order to replicate the tutorial // using covariance matrices, please see the next unit test. // Create the analysis using the selected method var pca = new PrincipalComponentAnalysis(method); // Compute it pca.Learn(data); // Step 4. Compute the eigenvectors and eigenvalues of the covariance matrix // ------------------------------------------------------------------------- // Note: Since Accord.NET uses the SVD method rather than the Eigendecomposition // method, the Eigenvalues are computed from the singular values. However, it is // not the Eigenvalues themselves which are important, but rather their proportion: // Those are the expected eigenvalues, in descending order: double[] eigenvalues = { 1.28402771, 0.0490833989 }; // And this will be their proportion: double[] proportion = eigenvalues.Divide(eigenvalues.Sum()); // Those are the expected eigenvectors, // in descending order of eigenvalues: double[,] eigenvectors = { { -0.677873399, -0.735178656 }, { -0.735178656, 0.677873399 } }; // Now, here is the place most users get confused. The fact is that // the Eigenvalue decomposition (EVD) is not unique, and both the SVD // and EVD routines used by the framework produces results which are // numerically different from packages such as STATA or MATLAB, but // those are correct. // If v is an eigenvector, a multiple of this eigenvector (such as a*v, with // a being a scalar) will also be an eigenvector. In the Lindsay case, the // framework produces a first eigenvector with inverted signs. This is the same // as considering a=-1 and taking a*v. The result is still correct. // Retrieve the first expected eigenvector double[] v = eigenvectors.GetColumn(0); // Multiply by a scalar and store it back eigenvectors.SetColumn(0, v.Multiply(-1)); // Everything is alright (up to the 9 decimal places shown in the tutorial) Assert.IsTrue(eigenvectors.IsEqual(pca.ComponentMatrix, rtol: 1e-9)); Assert.IsTrue(proportion.IsEqual(pca.ComponentProportions, rtol: 1e-9)); Assert.IsTrue(eigenvalues.IsEqual(pca.Eigenvalues, rtol: 1e-5)); // Step 5. Deriving the new data set // --------------------------------- double[][] actual = pca.Transform(data); // transformedData shown in pg. 18 double[,] expected = new double[,] { { 0.827970186, -0.175115307 }, { -1.77758033, 0.142857227 }, { 0.992197494, 0.384374989 }, { 0.274210416, 0.130417207 }, { 1.67580142, -0.209498461 }, { 0.912949103, 0.175282444 }, { -0.099109437, -0.349824698 }, { -1.14457216, 0.046417258 }, { -0.438046137, 0.017764629 }, { -1.22382056, -0.162675287 }, }; // Everything is correct (up to 8 decimal places) Assert.IsTrue(expected.IsEqual(actual, atol: 1e-8)); // Let's say we would like to project down to one // principal component. It suffices to set: pca.NumberOfOutputs = 1; // And then do the transform actual = pca.Transform(data); // transformedData shown in pg. 18 expected = new double[,] { { 0.827970186 }, { -1.77758033, }, { 0.992197494 }, { 0.274210416 }, { 1.67580142, }, { 0.912949103 }, { -0.099109437 }, { -1.14457216, }, { -0.438046137 }, { -1.22382056, }, }; // Everything is correct (up to 8 decimal places) Assert.IsTrue(expected.IsEqual(actual, atol: 1e-8));
Some users would like to analyze huge amounts of data. In this case, computing the SVD directly on the data could result in memory exceptions or excessive computing times. If your data's number of dimensions is much less than the number of observations (i.e. your matrix have less columns than rows) then it would be a better idea to summarize your data in the form of a covariance or correlation matrix and compute PCA using the EVD.
The example below shows how to compute the analysis with covariance matrices only.
// Create the Principal Component Analysis // specifying the CovarianceMatrix method: var pca = new PrincipalComponentAnalysis() { Method = PrincipalComponentMethod.CovarianceMatrix, Means = mean // pass the original data mean vectors }; // Learn the PCA projection using passing the cov matrix MultivariateLinearRegression transform = pca.Learn(cov); // Now, we can transform data as usual double[,] actual = pca.Transform(data);