Python sklearn PCA transform function output does not match

Question

I am computing PCA on some data using 10 components and using 3 out of 10 as:

transformer = PCA(n_components=10) trained=transformer.fit(train) one=numpy.matmul(train,numpy.transpose(trained.components_[:3,:]))

Here trained.components_[:3,:] are:

array([[-1.43311999e-03, 1.65635865e-01, 5.49189565e-01, 5.26069645e-02, 2.42638594e-01, 1.20957807e-02, 1.30595572e-01, 1.09279646e-02, 7.21299808e-03, -2.79057934e-02, -1.14834589e-02, 5.06289160e-01, 5.42890317e-01, 8.50422194e-02, 1.80935205e-01, 2.98473275e-05, -8.04537378e-04], [-1.05419313e-02, 3.09442577e-01, -8.15534934e-02, 4.28621520e-03, 2.93323569e-01, 3.85849115e-02, -1.16193185e-01, 4.14964652e-01, 4.16279154e-01, 2.95264788e-01, 3.28620106e-01, -2.60916490e-01, -2.37459426e-02, 1.57567265e-01, 4.02873342e-01, 5.28389303e-05, -2.07920000e-03], [ 8.63072772e-03, -3.26129082e-01, 8.59869400e-02, 3.04770780e-03, -3.14966419e-01, -2.47151330e-02, 1.05987767e-01, 3.74235953e-01, 3.75747065e-01, 2.76035253e-01, 3.18273743e-01, 3.02423861e-01, 2.76535177e-02, -1.51485057e-01, -4.48558170e-01, -8.83328996e-05, -2.25542180e-03]])

and using only 3 components as :

transformer = PCA(n_components=3) trained=transformer.fit(train) two=trained.transform(train)

Here the components are:

 array([[-1.43311999e-03, 1.65635865e-01, 5.49189565e-01, 5.26069645e-02, 2.42638594e-01, 1.20957807e-02, 1.30595572e-01, 1.09279646e-02, 7.21299808e-03, -2.79057934e-02, -1.14834589e-02, 5.06289160e-01, 5.42890317e-01, 8.50422194e-02, 1.80935205e-01, 2.98473275e-05, -8.04537377e-04], [-1.05419314e-02, 3.09442577e-01, -8.15534934e-02, 4.28621520e-03, 2.93323569e-01, 3.85849115e-02, -1.16193185e-01, 4.14964652e-01, 4.16279154e-01, 2.95264788e-01, 3.28620106e-01, -2.60916490e-01, -2.37459426e-02, 1.57567265e-01, 4.02873342e-01, 5.28389307e-05, -2.07919994e-03], [ 8.63072765e-03, -3.26129082e-01, 8.59869400e-02, 3.04770780e-03, -3.14966419e-01, -2.47151331e-02, 1.05987767e-01, 3.74235953e-01, 3.75747065e-01, 2.76035253e-01, 3.18273743e-01, 3.02423861e-01, 2.76535177e-02, -1.51485057e-01, -4.48558170e-01, -8.83328994e-05, -2.25542175e-03]])

But one comes not equal to two. Components are same in both. They are not same because transform function first subtracts the original data by mean vector and then multiplies with components. But why should the mean be subtracted here. As they are subtracted in the first step to compute PCA for computing basis.

Does this Q&A address your question - It discusses subtraction of the mean (forward transform) and then adding the mean back (inverse transform): stackoverflow.com/questions/32750915/… — WestCoastProjects, CommentedJul 10, 2019 at 19:31
I need to think about this a bit, but the components aren't the exact same if that is what you were worried about. There is a small difference. I wonder if you change the training data to float64 if that would decrease the error. This might shine some light, but it looks like you are doing it correctly. https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/pca.py Their version is ` X_transformed = np.dot(X, self.components_.T)` — ldmtwo, CommentedJul 10, 2019 at 22:00
And here https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/base.py — ldmtwo, CommentedJul 10, 2019 at 22:00

Ethan · Accepted Answer · 2020-12-03 00:39:22Z

If you look at the source code, the PCA is calculated through the SVD. I believe it iterates until "good enough."

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/pca.py

Stack Exchange Network

Python sklearn PCA transform function output does not match

1 Answer 1

Hot Network Questions

Python sklearn PCA transform function output does not match

1 Answer 1

Related

Hot Network Questions