1
$\begingroup$

I am computing PCA on some data using 10 components and using 3 out of 10 as:

transformer = PCA(n_components=10) trained=transformer.fit(train) one=numpy.matmul(train,numpy.transpose(trained.components_[:3,:])) 

Here trained.components_[:3,:] are:

array([[-1.43311999e-03, 1.65635865e-01, 5.49189565e-01, 5.26069645e-02, 2.42638594e-01, 1.20957807e-02, 1.30595572e-01, 1.09279646e-02, 7.21299808e-03, -2.79057934e-02, -1.14834589e-02, 5.06289160e-01, 5.42890317e-01, 8.50422194e-02, 1.80935205e-01, 2.98473275e-05, -8.04537378e-04], [-1.05419313e-02, 3.09442577e-01, -8.15534934e-02, 4.28621520e-03, 2.93323569e-01, 3.85849115e-02, -1.16193185e-01, 4.14964652e-01, 4.16279154e-01, 2.95264788e-01, 3.28620106e-01, -2.60916490e-01, -2.37459426e-02, 1.57567265e-01, 4.02873342e-01, 5.28389303e-05, -2.07920000e-03], [ 8.63072772e-03, -3.26129082e-01, 8.59869400e-02, 3.04770780e-03, -3.14966419e-01, -2.47151330e-02, 1.05987767e-01, 3.74235953e-01, 3.75747065e-01, 2.76035253e-01, 3.18273743e-01, 3.02423861e-01, 2.76535177e-02, -1.51485057e-01, -4.48558170e-01, -8.83328996e-05, -2.25542180e-03]]) 

and using only 3 components as :

transformer = PCA(n_components=3) trained=transformer.fit(train) two=trained.transform(train) 

Here the components are:

 array([[-1.43311999e-03, 1.65635865e-01, 5.49189565e-01, 5.26069645e-02, 2.42638594e-01, 1.20957807e-02, 1.30595572e-01, 1.09279646e-02, 7.21299808e-03, -2.79057934e-02, -1.14834589e-02, 5.06289160e-01, 5.42890317e-01, 8.50422194e-02, 1.80935205e-01, 2.98473275e-05, -8.04537377e-04], [-1.05419314e-02, 3.09442577e-01, -8.15534934e-02, 4.28621520e-03, 2.93323569e-01, 3.85849115e-02, -1.16193185e-01, 4.14964652e-01, 4.16279154e-01, 2.95264788e-01, 3.28620106e-01, -2.60916490e-01, -2.37459426e-02, 1.57567265e-01, 4.02873342e-01, 5.28389307e-05, -2.07919994e-03], [ 8.63072765e-03, -3.26129082e-01, 8.59869400e-02, 3.04770780e-03, -3.14966419e-01, -2.47151331e-02, 1.05987767e-01, 3.74235953e-01, 3.75747065e-01, 2.76035253e-01, 3.18273743e-01, 3.02423861e-01, 2.76535177e-02, -1.51485057e-01, -4.48558170e-01, -8.83328994e-05, -2.25542175e-03]]) 

But one comes not equal to two. Components are same in both. They are not same because transform function first subtracts the original data by mean vector and then multiplies with components. But why should the mean be subtracted here. As they are subtracted in the first step to compute PCA for computing basis.

$\endgroup$
6
  • $\begingroup$Can you show the component weights of each model?$\endgroup$
    – ldmtwo
    CommentedJul 10, 2019 at 17:16
  • $\begingroup$They are same. I have updated the question.$\endgroup$CommentedJul 10, 2019 at 17:20
  • $\begingroup$Does this Q&A address your question - It discusses subtraction of the mean (forward transform) and then adding the mean back (inverse transform): stackoverflow.com/questions/32750915/…$\endgroup$CommentedJul 10, 2019 at 19:31
  • $\begingroup$I need to think about this a bit, but the components aren't the exact same if that is what you were worried about. There is a small difference. I wonder if you change the training data to float64 if that would decrease the error. This might shine some light, but it looks like you are doing it correctly. https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/pca.py Their version is ` X_transformed = np.dot(X, self.components_.T)`$\endgroup$
    – ldmtwo
    CommentedJul 10, 2019 at 22:00
  • $\begingroup$And here https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/base.py$\endgroup$
    – ldmtwo
    CommentedJul 10, 2019 at 22:00

1 Answer 1

0
$\begingroup$

If you look at the source code, the PCA is calculated through the SVD. I believe it iterates until "good enough."

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/pca.py

$\endgroup$

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.