1
$\begingroup$

I have a few TB of wide data. I want to reduce the number of features in my dataset before feeding my dataset into a classification model... or should I not?

Obviously, I will want to try both methods if my cluster budget allows, but is there any obvious theory as to why either of the following approaches would be better than the other?

  • Autoencoder (tanh, tanh, tanh layers)
  • Or: drastically reduce the number of inputs, perhaps via dropout, to the second and third layers of the classification model itself. Also, droput
$\endgroup$

    1 Answer 1

    1
    $\begingroup$

    Autoencoder is a good option for doing dimensionality reduction, however, you will need another training for autoencoder.

    If you want dimensionality reduction before feeding into a classification model, how about PCA and t-SNE? They don't require training process for doing dimensionality reduction.

    $\endgroup$

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.