I have a few TB of wide data. I want to reduce the number of features in my dataset before feeding my dataset into a classification model... or should I not?
Obviously, I will want to try both methods if my cluster budget allows, but is there any obvious theory as to why either of the following approaches would be better than the other?
- Autoencoder (tanh, tanh, tanh layers)
- Or: drastically reduce the number of inputs, perhaps via dropout, to the second and third layers of the classification model itself. Also, droput