3
$\begingroup$

I have 1100 sequences for 2 classes. Of them 400 are from one class 1 and 700 are from class 2. I have used an one hidden layer auto-encoder of 2 neurons to capture my features. My initial features are tri-grams for each sequences. So, for each sequence I have 6860 tri-grams. As a result most of my input vectors are sparse vectors.

Now, if I calculate the parameters for this network, I have

6860 * 2 = 13720 paramters (1st layer) 2 * 6860 = 13720 parameters (2nd layer) ----------------------------------------- 27440 parameters (in total) 

Now, that is way too many parameters in comparison to my number of data points. Hence, I have used a dropout value of 0.98, on layer 1->hidden layer as well as hidden layer->output layer which makes the number of parameters 13720 * 0.02 = 274 on each layer and in total 548 parameters.

Now, after training, I tried the encoder on my test data of 500 sequences, and extracted the hidden layer of 2 dimensional data. Then I use that data on another 5 neuron single hidden layer neural network to classify. My results are really good in that I am getting around 90% accuracy on my test data which is around the same ball park of the training data accuracy.

My question is am I overfitting in my autoencoder? Am I overfitting by using another neural network? I am worried about my low number of data points. Does my use of dropout seem sensible?

I am worried that I am calculating the parameter numbers wrong, and my network has more parameters than number of data samples even after using Dropout. Does my calculation seem correct?

$\endgroup$

    1 Answer 1

    1
    $\begingroup$

    Additional drop out layers do not reduce parameters. Drop out is applied after each sample independently. This is done to Limit overfitting. So you still have the same parameters.

    From over 6000 dimensions to only 2 seems a bit too much. But it is hard to see. You could run a PCA Analysis and use that as a starting point for the number of hidden neurons, e.g. get the dimensionality that Accounts for 75% of the Variation in your data and go from there. Also try a second layer and add some form of regularization.

    $\endgroup$
    2
    • $\begingroup$Thanks for the answer. Given that I only have 1600 data points, is the input of 6860 parameters in the autoencoder correct? You also suggested adding a second layer, but won't that increase the parameter number more? Or for autoencoders it doesn't matter?$\endgroup$
      – nafizh
      CommentedNov 15, 2016 at 18:02
    • $\begingroup$That is hard to say. I'd focus more on the dimensionality, i.e. the number of inputs to the Autoencoder. Just try it and see how the error changes.$\endgroup$
      – hh32
      CommentedNov 15, 2016 at 18:40

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.