I have 1100
sequences for 2 classes. Of them 400
are from one class 1
and 700
are from class 2
. I have used an one hidden layer auto-encoder of 2
neurons to capture my features. My initial features are tri-grams for each sequences. So, for each sequence I have 6860
tri-grams. As a result most of my input vectors are sparse vectors.
Now, if I calculate the parameters for this network, I have
6860 * 2 = 13720 paramters (1st layer) 2 * 6860 = 13720 parameters (2nd layer) ----------------------------------------- 27440 parameters (in total)
Now, that is way too many parameters in comparison to my number of data points. Hence, I have used a dropout value of 0.98
, on layer 1->hidden layer
as well as hidden layer->output layer
which makes the number of parameters 13720 * 0.02 = 274
on each layer and in total 548
parameters.
Now, after training, I tried the encoder on my test data of 500
sequences, and extracted the hidden layer of 2 dimensional data. Then I use that data on another 5 neuron single hidden layer neural network to classify. My results are really good in that I am getting around 90%
accuracy on my test data which is around the same ball park of the training data accuracy.
My question is am I overfitting in my autoencoder? Am I overfitting by using another neural network? I am worried about my low number of data points. Does my use of dropout seem sensible?
I am worried that I am calculating the parameter numbers wrong, and my network has more parameters than number of data samples even after using Dropout. Does my calculation seem correct?