0
$\begingroup$

I am doing anomaly detection using machine learning. i have tried different models like isolation forest, SVM and KNN. The maximum accuracy that I can get from each of them is $80\%$ accordind to my dataset which contains $5$ features and $4000$ data samples, $18\%$ of them are anomalous. When I use autoencoder and I adjust the proper reconstruction loss threshold I can get $92\%$ accuracy but the hidden layers setup of the autoencoder does not seems right despite the accuracy I get. As I said, I have only $5$ features and the setup as follows

 self.encoder = tf.keras.Sequential([ layers.Dense(64, activation="relu"), layers.Dense(32, activation="relu"), layers.Dense(16, activation="relu")]) self.decoder = tf.keras.Sequential([ layers.Dense(32, activation="relu"), layers.Dense(64, activation="relu"), layers.Dense(5, activation="sigmoid")]) 

Is this reasonable? because it looks like an encoder upsampling then downsampling that I did not see before which makes me missing something here.

$\endgroup$

    1 Answer 1

    1
    $\begingroup$

    The use of sigmoid as the output activation might be very problematic, as it forces the output to be in the range 0.0-1.0. If the features are not in this range - the network will not be able to represent the data well.

    Furthermore you should add another layer at the end of the encoder, that maps the data into a low dimensional space. This is often called a "bottleneck" layer. Right now your input space is 5 dimensional (5 features), but your latent space is 16 dimensional - so the network can represent the data directly, it does not to learn any particular pattern. Try having 1-5 output size for this layer (tunable hyperparameter).

    Lastly, when you have imbalanced data, then accuracy is not a good metric. With a class balance of 18% anomalous + 82% normal, an accuracy of 80% is actually below a no-information baseline - as always outputting normal will give 82% accuracy. Use F1 score or PR AUC instead.

    $\endgroup$
    1
    • $\begingroup$Hi, Thanks for answering. The features are scaled between 0 and 1. Regarding my question, I found that in one highly cited paper, the author uses a similar structure of upscaling the input. The next layer compress it anyway.$\endgroup$
      – Riva11
      CommentedJul 14, 2022 at 22:14

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.