An autoencoder setup for anomaly detection

Question

I am doing anomaly detection using machine learning. i have tried different models like isolation forest, SVM and KNN. The maximum accuracy that I can get from each of them is $80\%$ accordind to my dataset which contains $5$ features and $4000$ data samples, $18\%$ of them are anomalous. When I use autoencoder and I adjust the proper reconstruction loss threshold I can get $92\%$ accuracy but the hidden layers setup of the autoencoder does not seems right despite the accuracy I get. As I said, I have only $5$ features and the setup as follows

 self.encoder = tf.keras.Sequential([ layers.Dense(64, activation="relu"), layers.Dense(32, activation="relu"), layers.Dense(16, activation="relu")]) self.decoder = tf.keras.Sequential([ layers.Dense(32, activation="relu"), layers.Dense(64, activation="relu"), layers.Dense(5, activation="sigmoid")])

Is this reasonable? because it looks like an encoder upsampling then downsampling that I did not see before which makes me missing something here.

Jon Nordby · Accepted Answer · 2022-07-08 17:08:22Z

The use of sigmoid as the output activation might be very problematic, as it forces the output to be in the range 0.0-1.0. If the features are not in this range - the network will not be able to represent the data well.

Furthermore you should add another layer at the end of the encoder, that maps the data into a low dimensional space. This is often called a "bottleneck" layer. Right now your input space is 5 dimensional (5 features), but your latent space is 16 dimensional - so the network can represent the data directly, it does not to learn any particular pattern. Try having 1-5 output size for this layer (tunable hyperparameter).

Lastly, when you have imbalanced data, then accuracy is not a good metric. With a class balance of 18% anomalous + 82% normal, an accuracy of 80% is actually below a no-information baseline - as always outputting normal will give 82% accuracy. Use F1 score or PR AUC instead.

Hi, Thanks for answering. The features are scaled between 0 and 1. Regarding my question, I found that in one highly cited paper, the author uses a similar structure of upscaling the input. The next layer compress it anyway. — Riva11, CommentedJul 14, 2022 at 22:14

Stack Exchange Network

An autoencoder setup for anomaly detection

1 Answer 1

Hot Network Questions

An autoencoder setup for anomaly detection

1 Answer 1

Related

Hot Network Questions