how to select threshold for unsupervised anomaly detection

Question

I am working on an anomaly detection use case. I studied one technique of selecting the threshold that marks 5% of validation data as anomalies. how it works in anomaly detection cases. and there is also another technique which selects the threshold that maximizes the difference between TPR and FPR.

Which technique is helpful in unsupervised learning and then comparing it with ground truth.

As we can find the ideal thresholds by plotting an RC curve with TP and FP rates. but its good technique to follow in unsupervised scenario?

Ashwiniku918 · Accepted Answer · 2022-04-20 06:20:08Z

1

Unsupervised means that you don't have any labelled data. To know the True Positive rates and False Positive Rates you need labels. In the absence of training data RC curve cannot be calculates.

You maybe be talking about isolation forest which assumes some percent of data as anomaly and that percent is hyperparam defined by the user. So you can choose 1 percent or 10% depending on the business use case in hand

answered Apr 20, 2022 at 6:20

Ashwiniku918

2,0945 silver badges18 bronze badges

$\begingroup$and what about selecting the threshold that marks 5% of validation data as anomalies. how it works ?$\endgroup$
– user12
CommentedApr 20, 2022 at 6:36
$\begingroup$i think you are telling the model to consider 5% of training data as anaomaly. So model will train to predict 5% of data as anomaly.$\endgroup$
– Ashwiniku918
CommentedApr 20, 2022 at 6:43
$\begingroup$i am not telling i am asking about the method of calculating threshold that marks 5% of validation data as anomalies. whtas the purpose of this how it make sense$\endgroup$
– user12
CommentedApr 20, 2022 at 6:50
1
$\begingroup$You are telling your model to classify 5% of your data as anomaly. I am not sure from where you heard 5% it can be 1% or 2% or any other depending on your domain understanding. The significance is just that model will train to classify 5% as anaomaly$\endgroup$
– Ashwiniku918
CommentedApr 20, 2022 at 6:55
1
$\begingroup$Usually people keep validation, because training & test data maybe used during the process of tuning the model. Validation Dataset act as a litmus test for your model before it goes to production. But if you use test data also, it would not be wrong.$\endgroup$
– Ashwiniku918
CommentedApr 20, 2022 at 7:30

| Show 1 more comment

Stack Exchange Network

how to select threshold for unsupervised anomaly detection

1 Answer 1

Hot Network Questions

how to select threshold for unsupervised anomaly detection

1 Answer 1

Related

Hot Network Questions