1
$\begingroup$

I’m working on a fully unsupervised anomaly detection problem. Since it’s completely unsupervised, I’m having hard times in defining some metrics to kind of validate the results (I run several algorithms, but the final result is a binary classification). I was considering a Mann-Whitney test to check for significant differences between the sample that I labelled anomalous and the non-anomalous one. Of course this is not going to tell me whether the classification is proper or not, but at least I can assume that my classification method splits my data into two significantly different samples. Does it make sense ? Thanks for any reply and maybe potential effective alternatives.

$\endgroup$
1
  • $\begingroup$Can you clarify what labeling did you do?$\endgroup$
    – Iyar Lin
    CommentedAug 31, 2022 at 8:32

2 Answers 2

1
$\begingroup$
  1. One way would be to separate algorithm testing and model evaluation for the specific task.
  2. For algorithm testing, you can use a related benchmark dataset and validate results.
  3. For your task, you would want to create a ground truth dataset and validate against it.
$\endgroup$
    0
    $\begingroup$

    In terms of anomaly detection, it is useful to know a certainty range for each value (ex: A has 80% chance to be an anomaly) to be able to set a threshold that would be a model quality parameter.

    The certainty range could be measured in many existing algorithms, for instance by getting the distance from an outlier to the closest group of normal values.

    Consequently, I would recommend ways to measure such a distance and give a relative value to it (i.e. compared to other existing outliers' distances).

    Unfortunately, most unsupervised algorithms don't have a universal rule, because their results are relative to each other. I don't know if it is your case, but if it is so, you should test values from data analytics indicators (standard deviation, etc.) and define the most realistic ones.

    If it is not the case, I'd like to know which unsupervised algorithm are you using to see if there is any solution.

    $\endgroup$
    1
    • $\begingroup$Does it answer your question? If not, please let me know.$\endgroup$CommentedSep 22, 2022 at 7:29

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.