2
$\begingroup$

I'm working in binary classifier problem, where I have used Tensorflow low level API's. Last layer wrapped with Sigmoidal Function and just returning a single value.

For my prediction I just set a standard threshold value of 0.5 and hence if it's >0.5, I'm predicting as 1, otherwise 0.

I'm also looking for a confidence level of each prediction. Is it possible to get the same with this architecture, or does it need some modification?

$\endgroup$

    2 Answers 2

    2
    $\begingroup$

    Since a neural net that ends with a sigmoid activation outputs probabilities, you can take the output of the network as is.

    If you're referring to scikit-learn's predict_proba, it is equivalent to taking the sigmoid-activated output of the model in tensorflow. In fact that's exactly what scikit-learn does.

    E.g. if tour model outputs $0.8$ for class $1$, you would classify this as $1$ (since $0.8 > 0.5$), with a probability of $0.8$. S

    $\endgroup$
    2
    • $\begingroup$What is in a case model output 0.1. That means it's quiet confident for 0th class. What will be the probability in such case? Can we say for class 0 probability can be (1-sigmoid value) and for class 1, probability will be sigmoid value itself.$\endgroup$CommentedApr 15, 2020 at 10:10
    • $\begingroup$Yes, that is correct. This can also be generalized to multi-class networks, if you use a softmax output.$\endgroup$
      – Djib2011
      CommentedApr 15, 2020 at 11:39
    1
    $\begingroup$

    Usually binary classifiers are implemented with one output node and Sigmoid activation function. In that case the output you get is the predicted probability of an observation being of class 1 (compared to 0). If you want a probability distribution you can simply pair that y predicted, with 1-y, meaning "the probability of the other class".

    Alternatively, you could implement the model with two output nodes, and Softmax activation function. The output would then be a probability distribution on the two classes.


    I'm also looking for a confidence level of each prediction. Is it possible to get the same with this architecture, or does it need some modification?

    Neural Networks do not calculate confidence levels. Many studies have been made on how to estimate them. The one I know better is based on dropout as a perturbation method. It is based on using varying levels of droput after the model was trained on multiple prediction, and building standard confidence intervals on those. You can read an introduction to it here. I have to say though, the implementation is not straightforward, and it's a computationally intensive procedure.

    $\endgroup$
    3
    • $\begingroup$So far I know sigmoid return a value between 0 and 1, value less than 0.5 will be treated for 0'th class and 1 otherwise. So merely subtracting from 1, we can't get the probability of other class. For example if the sigmoid output is 0.3, it means for me its 0 label, but doesn't mean .70(70%1-0.3) is the probability of 1 label. Or are you saying something different?$\endgroup$CommentedApr 15, 2020 at 10:05
    • $\begingroup$They can be treated as probabilties, or their estimates. You can use two output nodes and Softmax activation if you want a real distribution of values that sums up to 1. In that case you need categorical crossentropy loss, instead of binary.$\endgroup$
      – Leevo
      CommentedApr 15, 2020 at 12:04
    • $\begingroup$Softmax will be the last option for me. Any ways thanks for your reply.$\endgroup$CommentedApr 15, 2020 at 17:03

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.