Usually binary classifiers are implemented with one output node and Sigmoid activation function. In that case the output you get is the predicted probability of an observation being of class 1 (compared to 0). If you want a probability distribution you can simply pair that y predicted, with 1-y, meaning "the probability of the other class".
Alternatively, you could implement the model with two output nodes, and Softmax activation function. The output would then be a probability distribution on the two classes.
I'm also looking for a confidence level of each prediction. Is it possible to get the same with this architecture, or does it need some modification?
Neural Networks do not calculate confidence levels. Many studies have been made on how to estimate them. The one I know better is based on dropout as a perturbation method. It is based on using varying levels of droput after the model was trained on multiple prediction, and building standard confidence intervals on those. You can read an introduction to it here. I have to say though, the implementation is not straightforward, and it's a computationally intensive procedure.