2
$\begingroup$

I already referred these posts here and here. So, please don't mark it as duplicate

I am doing a binary classification using random forest and class labels are 1 and 0. What is the likelihood that supplier will meet the target

I got the below output from SHAP summary plot

enter image description here

How do I know which feature leads to class 1 and class 0?

Does it mean high values of each feature leads to class 1?

And low values of each feature lead to class 0?

When my output probability range is 0 to 1, why does the SHAP plot return something like 0 to 0.20` etc

What does mean SHAP value mean?

$\endgroup$

    1 Answer 1

    2
    $\begingroup$

    How do I know which feature leads to class 1 and class 0?

    The length of the bar tells you how much influence the feature has on the prediction.

    Does it mean high values of each feature leads to class 1?

    No, to see this use summary plot

    And low values of each feature lead to class 0?

    Same as previous answer.

    When my output probability range is 0 to 1, why does the SHAP plot return something like 0 to 0.20` etc

    What it is showing you is by how much each feature contributes to the prediction on average. And I suspect that the reason sum of contributions doesn't add up to 1 is that you have an unbalanced dataset.

    What does mean SHAP value mean?

    SHAP first computes scores per observation, but to get contributions of each feature overall it averages the values across observations.

    $\endgroup$
    6
    • $\begingroup$Thanks a lot for the help. Upvoted. While I will use the beeswarm plot suggested, but can you still help me interpret the above graph for questions 2 and 3?$\endgroup$
      – The Great
      CommentedMar 19, 2022 at 2:41
    • $\begingroup$regarding the sum of contributions, I see that if I add all of them, I get approx ~0.56. So, does it mean my features only explain 56% of the variance in outcome variabele? I am trying to learn from you on how did you infer/come to a conclusion that my data has more 0's than 1s (and that's why it is not adding upto 1)$\endgroup$
      – The Great
      CommentedMar 19, 2022 at 2:44
    • $\begingroup$@TheGreat, The plot that you have only tells you the relative importance of the features. Since you only have two classes, The contribution to each class is the same, positive in one direction and negative in the other with same magnitude.$\endgroup$
      – Akavall
      CommentedMar 19, 2022 at 19:19
    • $\begingroup$For the second question, no, the sum does not tell you anything about the variance these features explain, it only shows you the relative importance. The reason, I think that you sum does not add up to one is that generally what I see on unbalanced datasets is that a contribution of each feature is smaller. The reason for this (I think, not 100% sure) is the the contributions start with some sort of a prior that is equal the overall ratio in the population. So if you number of positives is 0.15, you would start with that prior, and even if all you features will bring the prediction to 0,$\endgroup$
      – Akavall
      CommentedMar 19, 2022 at 19:25
    • $\begingroup$their overall contribution will be 0.15.$\endgroup$
      – Akavall
      CommentedMar 19, 2022 at 19:25

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.