I have an imbalanced dataset on intrusion detection. I have (attack class) 3668045 samples and (benign class) 477 samples. I made a 70:30 Train test split. My problem is to predict whether the given node belongs to the attack class or the benign class. As a first step, I trained a decision tree model on the dataset without using any balancing technique. I obtained the following results for my model on the test set using the sklearn metrics.
Scores for Decision Tree Accuracy: 0.9998991419799247 True positive 1100391 True Negative 55 False Positive 86 False Negative 25 F2-score 0.9999661949775551 Precision 0.9999218520696025 Recall 0.9999772813190648 F1-score 0.9999495659261946 Log loss: 0.0034835750853569407 Decision Tree : AUROC (ROC Curve) = 0.999 Decision Tree : AUPR(Precision/Recall curve) = 1.000 Classification Report precision recall f1-score support 0 0.69 0.39 0.50 141 1 1.00 1.00 1.00 1100416 accuracy 1.00 1100557 macro avg 0.84 0.70 0.75 1100557 weighted avg 1.00 1.00 1.00 1100557
Why am I getting high, almost perfect AUROC and AUPR scores, even though the precision and recall for my minority class are very low? What measures can I take to improve the results such that they are not biased and my model is generalizing well? How can I ensure that?