Skip to main content

All Questions

1vote
2answers
734views

Ways to increase recall in SVM

I am training an SVM on UCI's Bank Marketing Data Set, the bank additional-full.csv. As the data is skewed I am also interested in recall. I am getting accuracy of about 87.95% but my recall is around ...
6votes
2answers
3kviews

How to deal with missing data for Bernoulli Naive Bayes?

I am dealing with a dataset of categorical data that looks like this: ...
1vote
1answer
1kviews

Adding machine learning classifier at the end of CNN layer

I wanted to use the CNN as feature extractor for my images and then fed these features to some machine learning classifiers such as SVM, decision tree and KNN. However when I was trying with SVM I got ...
0votes
1answer
721views

Which classification_report metrics are appropriate to report/interpret for a binary label? Individual or macro average for both classes? scikit-learn

First, please forgive my ignorance; I am a newbie but dedicated to learning more. Example: I have a using a random forest classifier to predict a binary outcome. The binary outcome equals 1 if people ...
0votes
0answers
273views

Correct method to report Randomized Search CV results

I have searched online but I still cannot find a definitive answer on how to "correctly" report the results from hyperparameter tuning a machine learning model; though, this may just be some ...
1vote
2answers
257views

Why do we need hyperparameter tuning in Scikit learn? Doesn't sk learn models by default give best model?

When I have the option to build a classifier like this directly clf = RandomForestClassifier() why do we perform tuning by restricting the parameters like this <...
0votes
0answers
214views

SVM taking too much time to train

I'm trying to train my ML model with Svm.svc from sklearn, but it is taking so much time, it won't even train for once. This happens only when kernel function is used. Currently i selected 10 Features ...
1vote
1answer
72views

The Sklearn train_test_split function is create training data and test data which are not similar

I am working on loan default data and my model is not able to make accurate predictions on the test set because the the default percentage on the test set is very different from that of the training ...
0votes
2answers
76views

Which random_state to use in test_train_split when deploying final model?

I have developed a Random Forest that gives varying results depending on the random state of the test train split. This is normal, because a lot of the values in the data are extreme, without being ...
1vote
2answers
375views

How to remove test set so that model uses all data as training data?

I have developed a RandomForest classification model and I am pretty satisfied with the results on the test set. Now, my next step is to deploy the model. Before ...
4votes
2answers
2kviews

Flipping the labels in a binary classification gives different model and results

I have an imbalanced dataset and I want to train a binary classifier to model the dataset. Here was my approach which resulted into (relatively) acceptable performance: 1- I made a random split to get ...
0votes
0answers
52views

How to improve validation score

I am working on time series classification. My data has 4 classes. I used this paper's architecture on my data (1611.06455). However, my results look like this : . Here is a link to my notebook I ...
1vote
1answer
193views

Feature Engineer each class separately in Binary Classification

I have an imbalanced tabular dataset, my problem is a binary classification. The dataset is strongly imbalanced so I have performed oversampling, but it did not solve the issue, you can find the ...
1vote
0answers
299views

Low F1-Score due to Imbalanced Dataset even after resampling

I am performing a Binary Classification over an imbalanced dataset: 0: 16,263 1: 214 I have used multiple oversampling, undersampling, and combination techniques, below are the results that I have ...
1vote
0answers
26views

Laben Encoding for Target Classes: Any Integer or Consecutive Integers from Zero?

I'm handling an very conventional supervised classification task with three (mutually exclusive) target categories (not ordinal ones): ...

153050per page
close