Skip to main content

All Questions

33votes
2answers
62kviews

How to calculate the fold number (k-fold) in cross validation?

I am confused about how I choose the number of folds (in k-fold CV) when I apply cross validation to check the model. Is it dependent on data size or other parameters?
Taimur Islam's user avatar
11votes
1answer
4kviews

Why you shouldn't upsample before cross validation

I have an imbalanced dataset and I am trying different methods to address the data imbalance. I found this article that explains the correct way to cross-validate when oversampling data using SMOTE ...
sums22's user avatar
10votes
3answers
11kviews

Nested cross-validation and selecting the best regression model - is this the right SKLearn process?

If I understand correctly, nested-CV can help me evaluate what model and hyperparameter tuning process is best. The inner loop (GridSearchCV) finds the best ...
BobbyJohnsonOG's user avatar
9votes
1answer
5kviews

What is GridSearchCV doing after it finishes evaluating the performance of parameter combinations that takes so long?

I'm running GridSearchCV to tune some parameters. For example: ...
Dan Scally's user avatar
7votes
2answers
6kviews

Is there a way of performing stratified cross validation using xgboost module in python?

I am training and predicting on the same data-set, but I want to perform 10-fold cross-validation and predict on the left out fold and thus predict on the whole data set. How can I do this? The ...
Ved Gupta's user avatar
6votes
2answers
38kviews

How to implement Python's MLPClassifier with gridsearchCV?

I am trying to implement Python's MLPClassifier with 10 fold cross-validation using gridsearchCV function. Here is a chunk of my code: ...
zx mnb's user avatar
4votes
2answers
26kviews

Found input variables with inconsistent numbers of samples

I would appreciate if you could let me know how to resolve this error: Code: ...
ebrahimi's user avatar
4votes
0answers
92views

Does ROC AUC different between crossval and test set indicate overfitting or other problem?

I am training a composite model (XGBoost, Linear Regression, and RandomForest) to predict injured people probability. Well, the results of cross-validation with 5 folds. Well, I can see any problem ...
GregOliveira's user avatar
3votes
2answers
1kviews

Understanding Sklearns learning_curve

I have been using sklearns learning_curve , and there are a few questions I have that are not answered by the documentation(see also here and here), as well as questions that are raised by the ...
Abijah's user avatar
3votes
1answer
4kviews

GridSearchCV results are different to directly applied default model (SVM)

I run a Support Vector Machines model on part of my train set with following result: ...
Mateusz Konopelski's user avatar
2votes
2answers
455views

Advice and Ideas appreciated - Machine Learning one man project

I have a project where I am supposed to start from scratch and learn how machine Learning works. So far everything is working out better than expected but I feel as I am offered to many ways to choose ...
CRoNiC's user avatar
2votes
1answer
2kviews

Validation curve unlike SKLearn sample

I'm trying to implement the validation curve based on this SKLearn tutorial. On the site, it shows how based on the parameters the model goes from under- to overfitted, finding the optimal parameter ...
lte__'s user avatar
  • 1,379
2votes
1answer
1kviews

Cross Validation for Different Metrics - Sklearn

When I am doing cross validation using Python's Sklearn and take the score of different metrics (accuracy, precision, etc.) like this: ...
Akhmad Zaki's user avatar
2votes
1answer
4kviews

validation_curve differs from cross_val_score?

I'm trying to see how well a decision tree classifier performs on my input. For this I'm trying to use the validation and learning curves and SKLearn's cross-validation methods. However, they differ, ...
lte__'s user avatar
  • 1,379
1vote
3answers
18kviews

Leave one out Cross validation using sklearn (Multiple CSV)

I have 52 CSV files in a folder. I want to build a model based on this data. That's why I want to Leave one out cross-validation on these data. How can I do this using sci-kit learn in python? I ...
Bloodstone Programmer's user avatar

153050per page
close