All Questions
44 questions
33votes
2answers
62kviews
How to calculate the fold number (k-fold) in cross validation?
I am confused about how I choose the number of folds (in k-fold CV) when I apply cross validation to check the model. Is it dependent on data size or other parameters?
11votes
1answer
4kviews
Why you shouldn't upsample before cross validation
I have an imbalanced dataset and I am trying different methods to address the data imbalance. I found this article that explains the correct way to cross-validate when oversampling data using SMOTE ...
10votes
3answers
11kviews
Nested cross-validation and selecting the best regression model - is this the right SKLearn process?
If I understand correctly, nested-CV can help me evaluate what model and hyperparameter tuning process is best. The inner loop (GridSearchCV) finds the best ...
9votes
1answer
5kviews
What is GridSearchCV doing after it finishes evaluating the performance of parameter combinations that takes so long?
I'm running GridSearchCV to tune some parameters. For example: ...
7votes
2answers
6kviews
Is there a way of performing stratified cross validation using xgboost module in python?
I am training and predicting on the same data-set, but I want to perform 10-fold cross-validation and predict on the left out fold and thus predict on the whole data set. How can I do this? The ...
6votes
2answers
38kviews
How to implement Python's MLPClassifier with gridsearchCV?
I am trying to implement Python's MLPClassifier with 10 fold cross-validation using gridsearchCV function. Here is a chunk of my code: ...
4votes
2answers
26kviews
Found input variables with inconsistent numbers of samples
I would appreciate if you could let me know how to resolve this error: Code: ...
4votes
0answers
92views
Does ROC AUC different between crossval and test set indicate overfitting or other problem?
I am training a composite model (XGBoost, Linear Regression, and RandomForest) to predict injured people probability. Well, the results of cross-validation with 5 folds. Well, I can see any problem ...
3votes
2answers
1kviews
Understanding Sklearns learning_curve
I have been using sklearns learning_curve , and there are a few questions I have that are not answered by the documentation(see also here and here), as well as questions that are raised by the ...
3votes
1answer
4kviews
GridSearchCV results are different to directly applied default model (SVM)
I run a Support Vector Machines model on part of my train set with following result: ...
2votes
2answers
455views
Advice and Ideas appreciated - Machine Learning one man project
I have a project where I am supposed to start from scratch and learn how machine Learning works. So far everything is working out better than expected but I feel as I am offered to many ways to choose ...
2votes
1answer
2kviews
Validation curve unlike SKLearn sample
I'm trying to implement the validation curve based on this SKLearn tutorial. On the site, it shows how based on the parameters the model goes from under- to overfitted, finding the optimal parameter ...
2votes
1answer
1kviews
Cross Validation for Different Metrics - Sklearn
When I am doing cross validation using Python's Sklearn and take the score of different metrics (accuracy, precision, etc.) like this: ...
2votes
1answer
4kviews
validation_curve differs from cross_val_score?
I'm trying to see how well a decision tree classifier performs on my input. For this I'm trying to use the validation and learning curves and SKLearn's cross-validation methods. However, they differ, ...
1vote
3answers
18kviews
Leave one out Cross validation using sklearn (Multiple CSV)
I have 52 CSV files in a folder. I want to build a model based on this data. That's why I want to Leave one out cross-validation on these data. How can I do this using sci-kit learn in python? I ...