Skip to main content

Questions tagged [cross-validation]

Refers to general procedures that attempt to determine the generalizability of a statistical result. Cross-validation arises frequently in the context of assessing how a particular model fit predicts future observations. Methods for cross-validation usually involve withholding a random subset of the data during model fitting and quantifying how accurate the withheld data are predicted and repeating this process to get a measure of prediction accuracy.

2votes
1answer
77views

400 instances dataset XGboost, is my model overfitting?

Im working on a regression problem with 400 samples and 7 features, to predict job durations of machineries from historical data. Im using XGboost and (90,10) split works better than (80,20) split. Is ...
barcamela's user avatar
1vote
0answers
28views

Choosing the number of features via cross-validation

I have an algorithm that trains a binary predictive model for a specified number of features from the dataset (features are all of the same type, but not all important.) Thus, the number of features ...
Roger V.'s user avatar
1vote
0answers
29views

Can't understand the evaluation approach used in this paper

In this paper, two deep learning models where proposed: Hybrid-AttUnet++ and EH-AttUnet++. The first model, Hybrid-AttUnet++, is simply a modified U-net model, and the second model is an ensemble ...
AAA_11's user avatar
0votes
0answers
15views

Error in plotting Gaussian Process for 3 models that use Bayesian Optimization

I'm writing a python script for Orange Data Mining to plot the gaussian processes in order to find the best hyperparameters for the 5-FoldCrossValidation Accuracy metric. The three models are SVC, ...
Mattma's user avatar
0votes
0answers
33views

how to properly implement Random Undersampling during Cross-Validation in Orange

I am working on a highly imbalanced fraud detection dataset (class 0:284315 instances, class 1: 492 instances) and trying to implement random undersampling correctly during cross-validation in Orange. ...
Mattma's user avatar
3votes
1answer
73views

Is it statistically wrong to adjust for sex and race then do subgroup based on them?

I am doing for subgroup analysis of early mortality (Outcome) based on Transfusion(WITH ADJUSTMENT for both ...
Mohamed Rahouma's user avatar
3votes
1answer
140views

Need advice regarding cross-validiation to obtain optimal lambda in Lasso

I am comparatively new to machine learning and any suggestions and coding corrections will be a great help. I am using Lasso for feature selection and want to select the lambda that produces the ...
h_ihkam's user avatar
7votes
1answer
141views

Nested-cross validation pipeline and confidence intervals

I'm hoping someone can help me think through this. I've come across a lot of different resources on nested-cv, but I think I'm confused as to how to go about model selection and the appropriate ...
molecularrunner's user avatar
3votes
1answer
51views

When I use linear regression in machine learning, variables selection is same as choosing turning parameters?

I am a newbie in machine learning. After days of studying the ideas of machine learning, I have made some conclusions, which are below (I only consider supervised learning). Step 1: Data splitting ...
Student coding's user avatar
0votes
0answers
23views

Is this a good way to use a separate validation set with k-fold cross-validation?

I am training a CNN, and I divided the dataset into 70% training set, 20% validation set, and 10% test set. What I want is to use this validation set for early stopping to avoid overfitting the model ...
AAA_11's user avatar
1vote
0answers
88views

XGBoost CV confusion on how to choose eval set

If I am using XGBoost with GridSearchCV, how should I choose my evaluation set? Note, I am referring to eval_set within the model params. My current implementation is using GridSearchCV in order to ...
user54565's user avatar
1vote
0answers
47views

What is the standard ML pipeline for training and testing? [closed]

I have a dataframe containing 1324 rows and 28 columns and I'm kinda lost on which approach to go for when training regression models. Currently I perform a data split and run GridSearchCV to pick the ...
Davi Magalhães's user avatar
1vote
1answer
18views

CV-kNN performing worse than kNN

I have been writing some code which compares some basic classifiers. Just wondering if CV-kNN can perform worse than regular kNN when checking performance on test data? We train the models using a ...
scruby's user avatar
1vote
0answers
102views

Confused about use of random states for training models in scikit

I am new to ML and currently working on improving the accuracy of an MLPClassifier in scikit. My code looks like so ...
Leandro's user avatar
1vote
1answer
35views

XGB find hyperparameters and then crossvalidation

I want to train an XGBoost model, and here's how I believe the process should go: Step 1: Find the optimal hyperparameters using GridSearchCV. Step 2: Evaluate the selected parameters. My question is: ...
Math_D's user avatar

153050per page
close