Skip to main content

Questions tagged [scikit-learn]

scikit-learn is a popular machine learning package for Python that has simple and efficient tools for predictive data analysis. Topics include classification, regression, clustering, dimensionality reduction, model selection, and preprocessing.

2votes
0answers
9views

Low Accurecy from Geospatial Random forest ML modeling problem - Training Exported from qGIS, SCP

I am doing a geospatial assessment integrated with ML modeling. The problem is the very low accuracy percentage, as more training features increases, it gets lower. What could be he solution to such a ...
Reem 's user avatar
0votes
0answers
12views

Isolation Forest sample size

I am using sklearn's Isolation Forest as a model to detect anomalies. My dataset is relatively small, 50 records with only 2-3 features. To prevent any overfitting, what would you recommend to tune ...
Mar's user avatar
  • 85
4votes
0answers
23views

Trying to train ML model to do regression for US Department of Transportation Kaggle Flights Dataset with 5 million records and 7 features

For a college project for my data science course I am trying to fit a model based on the U.S. DOT's 2015 Kaggle Flight Cancellations dataset, but am not having great luck with model performance (MSE ...
Jake Malis's user avatar
4votes
1answer
54views

Unsupervised Isolation Forrest sklearn hyperparameters

I am using sklearn's IsolationForest for unsupervised anomaly detection task. According to the docs, https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html, there are ...
Mar's user avatar
  • 85
3votes
1answer
37views

Confirm understanding of decision_function in Isolation Forest sklearn

I am looking to better understand sklearn IsolationForest decision_function. My understanding is that if the metric is closer to -1 then the model is more confident ...
Mar's user avatar
  • 85
1vote
0answers
19views

How to correctly use RFECV for feature selection in a Scikit-Learn pipeline with a Simple Decision Tree?

I am working on the Kaggle House Price Prediction competition and have built a Scikit-Learn pipeline that includes: Preprocessing (handling missing values, scaling, encoding) Feature Engineering ...
Jake Ferris's user avatar
-1votes
0answers
37views

ML model for Career Prediction

I am NOT able to figure out how to make a ML model. I have been chatgpting most of it and understanding the code, I'm doing next to nothing. No matter what code I input, the accuracy is always 0%... ...
Ananya Vijay's user avatar
2votes
0answers
20views

Preprocessing multivalue attributes in a dataframe, similar to Nominal

Description: Input is a CSV file CSV file contains columns of different data types: Ordinal Values, Nominal Values, Numerical Values and Multi Value For the multivalue columns. Minimum is 1, ...
DILF Unboxing's user avatar
1vote
0answers
32views

Predicting PGA Tour results with Linear Regression

I have curated a dataset from various online sources that contains information about each PGA player's weekly performance/trends. I'm attempting to predict their finishing positions at the next ...
racurry1993's user avatar
2votes
0answers
36views

Determine best hyperprameteres in GridSearch - Isolation Forest

I have implemented an Isolation Forest algorithm for anomaly detection (unsupervised learning), where I divided my dataset into 1000 subsets, and for each subset, there is one isolation tree. This ...
Learner's user avatar
2votes
1answer
44views

I can't get my R² above 70%

I tried RandomForest, LGBM, Knneighbors, Polynomial Regression as algorithm's and cross-validation, train test split and standard scaler, nothing seem's to get it past the 70% mark. The dataframe has ...
user178825's user avatar
0votes
0answers
25views

Agglomerative clustering classifies 98% of my data in 1 cluster. Why?

I have a JSD distance matrix that I'm trying to cluster. When generating 24 clusters (roughly the amount the shows up on the clustermap), it assigns vast majority of the data as 1 cluster. Weirdly ...
youtube's user avatar
1vote
0answers
40views

OneClassSVM super slow training with poly kernel

In contrast to questions like here, where a slow SVM training results from a high number of samples, I only have around 500 samples. Still, a single training fold (cross-validation) takes several ...
UserPo41085's user avatar
3votes
1answer
31views

Looking to replace missing time series values with values from a competitor that's correlated

I have a dataset of a retailer that has the following attributes Date, Hour, Enters, Exits I have another dataset with the same attributes of a competitor that is correlated with the original dataset ...
utink's user avatar
1vote
1answer
35views

scipy bootstrap generates input with inconsistent numbers of samples

I have a dataset of 77 samples, and I am using scipy bootstrap to get a confidence interval to estimate the precision. I am baffled to see that it generates input variables with inconsistent numbers ...
Wouter De Coster's user avatar

153050per page
close