Questions tagged [scikit-learn]

Ask Question

scikit-learn is a popular machine learning package for Python that has simple and efficient tools for predictive data analysis. Topics include classification, regression, clustering, dimensionality reduction, model selection, and preprocessing.

2,303 questions

2votes

0answers

9views

Low Accurecy from Geospatial Random forest ML modeling problem - Training Exported from qGIS, SCP

I am doing a geospatial assessment integrated with ML modeling. The problem is the very low accuracy percentage, as more training features increases, it gets lower. What could be he solution to such a ...

Reem

asked Apr 21 at 18:45

0votes

0answers

12views

Isolation Forest sample size

I am using sklearn's Isolation Forest as a model to detect anomalies. My dataset is relatively small, 50 records with only 2-3 features. To prevent any overfitting, what would you recommend to tune ...

Mar

asked Apr 21 at 18:28

4votes

0answers

23views

Trying to train ML model to do regression for US Department of Transportation Kaggle Flights Dataset with 5 million records and 7 features

For a college project for my data science course I am trying to fit a model based on the U.S. DOT's 2015 Kaggle Flight Cancellations dataset, but am not having great luck with model performance (MSE ...

Jake Malis

asked Apr 19 at 0:36

4votes

1answer

54views

Unsupervised Isolation Forrest sklearn hyperparameters

I am using sklearn's IsolationForest for unsupervised anomaly detection task. According to the docs, https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html, there are ...

Mar

asked Apr 17 at 17:33

3votes

1answer

37views

Confirm understanding of decision_function in Isolation Forest sklearn

I am looking to better understand sklearn IsolationForest decision_function. My understanding is that if the metric is closer to -1 then the model is more confident ...

Mar

asked Apr 16 at 13:37

1vote

0answers

19views

How to correctly use RFECV for feature selection in a Scikit-Learn pipeline with a Simple Decision Tree?

I am working on the Kaggle House Price Prediction competition and have built a Scikit-Learn pipeline that includes: Preprocessing (handling missing values, scaling, encoding) Feature Engineering ...

Jake Ferris

asked Apr 15 at 2:27

-1votes

0answers

37views

ML model for Career Prediction

I am NOT able to figure out how to make a ML model. I have been chatgpting most of it and understanding the code, I'm doing next to nothing. No matter what code I input, the accuracy is always 0%... ...

Ananya Vijay

asked Apr 9 at 10:27

2votes

0answers

20views

Preprocessing multivalue attributes in a dataframe, similar to Nominal

Description: Input is a CSV file CSV file contains columns of different data types: Ordinal Values, Nominal Values, Numerical Values and Multi Value For the multivalue columns. Minimum is 1, ...

DILF Unboxing

asked Apr 2 at 8:56

1vote

0answers

32views

Predicting PGA Tour results with Linear Regression

I have curated a dataset from various online sources that contains information about each PGA player's weekly performance/trends. I'm attempting to predict their finishing positions at the next ...

racurry1993

asked Mar 20 at 18:47

2votes

0answers

36views

Determine best hyperprameteres in GridSearch - Isolation Forest

I have implemented an Isolation Forest algorithm for anomaly detection (unsupervised learning), where I divided my dataset into 1000 subsets, and for each subset, there is one isolation tree. This ...

Learner

asked Mar 16 at 11:44

2votes

1answer

44views

I can't get my R² above 70%

I tried RandomForest, LGBM, Knneighbors, Polynomial Regression as algorithm's and cross-validation, train test split and standard scaler, nothing seem's to get it past the 70% mark. The dataframe has ...

user178825

asked Mar 7 at 0:22

0votes

0answers

25views

Agglomerative clustering classifies 98% of my data in 1 cluster. Why?

I have a JSD distance matrix that I'm trying to cluster. When generating 24 clusters (roughly the amount the shows up on the clustermap), it assigns vast majority of the data as 1 cluster. Weirdly ...

youtube

asked Feb 28 at 9:19

1vote

0answers

40views

OneClassSVM super slow training with poly kernel

In contrast to questions like here, where a slow SVM training results from a high number of samples, I only have around 500 samples. Still, a single training fold (cross-validation) takes several ...

UserPo41085

asked Feb 8 at 20:46

3votes

1answer

31views

Looking to replace missing time series values with values from a competitor that's correlated

I have a dataset of a retailer that has the following attributes Date, Hour, Enters, Exits I have another dataset with the same attributes of a competitor that is correlated with the original dataset ...

utink

asked Jan 28 at 3:23

1vote

1answer

35views

scipy bootstrap generates input with inconsistent numbers of samples

I have a dataset of 77 samples, and I am using scipy bootstrap to get a confidence interval to estimate the precision. I am baffled to see that it generates input variables with inconsistent numbers ...

Wouter De Coster

asked Jan 24 at 9:55

15 30 50per page

2 3 4 5

…

154 Next

Stack Exchange Network

Questions tagged [scikit-learn]

Low Accurecy from Geospatial Random forest ML modeling problem - Training Exported from qGIS, SCP

Isolation Forest sample size

Trying to train ML model to do regression for US Department of Transportation Kaggle Flights Dataset with 5 million records and 7 features

Unsupervised Isolation Forrest sklearn hyperparameters

Confirm understanding of decision_function in Isolation Forest sklearn

How to correctly use RFECV for feature selection in a Scikit-Learn pipeline with a Simple Decision Tree?

ML model for Career Prediction

Preprocessing multivalue attributes in a dataframe, similar to Nominal

Predicting PGA Tour results with Linear Regression

Determine best hyperprameteres in GridSearch - Isolation Forest

I can't get my R² above 70%

Agglomerative clustering classifies 98% of my data in 1 cluster. Why?

OneClassSVM super slow training with poly kernel

Looking to replace missing time series values with values from a competitor that's correlated

scipy bootstrap generates input with inconsistent numbers of samples

Hot Network Questions

Questions tagged [scikit-learn]

Related Tags