Questions tagged [scikit-learn]
scikit-learn is a popular machine learning package for Python that has simple and efficient tools for predictive data analysis. Topics include classification, regression, clustering, dimensionality reduction, model selection, and preprocessing.
2,303 questions
2votes
0answers
9views
Low Accurecy from Geospatial Random forest ML modeling problem - Training Exported from qGIS, SCP
I am doing a geospatial assessment integrated with ML modeling. The problem is the very low accuracy percentage, as more training features increases, it gets lower. What could be he solution to such a ...
0votes
0answers
12views
Isolation Forest sample size
I am using sklearn's Isolation Forest as a model to detect anomalies. My dataset is relatively small, 50 records with only 2-3 features. To prevent any overfitting, what would you recommend to tune ...
4votes
0answers
23views
Trying to train ML model to do regression for US Department of Transportation Kaggle Flights Dataset with 5 million records and 7 features
For a college project for my data science course I am trying to fit a model based on the U.S. DOT's 2015 Kaggle Flight Cancellations dataset, but am not having great luck with model performance (MSE ...
4votes
1answer
54views
Unsupervised Isolation Forrest sklearn hyperparameters
I am using sklearn's IsolationForest for unsupervised anomaly detection task. According to the docs, https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html, there are ...
3votes
1answer
37views
Confirm understanding of decision_function in Isolation Forest sklearn
I am looking to better understand sklearn IsolationForest decision_function. My understanding is that if the metric is closer to -1 then the model is more confident ...
1vote
0answers
19views
How to correctly use RFECV for feature selection in a Scikit-Learn pipeline with a Simple Decision Tree?
I am working on the Kaggle House Price Prediction competition and have built a Scikit-Learn pipeline that includes: Preprocessing (handling missing values, scaling, encoding) Feature Engineering ...
-1votes
0answers
37views
ML model for Career Prediction
I am NOT able to figure out how to make a ML model. I have been chatgpting most of it and understanding the code, I'm doing next to nothing. No matter what code I input, the accuracy is always 0%... ...
2votes
0answers
20views
Preprocessing multivalue attributes in a dataframe, similar to Nominal
Description: Input is a CSV file CSV file contains columns of different data types: Ordinal Values, Nominal Values, Numerical Values and Multi Value For the multivalue columns. Minimum is 1, ...
1vote
0answers
32views
Predicting PGA Tour results with Linear Regression
I have curated a dataset from various online sources that contains information about each PGA player's weekly performance/trends. I'm attempting to predict their finishing positions at the next ...
2votes
0answers
36views
Determine best hyperprameteres in GridSearch - Isolation Forest
I have implemented an Isolation Forest algorithm for anomaly detection (unsupervised learning), where I divided my dataset into 1000 subsets, and for each subset, there is one isolation tree. This ...
2votes
1answer
44views
I can't get my R² above 70%
I tried RandomForest, LGBM, Knneighbors, Polynomial Regression as algorithm's and cross-validation, train test split and standard scaler, nothing seem's to get it past the 70% mark. The dataframe has ...
0votes
0answers
25views
Agglomerative clustering classifies 98% of my data in 1 cluster. Why?
I have a JSD distance matrix that I'm trying to cluster. When generating 24 clusters (roughly the amount the shows up on the clustermap), it assigns vast majority of the data as 1 cluster. Weirdly ...
1vote
0answers
40views
OneClassSVM super slow training with poly kernel
In contrast to questions like here, where a slow SVM training results from a high number of samples, I only have around 500 samples. Still, a single training fold (cross-validation) takes several ...
3votes
1answer
31views
Looking to replace missing time series values with values from a competitor that's correlated
I have a dataset of a retailer that has the following attributes Date, Hour, Enters, Exits I have another dataset with the same attributes of a competitor that is correlated with the original dataset ...
1vote
1answer
35views
scipy bootstrap generates input with inconsistent numbers of samples
I have a dataset of 77 samples, and I am using scipy bootstrap to get a confidence interval to estimate the precision. I am baffled to see that it generates input variables with inconsistent numbers ...