Questions tagged [outlier]
For questions regarding outliers or unusual points in the data.
228 questions
1vote
0answers
18views
Rolling z-score and normalizing
I am using a rolling window z-score method to flag if a record is an outlier. Is it necessary to first normalize the values of the desired feature before computing the rolling z-score?
0votes
0answers
11views
Anomaly detection time in time-series for drops
I am looking into different statistical methods for determining a decrease in a numeric "count" feature across a time-series dataset. The dataset is relatively small (about 50 records), and ...
9votes
3answers
2kviews
Regression model R2 drops when I remove outliers: is that even possible?
I'm analyzing how outliers in my dataset of size 8x8000 affect regression models. I have three scenarios: raw dataset (with outliers), Winsorized dataset (2% of the extreme outliers adjusted), and ...
0votes
1answer
45views
Is normalization required before outlier detection?
When working with machine learning or data preprocessing, the order of operations is crucial for accurate results. One common question is: Should normalization or standardization be applied before ...
0votes
0answers
16views
Which input features do I need to drop after examining variance inflation factor (VIF)?
For example, I got the following VIF factor result (weekday_ is one-hot encoding): ...
0votes
0answers
22views
How to determine outliers based on a regression logarithmic-scaled?
I'm facing a problem were I'd like to detect outliers from a data collection. The goal is to be able to identify outliers from a variable Y based on its relation with the variable X. To do that, I did:...
0votes
0answers
21views
using PCA reconstruction to detect outliers
i have a banking customer for whom i am implementing a pilot. It deals with outlier detection in specific accounts. Now the number of transactions in these accounts, on a daily basis, number in their ...
0votes
1answer
34views
huge outliers in small dataset
I have a small dataset that has 66 samples and 19 features. It is a numerical and tabular dataset. The goal is to predict a value according to these 19 features. The data is about a medical physics ...
0votes
0answers
12views
DeLing with small medical dataset
I am a medical student currently working on a dataset related to medical physics fields extracted from medical physics images. My dataset has 66 rows and 22 columns. The goal is to predict the next 5 ...
0votes
0answers
17views
How important is outlier clearing
When I'm doing data pre-processing , I always handle outlier, whether its using mean, median , and sometimes deleting it. But i realize that sometimes handling it just makes the accuracy lower, so i ...
0votes
0answers
81views
Confused with Isolation Forest
Let say, I have the anomaly detection (unsupervised learning) dataset with 10 observations (two features). The datasets is like below: After executing the model, following are the results (anomalies ...
0votes
1answer
91views
How to identify outliers on a box and whisker plot that seems to be compressed?
I have plotted box plots for the features of an ML problem, to identify outliers. I have scaled the data using a MinMaxScaler so that the scaled data is in the range [0,1]. For some columns, the two ...
0votes
1answer
43views
can we use tanh activation function to detect outliers?
Can we use tanh activation function to detect outliers ? Does my image below true for dataset outliers (after training model with tanh activation function) ?
1vote
0answers
39views
Outlier detection with elliptic envelope - unexpected error
I am trying to detect outliers with sklearn.covariance.EllipticEnvelope for a single variable, but it throws an unexpected error. Here is an example the reproduces ...
0votes
1answer
617views
Min-Max Scaling more sensitive to outliers than 'Simple Feature Scaling'?
I am confused as to the pros and cons of two different approaches to normalization: Min-Max Scaling, and what the lecturer in the course I am taking refers to as 'Simple Feature Scaling'. The latter ...