I am trying to detect outliers with sklearn.covariance.EllipticEnvelope
for a single variable, but it throws an unexpected error. Here is an example the reproduces the error:
import pandas as pd import numpy as np from sklearn.covariance import EllipticEnvelope data = [1,2,2,2,2,2,3] df = pd.DataFrame(data, columns=['value']) def OutlierDetection(data): X = data[['value']] detector = EllipticEnvelope(support_fraction=1, contamination=0.1) model = detector.fit(X) prediction = model.predict(X) score = model.decision_function(X) return data.assign(score=score, prediction=prediction) OutlierDetection(df)
I get the following error message:
ValueError: Input contains NaN.
I get the same result if the data is constant (e.g. just 2's), which is not surprising to me. I suspect it may have something to do with how the algorithm is implemented, but I don't really know.
I hope someone can help.