13
$\begingroup$

I have plotted the feature importances in random forests with scikit-learn. In order to improve the prediction using random forests, how can I use the plot information to remove features? I.e. how to spot whether a feature is useless or even worse decrease of the random forests performance, based on the plot information? The plot is based on the attribute feature_importances_ and I use the classifier sklearn.ensemble.RandomForestClassifier.

I am aware that there exist other techniques for feature selection, but in this question I want to focus on how to use feature feature_importances_.


Examples of such feature importance plots:

enter image description here

enter image description here

$\endgroup$

    1 Answer 1

    14
    $\begingroup$

    You can simply use the feature_importances_ attribute to select the features with the highest importance score. So for example you could use the following function to select the K best features according to importance.

    def selectKImportance(model, X, k=5): return X[:,model.feature_importances_.argsort()[::-1][:k]] 

    Or if you're using a pipeline the following class

    class ImportanceSelect(BaseEstimator, TransformerMixin): def __init__(self, model, n=1): self.model = model self.n = n def fit(self, *args, **kwargs): self.model.fit(*args, **kwargs) return self def transform(self, X): return X[:,self.model.feature_importances_.argsort()[::-1][:self.n]] 

    So for example:

    >>> from sklearn.datasets import load_iris >>> from sklearn.ensemble import RandomForestClassifier >>> iris = load_iris() >>> X = iris.data >>> y = iris.target >>> >>> model = RandomForestClassifier() >>> model.fit(X,y) RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini', max_depth=None, max_features='auto', max_leaf_nodes=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1, oob_score=False, random_state=None, verbose=0, warm_start=False) >>> >>> newX = selectKImportance(model,X,2) >>> newX.shape (150, 2) >>> X.shape (150, 4) 

    And clearly if you wanted to selected based on some other criteria than "top k features" then you can just adjust the functions accordingly.

    $\endgroup$
    2
    • $\begingroup$Thanks David. Any insight on how to choose the threshold above which features are useful? (put aside from removing the least useful feature, running the RF again and see how it impacts the prediction performance)$\endgroup$CommentedAug 4, 2015 at 18:02
    • 1
      $\begingroup$As with most automated feature selection I'd say most people use a tuning grid. But using domain expertise when selecting (and engineering) features is probably the most valuable -- but isn't really automatable.$\endgroup$
      – David
      CommentedAug 4, 2015 at 18:05

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.