2
$\begingroup$
import numpy as np import pandas as pd from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.feature_selection import SelectFromModel from scipy.io import arff data = arff.loadarff("C:\\Users\\manib\\Desktop\\Python Job\\Project Work\\Breast\\Breast.arff") df = pd.DataFrame(data[0]) df.head() df["Class"].value_counts() X = df.iloc[:,:24481].values y = df.iloc[:, -1].values from sklearn import preprocessing label_encoder = preprocessing.LabelEncoder() y=y.astype('str') y= label_encoder.fit_transform(y) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0) sel = SelectFromModel(RandomForestClassifier(n_estimators = 100)) sel.fit(X_train, y_train) sel.get_support() selected_feat= X_train.columns[(sel.get_support())] len(selected_feat) print(selected_feat) 
$\endgroup$

    2 Answers 2

    3
    $\begingroup$

    The problem is that train_test_split(X, y, ...) returns numpy arrays and not pandas dataframes. Numpy arrays have no attribute named columns

    If you want to see what features SelectFromModel kept, you need to substitute X_train (which is a numpy.array) with X which is a pandas.DataFrame.

    selected_feat= X.columns[(sel.get_support())] 

    This will return a list of the columns kept by the feature selector.

    If you wanted to see how many features were kept you can just run this:

    sel.get_support().sum() # by default this will count 'True' as 1 and 'False' as 0 
    $\endgroup$
      2
      $\begingroup$

      because this :

      X = df.iloc[:,:24481].values y = df.iloc[:, -1].values 

      you should remove .values or make extra X_col, y_col like that

      X_col = df.iloc[:,:24481] y_col = df.iloc[:, -1] 
      $\endgroup$

        Start asking to get answers

        Find the answer to your question by asking.

        Ask question

        Explore related questions

        See similar questions with these tags.