AttributeError: 'numpy.ndarray' object has no attribute 'columns'

Question

import numpy as np import pandas as pd from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.feature_selection import SelectFromModel from scipy.io import arff data = arff.loadarff("C:\\Users\\manib\\Desktop\\Python Job\\Project Work\\Breast\\Breast.arff") df = pd.DataFrame(data[0]) df.head() df["Class"].value_counts() X = df.iloc[:,:24481].values y = df.iloc[:, -1].values from sklearn import preprocessing label_encoder = preprocessing.LabelEncoder() y=y.astype('str') y= label_encoder.fit_transform(y) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0) sel = SelectFromModel(RandomForestClassifier(n_estimators = 100)) sel.fit(X_train, y_train) sel.get_support() selected_feat= X_train.columns[(sel.get_support())] len(selected_feat) print(selected_feat)

Djib2011 · Accepted Answer · 2019-06-21 12:14:28Z

The problem is that train_test_split(X, y, ...) returns numpy arrays and not pandas dataframes. Numpy arrays have no attribute named columns

If you want to see what features SelectFromModel kept, you need to substitute X_train (which is a numpy.array) with X which is a pandas.DataFrame.

selected_feat= X.columns[(sel.get_support())]

This will return a list of the columns kept by the feature selector.

If you wanted to see how many features were kept you can just run this:

sel.get_support().sum() # by default this will count 'True' as 1 and 'False' as 0

Stephen Rauch · Accepted Answer · 2020-04-18 22:09:59Z

because this :

X = df.iloc[:,:24481].values y = df.iloc[:, -1].values

you should remove .values or make extra X_col, y_col like that

X_col = df.iloc[:,:24481] y_col = df.iloc[:, -1]

Stack Exchange Network

AttributeError: 'numpy.ndarray' object has no attribute 'columns'

2 Answers 2

Hot Network Questions

AttributeError: 'numpy.ndarray' object has no attribute 'columns'

2 Answers 2

Related

Hot Network Questions