Skip to content

This toolbox offers 13 wrapper feature selection methods (PSO, GA, GWO, HHO, BA, WOA, and etc.) with examples. It is simple and easy to implement.

License

Notifications You must be signed in to change notification settings

JingweiToo/Wrapper-Feature-Selection-Toolbox-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jx-WFST : Wrapper Feature Selection Toolbox

LicenseGitHub release


"Toward Talent Scientist: Sharing and Learning Together" --- Jingwei Too


Wheel

Introduction

  • This toolbox offers 13 wrapper feature selection methods
  • The Demo_PSO provides an example of how to apply PSO on benchmark dataset
  • Source code of these methods are written based on pseudocode & paper

Usage

The main function jfs is adopted to perform feature selection. You may switch the algorithm by changing the pso in from FS.pso import jfs to other abbreviations

  • If you wish to use particle swarm optimization ( PSO ) then you may write
from FS.pso import jfs 
  • If you want to use differential evolution ( DE ) then you may write
from FS.de import jfs 

Input

  • feat : feature vector matrix ( Instance x Features )
  • label : label matrix ( Instance x 1 )
  • opts : parameter settings
    • N : number of solutions / population size ( for all methods )
    • T : maximum number of iterations ( for all methods )
    • k : k-value in k-nearest neighbor

Output

  • Acc : accuracy of validation model
  • fmdl : feature selection model ( It contains several results )
    • sf : index of selected features
    • nf : number of selected features
    • c : convergence curve

Example 1 : Particle Swarm Optimization ( PSO )

import numpy as np import pandas as pd from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import train_test_split from FS.pso import jfs # change this to switch algorithm import matplotlib.pyplot as plt # load data data = pd.read_csv('ionosphere.csv') data = data.values feat = np.asarray(data[:, 0:-1]) # feature vector label = np.asarray(data[:, -1]) # label vector # split data into train & validation (70 -- 30) xtrain, xtest, ytrain, ytest = train_test_split(feat, label, test_size=0.3, stratify=label) fold = {'xt':xtrain, 'yt':ytrain, 'xv':xtest, 'yv':ytest} # parameter k = 5 # k-value in KNN N = 10 # number of particles T = 100 # maximum number of iterations w = 0.9 c1 = 2 c2 = 2 opts = {'k':k, 'fold':fold, 'N':N, 'T':T, 'w':w, 'c1':c1, 'c2':c2} # perform feature selection fmdl = jfs(feat, label, opts) sf = fmdl['sf'] # model with selected features num_train = np.size(xtrain, 0) num_valid = np.size(xtest, 0) x_train = xtrain[:, sf] y_train = ytrain.reshape(num_train) # Solve bug x_valid = xtest[:, sf] y_valid = ytest.reshape(num_valid) # Solve bug mdl = KNeighborsClassifier(n_neighbors = k) mdl.fit(x_train, y_train) # accuracy y_pred = mdl.predict(x_valid) Acc = np.sum(y_valid == y_pred) / num_valid print("Accuracy:", 100 * Acc) # number of selected features num_feat = fmdl['nf'] print("Feature Size:", num_feat) # plot convergence curve = fmdl['c'] curve = curve.reshape(np.size(curve,1)) x = np.arange(0, opts['T'], 1.0) + 1.0 fig, ax = plt.subplots() ax.plot(x, curve, 'o-') ax.set_xlabel('Number of Iterations') ax.set_ylabel('Fitness') ax.set_title('PSO') ax.grid() plt.show() 

Example 2 : Genetic Algorithm ( GA )

import numpy as np import pandas as pd from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import train_test_split from FS.ga import jfs # change this to switch algorithm import matplotlib.pyplot as plt # load data data = pd.read_csv('ionosphere.csv') data = data.values feat = np.asarray(data[:, 0:-1]) label = np.asarray(data[:, -1]) # split data into train & validation (70 -- 30) xtrain, xtest, ytrain, ytest = train_test_split(feat, label, test_size=0.3, stratify=label) fold = {'xt':xtrain, 'yt':ytrain, 'xv':xtest, 'yv':ytest} # parameter k = 5 # k-value in KNN N = 10 # number of chromosomes T = 100 # maximum number of generations CR = 0.8 MR = 0.01 opts = {'k':k, 'fold':fold, 'N':N, 'T':T, 'CR':CR, 'MR':MR} # perform feature selection fmdl = jfs(feat, label, opts) sf = fmdl['sf'] # model with selected features num_train = np.size(xtrain, 0) num_valid = np.size(xtest, 0) x_train = xtrain[:, sf] y_train = ytrain.reshape(num_train) # Solve bug x_valid = xtest[:, sf] y_valid = ytest.reshape(num_valid) # Solve bug mdl = KNeighborsClassifier(n_neighbors = k) mdl.fit(x_train, y_train) # accuracy y_pred = mdl.predict(x_valid) Acc = np.sum(y_valid == y_pred) / num_valid print("Accuracy:", 100 * Acc) # number of selected features num_feat = fmdl['nf'] print("Feature Size:", num_feat) # plot convergence curve = fmdl['c'] curve = curve.reshape(np.size(curve,1)) x = np.arange(0, opts['T'], 1.0) + 1.0 fig, ax = plt.subplots() ax.plot(x, curve, 'o-') ax.set_xlabel('Number of Iterations') ax.set_ylabel('Fitness') ax.set_title('GA') ax.grid() plt.show() 

Requirement

  • Python 3
  • Numpy
  • Pandas
  • Scikit-learn
  • Matplotlib

List of available wrapper feature selection methods

  • Note that the methods are altered so that they can be used in feature selection tasks
  • The extra parameters represent the parameter(s) other than population size and maximum number of iterations
  • Click on the name of method to view how to set the extra parameter(s)
  • Use the opts to set the specific parameters
  • If you do not set extra parameters then the algorithm will use default setting in here
No.AbbreviationNameYearExtra Parameters
13hhoHarris Hawk Optimization2019No
12ssaSalp Swarm Algorithm2017No
11woaWhale Optimization Algorithm2016Yes
10scaSine Cosine Algorithm2016Yes
09jaJaya Algorithm2016No
08gwoGrey Wolf Optimizer2014No
07fpaFlower Pollination Algorithm2012Yes
06baBat Algorithm2010Yes
05faFirefly Algorithm2010Yes
04csCuckoo Search Algorithm2009Yes
03deDifferential Evolution1997Yes
02psoParticle Swarm Optimization1995Yes
01gaGenetic Algorithm-Yes
close