1
$\begingroup$

I tried to build a Python class, CustomStackingClassifier(), to implement the Stacking method in ensemble machine learning. In this implementation, the output of the base classifiers is set to be the predicted probabilities, and StratifiedKFold is used for model training. The input matrix for the meta-classifier has dimensions (samples, models * classes).

This code essentially replicates the functionality of sklearn.ensemble.StackingClassifier() manually. However, after testing it with the wine dataset and comparing the results between the two methods, I found discrepancies. Despite spending a lot of time on it, I could not pinpoint the issue. I would greatly appreciate any help or insights from the community. Thank you so much!

I hope to clarify whether there is a logical issue with the CustomStackingClassifier(). If there is a problem, I would appreciate guidance and suggestions for corrections. If the implementation is correct, why does it show result differences compared to sklearn.ensemble.StackingClassifier()?

The code is as follows, I really need help, please:

class CustomStackingClassifier(BaseEstimator, ClassifierMixin): def __init__(self, base_classifiers, meta_classifier, n_splits=5): """ :param base_classifiers: list of estimators :param meta_classifier: final_estimator :param n_splits: cv """ self.base_classifiers = base_classifiers self.meta_classifier = meta_classifier self.n_splits = n_splits def fit(self, X, y): """ :param X: train data :param y: train label """ n_samples = X.shape[0] n_classifiers = len(self.base_classifiers) n_classes = len(np.unique(y)) # Get the number of categories base_probabilities = np.zeros((n_samples, n_classifiers * n_classes)) # Used to store the predicted probabilities of the base classifier # Setting up cross validation by StratifiedKFold, consistent with StackingClassifier kf = StratifiedKFold(n_splits=self.n_splits, shuffle=False, random_state=None) # reset index of data X_re_index = X.reset_index(drop=True) y_re_index = y.reset_index(drop=True) # Train each base classifier and generate prediction probabilities for i, (name, clf) in enumerate(self.base_classifiers): fold_probabilities = np.zeros((n_samples, n_classes)) # Train and predict for each fold for train_index, val_index in kf.split(X_re_index,y_re_index): X_train, X_val = X_re_index.iloc[train_index], X_re_index.iloc[val_index] y_train, y_val = y_re_index.iloc[train_index], y_re_index.iloc[val_index] # Train base classifier clf.fit(X_train, y_train) # Predict probabilities on validation set fold_probabilities[val_index] = clf.predict_proba(X_val) # Save the predicted probabilities of each base classifier into base_probabilities base_probabilities[:, i * n_classes: (i + 1) * n_classes] = fold_probabilities # train meta classifier self.meta_classifier.fit(base_probabilities, y_re_index) return self def predict(self, X): """ :param X: test data """ # get the predicted probabilities of each base classifier base_probabilities = np.column_stack([clf.predict_proba(X) for name, clf in self.base_classifiers]) # predict the label using the meta classifier return self.meta_classifier.predict(base_probabilities) def predict_proba(self, X): """ :param X: test data """ # get the predicted probabilities of each base classifier base_probabilities = np.column_stack([clf.predict_proba(X) for name, clf in self.base_classifiers]) # predict the label probabilities using the meta classifier return self.meta_classifier.predict_proba(base_probabilities) base_models = [ ('svm', SVC(probability=True,random_state=42)), ('knn', KNeighborsClassifier()), ('rf', RandomForestClassifier(random_state=42)), ] meta_model = xgb.XGBClassifier(verbosity=0,random_state=42) # 1. load wine dataset iris = load_wine() X = iris.data y = pd.Series(iris.target) # data spilt X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=41) # data preprocessing scaler = preprocessing.StandardScaler().fit(X_train) X_train_scaled = pd.DataFrame(scaler.transform(X_train), columns=iris.feature_names) X_test_scaled = pd.DataFrame(scaler.transform(X_test), columns=iris.feature_names) # manual implementation of Stacking Model stacking_model = CustomStackingClassifier (base_classifiers=base_models, meta_classifier=meta_model, n_splits=5) # accuracy: 0.944 # Stacking Model method in sklearn stacking_model = StackingClassifier(estimators=base_models, final_estimator=meta_model,cv=5,stack_method='auto',verbose=1) # accuracy: 0.972 stacking_model.fit(X_train_scaled, y_train) # 4. Evaluate the model y_pred = stacking_model.predict(X_test_scaled) print('Evaluating results of Stacking model :') print('accuracy:', accuracy_score(y_test, y_pred)) print('precision:', precision_score(y_test, y_pred, average='macro')) print('recall:', recall_score(y_test, y_pred, average='macro')) print('F1-score:', f1_score(y_test, y_pred, average='macro')) 
$\endgroup$
3
  • 1
    $\begingroup$As answering this question will probably take time, better make it easier for the reader by showing the code you used to test your class and the results you got when comparing both method.$\endgroup$
    – rehaqds
    CommentedJan 2 at 8:14
  • 1
    $\begingroup$Thanks for your comments, I have added the code that can be used for testing verification$\endgroup$
    – CM_Li
    CommentedJan 2 at 9:04
  • 1
    $\begingroup$It would be good to add specifically what kind of discrepancies you found between your implementation and the scikit-learn implementation.$\endgroup$
    – Oxbowerce
    CommentedJan 2 at 17:33

1 Answer 1

0
$\begingroup$

I think the main issue was how you were fitting the base estimators.

Your first two steps are the same as the sklearn implementation:

  1. Fit base estimators using CV, and record the out-of-fold validation probabilities

  2. Fit the meta estimator on those validation probabilities

You then re-use the base estimators from step 1 for the final prediction step:

  1. For prediction, get the probabilities from the base estimators and hand them over to the meta estimator for a final prediction.

In your predict step, you would have inadvertently been invoking predict_proba on base estimators trained only on the last fold.

The sklearn implementation does it differently:

During training, the estimators are fitted on the whole training data X_train. They will be used when calling predict or predict_proba. To generalize and avoid over-fitting, the final_estimator is trained on out-samples using sklearn.model_selection.cross_val_predict internally.

So their approach is:

(Steps 1 and 2 are identical to yours)

  1. Fit base estimators on all of the data (i.e. no CV), and then set them aside (not used again until prediction). This gives you base estimators trained on all of the data that'll be used during predict.

  2. For prediction, get the probabilities from the base estimators and hand them over to the meta estimator for a final prediction (like your prediction step, but using base estimators trained on all of the data).

The difference to your predict step is that you would have inadvertently been using base estimators trained only on a single fold rather than the full dataset.

I amended that part, and also made other changes including some standard estimator checks, and using a trailing underscore to denote fitted attributes. You could also replace the CV loop with cross_val_predict, as done in the sklearn implementation.

Comparing the two implementations:

-------------[sklearn implementation]------------- Evaluating results of Stacking model: accuracy: 0.74375 precision: 0.7491127887469351 recall: 0.7430302705789962 F1-score: 0.7442490607858389 -------------[custom implementation]-------------- Evaluating results of Stacking model: accuracy: 0.74375 precision: 0.7491127887469351 recall: 0.7430302705789962 F1-score: 0.7442490607858389 

To ensure the results are identical, it would be more correct to compare the probability outputs rather than the overall model accuracies.

Modified implementation and testing

The modified custom implementation:

from sklearn.base import ( BaseEstimator, ClassifierMixin, check_is_fitted, check_array, check_X_y, clone ) from sklearn.model_selection import StratifiedKFold class CustomStackingClassifier(BaseEstimator, ClassifierMixin): def __init__(self, base_classifiers, meta_classifier, n_splits=5): """ :param base_classifiers: list of estimators :param meta_classifier: final_estimator :param n_splits: cv """ self.base_classifiers = base_classifiers self.meta_classifier = meta_classifier self.n_splits = n_splits def fit(self, X, y): """ :param X: train data :param y: train label """ # # Input checks, convert to ndarray, and set some standard attributes # X, y = check_X_y(X, y) if hasattr(X, 'columns'): self.feature_names_in_ = np.array(X.columns, dtype='object') self.n_features_in_ = X.shape[1] n_samples = len(X) n_classifiers = len(self.base_classifiers) n_classes = len(np.unique(y)) # Get the number of categories # Used to store the predicted probabilities of the base classifier base_probabilities = np.zeros((n_samples, n_classifiers, n_classes)) # Setting up cross validation by StratifiedKFold, consistent with StackingClassifier kf = StratifiedKFold(n_splits=self.n_splits) # Train each base classifier and generate prediction probabilities self.base_classifiers_ = [(name, clone(clf)) for name, clf in self.base_classifiers] #Base classifiers are fitted on the whole of X # These are fit, and then set aside until .predict/.predict_proba [clf.fit(X, y) for (name, clf) in self.base_classifiers_] #final_estimator/meta_estimator/blender is # fitted only on out-of-fold predictions from each base classifier for clf_idx, (name, clf) in enumerate(self.base_classifiers): # Get out-of-fold probas from base classifiers # You could use cross_val_predict() to replace this block, as done in StackingClassifier for train_index, val_index in kf.split(X, y): X_train, y_train = [arr[train_index] for arr in (X, y)] X_val, _ = [arr[val_index] for arr in (X, y)] # Predict probabilities on validation set and store base_probabilities[val_index, clf_idx, :] = ( clone(clf) #unfitted base classifier .fit(X_train, y_train) #fit on train split .predict_proba(X_val) #get the out-of-fold probas ) # train meta classifier on (X=out-of-fold probas, y) self.meta_classifier_ = clone(self.meta_classifier).fit( base_probabilities.reshape(-1, n_classifiers * n_classes), y ) return self def get_base_probas(self, X): return np.column_stack([ clf.predict_proba(X) for name, clf in self.base_classifiers_ ]) def predict(self, X): """ :param X: test data """ check_is_fitted(self) X = check_array(X) # get the predicted probabilities of each base classifier base_probabilities = self.get_base_probas(X) # predict the label using the meta classifier return self.meta_classifier_.predict(base_probabilities) def predict_proba(self, X): """ :param X: test data """ check_is_fitted(self) X = check_array(X) # get the predicted probabilities of each base classifier base_probabilities = self.get_base_probas(X) # predict the label probabilities using the meta classifier return self.meta_classifier_.predict_proba(base_probabilities) 

Comparing scores as an initial test:

from sklearn.svm import SVC from sklearn.ensemble import ( RandomForestClassifier, AdaBoostClassifier, StackingClassifier ) import xgboost as xgb from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split from sklearn.metrics import ( accuracy_score, precision_score, recall_score, f1_score ) # # Dataset for testing # import pandas as pd from sklearn.datasets import make_classification X, y = make_classification( n_samples=800, n_classes=3, n_informative=3, random_state=0 ) X = pd.DataFrame(X) y = pd.Series(y) X_train, X_val, y_train, y_val = train_test_split( X, y, test_size=0.2, random_state=0 ) # data preprocessing scaler = StandardScaler().set_output(transform='pandas').fit(X_train) #set to pandas output X_train_scaled, X_val_scaled = [scaler.transform(x) for x in [X_train, X_val]] # # Base models # base_models = [ #Use regularised models so the accuracy etc is < 100% # Makes it easier to see if implementation are the same ('svm', SVC(probability=True, C=0.01, random_state=0)), ('adab', AdaBoostClassifier(random_state=0, n_estimators=3)), ('rf', RandomForestClassifier(max_depth=1, random_state=0)), ] meta_model = xgb.XGBClassifier(verbosity=0, random_state=0) # # Evaluate StackingClassifier and custom class # for use_custom in [False, True]: if not use_custom: print('[sklearn implementation]'.center(50, '-')) # Stacking Model method in sklearn stacking_model = StackingClassifier( estimators=base_models, final_estimator=meta_model, cv=5, stack_method='predict_proba', ) else: print('[custom implementation]'.center(50, '-')) # manual implementation of Stacking Model stacking_model = CustomStackingClassifier( base_classifiers=base_models, meta_classifier=meta_model, n_splits=5 ) #Fit the selected implementation stacking_model.fit(X_train_scaled, y_train) # 4. Evaluate the model y_pred = stacking_model.predict(X_val_scaled) print('Evaluating results of Stacking model:') print(' accuracy:', accuracy_score(y_val, y_pred)) print(' precision:', precision_score(y_val, y_pred, average='macro')) print(' recall:', recall_score(y_val, y_pred, average='macro')) print(' F1-score:', f1_score(y_val, y_pred, average='macro')) print() 
$\endgroup$
2
  • 1
    $\begingroup$Thank you for your valuable feedback. I have tested the code on multiple datasets, and both implementation methods (custom and sklearn) achieve consistent outputs. I appreciate you taking the time to respond, and wish you all the best in life!$\endgroup$
    – CM_Li
    CommentedJan 3 at 2:23
  • $\begingroup$My pleasure @CM_Li, I'm glad it's working as needed. Thanks for your kind words. I wish you all the best too👍$\endgroup$CommentedJan 3 at 13:26

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.