Note
Go to the end to download the full example code.
Convert a pipeline with a LightGBM classifier¶
sklearn-onnx only converts scikit-learn models into ONNX but many libraries implement scikit-learn API so that their models can be included in a scikit-learn pipeline. This example considers a pipeline including a LightGBM model. sklearn-onnx can convert the whole pipeline as long as it knows the converter associated to a LGBMClassifier. Let’s see how to do it.
Train a LightGBM classifier¶
importonnxruntimeasrtfromskl2onnximportconvert_sklearn,update_registered_converterfromskl2onnx.common.shape_calculatorimport(calculate_linear_classifier_output_shapes,)fromonnxmltools.convert.lightgbm.operator_converters.LightGbmimport(convert_lightgbm,)fromskl2onnx.common.data_typesimportFloatTensorTypeimportnumpyfromsklearn.datasetsimportload_irisfromsklearn.pipelineimportPipelinefromsklearn.preprocessingimportStandardScalerfromlightgbmimportLGBMClassifierdata=load_iris()X=data.data[:,:2]y=data.targetind=numpy.arange(X.shape[0])numpy.random.shuffle(ind)X=X[ind,:].copy()y=y[ind].copy()pipe=Pipeline([("scaler",StandardScaler()),("lgbm",LGBMClassifier(n_estimators=3))])pipe.fit(X,y)
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.018751 seconds. You can set `force_col_wise=true` to remove the overhead. [LightGBM] [Info] Total Bins 47 [LightGBM] [Info] Number of data points in the train set: 150, number of used features: 2 [LightGBM] [Info] Start training from score -1.098612 [LightGBM] [Info] Start training from score -1.098612 [LightGBM] [Info] Start training from score -1.098612 [LightGBM] [Warning] No further splits with positive gain, best gain: -inf [LightGBM] [Warning] No further splits with positive gain, best gain: -inf [LightGBM] [Warning] No further splits with positive gain, best gain: -inf [LightGBM] [Warning] No further splits with positive gain, best gain: -inf [LightGBM] [Warning] No further splits with positive gain, best gain: -inf [LightGBM] [Warning] No further splits with positive gain, best gain: -inf [LightGBM] [Warning] No further splits with positive gain, best gain: -inf [LightGBM] [Warning] No further splits with positive gain, best gain: -inf [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
Register the converter for LGBMClassifier¶
The converter is implemented in onnxmltools: onnxmltools…LightGbm.py. and the shape calculator: onnxmltools…Classifier.py.
update_registered_converter(LGBMClassifier,"LightGbmLGBMClassifier",calculate_linear_classifier_output_shapes,convert_lightgbm,options={"nocl":[True,False],"zipmap":[True,False,"columns"]},)
Convert again¶
model_onnx=convert_sklearn(pipe,"pipeline_lightgbm",[("input",FloatTensorType([None,2]))],target_opset={"":12,"ai.onnx.ml":2},)# And save.withopen("pipeline_lightgbm.onnx","wb")asf:f.write(model_onnx.SerializeToString())
Compare the predictions¶
Predictions with LightGbm.
print("predict",pipe.predict(X[:5]))print("predict_proba",pipe.predict_proba(X[:1]))
predict [2 0 0 0 1] predict_proba [[0.22814003 0.31657806 0.45528191]]
Predictions with onnxruntime.
sess=rt.InferenceSession("pipeline_lightgbm.onnx",providers=["CPUExecutionProvider"])pred_onx=sess.run(None,{"input":X[:5].astype(numpy.float32)})print("predict",pred_onx[0])print("predict_proba",pred_onx[1][:1])
predict [2 0 0 0 1] predict_proba [{0: 0.22814001142978668, 1: 0.3165780305862427, 2: 0.45528194308280945}]
Total running time of the script: (0 minutes 0.102 seconds)