title | description | parent | grand_parent | nav_order |
---|---|---|---|---|
Inference on multiple targets | Maximize performance on CPU, GPU and | Accelerate PyTorch | Tutorials | 1 |
{: .no_toc }
As a developer who wants to deploy a PyTorch or ONNX model and maximize performance and hardware flexibility, you can leverage ONNX Runtime to optimally execute your model on your hardware platform.
In this tutorial, you'll learn:
- how to use the PyTorch ResNet-50 model for image classification
- convert to ONNX, and
- deploy to the default CPU, NVIDIA CUDA (GPU), and Intel OpenVINO with ONNX Runtime -- using the same application code to load and execute the inference across hardware platforms.
ONNX was developed as the open-sourced ML model format by Microsoft, Meta, Amazon, and other tech companies to standardize and make it easy to deploy Machine Learning models on various types of hardware. ONNX Runtime was contributed and is maintained by Microsoft to optimize ONNX model performance over frameworks like PyTorch, Tensorflow, and more. When trained with the ImageNet dataset, the ResNet-50 model is commonly used for image classification.
This tutorial demonstrates how to run an ONNX model on CPU, GPU, and Intel hardware with OpenVINO and ONNX Runtime, using Microsoft Azure Machine Learning.
Your environment should have curl installed.
The onnxruntime-gpu library needs access to a NVIDIA CUDA accelerator in your device or compute cluster, but running on just CPU works for the CPU and OpenVINO-CPU demos.
Ensure that you have an image to inference on. For this tutorial, we have a "cat.jpg" image located in the same directory as the Notebook files.
In Azure Notebook Terminal or AnaConda prompt window, run the following commands to create your 3 environments for CPU, GPU, and/or OpenVINO (differences are bolded).
CPU
conda create -n cpu_env_demo python=3.8conda activate cpu_env_democonda install -c anaconda ipykernelconda install -c conda-forge ipywidgetspython -m ipykernel install --user --name=cpu_env_demojupyter notebook
GPU
conda create -n gpu_env_demo python=3.8conda activate gpu_env_demo conda install -c anaconda ipykernelconda install -c conda-forge ipywidgetspython -m ipykernel install --user --name=gpu_env_demo jupyter notebook
OpenVINO
conda create -n openvino_env_demo python=3.8conda activate openvino_env_demo conda install -c anaconda ipykernelconda install -c conda-forge ipywidgetspython -m ipykernel install --user --name=openvino_env_demopython -m pip install --upgrade pippip install openvino
In the first code cell, install the necessary libraries with the following code snippets (differences are bolded).
CPU + GPU
importsysifsys.platformin ['linux', 'win32']: # Linux or Windows !{sys.executable} -mpipinstalltorch==1.9.0+cu111torchvision==0.10.0+cu111torchaudio===0.9.0-fhttps://download.pytorch.org/whl/torch_stable.htmlelse: # Macprint("PyTorch 1.9 MacOS Binaries do not support CUDA, install from source instead") !{sys.executable} -mpipinstallonnxruntime-gpuonnxonnxconverter_common==1.8.1pillow
OpenVINO
importsysifsys.platformin ['linux', 'win32']: # Linux or Windows !{sys.executable} -mpipinstalltorch==1.9.0+cu111torchvision==0.10.0+cu111torchaudio===0.9.0-fhttps://download.pytorch.org/whl/torch_stable.htmlelse: # Macprint("PyTorch 1.9 MacOS Binaries do not support CUDA, install from source instead") !{sys.executable} -mpipinstallonnxruntime-openvinoonnxonnxconverter_common==1.8.1pillowimportopenvino.utilsasutilsutils.add_openvino_libs_to_path()
Import necessary libraries to get models and run inference.
fromtorchvisionimportmodels, datasets, transformsasTimporttorchfromPILimportImageimportnumpyasnp
Download a pretrained ResNet-50 model from PyTorch and export to ONNX format.
resnet50=models.resnet50(pretrained=True) # Download ImageNet labels !curl-oimagenet_classes.txthttps://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt# Read the categorieswithopen("imagenet_classes.txt", "r") asf: categories= [s.strip() forsinf.readlines()] # Export the model to ONNXimage_height=224image_width=224x=torch.randn(1, 3, image_height, image_width, requires_grad=True) torch_out=resnet50(x) torch.onnx.export(resnet50, # model being runx, # model input (or a tuple for multiple inputs)"resnet50.onnx", # where to save the model (can be a file or file-like object)export_params=True, # store the trained parameter weights inside the model fileopset_version=12, # the ONNX version to export the model todo_constant_folding=True, # whether to execute constant folding for optimizationinput_names= ['input'], # the model's input namesoutput_names= ['output']) # the model's output names
Sample Output:
% Total % Received % Xferd Average Speed Time Time Time CurrentDload Upload Total Spent Left Speed100 10472 100 10472 0 0 50581 0 --:--:-- --:--:-- --:--:-- 50834
Create preprocessing for the image (ex. cat.jpg) you want to use the model to inference on.
# Pre-processing for ResNet-50 Inferencing, from https://pytorch.org/hub/pytorch_vision_resnet/resnet50.eval() filename='cat.jpg'# change to your filenameinput_image=Image.open(filename) preprocess=T.Compose([ T.Resize(256), T.CenterCrop(224), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]) input_tensor=preprocess(input_image) input_batch=input_tensor.unsqueeze(0) # create a mini-batch as expected by the model# move the input and model to GPU for speed if availableprint("GPU Availability: ", torch.cuda.is_available()) iftorch.cuda.is_available(): input_batch=input_batch.to('cuda') resnet50.to('cuda')
Sample Output:
GPU Availability: False
Inference the model with ONNX Runtime by selecting the appropriate Execution Provider for the environment. If your environment uses CPU, uncomment CPUExecutionProvider, if the environment uses NVIDIA CUDA, uncomment CUDAExecutionProvider, and if the environment uses OpenVINOExecutionProvider, uncomment OpenVINOExecutionProvider – commenting out the other onnxruntime.InferenceSession
lines of code.
# Inference with ONNX Runtimeimportonnxruntimefromonnximportnumpy_helperimporttimesession_fp32=onnxruntime.InferenceSession("resnet50.onnx", providers=['CPUExecutionProvider']) # session_fp32 = onnxruntime.InferenceSession("resnet50.onnx", providers=['CUDAExecutionProvider'])# session_fp32 = onnxruntime.InferenceSession("resnet50.onnx", providers=['OpenVINOExecutionProvider'])defsoftmax(x): """Compute softmax values for each sets of scores in x."""e_x=np.exp(x-np.max(x)) returne_x/e_x.sum() latency= [] defrun_sample(session, image_file, categories, inputs): start=time.time() input_arr=inputs.cpu().detach().numpy() ort_outputs=session.run([], {'input':input_arr})[0] latency.append(time.time() -start) output=ort_outputs.flatten() output=softmax(output) # this is optionaltop5_catid=np.argsort(-output)[:5] forcatidintop5_catid: print(categories[catid], output[catid]) returnort_outputsort_output=run_sample(session_fp32, 'cat.jpg', categories, input_batch) print("ONNX Runtime CPU/GPU/OpenVINO Inference time = {} ms".format(format(sum(latency) *1000/len(latency), '.2f')))
Sample output:
Egyptian cat 0.78605634tabby 0.117310025tiger cat 0.020089425Siamese cat 0.011728076plastic bag 0.0052174763ONNX Runtime CPU Inference time = 32.34 ms
Use PyTorch to benchmark against ONNX Runtime CPU and GPU accuracy and latency.
# Inference with OpenVINOfromopenvino.runtimeimportCoreie=Core() onnx_model_path="./resnet50.onnx"model_onnx=ie.read_model(model=onnx_model_path) compiled_model_onnx=ie.compile_model(model=model_onnx, device_name="CPU") # inferenceoutput_layer=next(iter(compiled_model_onnx.outputs)) latency= [] input_arr=input_batch.detach().numpy() inputs= {'input':input_arr} start=time.time() request=compiled_model_onnx.create_infer_request() output=request.infer(inputs=inputs) outputs=request.get_output_tensor(output_layer.index).datalatency.append(time.time() -start) print("OpenVINO CPU Inference time = {} ms".format(format(sum(latency) *1000/len(latency), '.2f'))) print("***** Verifying correctness *****") foriinrange(2): print('OpenVINO and ONNX Runtime output {} are close:'.format(i), np.allclose(ort_output, outputs, rtol=1e-05, atol=1e-04))
Sample output:
Egyptian cat 0.7820879tabby 0.113261245tiger cat 0.020114701Siamese cat 0.012514038plastic bag 0.0056432663OpenVINO CPU Inference time = 31.83 ms***** Verifying correctness *****PyTorch and ONNX Runtime output 0 are close: TruePyTorch and ONNX Runtime output 1 are close: True
Use OpenVINO to benchmark against ONNX Runtime OpenVINO accuracy and latency.
# Inference with OpenVINOfromopenvino.runtimeimportCoreie=Core() onnx_model_path="./resnet50.onnx"model_onnx=ie.read_model(model=onnx_model_path) compiled_model_onnx=ie.compile_model(model=model_onnx, device_name="CPU") # inferenceoutput_layer=next(iter(compiled_model_onnx.outputs)) latency= [] input_arr=input_batch.detach().numpy() inputs= {'input':input_arr} start=time.time() request=compiled_model_onnx.create_infer_request() output=request.infer(inputs=inputs) outputs=request.get_output_tensor(output_layer.index).datalatency.append(time.time() -start) print("OpenVINO CPU Inference time = {} ms".format(format(sum(latency) *1000/len(latency), '.2f'))) print("***** Verifying correctness *****") foriinrange(2): print('OpenVINO and ONNX Runtime output {} are close:'.format(i), np.allclose(ort_output, outputs, rtol=1e-05, atol=1e-04))
Sample output:
Egyptian cat 0.7820879tabby 0.113261245tiger cat 0.020114701Siamese cat 0.012514038plastic bag 0.0056432663OpenVINO CPU Inference time = 31.83 ms***** Verifying correctness *****PyTorch and ONNX Runtime output 0 are close: TruePyTorch and ONNX Runtime output 1 are close: True
We've demonstrated that ONNX Runtime is an effective way to run your PyTorch or ONNX model on CPU, NVIDIA CUDA (GPU), and Intel OpenVINO (Mobile). ONNX Runtime enables deployment to more types of hardware that can be found on Execution Providers. We'd love to hear your feedback by participating in our ONNX Runtime Github repo.
Watch the video here for more explanation on ResNet-50 Deployment and Flexible Inferencing with the step by step guide.