PyTorch Model Inference using ONNX and Caffe2

After the release of PyTorch in October 2016 by Facebook, it quickly gained popularity because of its developer friendliness. With its transparent and Pythonic interface, it is great for research and for rapid prototyping. Debugging your code and experimenting with model architecture can be done very easily in PyTorch.

When it came to production, though, Google’s Tensorflow was ahead. With Tensorflow Serving deployment of machine learning models was very easy.

That changed in May 2018 when PyTorch integrated with Caffe2 and got its full production pipeline. This is the pipeline used at Facebook. They train the model using PyTorch and deploy it using Caffe2.

Note: Caffe2 should not be confused with Caffe. They are two completely different frameworks. Caffe used to be very popular 5 years back, but now it seems to have fallen out of favor.

This article is part of the following series:c

PyTorch for Beginners
PyTorch for Beginners: Basics
PyTorch for Beginners: Image Classification using Pre-trained models
Image Classification using Transfer Learning in PyTorch
PyTorch Model Inference using ONNX and Caffe2
PyTorch for Beginners: Semantic Segmentation using torchvision
Object Detection
Instance Segmentation

Facebook’s Deep Learning Pipeline

“Move fast and break things.”
Facebook Motto.

For developers and engineers, moving fast requires having an easy to use development framework in a language that is easy to work with. PyTorch provides that solution.

However, in production Facebook needs to operate at an unimaginable scale. Computational efficiency and performance, and not developer happiness are the goals in such environments. Facebook’s answer to the performance problem was Caffe2 — a blazingly fast framework written in C++ for use in production.

Caffe2 was introduced by Facebook in April 2017. It is versatile and Caffe2 models can be deployed on many platforms, including mobile. Facebook applications in Caffe2 has been deployed on over a billion iOS and Android mobile phones.

Facebook maintains interoperability between PyTorch and Caffe2.

Recently in May 2019, with the release of PyTorch 1.1 support for TensorBoard was added which is very useful for visualization and debugging.

Open Neural Network Exchange (ONNX)

Open Neural Network Exchange (ONNX) is an open format that lets users move deep learning models between different frameworks. This open format was initially proposed by Facebook and Microsoft but is now a widely accepted industry standard.

For the deployment of PyTorch models, the most common way is to convert them into an ONNX format and then deploy the exported ONNX model using Caffe2.

In our last post, we described how to train an image classifier and do inference in PyTorch. The PyTorch models are saved as .pt or .pth files. In this post, we will explain how we can convert a trained PyTorch model to an ONNX model and do inference in Caffe2.

We provide you an environment file, so that you can make your own virtual environment easily. We also provide you a Jupyter notebook so that you can replicate your own results and use the code for your inference in your own projects.

We will also examine the similarities and differences in the inference results using the PyTorch model and the ONNX model.

We use PyTorch 1.1.0 and ONNX 1.5.0 with Python3.7 for this work.

Environment Set Up

You will need to have conda installed in your system. Then we set up our virtual environment by running the following command.

conda env create -f environment.yml

This will install the required packages into a virtual environment called pytorch_inference. We then activate it with the following command.

conda activate pytorch_inference

The Pytorch model we will be working with, can be downloaded from here. The model was trained using PyTorch 1.1.0, and our current virtual environment for inference also has PyTorch 1.1.0. We can now run the notebook to convert the PyTorch model to ONNX and do inference using the ONNX model in Caffe2.

PyTorch to ONNX

Let us see how to export the PyTorch .pt model to ONNX. Below is a snippet doing so.

# Export an ONNX model from a PyTorch .pt model

import torch.onnx

# Loading the input PyTorch model and mapping the tensors to CPU
device = torch.device('cpu')
model = torch.load('animals_caltech.pt', map_location=device)

# Generate a dummy input that is consistent with the network's arhitecture
dummy_input = torch.randn(1, 3, 224, 224)

# Export into an ONNX model using the PyTorch model and the dummy input
torch.onnx.export(model, dummy_input, "animals_caltech.onnx")

Download Code To easily follow along this tutorial, please download code by clicking on the button below. It's FREE!

Click here to download the source code to this post

Since we will be doing the inference in CPU using Caffe2, we set the device to ‘cpu’, and load the PyTorch model mapping the tensors to CPU. We then need to make a dummy input that fits the network structure’s input. Finally the export function is a one liner, which takes in the PyTorch model, the dummy input and the target ONNX file.

Note that, for installing Caffe2, currently prebuilt binaries are available without CUDA support for Mac, Ubuntu and CentOS. All other platforms or CUDA support requires compiling from source. In this work, we have tested CPU inference in MacOS Mojave and Ubuntu 18.04. If you want to use your ONNX models with CUDA, you will need to build Caffe2 from source.

Inference in Caffe2 using ONNX

Next, we can now deploy our ONNX model in a variety of devices and do inference in Caffe2.

First make sure you have created the our desired environment with Caffe2 to run the ONNX model, and you are able to import caffe2.python.onnx.backend. Next you can download our ONNX model from here. It is exported using PyTorch 1.1.0. Or, if you could successfully export your own ONNX model, feel free to use it. We then run the code below to do inference.

# Inference in Caffe2 using the ONNX model
import caffe2.python.onnx.backend as backend
import onnx

# First load the onnx model
model = onnx.load("animals_caltech.onnx")

# Prepare the backend
rep = backend.prepare(model, device="CPU")

# Transform the image
transform = transforms.Compose([
        transforms.Resize(size=224),
        transforms.CenterCrop(size=224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])
    ])

# Load and show the image
test_image_name = "giraffe.jpg"
test_image = Image.open(test_image_name)
display(test_image)

# Apply the transformations to the input image and convert it into a tensor
test_image_tensor = transform(test_image)

# Make the input image ready to be input as a batch of size 1
test_image_tensor = test_image_tensor.view(1, 3, 224, 224)

# Convert the tensor to numpy array
np_image = test_image_tensor.numpy()

# Pass the numpy array to run through the ONNX model
outputs = rep.run(np_image.astype(np.float32))

# Dictionary with class name and index
idx_to_class = {0: 'bear', 1: 'chimp', 2: 'giraffe', 3: 'gorilla', 4: 'llama', 5: 'ostrich', 6: 'porcupine', 7: 'skunk', 8: 'triceratops', 9: 'zebra'}

ps = torch.exp(torch.from_numpy(outputs[0]))
topk, topclass = ps.topk(10, dim=1)
for i in range(10):
    print("Prediction", '{:2d}'.format(i+1), ":", '{:11}'.format(idx_to_class[topclass.cpu().numpy()[0][i]]), ", Class Id : ", topclass[0][i].numpy(), " Score: ", topk.cpu().detach().numpy()[0][i])

We load the ONNX model and pass it to Caffe2 along with the device information. It needs to be CPU in our case, since we exported it for CPU while generating the ONNX model.

We can then read the input test image, resize it such that the smaller size is 224, preserving the aspect ratio while resizing. The center 224×224 image is cropped out and converted into a tensor. This step converts the values into the range of 0-1. It is then normalized relative to the ImageNet color cluster using the ImageNet mean and standard deviation. It is done as input[channel] =(input[channel] – mean[channel]) / std[channel].

The image tensor is then made to look like a batch of 1 image, since the network architecture inputs batches of images.

The tensor is then converted into a Float32 numpy array and run through the loaded model in Caffe2.

The outputs of the model are in the form of log probabilities. We take their exponents to get the actual scores, sort the scores and assign the class with the highest score as our prediction for the input test image.

We print the scores for all the 10 classes, in descending order, so that we can compare the scores computed while doing inference using the PyTorch models directly (as in our earlier post), to those computed with inference using ONNX model in Caffe2.

Here are the results of inference in PyTorch using the PyTorch .pt model and the inference in Caffe2 using the .onnx model:

Prediction#	Predicted Class	.pt Model Score	.onnx Model Score
1	Giraffe	0.9941407	0.99414057
2	Ostrich	0.0034540326	0.0034540326
3	Zebra	0.0013424822	0.0013424809
4	Llama	0.00086722436	0.00086722267
5	Bear	7.583614e-05	7.583578e-05
6	Triceratops	6.406967e-05	6.4069485e-05
7	Porcupine	1.9247866e-05	1.9247791e-05
8	Gorilla	1.8487663e-05	1.8487592e-05
9	Chimp	1.6452992e-05	1.6452961e-05
10	Skunk	1.5678854e-06	1.5678779e-06

As we can see above, the scores of the two models are very close with negligible numerical differences.

Inference Time on CPU

ONNX models have been widely deployed in Caffe2 runtimes in mobile and large scale applications at Facebook as well as other companies. Over the last year the PyTorch team has been trying to get the production and performance advantages of Caffe2 into PyTorch.

As a test, we measured the inference time on 407 test images in two different scenarios.

Case 1: Inference using the PyTorch 1.1.0 .pt model in PyTorch 1.1.0.
Case 2: Inference using the exported ONNX models in Caffe2

Both the above tests were run in CPU in Ubuntu 18.04.

The mean per image inference time on the 407 test images was 0.173 seconds using the PyTorch 1.1.0 model and 0.131 seconds using the ONNX model in Caffe2. So even though Caffe2 has already proved its cross platform deployment capabilities and high performance, PyTorch is slowly getting close to Caffe2 regarding performance.