Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Model optimization is the process of improving a machine learning model's efficiency, effectiveness, and use of resources. Optimization aims to develop accurate and efficient models while using the same amount of time and computational resources as before, if not less.
In this blog, we will discuss Optimizing Models for CPU-based Deployments in Keras using ONNX. Let’s start going!
About ONNX
ONNX stands for Open Neural Network Exchange. It is an open standard to represent and exchange deep learning models across various platforms and frameworks. Developers can exchange models between frameworks without making significant changes, allowing interchange between various deep learning packages.
It is simple to deploy models using ONNX on various hardware accelerators and inference engines, making it a valuable tool for scalable and effective machine learning deployments.
Optimizing Models with ONNX
In this section of “Optimizing Models for CPU-based Deployments in Keras,” we will discuss the required libraries, the conversion of Keras to the ONNX model, and the inference of the ONNX model.
Required Libraries
The required libraries for Optimizing models are:-
Numpy: A essential Python package for computing numbers is called NumPy. It stands for "Numerical Python" and is an essential tool for carrying out mathematical and numerical operations quickly and effectively.
TensorFlow: It is one of the most popular and often used libraries for building and enhancing deep learning models.
Keras: The open-source Keras deep learning API is developed on top of TensorFlow, Microsoft Cognitive Toolkit (CNTK), and Theano. It is a high-level deep-learning API written in Python.
Keras2onnx: Keras models can be converted to the ONNX (Open Neural Network Exchange) format using the Python library keras2onnx.
Onnxruntime: The Microsoft-developed ONNX Runtime is an open-source, high-performance runtime engine specially made for using machine learning models in the ONNX format.
Note: Users can install all libraries using the pip command
Converting Keras Model to ONNX Model
In this section of “Optimizing Models for CPU-based Deployments in Keras,” we will convert Keras Model to ONNX Model using the below-given code.
Code
import onnx # Import ONNX
import keras2onnx # Import Keras-ONNX
from keras.models import load_model
from tensorflow import keras # Importing Keras
# Loading the Keras model
model = load_model('./model-resnet50-final.h5')
# Converting the model via keras2onnx library
onnx_model = keras2onnx.convert_keras(model, model.name)
# Saving the model in ONNX format
onnx.save_model(onnx_model, 'resnet50_v1.onnx')
You can also try this code with Online Python Compiler
In the above code, we are importing all the libraries, such as ONNX, Keras ONNX, and Model, to convert the Keras Model into the ONNX model.
After running the above code, resnet50_v.onnx is produced. This is how Keras models may be converted to ONNX format.
ONNX Model Inference
Inference time is needed to run the input data through the trained model just once to get the predictions or outputs. Let’s check the inference of the top 5 predictions.
Code
import time
import onnxruntime
from tensorflow import keras
import numpy as np
# Load the models
sess = onnxruntime.InferenceSession('./resnet50_v1.onnx')
# Define the image size
IMG_SIZE = 224
loop_count = 10
# Define a class list
class_list = ['10', '11', '12', '13', '14', '15', '16', '17', '18', '19',
'1', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29',
'2', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39',
'3', '40', '41', '42', '4', '5', '6', '7', '8', '9']
# Preprocess the image
img = keras.preprocessing.image.load_img('./57_right.jpeg', target_size=(IMG_SIZE, IMG_SIZE))
input_image = keras.preprocessing.image.img_to_array(img)
input_image = np.expand_dims(input_image, axis=0) # Expanding the dimensions (IMG_SIZE, IMG_SIZE, 3) -> (1, IMG_SIZE, IMG_SIZE, 3)
input_image = keras.applications.resnet50.preprocess_input(input_image)
input_image = input_image.astype(np.float32) # Convert to float32 data type
input_image = np.transpose(input_image, [0, 3, 1, 2]) # Transpose to (batch_size, 3, 224, 224)
# Repeat the input image to match batch size
input_image = np.repeat(input_image, loop_count, axis=0)
input_image = input_image if isinstance(input_image, list) else [input_image]
feed = dict([(input.name, input_image[n]) for n, input in enumerate(sess.get_inputs())])
prediction_onnx = sess.run(None, feed)[0] # Run predictions
prediction = np.squeeze(prediction_onnx)
top_index = prediction.argsort()[::-1][:5] # Sorting the top-5 predictions
for i in top_index:
print(' {:.2f} {}'.format(prediction[i], class_list[i]))
You can also try this code with Online Python Compiler
The code loads a pre-trained ResNet-50 model in ONNX format, preprocesses an input image, runs inference on the image multiple times to simulate a batch, and prints the top 5 predicted classes with their probabilities.
Comparison Between Keras and ONNX Model
In this section, we will compare Keras Model with ONNX Model.
Loading Time of Model
The module size of both the Keras and ONNX Models is near about 98 MB. Firstly, we will compare the loading time of both models.
Code
import time
from tensorflow import keras # Importing Keras
start_time = time.time()
# Loading the Keras model
keras_model = keras.models.load_model('./model-resnet50-final.h5')
print("Loading Time of Kerad Model is %s second." %(time.time() - start_time))
You can also try this code with Online Python Compiler
The loading time of the ONNX Model is 1.532 seconds which is nearly five times less than the Keras Model. So, ONNX Model is preferred over Keras Model.
Inference Time of Model
Let’s compare the inference time of the Keras and ONNX models.
Code
import time
import onnxruntime
from tensorflow import keras
import numpy as np
# Load the models
keras_model = keras.models.load_model('./model-resnet50-final.h5')
sess = onnxruntime.InferenceSession('./resnet50_v1.onnx')
# Define the image size
IMG_SIZE = 224
loop_count = 10
# Define a class list
class_list = ['10', '11', '12', '13', '14', '15', '16', '17', '18', '19',
'1', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29',
'2', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39',
'3', '40', '41', '42', '4', '5', '6', '7', '8', '9']
# Preprocess the image
img = keras.preprocessing.image.load_img('./57_right.jpeg', target_size=(IMG_SIZE, IMG_SIZE))
input_image = keras.preprocessing.image.img_to_array(img)
input_image = np.expand_dims(input_image, axis=0) # Expanding the dimensions (IMG_SIZE, IMG_SIZE, 3) -> (1, IMG_SIZE, IMG_SIZE, 3)
# ONNX prediction
start_time = time.time()
for x in range(loop_count):
pred = keras_model.predict(input_image)[0]
print("ONNX Inferences time with %s second." % ((time.time() - start_time) / loop_count))
# Keras prediction
input_image = keras.applications.resnet50.preprocess_input(input_image)
# Keras prediction
input_image = input_image.astype(np.float32) # Convert to float32 data type
input_image = np.transpose(input_image, [0, 3, 1, 2]) # Transpose to (batch_size, 3, 224, 224)
# Repeat the input image to match batch size
input_image = np.repeat(input_image, loop_count, axis=0)
input_image = input_image if isinstance(input_image, list) else [input_image]
feed = dict([(input.name, input_image[n]) for n, input in enumerate(sess.get_inputs())])
start_time = time.time()
for x in range(loop_count):
prediction_onnx = sess.run(None, feed)[0]
print("Keras inference time with %s seconds." % ((time.time() - start_time) / loop_count))
You can also try this code with Online Python Compiler
So, from the above output, we can see that the ONNX inference time is smaller than the Keras inference time. So, we can conclude ONNX model is faster than Keras Model.
Frequently Asked Questions
How to benchmark the Keras model that is CPU-optimized?
To benchmark your CPU-optimized Keras model, you can evaluate the inference time for relevant data samples on your target CPU. For a more precise measurement, record the beginning and ending times of the inference process using Python's time module and average those times across several runs.
Can Keras models that are GPU-accelerated be used on CPU-based systems?
On CPU-based platforms, you can employ Keras models that are GPU-accelerated. If GPUs are available, Keras automatically recognizes them and uses them for calculation. However, you can explicitly designate the CPU as the computing backend to optimize for CPU-based deployments.
How to optimize my Keras model for CPU-based deployments?
To run your Keras model on hardware without specialized hardware accelerators like GPUs or TPUs, you must optimize your model for CPU-based deployments. Many edge devices, embedded systems, and cloud instances execute machine learning models only on CPUs.
Conclusion
In this blog, we have discussed Optimizing Models for CPU-based Deployments in Keras We have gone through the ONNX model and its comparison with Keras Model using the resnet50 dataset.
We hope this blog has helped you to gain knowledge of Optimizing Models for CPU-based Deployments in Keras. Do not stop learning! We recommend you read some of our related articles to Optimizing Models for CPU-based Deployments in Keras:
But suppose you have just started your learning process and are looking for questions from tech giants like Amazon, Microsoft, Uber, etc. For placement preparations, you must look at theproblems, interview experiences,andinterview bundles.