Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
About ONNX 
3.
Optimizing Models with ONNX
3.1.
Required Libraries
3.2.
Converting Keras Model to ONNX Model
3.3.
ONNX Model Inference
4.
Comparison Between Keras and ONNX Model
4.1.
Loading Time of Model
4.2.
Inference Time of Model
5.
Frequently Asked Questions
5.1.
How to benchmark the Keras model that is CPU-optimized?
5.2.
Can Keras models that are GPU-accelerated be used on CPU-based systems?
5.3.
How to optimize my Keras model for CPU-based deployments?
6.
Conclusion
Last Updated: Mar 27, 2024
Hard

Optimizing Models for CPU-based Deployments in Keras

Author Ayush Mishra
0 upvote
Leveraging ChatGPT - GenAI as a Microsoft Data Expert
Speaker
Prerita Agarwal
Data Specialist @
23 Jul, 2024 @ 01:30 PM

Introduction

Model optimization is the process of improving a machine learning model's efficiency, effectiveness, and use of resources. Optimization aims to develop accurate and efficient models while using the same amount of time and computational resources as before, if not less. 

Optimizing Models for CPU-based Deployments in Keras

In this blog, we will discuss Optimizing Models for CPU-based Deployments in Keras using ONNX. Let’s start going!

About ONNX 

ONNX stands for Open Neural Network Exchange. It is an open standard to represent and exchange deep learning models across various platforms and frameworks. Developers can exchange models between frameworks without making significant changes, allowing interchange between various deep learning packages. 

It is simple to deploy models using ONNX on various hardware accelerators and inference engines, making it a valuable tool for scalable and effective machine learning deployments. 

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Optimizing Models with ONNX

In this section of “Optimizing Models for CPU-based Deployments in Keras,” we will discuss the required libraries, the conversion of Keras to the ONNX model, and the inference of the ONNX model.

Required Libraries

The required libraries for Optimizing models are:-

  • Numpy: A essential Python package for computing numbers is called NumPy. It stands for "Numerical Python" and is an essential tool for carrying out mathematical and numerical operations quickly and effectively.
     
  • TensorFlow: It is one of the most popular and often used libraries for building and enhancing deep learning models. 
     
  • Keras: The open-source Keras deep learning API is developed on top of TensorFlow, Microsoft Cognitive Toolkit (CNTK), and Theano. It is a high-level deep-learning API written in Python.
     
  • Keras2onnx: Keras models can be converted to the ONNX (Open Neural Network Exchange) format using the Python library keras2onnx.
     
  • Onnxruntime: The Microsoft-developed ONNX Runtime is an open-source, high-performance runtime engine specially made for using machine learning models in the ONNX format.

 

Note: Users can install all libraries using the pip command

Converting Keras Model to ONNX Model

In this section of “Optimizing Models for CPU-based Deployments in Keras,” we will convert Keras Model to ONNX Model using the below-given code.

Code

import onnx # Import ONNX
import keras2onnx # Import Keras-ONNX
from keras.models import load_model
from tensorflow import keras # Importing Keras

 # Loading the Keras model
model = load_model('./model-resnet50-final.h5')

 # Converting the model via keras2onnx library
onnx_model = keras2onnx.convert_keras(model, model.name)

# Saving the model in ONNX format
onnx.save_model(onnx_model, 'resnet50_v1.onnx') 


Explanation

In the above code, we are importing all the libraries, such as ONNX, Keras ONNX, and Model, to convert the Keras Model into the ONNX model.

After running the above code, resnet50_v.onnx is produced. This is how Keras models may be converted to ONNX format.

ONNX Model Inference

Inference time is needed to run the input data through the trained model just once to get the predictions or outputs. Let’s check the inference of the top 5 predictions.

Code

import time
import onnxruntime
from tensorflow import keras
import numpy as np

# Load the models

sess = onnxruntime.InferenceSession('./resnet50_v1.onnx')

# Define the image size
IMG_SIZE = 224
loop_count = 10

# Define a class list
class_list = ['10', '11', '12', '13', '14', '15', '16', '17', '18', '19',
              '1', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29',
              '2', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39',
              '3', '40', '41', '42', '4', '5', '6', '7', '8', '9']

# Preprocess the image
img = keras.preprocessing.image.load_img('./57_right.jpeg', target_size=(IMG_SIZE, IMG_SIZE))
input_image = keras.preprocessing.image.img_to_array(img)
input_image = np.expand_dims(input_image, axis=0)  # Expanding the dimensions (IMG_SIZE, IMG_SIZE, 3) -> (1, IMG_SIZE, IMG_SIZE, 3)

input_image = keras.applications.resnet50.preprocess_input(input_image)

input_image = input_image.astype(np.float32)  # Convert to float32 data type
input_image = np.transpose(input_image, [0, 3, 1, 2])  # Transpose to (batch_size, 3, 224, 224)

# Repeat the input image to match batch size
input_image = np.repeat(input_image, loop_count, axis=0)
input_image = input_image if isinstance(input_image, list) else [input_image]
feed = dict([(input.name, input_image[n]) for n, input in enumerate(sess.get_inputs())])

prediction_onnx = sess.run(None, feed)[0] # Run predictions
prediction = np.squeeze(prediction_onnx)
top_index = prediction.argsort()[::-1][:5] # Sorting the top-5 predictions
for i in top_index:
    print('    {:.2f}  {}'.format(prediction[i], class_list[i]))


Output

Output

Explanation

The code loads a pre-trained ResNet-50 model in ONNX format, preprocesses an input image, runs inference on the image multiple times to simulate a batch, and prints the top 5 predicted classes with their probabilities.

Comparison Between Keras and ONNX Model

In this section, we will compare Keras Model with ONNX Model.

Loading Time of Model

The module size of both the Keras and ONNX Models is near about 98 MB. Firstly, we will compare the loading time of both models.

Code

import time
from tensorflow import keras # Importing Keras
start_time = time.time()
# Loading the Keras model
keras_model = keras.models.load_model('./model-resnet50-final.h5') 
print("Loading Time of Kerad Model is  %s second." %(time.time() - start_time))


Output

Output

The loading time of the Keras model of size 98 MB is 8.4487 seconds. Now let us see the loading time of the ONNX-optimized model. 

Code

import time
import onnxruntime
start_time = time.time()

sess = onnxruntime.InferenceSession('model-resnet50-final.onnx')
print("ONNX model loading tine with %s seconds." %(time.time() - start_time))


Output

Output

The loading time of the ONNX Model is 1.532 seconds which is nearly five times less than the Keras Model. So, ONNX Model is preferred over Keras Model.

Inference Time of Model

Let’s compare the inference time of the Keras and ONNX models.

Code

import time
import onnxruntime
from tensorflow import keras
import numpy as np

# Load the models
keras_model = keras.models.load_model('./model-resnet50-final.h5')
sess = onnxruntime.InferenceSession('./resnet50_v1.onnx')


# Define the image size
IMG_SIZE = 224
loop_count = 10


# Define a class list
class_list = ['10', '11', '12', '13', '14', '15', '16', '17', '18', '19',
              '1', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29',
              '2', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39',
              '3', '40', '41', '42', '4', '5', '6', '7', '8', '9']


# Preprocess the image
img = keras.preprocessing.image.load_img('./57_right.jpeg', target_size=(IMG_SIZE, IMG_SIZE))
input_image = keras.preprocessing.image.img_to_array(img)
input_image = np.expand_dims(input_image, axis=0)  # Expanding the dimensions (IMG_SIZE, IMG_SIZE, 3) -> (1, IMG_SIZE, IMG_SIZE, 3)


# ONNX prediction
start_time = time.time()
for x in range(loop_count):
    pred = keras_model.predict(input_image)[0]
print("ONNX Inferences time with %s second." % ((time.time() - start_time) / loop_count))


# Keras prediction
input_image = keras.applications.resnet50.preprocess_input(input_image)


# Keras prediction
input_image = input_image.astype(np.float32)  # Convert to float32 data type
input_image = np.transpose(input_image, [0, 3, 1, 2])  # Transpose to (batch_size, 3, 224, 224)


# Repeat the input image to match batch size
input_image = np.repeat(input_image, loop_count, axis=0)
input_image = input_image if isinstance(input_image, list) else [input_image]
feed = dict([(input.name, input_image[n]) for n, input in enumerate(sess.get_inputs())])


start_time = time.time()
for x in range(loop_count):
    prediction_onnx = sess.run(None, feed)[0]
print("Keras inference time with %s seconds." % ((time.time() - start_time) / loop_count))


Output

Output

So, from the above output, we can see that the ONNX inference time is smaller than the Keras inference time. So, we can conclude ONNX model is faster than Keras Model.

Frequently Asked Questions

How to benchmark the Keras model that is CPU-optimized?

To benchmark your CPU-optimized Keras model, you can evaluate the inference time for relevant data samples on your target CPU. For a more precise measurement, record the beginning and ending times of the inference process using Python's time module and average those times across several runs.

Can Keras models that are GPU-accelerated be used on CPU-based systems?

On CPU-based platforms, you can employ Keras models that are GPU-accelerated. If GPUs are available, Keras automatically recognizes them and uses them for calculation. However, you can explicitly designate the CPU as the computing backend to optimize for CPU-based deployments.

How to optimize my Keras model for CPU-based deployments?

To run your Keras model on hardware without specialized hardware accelerators like GPUs or TPUs, you must optimize your model for CPU-based deployments. Many edge devices, embedded systems, and cloud instances execute machine learning models only on CPUs. 

Conclusion

In this blog, we have discussed Optimizing Models for CPU-based Deployments in Keras We have gone through the ONNX model and its comparison with Keras Model using the resnet50 dataset.

We hope this blog has helped you to gain knowledge of Optimizing Models for CPU-based Deployments in Keras. Do not stop learning! We recommend you read some of our related articles to Optimizing Models for CPU-based Deployments in Keras: 


Refer to our Guided Path to upskill yourself in DSACompetitive ProgrammingJavaScriptSystem Design, and many more! If you want to test your competency in coding, you may check out the mock test series and participate in the contests hosted on Coding Ninjas Studio!

But suppose you have just started your learning process and are looking for questions from tech giants like Amazon, Microsoft, Uber, etc. For placement preparations, you must look at the problemsinterview experiences, and interview bundles.

We wish you Good Luck! 

Happy Learning!

Live masterclass