Table of contents
1.
Introduction
2.
AWS Inferentia
2.1.
TENSORFLOW RESNET-50 MODEL FOR IMAGE CLASSIFICATION
2.1.1.
Setup environment
2.1.2.
Running on CPU
2.1.3.
Compiling and running inferentia
2.1.4.
Performance optimization
3.
FAQs
4.
Key takeaways
Last Updated: Mar 27, 2024
Easy

AWS Inferentia

Author Adya Tiwari
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

AWS Inferentia is intended to give superior execution deduction in the cloud, drive down the all-out cost of induction, and simplify engineers to incorporate AI into their business applications.

AWS's vision is to make profound learning unavoidable for ordinary engineers and democratize admittance to a state-of-the-art framework accessible at a minimal expense, pay more only as costs arise utilization model. AWS Inferentia is Amazon's most memorable custom silicon intended to speed up profound learning jobs and is essential for a drawn-out procedure to follow through on this vision. AWS Inferentia is intended to give superior execution induction in the cloud, drive down the all-out cost of surmising, and make it simple for engineers to incorporate AI into their business applications.

AWS Inferentia

The AWS Neuron programming improvement pack (SDK) comprises a compiler, run-time, and profiling devices that enhance the exhibition of jobs for AWS Inferentia. Engineers can convey complex brain network models that have been fabricated and prepared on well-known systems, for example, Tensorflow, PyTorch, and MXNet, and send them on AWS Inferentia-based Amazon EC2 Inf1 occasions. You can keep on utilizing similar ML structures you use today and move your models onto Inf1 with minor code changes and without a tie-in to merchant explicit arrangements.

AWS Inferentia-based Inf1 occasions is the principal custom AI (ML) chip planned by AWS streamlined for ML induction. Inferentia conveys up to 80% lower cost per deduction and 2.3X higher throughput than practically identical current age GPU-based Amazon EC2 cases. Utilizing Inf1 occasions, clients can run large-scale ML surmising applications, for example, picture acknowledgment, discourse acknowledgment, normal language handling, personalization, and extortion identification at the least expense in the cloud. Inf1 examples are likewise accessible for use with Amazon SageMaker, Amazon Batch, Amazon EKS, and Amazon ECS.

TENSORFLOW RESNET-50 MODEL FOR IMAGE CLASSIFICATION

Setup environment

Set up Inf1 example as an improved environment for ordering pre-prepared AI models and an organizational climate for running the accumulated models.

You can choose Deep Learning AMI, Ubuntu 18.x, or Amazon Linux 2-based AMI.

Stage 1. Send off an Inf1 Instance as a turn of events and organization climate

Stage 2. Set up pre-introduced Neuron advancement climate

Programming apparatuses and bundles are refreshed habitually, so run the update cycle. Before restoring the fortunes, stop and uninstall any current Neuron runtime 1.0 daemon (neuron-rtd)

# Stop and uninstall existing Neuron runtime 1.0 daemon
sudo systemctl stop neuron-rtd
sudo apt remove aws-neuron-runtime aws-neuron-runtime-base -y


# Update OS packages
sudo apt-get update -y


# Update OS headers
sudo apt-get install linux-headers-$(uname -r) -y


# Update Neuron Driver
sudo apt-get install aws-neuron-dkms -y


# Update Neuron Tools
sudo apt-get install aws-neuron-tools -y


# Optional: Update Neuron TensorFlow model server
sudo apt-get install tensorflow-model-server-neuron -y


Send off the pre-introduced Tensorflow Neuron improvement climate on the Deep Learning AMI and run the updated interaction.


# Activate TensorFlow 1
source activate aws_neuron_tensorflow_p36


# Set Pip repository  to point to the Neuron repository
pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com


#Update Neuron TensorFlow 1
pip install --upgrade tensorflow-neuron==1.15.5.* neuron-cc


# Update Neuron TensorBoard
pip install --upgrade tensorboard-plugin-neuron
Check the installed Tensorflow Neuron development environment.
pip list | grep neuron

Running on CPU

Step 1. Create a python script for inference

import os
import time
import shutil
import numpy as np
import tensorflow as tf
import tensorflow.compat.v1.keras as keras
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions

# Instantiate Keras ResNet50 model
keras.backend.set_learning_phase(0)
tf.keras.backend.set_image_data_format('channels_last')
model = ResNet50(weights='imagenet')

# Export SavedModel
model_dir = 'resnet50'
shutil.rmtree(model_dir, ignore_errors=True)

tf.saved_model.simple_save(
    session            = keras.backend.get_session(),
    export_dir         = model_dir,
    inputs             = {'input': model.inputs[0]},
    outputs            = {'output': model.outputs[0]})

# Create input from image
img_sgl = image.load_img('kitten_small.jpg', target_size=(224, 224))
img_arr = image.img_to_array(img_sgl)
img_arr2 = np.expand_dims(img_arr, axis=0)
img_arr3 = preprocess_input(img_arr2)

# Run inference, Display results
preds = model.predict(img_arr3)
print(decode_predictions(preds, top=5)[0])

Step 2. Prepare input image data

Step 3. Run inference script

python infer_resnet50_cpu.py
You can also try this code with Online Python Compiler
Run Code

Compiling and running inferentia

Accumulate the pre-prepared Keras ResNet-50 model sent out in SavedModel design for the Inferentia chip and perform picture grouping on the Neuron center. The result arrangement of the incorporated model is likewise the Saved Model.

Stage 1. Make a python script for ordering a model

Make a python script named compile_resnet50.py with the accompanying substance:

import shutil
import tensorflow.neuron as tfn

model_dir = 'resnet50'

# Prepare export directory (old one removed)
compiled_model_dir = 'resnet50_neuron'
shutil.rmtree(compiled_model_dir, ignore_errors=True)

# Compile using Neuron
tfn.saved_model.compile(model_dir, compiled_model_dir)



Stage 2. Run the assemblage script
Run the accumulation script, which will take ~2 minutes on inf1.2xlarge.
time python compile_resnet50.py


Stage 3. Make a python script for derivation
Make a python script named infer_resnet50_neuron.py with the accompanying substance. The content burden the model which was assembled in Step 2.
import os
import time
import shutil
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions

# Load model
compiled_model_dir = 'resnet50_neuron'
predictor_inferentia = tf.contrib.predictor.from_saved_model(compiled_model_dir)

# Create input from image
img_sgl = image.load_img('kitten_small.jpg', target_size=(224, 224))
img_arr = image.img_to_array(img_sgl)
img_arr2 = np.expand_dims(img_arr, axis=0)
img_arr3 = preprocess_input(img_arr2)

# Run inference, Display results
model_feed_dict={'input': img_arr3}
infa_rslts = predictor_inferentia(model_feed_dict)
print(decode_predictions(infa_rslts["output"], top=5)[0])
You can also try this code with Online Python Compiler
Run Code

Stage 4. Run derivation script

Run the surmising script infer_resnet50_neuron.py for running induction on Neuron centers.

python infer_resnet50_neuron.py
You can also try this code with Online Python Compiler
Run Code

Performance optimization

By altering the surmising scripts made in area c., we see the deduction execution running on Inferentia.

Stage 1. Change deduction script

Make a python script named infer_resnet50_perf.py with the accompanying substance.

import os
import time
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions

# added for utilizing 4 neuron cores
os.environ['NEURON_RT_VISIBLE_CORES'] = '0-3'

# Load models
compiled_model_dir = 'resnet50_neuron'
predictor_inferentia = tf.contrib.predictor.from_saved_model(compiled_model_dir)

# Create input from image
img_sgl = image.load_img('kitten_small.jpg', target_size=(224, 224))
img_arr = image.img_to_array(img_sgl)
img_arr2 = np.expand_dims(img_arr, axis=0)
img_arr3 = preprocess_input(img_arr2)

model_feed_dict={'input': img_arr3}

# warmup
infa_rslts = predictor_inferentia(model_feed_dict)

num_inferences = 10000

# Run inference on Neuron Cores, Display results
start = time.time()
for _ in range(num_inferences):
    infa_rslts = predictor_inferentia(model_feed_dict)
elapsed_time = time.time() - start

print('By Neuron Core - num_inferences:{:>6}[images], elapsed_time:{:6.2f}[sec], Throughput:{:8.2f}[images/sec]'.format(num_inferences, elapsed_time, num_inferences / elapsed_time))
You can also try this code with Online Python Compiler
Run Code

Stage 2. Run induction script

Run the induction script infer_resnet50_perf.py for running surmising on CPUs and Neuron centers separately.

python infer_resnet50_perf.py

Stage 3. View Neuron center use

Run neuron-top order in one more terminal when derivation on Neuron Core is dynamic to show Neuron center and the memory utilization.

Neuron-top

Stage 4. Adjust derivation script

Make a python script named infer_resnet50_perf2.py with the accompanying substance:

import os
import time
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions
from concurrent import futures

# added for utilizing 4 neuron cores
os.environ['NEURON_RT_VISIBLE_CORES'] = '0-3'

# Load models
compiled_model_dir = 'resnet50_neuron'
predictor_inferentia = tf.contrib.predictor.from_saved_model(compiled_model_dir)

# Create input from image
img_sgl = image.load_img('kitten_small.jpg', target_size=(224, 224))
img_arr = image.img_to_array(img_sgl)
img_arr2 = np.expand_dims(img_arr, axis=0)
img_arr3 = preprocess_input(img_arr2)

model_feed_dict={'input': img_arr3}

# warmup
infa_rslts = predictor_inferentia(model_feed_dict)

num_inferences = 10000

# Run inference on Neuron Cores, Display results
start = time.time()
with futures.ThreadPoolExecutor(8) as exe:
    fut_list = []
    for _ in range (num_inferences):
        fut = exe.submit(predictor_inferentia, model_feed_dict)
        fut_list.append(fut)
    for fut in fut_list:
        infa_rslts = fut.result()
elapsed_time = time.time() - start

print('By Neuron Core - num_inferences:{:>6}[images], elapsed_time:{:6.2f}[sec], Throughput:{:8.2f}[images/sec]'.format(num_inferences, elapsed_time, num_inferences / elapsed_time))
You can also try this code with Online Python Compiler
Run Code

Stage 5. Run altered surmising script

Run altered surmising script infer_resnet50_perf2.py.

python infer_resnet50_perf2.py

Stage 6. View Neuron center utilization

As we did in Step 1, run neuron-top order to show Neuron center utilization.

neuron-top

FAQs

1. Is AWS Inferentia ARM-based?

The universally helpful AWS-created server processor with a 64-digit Arm drives the EC2 A1 occasion family.

2. What is Inferentia?

AWS Inferentia is an AI induction chip specially crafted by AWS to convey high throughput, low inactivity deduction execution for a very minimal price.

3. What is the Inferentia chip?

AWS Inferentia is a custom AI chip planned by AWS that you can use for superior execution surmising forecasts.

4. Does induction require GPU?

You train your model on GPUs, so consider GPUs for surmising deployment. GPUs accelerate profound picking up preparing, and deduction is only the forward pass of your brain network that is now sped up on GPU.

Key takeaways

AWS Inferentia is intended to give elite execution surmising in the cloud, drive down the total expense of induction, and simplify designers to coordinate AI into their business applications. AWS Inferentia is Amazon's most memorable custom silicon intended to speed up profound learning responsibilities and is essential for a drawn-out technique to follow through on this vision.

Hey Ninjas! Don’t stop here; check out Coding Ninjas for Python, more unique courses and guided paths. Also, try Coding Ninjas Studio for more exciting articles, interview experiences, and excellent Machine Learning and Python problems. 

Happy Learning!

Live masterclass