Table of contents
1.
Introduction
2.
TorchServe
3.
Features of TorchServe
4.
Steps to Export Models for Serving using TorchServe
4.1.
Install TorchServe
4.2.
Python
4.3.
Python
4.4.
Python
4.5.
Export Model to your Local Machine
4.6.
Python
4.7.
Save the model
4.8.
Python
4.9.
Create JSON Request and Label Mapper file
4.10.
Create the Model Handler
4.11.
Python
4.12.
Python
4.13.
Serving the Model in the Localhost
4.14.
Python
4.15.
Register and Serve the Model
4.16.
Model Handler
4.16.1.
Run the below code for Initialization Function
4.17.
Python
4.17.1.
Run the below code for Preprocessing Function
4.18.
Python
4.18.1.
Run the below code for Inference Function.
4.19.
Python
4.19.1.
Run the below code for Post Processing Function
4.20.
Python
4.21.
Testing
4.22.
Python
5.
Frequently Asked Questions
5.1.
What is TorchServe?
5.2.
How can a PyTorch model be exported for TorchServe?
5.3.
What is TorchScript?
5.4.
How can we start a TorchServe instance?
5.5.
What are the advantages of deploying models with TorchServe?
6.
Conclusion
Last Updated: Feb 5, 2025
Easy

Exporting models for serving using TorchServe

Author Vidhi Sareen
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Do you ever ponder how we can swiftly deploy our machine learning? To address this issue, we can use TorchServe. TorchServe is a production-ready solution for serving and scaling PyTorch models.

Exporting models for serving using TorchServe

 

This article will explain the process of exporting models for serving with TorchServe.

TorchServe

TorchServe

TorchServe enables users to serve and scale PyTorch models in production effortlessly. Users can deploy PyTorch models to cloud environments using this flexible and user-friendly tool. TorchServe wraps a PyTorch deep learning model in REST APIs, allowing seamless integration with web applications and services. Additionally, TorchServe enhances inference performance for large models by supporting batch inference.

Features of TorchServe

Features of TorchServe

TorchServe has many features, some of which are:

  • TorchServe simplifies deploying machine learning models by creating an easy-to-use API for others to interact with.
     
  • TorchServe efficiently handles a large number of user requests. 
     
  • TorchServe helps you to monitor and log model performance, customize settings, and version different models.
     
  • TorchServe also supports containerization for easy deployment.
     
  • TorchServe regularly checks the health of running models and automatically substitutes unhealthy ones.

Steps to Export Models for Serving using TorchServe

There are multiple steps involved in exporting models for Serving in TorchServe. We will go step by step and understand how to export our models effectively.

Install TorchServe

We need to install Java on our computer because TorchServe runs on Java. You can download and install the JDK (Java Development Kit) from Oracle's official website. To confirm Java is installed entirely on your computer, use the following code in the command line.

java -version
output

If you are operating on google cloud, execute the following command to confirm.

  • Python

Python

! java -version
You can also try this code with Online Python Compiler
Run Code
output

After installing Java into your system, you have to install PyTorch. After installing PyTorch on your system, we will install TorchServe and Torch Model. TorchServe and TorchModel are two primary components used to deploy and serve PyTorch models in production environments.

You can install TorchServe and TorchModel by running the following command:

  • Python

Python

pip install torchserve torch-model-archiver torch-workflow-archiver
You can also try this code with Online Python Compiler
Run Code
output

In this, we are using a pre-trained model. Run the following command to install the 'transformers' package. It will allow you to work with various pre-trained models from Hugging Face. We will further use GPUs so that we will install the 'nvgpu' package also.

  • Python

Python

pip install transformers nvgpu
You can also try this code with Online Python Compiler
Run Code
output

Export Model to your Local Machine

Hugging Face delivers a sentiment analysis model called ‘BERTweet’. We must include the tokenizer and model installed on our computer to boost its functionality. Execute the command below to load them:

  • Python

Python

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# tokenizer
tokenizer = AutoTokenizer.from_pretrained("finite automata/between-base-sentiment-analysis")

# model
model = AutoModelForSequenceClassification.from_pretrained("finite automata/between-base-sentiment-analysis")
You can also try this code with Online Python Compiler
Run Code
output


Save the model

We have to save the tokenizer and model in your local machine. To do this, we need to execute the following commands:

  • Python

Python

# save tokenizer
tokenizer.save_pretrained('./my_tokenizer')

# save model
model.save_pretrained('./my_model')
You can also try this code with Online Python Compiler
Run Code


In your notebook, you can see that we have preserved the tokenizer and model under the names 'my_tokenizer' and 'my_model.'

output

Create JSON Request and Label Mapper file

Before deploying the model, we need to create a JSON file. This file will merge when we request the model endpoint. 

output

Let's build an additional JSON file named "index_to_name.json" to direct our model's outcome. While training the model, we generally encounter coded labels. However, using this 'index_to_name.json,' we can effectively convert those coded labels into human-readable strings.

output

Create the Model Handler

The model handler works like a pipeline that transforms the data we acquire through an HTTP request into the desired output. It's responsible for creating predictions using our model. But if we have a specific way of processing the input data for the desired output, we must create our handler. To make our handler, we should follow this good practice: Extend the base handler provided by TorchServe. Create a file named "handler.py." Then, we must run the following command to build our custom handler.

  • Python

Python

class ModelHandler(BaseHandler):
    def initialize(self, context):
        "Initialize function loads the model and the tokenizer
        Args:
            context (context): It is a JSON Object containing information
            pertaining to the model artifacts parameters.
        """
        properties = context.system_properties
        self.manifest = context.manifest
        logger.info(f'Properties: {properties}')
        logger.info(f'Manifest: {self.manifest}')
You can also try this code with Online Python Compiler
Run Code
output

We're creating a new class called ModelHandler, inheriting from BaseHandler. BaseHandler already has essential prediction functions. When creating our handler, we modify these functions: initialize (loading model and tokenizer), preprocess (preparing input), inference (making predictions), and postprocess (transforming output) according to the needs.

Now let's create the initialization function for our handler.

  • Python

Python

class ModelHandler(BaseHandler):

    def initialize(self, context):

        "Initialize function loads the model and the tokenizer

        Args:

            context (context): It is a JSON Object containing information

            pertaining to the model artifacts parameters.

        """
        properties = context.system_properties

        self.manifest = context.manifest

        logger.info(f'Properties: {properties}')

        logger.info(f'Manifest: {self.manifest}')
You can also try this code with Online Python Compiler
Run Code

We have created an initialize function. When we use it, we pass in a 'context,' just a bunch of information about our model. A JSON object stores this information with two main parts: 'system_properties' and 'manifest.' For now, we'll use this 'handler' to work with the model and check what's inside 'context.system_properties' and 'context.manifest' by looking at the logs.

Serving the Model in the Localhost

We must build a file called the '.mar' model archive to operate the model on our computer. This file will stock all the critical elements of the model in one location. We use this file to register our model with TorchServe, allowing us to share the model easily.

Create a folder called 'model_store' and go to the working directory and run the following command:

torch-model-archiver 
--model-name bertweetSentimentAnalysis 
--version 1.0 
--model-file my_model/pytorch_model.bin 
--handler handler.py 
--extra-files "my_model/config.json,my_tokenizer/added_tokens.json,my_tokenizer/bpe.codes,my_tokenizer/special_tokens_map.json,my_tokenizer/tokenizer_config.json,my_tokenizer/vocab.txt,index_to_name.json" 
--export-path model_store

You can get all the data in detail by running the command below. 

  • Python

Python

! torch-model-archiver -h
You can also try this code with Online Python Compiler
Run Code
output

Register and Serve the Model

Once we have developed the model archive, our next step is registering the model into TorchServe. We must conduct the command below to register and serve the model using TorchServe to perform. 

torchserve --start --model-store model_store --models my_model=bertweet_sentiment_analysis.mar --ncs

The model server will start doing the task on our computer. Using the "model-store" option, we will decide where to get the models. To save the models, we'll use "--models MODEL_NAME=<PATH_TO_MAR_FILE>," and we can choose any name for the model.

output

We have three addresses: Inference, Management, and Metrics. These addresses reserve the URLs we operate for different objectives. We use the Inference address to develop predictions from the model. The Management address enables us to manage the models. And the Metrics address allows us to access the model metrics.

Model Handler

We have only defined a small part of the initialization function for our model handler so far. We stopped there and showed you how to serve the model first because it helps build and debug the model handler.

Run the below code for Initialization Function

Note - Try to run the code in GPU if it is available.

  • Python

Python

def initialize(self, context):

    "Initialize function loads the model and the tokenizer
    Args:
        context (context): It is a JSON Object containing information
        pertaining to the model artifacts parameters.
    Raises:
        RuntimeError: Raises the Runtime error when the model or
        tokenizer is missing
    """

    properties = context.system_properties
    self.manifest = context.manifest
    model_dir = properties.get("model_dir")

    # use GPU if available
    self.device = torch.device(
        "cuda:" + str(properties.get("gpu_id"))
        if torch.cuda.is_available() and properties.get("gpu_id") is None
        else "cpu."
    )
    logger.info(f'Using device {self.device}')

    # load the model
    model_file = self.manifest['model']['modelFile']
    model_path = os.path.join(model_dir, model_file)
    if os.path.isfile(model_path):
        self.model = AutoModelForSequenceClassification.from_pretrained(model_dir)
        self.model.to(self.device)
        self.model.eval()
        logger.info('Successfully loaded model from {model_file}')
    else:
        raise RuntimeError('Missing the file')

    # load tokenizer
    self.tokenizer = AutoTokenizer.from_pretrained(model_dir)
    if self.tokenizer is not None:
        logger.info('Successfully loaded tokenizer')
    else:
        raise RuntimeError('Missing tokenizer object')

    # load mapping file
    mapping_file_path = os.path.join(model_dir, 'index_to_name.json')
    if os.path.isfile(mapping_file_path):
        with open(mapping_file_path) as f:
            self.mapping = json.load(f)
        logger.info('Successfully loaded file')
    else:
        logger.warning('Mapping file is missing)
    self.initialized = True
You can also try this code with Online Python Compiler
Run Code

Run the below code for Preprocessing Function

Now, we will use the preprocessing function. This function takes a request, unpacks the information, and preprocesses it.

We can analyze the design of the request object pushed into the preprocess operation by adding a log in our model handler. We'll send a POST request with our sample_input.json file to localhost:8080/predictions/<MODEL_NAME> to achieve this. 

[INFO ] Request object: [{'body': {'input': ['texts 1', 'texts 2']}}]

The request object is a list containing a dictionary {'body': sample_input.json}. Let's complete the preprocessing operation.

  • Python

Python

def preprocess(self, requests):
    "Tokenize the input text using the model tokenizer and convert 
    it into a PyTorch tensor
    Args:
        requests: the requests object in the form
         [{'body' or ('data'): input.json file}] 
    """
    # unpack the data
    data = requests[0].get('body')
    if data is None:
        data = requests[0].get('data')
    inp_x = data.get('input')
    # tokenize 
    tokenized_inp = self.tokenizer(inp_x,
                                    padding=True,
                                    return_tensors='pt')
    logger.info('Tokenization is completed!')
    return tokenized_inp
You can also try this code with Online Python Compiler
Run Code

Run the below code for Inference Function.

We use the inference function to give the model the tokenized tensors and get back the model outputs. Here's how the function is defined:

  • Python

Python

def inference(self, inputs):
    """Predict class using the model
    Args:
        inputs: tensor of tokenized data
    """
    outputs = self.model(**inputs.to(self.device))
    probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predictions = torch.argmax(probabilities, axis=1)
    predictions = predictions.tolist()
    logger.info('Predictions generated successfully.')
    return predictions
You can also try this code with Online Python Compiler
Run Code

Run the below code for Post Processing Function

We have one more function left to create, called the postprocess function. This function is essential because it helps us convert the output of our model into a list of words.

  • Python

Python

def postprocess(self, outputs: list):
    """
    map the model's integer output to their corresponding string labels using the mapping available in index_to_name.json
    Args:
        outputs (list): model outputs
    Returns:
        List: the corresponding string labels
    """
    preds = [self.mapping[str(label)] for label in outputs]
    logger.info(f‘PREDICTED LABELS: {preds}')
    return [preds]
You can also try this code with Online Python Compiler
Run Code

Testing

Now, let's check if we served our model correctly without any problems. We can use the following command to check if the deployed TorchServe API is available.

  • Python

Python

! curl http://localhost:8080/ping
You can also try this code with Online Python Compiler
Run Code

You can send a GET request to localhost:8080/ping. It returns the status of the server to you.

output

Frequently Asked Questions

What is TorchServe?

TorchServe is a PyTorch open-source model serving library that makes deploying machine learning models for inference in real-world settings easier.

How can a PyTorch model be exported for TorchServe?

When exporting a PyTorch model for TorchServe, torch.jit.trace or torch.jit.script generates a TorchScript representation of the model's state dictionary (parameters).

What is TorchScript?

TorchScript is a method for serializing and refining PyTorch models for deployment and inference that is more effective. To enable quicker execution and deployment across several platforms, it transforms PyTorch models into a transportable intermediate representation.

How can we start a TorchServe instance?

A configuration file containing the model(s) to serve, the host and port settings, and other details is usually created before starting a TorchServe instance. After that, you issue the TorchServe command while specifying the configuration file's location.

What are the advantages of deploying models with TorchServe?

To enable effective and concurrent model execution, TorchServe automatically handles model loading, inference, and scaling. PyTorch models may be more easily deployed in production while maintaining excellent performance thanks to its multi-model serving, batching, and GPU support capabilities.

Conclusion

This article discusses TorchServe and its unique features that make it attractive to developers. We discuss different steps involved in exporting models for serving using TorchServe. We explore how we can export our model to our local machine. We even learned how we can create a Model Handler. We also discussed how we can save our model and test it and identify the status of our network.

Do check out the link to learn more about such topic

You can find more informative articles or blogs on our platform. You can also practice more coding problems and prepare for interview questions from well-known companies on your platform, Coding Ninjas Studio.

Live masterclass