How to deploy PyTorch models on Vertex AI

Last Updated : 23 Jul, 2025

PyTorch is a freely available machine learning library that can be imported and used inside the code for performing machine learning operations based on requirements. The front-end api is written in Python and the tensor operations are implemented using C++. It is developed by Facebook's AI Research Lab (FAIR). It is easy to use, adaptive (provides flexibility) and most importantly it poses dynamic computation graph capability (providing graphs based on the input in run time at that instant).

Overview of Vertex AI
Installation of libraries
Implement Pytorch Model
Flask Application and Pytorch Model
Dockerizing the Flask Application
Setting up Google Cloud Environment
Push the Dockerized Flask Image to GCP Container Registry
Deploy the GCP Container to GCP Vertex AI
Test and Monitor the Deployed Container

Overview of Vertex AI

Vertes AI is a service which is provided by Google Cloud Platform (GCP) which allow developers to deploy the machine learning model , to build the machine learning model and most importantly to scale it up very conveniently. It comprises of various tools and services that a developer can access and can efficiently manage the the machine learning model from building the model to deployment of the model to scaling of the model everything can be done inside the service provided by Vertex AI.

Terminologies related to Vertex AI

Model Artifact :- when a machine learning model produce files and data in the training process it is known as Model Artifact . It is required because without this Artifact the model after the training phase cannot be deployed in the production or it cannot be used.

Vertex AI Model Registry :- It basically acts as a overall container (repository) for storing and managing the various type of machine learning model and developers can access it throughout the development phase.

Google Cloud Storage (GCS) :- This is also a service provided by the google in which it offers a scalable storage option to store data on demand basis and charge accordingly. It can handle a huge volume of data also as it is scalable and efficient.

Containerization :- It means that the dependencies and the application are packed into a container and this container performs in all possible computing environment which assure that it should perform same regardless of the its deployment environment.

Model Endpoint :- It is basically a dedicated URL or network location which can be accessed and can be use for making prediction for the deployed machine learning model. It plays a important role as it help in sending data to the model and receiving result from it which a client can do by accessing the endpoints.

Installation of the required library

Let's add the required libraries to the requirements.txt file.

Flask==3.0.3
Flask-Cors==4.0.1
numpy==2.0.0
torch==2.3.1
gunicorn==22.0.0

By using the pip install command one can install the libraries mentioned in the requirements.tx file. The command is as follows:

pip install -r requirements.txt

Since we are dockerizing the application, we will mention the above installation command in the Dockerfile.

Implementation of the PyTorch model

Let's implement a Pytorch model by applying a linear transformation to the incoming data. We can make use of one nn.Lienar model, one of the fundamental component of PyTorch. The code is as follows:

Python

import torch 
import torch.nn as nun

class SimpleModel(nun.Module): 
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.linear = nun.Linear(10, 1)
    def forward(self, x): 
        return self.linear(x)

The nn.Linear applies a linear transformation to input data using weight and biases. The module takes two parameters called in_features and out_features. These parameters represent the number of input and output features. Upon object creation, it randomly initializes a weight matrix and a bias vector.

Let's try a sample prediction before we save the model.

Python

model= SimpleModel()
model.linear

Output

Linear(in_features=10, out_features=1, bias=True)

Here we have iniatialzed a linear Pytorch model. Now let's create a random input data of same matrix and make the prediction

Python

x = torch.randn(1, 10)
t1 = x.to(torch.float)

with torch.no_grad():
    prediction = model(t1).tolist()
    
prediction

Output

[[-0.26785045862197876]]

So our model works fine. Next we can save our model so that our flask application can load it and make prediction.

Saving the PyTorch model

The model can be saved by using the following code

Python

model= SimpleModel()
torch.save(model.state_dict(),'model.pth')

Flask Application and Pytorch Model

As a next step, we need to create a flask application and load the Pytorch model. Finally, one can make predictions using the model by invoking a REST API.

Create a Flask Application

Let's create a directory called 'app'. Inside 'app' folder we can create a main.py file. The file contains the code to create flask application. The main.py file is as follows:

Python

from flask import (
  Flask, request, jsonify)
from flask_cors import CORS

@app.route('/health', methods=['GET'])
def health():
  return jsonify(status='healthy'), 200

@app.route('/predict', methods=['POST'])
def predict():
  return None
    
app = Flask(__name__)
CORS(app)

if __name__ == '__main__':
  app.run(host='0.0.0.0', port=8080)

Here we created a basic structure needed for a FLASK application. Now the predict() method does nothing. To make it functional we need to load our Pytorch model and make predictions when the user invokes the REST API ('/predict'). Also, we created a health monitoring API, which is used to check the health of the deployed model. The /health route is mentioned while creating the endpoint in the GCP Vertex AI.

Loading the Pytorch model in Flask

To load the Pytorch model, it is necessary to mention the linear module in our Flask application. The code is as follows:

Python

import torch 
import torch.nn as nun 

# linear module
class SimpleModel (nun.Module):
    def __init__(self): 
        super(SimpleModel, self). __init__() 
        self.linear = nun.Linear(10, 1)
    def forward(self,x):
        return self.linear(x)

# initialize the module
model = SimpleModel() 
# load the module
model.load_state_dict(torch.load('model.pth'))

Here we created a linear Pytorch model and loaded the saved Pytorch model. Now we can construct the predict() method.

Python

@app.route('/predict', methods=['POST'])
def predict():
    data = request.json['inputs']
    data = torch.tensor(data)
    with torch.no_grad():
        prediction = model(data).tolist()
        return jsonify(prediction=prediction)

The complete code is as follows:

Python

from flask import (
  Flask, request, jsonify)
from flask_cors import CORS

import torch 
import torch.nn as nun 

# linear module
class SimpleModel (nun.Module):
    def __init__(self): 
        super(SimpleModel, self). __init__() 
        self.linear = nun.Linear(10, 1)
    def forward(self,x):
        return self.linear(x)

# initialize the module
model = SimpleModel() 
# load the module
model.load_state_dict(torch.load('model.pth'))

@app.route('/health', methods=['GET'])
def health():
  return jsonify(status='healthy'), 200

@app.route('/predict', methods=['POST'])
def predict():
    data = request.json['inputs']
    data = torch.tensor(data)
    with torch.no_grad():
        prediction = model(data).tolist()
        return jsonify(prediction=prediction)
    
app = Flask(__name__)
CORS(app)

if __name__ == '__main__':
  app.run(host='0.0.0.0', port=8080)

Dockerizing the Flask Application

To dockerize the flask application, it is necessary to create a docker file with necessary installation and running commands.

Creation of a Dockerfile

You need to create a Dockerfile in the same folder where 'app' directory is mentioned. The commands in the Dockerfile is as follows.

FROM python:3.9-slim

# Install libraries
COPY ./requirements.txt ./
RUN pip install -r requirements.txt && \
    rm ./requirements.txt

# container directories
RUN mkdir /app

# Copy app directory (code and Pytorch model) to the container
COPY ./app /app

# run server with gunicorn
WORKDIR /app
EXPOSE 8080
CMD ["gunicorn", "main:app", "--timeout=0", "--preload", \
     "--workers=1", "--threads=4", "--bind=0.0.0.0:8080"]

Now we need to build a docker conatiner based on the above docker file. Before that lets check the directory structure. The directory structure is as follows:

app
      |-- main.py
      |-- model.pth
Dockerfile
requirements.txt

The app directory contains our flask based python code (main.py) and the Pytorch model (model.pth).

Build a Docker Container

To build a docker container you need to execute the command below:

docker build -t flask-torch-docker .

The above command will execute the Dockerfile and build a docker named 'flask-torch-docker'

Run a Docker Container

Let's run the 'flask-torch-docker' docker using below command

docker run -it -p 8080:8080 flask-torch-docker

Testing a Docker Container Locally

It can be tested by using the curl command, below is the mentioned code.

curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{"inputs": [[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]]}'

Output

{  "prediction": 0.785}

The command prompt screenshot is as follows:

Push the dockerized flask image (Pytorch model) to GCP

In the above steps we created a Pytorch model using Flask, dockerized it and ensured that the dockerized application is working locally. Now it's time to move to the crucial step, which is deploying the Pytorch model on Vertex AI. In this article we will deploy our dockerized image to the Google Cloud Registry (container) and then later deploy the container to VertexAI. As a first step we need to set up Google Cloud Environment

Setting up Google Cloud Environment

To set up google cloud environment, the user needs to create an account or sign in through the google account and add the payment details and after that user have the access to Google Cloud CLI (for managing the resources and services ). You can create a Google cloud project and install the gcloud CLI. Now we can focus on how to push our dockerised image to Google Cloud Registry (GCR).

Steps to push dockerized image to GCR

Let's look at the steps to push the dockerized image to GCR. The steps are as follows:

Step 1: Initializing the google cloud SDK (software development kit )

gcloud init

Step 2: Setting up Docker to authenticate requests to GCR (Google Cloud Registry) using the gcloud command-line tool

gcloud auth configure-docker

Step 3: Building the Docker image

docker build -t flask-torch-docker:latest .

Step 4: Add your GCP project ID and the preferred region to the Docker image

docker tag flask-torch-docker:latest gcr.io/your-project-id/flask-torch-docker:latest

In the above command replace the your-project-id with the project ID of your system. You can use the below command to get all the project IDs.

gcloud projects list

The above command will list the project IDs from where you can choose and replace it there and run the command.

Step 5: Pushing the Docker image to GCR (Google Cloud Registry )

docker push gcr.io/your-project-id/flask-torch-docker:latest

Same in the above command replace the project ID of your system in place of your-project-id.

Screenshot-2024-07-18-161626 — The pushed image can be checked under the Artifact Registry

Deploying the GCP container to Vertex AI

Now we have deployed our dockerized Py torch model to the Google Cloud Registry, which acts as a container. Next step is to deploy the container to Vertex AI. You can login in to your google cloud account and search for Vertex AI. The page is as shown below:

Screenshot-2024-07-12-180937 — This is the home page of Vertex AI , after this you can click on enable all APIs below.

Import the Model Using Model Registry

To import the model, you can choose the model registry functionality from Vertex AI. The model registry page is as follows:

Screenshot-2024-07-12-182107 — After clicking on the model registry

Here user can create a new model for registry or can import the model model from container registry or artifact registry. In this article we will deploy the model by importing the container from artifact registry and provide necessary model setting details. The steps are as follows:

Step 1: Create new model and provide appropriate name and region

Here we create a new model (for existing model, you can update the version) and provide a name and appropriate region.

Step 2: Import an existing custom container

Here we choose the option to import the existing custom conatiner from artifact registry and browse the container from artifact registry, which has the dockerized Pytorch model (flask application).

Step 3: Provide Model Setting details

You can set the prediction route as /predict and port as 8080 (as mentioned in dockerized flask app). For Health route you can mention it as "/health".

Step 4: Click the 'Import model' to create the model in the Model Registry

Finally, one can click the Import model to create the model in the Model Registry.

Define the endpoint and Deploy the Model

In the above steps, we created a model in the model registry. Now we need to define the endpoint and deploy the model. In the Model Registry we have the option to Deploy the endpoint. Select the Endpoints from navigation menu and the click on create and then configure it.

Step 1: Enter the model name and select the region

Screenshot-2024-07-18-163548 — After clicking on Deploy End-Points.

Step 2: Mention details in Model Settings

Screenshot-2024-07-18-164047 — In the Model Setting first set the Traffic split.

Screenshot-2024-07-18-164523 — Then set the number of computing node and click on Done at the bottom side

Here we set the traffic split and also set the number of computing node.

Step 3: Deploy the model

After configuring the necessary endpoint details you can deploy the model by clicking on 'DEPLOY TO ENDPOINT'.

Screenshot-2024-07-18-170401 — After deploying the model it will be displayed

After creating the endpoints click on deploy then select the model name and configure the rest setting according to the requirement and click on deploy.

Testing the Endpoints and Monitoring the Model

For testing the Endpoints you can use the following curl command.

curl -X POST https://<your-endpoint-url>/predict \
-H "Content-Type: application/json" \
-d '{"inputs": [[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]]}'

Here replace the "your-endpoint-URL" with your system endpoint URL and then run the command it will return a JSON output.

{
  "prediction": 0.785
}

For monitoring the Deployed model user can navigate to deploy and use and there choose monitoring and the following page will appear where user can monitor the deployed model and also can configure the monitoring according to the user convenience .

Screenshot-2024-07-12-193643 — After clicking on monitoring

Screenshot-2024-07-18-171527 — Monitoring the Deployed Model.

Additional features of Vertex AI

User can customize the labels of the dataset and set other parameter and after that deploying the model as per the prizing of google cloud vertex AI service and it will be successfully deployed.
Apart from this there are other services which are provided like workbench , colab Enterprise , etc. to customize and to Collaboratory work together.

Applications

It can be used in classification of any skin related diseases and other different variety of diseases and correctly identifying it.
It can be used in prediction overall weather forecast for the upcoming years and the expected values ranges.
It can be used in self driven cars by providing quality data set which will help in learning the trajectory of paths and smooth operations can be performed.

What is Data Science?

nrudrz5sc

Improve

Article Tags :

How to deploy PyTorch models on Vertex AI

Overview of Vertex AI

Terminologies related to Vertex AI

Installation of the required library

Implementation of the PyTorch model

Saving the PyTorch model

Flask Application and Pytorch Model

Create a Flask Application

Loading the Pytorch model in Flask

Dockerizing the Flask Application

Creation of a Dockerfile

Build a Docker Container

Run a Docker Container

Testing a Docker Container Locally

Push the dockerized flask image (Pytorch model) to GCP

Setting up Google Cloud Environment

Steps to push dockerized image to GCR

Deploying the GCP container to Vertex AI

Import the Model Using Model Registry

Define the endpoint and Deploy the Model

Step 1: Enter the model name and select the region

Step 2: Mention details in Model Settings

Testing the Endpoints and Monitoring the Model

Additional features of Vertex AI

Applications

Similar Reads

Introduction to Machine Learning

Python for Machine Learning

Introduction to Statistics

Feature Engineering

Model Evaluation and Tuning

Data Science Practice

Thank You!

What kind of Experience do you want to share?