Skip to content

Commit 1f6fe64

Browse files
Add TorchServe CPU Example (#2613)
* add basic ipex torchserve example * update commands and proxy info * Added advanced model archive info * Added advanced model archive info * Added advanced model archive info * Update README.md * grammar * Update container versions * Update README.md --------- Co-authored-by: Pratool Bharti <[email protected]>
1 parent 5ac04d3 commit 1f6fe64

File tree

2 files changed

+170
-0
lines changed

2 files changed

+170
-0
lines changed
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Serving ResNet50 INT8 model with TorchServe and Intel® Extension for PyTorch optimizations
2+
3+
## Description
4+
This sample provides code to integrate Intel® Extension for PyTorch with TorchServe. This project quantizes a ResNet50 model to use the INT8 Precision to improve performance on CPU.
5+
6+
## Preparation
7+
You'll need to install Docker Engine on your development system. Note that while **Docker Engine** is free to use, **Docker Desktop** may require you to purchase a license. See the [Docker Engine Server installation instructions](https://docs.docker.com/engine/install/#server) for details.
8+
9+
## Quantize Model
10+
Create and Quantize a TorchScript model for the INT8 precision using the python environment found in the Intel® Optimized TorchServe Container. The below command will output `rn50_int8_jit.pt` that will be used in the next step.
11+
12+
```bash
13+
docker run \
14+
--rm -it -u root \
15+
--entrypoint='' \
16+
-v $PWD:/home/model-server \
17+
intel/intel-optimized-pytorch:2.2.0-serving-cpu \
18+
python quantize_model.py
19+
```
20+
21+
> [!NOTE]
22+
> If you are working under a corporate proxy you will need to include the following parameters in your `docker run` command: `-e http_proxy=${http_proxy} -e https_proxy=${https_proxy}`.
23+
24+
## Archive Model
25+
The [Torchserve Model Archiver](https://github.com/pytorch/serve/blob/master/model-archiver/README.md) is a CLI tool found in the torchserve container as well as on [pypi](https://pypi.org/project/torch-model-archiver/). This process is very similar for the [TorchServe Workflow](https://github.com/pytorch/serve/tree/master/workflow-archiver).
26+
27+
Follow the instructions found in the link above depending on whether you are intending to archive a model or a workflow. Use the provided container rather than installing the archiver with the example command below:
28+
29+
```bash
30+
docker run \
31+
--rm -it -u root \
32+
--entrypoint='' \
33+
-v $PWD:/home/model-server \
34+
intel/intel-optimized-pytorch:2.2.0-serving-cpu \
35+
torch-model-archiver \
36+
--model-name ipex-resnet50 \
37+
--version 1.0 \
38+
--serialized-file rn50_int8_jit.pt \
39+
--handler image_classifier \
40+
--export-path /home/model-server/model-store
41+
```
42+
43+
> [!NOTE]
44+
> If you are working under a corporate proxy you will need to include the following parameters in your `docker run` command: `-e http_proxy=${http_proxy} -e https_proxy=${https_proxy}`.
45+
46+
#### Advanced Model Archival
47+
The `--handler` argument is an important component of serving as it controls the inference pipeline. Torchserve provides several default handlers [built into the application](https://pytorch.org/serve/default_handlers.html#torchserve-default-inference-handlers). that are often enough for most inference cases, but you may need to create a custom handler if your application's inference needs additional preprocessing, postprocessing or using other variables to derive a final output.
48+
49+
To create a custom handler, first inherit `BaseHandler` or another built-in handler and override any necessary functionality. Usually, you only need to override the preprocessing and postprocessing methods to achieve an application's inference needs.
50+
51+
```python
52+
from ts.torch_handler.base_handler import BaseHandler
53+
54+
class ModelHandler(BaseHandler):
55+
"""
56+
A custom model handler implementation.
57+
"""
58+
```
59+
60+
> [!TIP]
61+
> For more examples of how to write a custom handler, see the [TorchServe documentation](https://github.com/pytorch/serve/blob/master/docs/custom_service.md).
62+
63+
Additionally, the `torch-model-archiver` allows you to pass additional parameters/files to tackle complex scenarios while archiving the package.
64+
65+
```txt
66+
--requirements-file Path to requirements.txt file containing
67+
a list of model specific python packages
68+
to be installed by TorchServe for seamless
69+
model serving.
70+
--extra-files Pass comma separated path to extra dependency
71+
files required for inference and can be
72+
accessed in handler script.
73+
--config-file Path to model-config yaml files that can
74+
contain information like threshold values,
75+
any parameter values need to be passed from
76+
training to inference.
77+
```
78+
79+
> [!TIP]
80+
> For more use-case examples, see the [TorchServe documentation](https://github.com/pytorch/serve/tree/master/examples).
81+
82+
## Start Server
83+
Start the TorchServe Server.
84+
85+
```bash
86+
docker run \
87+
-d --rm --name server \
88+
-v $PWD/model-store:/home/model-server/model-store \
89+
-v $PWD/wf-store:/home/model-server/wf-store \
90+
--net=host \
91+
intel/intel-optimized-pytorch:2.2.0-serving-cpu
92+
```
93+
94+
> [!TIP]
95+
> For more information about how to configure the TorchServe Server, see the [Intel AI Containers Documentation](https://github.com/intel/ai-containers/tree/main/pytorch/serving).
96+
97+
> [!NOTE]
98+
> If you are working under a corporate proxy you will need to include the following parameters in your `docker run` command: `-e http_proxy=${http_proxy} -e https_proxy=${https_proxy}`.
99+
100+
Check the server logs to verify that the server has started correctly.
101+
102+
```bash
103+
docker logs server
104+
```
105+
106+
Register the model using the HTTP/REST API and verify it has been registered
107+
108+
```bash
109+
curl -v -X POST "http://localhost:8081/models?url=ipex-resnet50.mar&initial_workers=1"
110+
curl -v -X GET "http://localhost:8081/models"
111+
```
112+
113+
Download a [test image](https://raw.githubusercontent.com/pytorch/serve/master/docs/images/kitten_small.jpg) and make an inference request using the HTTP/REST API.
114+
115+
```bash
116+
curl -v -X POST "http://localhost:8080/v2/models/ipex-resnet50/infer" \
117+
-T kitten_small.jpg
118+
```
119+
120+
Unregister the Model
121+
122+
```bash
123+
curl -v -X DELETE "http://localhost:8081/models/ipex-resnet50"
124+
```
125+
126+
## Stop Server
127+
When finished with the example, stop the torchserve server with the following command:
128+
129+
```bash
130+
docker container stop server
131+
```
132+
133+
## Trademark Information
134+
Intel, the Intel logo and Intel Xeon are trademarks of Intel Corporation or its subsidiaries.
135+
* Other names and brands may be claimed as the property of others.
136+
137+
&copy;Intel Corporation
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
import torch
2+
import intel_extension_for_pytorch as ipex
3+
import torchvision.models as models
4+
5+
# load the model
6+
model = models.resnet50(weights=models.ResNet50_Weights.DEFAULT)
7+
model = model.eval()
8+
9+
# define dummy input tensor to use for the model's forward call to record operations in the model for tracing
10+
N, C, H, W = 1, 3, 224, 224
11+
dummy_tensor = torch.randn(N, C, H, W)
12+
13+
from intel_extension_for_pytorch.quantization import prepare, convert
14+
15+
# ipex supports two quantization schemes: static and dynamic
16+
# default static qconfig
17+
qconfig = ipex.quantization.default_static_qconfig_mapping
18+
19+
# prepare and calibrate
20+
model = prepare(model, qconfig, example_inputs=dummy_tensor, inplace=False)
21+
22+
n_iter = 100
23+
for i in range(n_iter):
24+
model(dummy_tensor)
25+
26+
# convert and deploy
27+
model = convert(model)
28+
29+
with torch.no_grad():
30+
model = torch.jit.trace(model, dummy_tensor)
31+
model = torch.jit.freeze(model)
32+
33+
torch.jit.save(model, './rn50_int8_jit.pt')

0 commit comments

Comments
 (0)