|
| 1 | +# Serving ResNet50 INT8 model with TorchServe and Intel® Extension for PyTorch optimizations |
| 2 | + |
| 3 | +## Description |
| 4 | +This sample provides code to integrate Intel® Extension for PyTorch with TorchServe. This project quantizes a ResNet50 model to use the INT8 Precision to improve performance on CPU. |
| 5 | + |
| 6 | +## Preparation |
| 7 | +You'll need to install Docker Engine on your development system. Note that while **Docker Engine** is free to use, **Docker Desktop** may require you to purchase a license. See the [Docker Engine Server installation instructions](https://docs.docker.com/engine/install/#server) for details. |
| 8 | + |
| 9 | +## Quantize Model |
| 10 | +Create and Quantize a TorchScript model for the INT8 precision using the python environment found in the Intel® Optimized TorchServe Container. The below command will output `rn50_int8_jit.pt` that will be used in the next step. |
| 11 | + |
| 12 | +```bash |
| 13 | +docker run \ |
| 14 | + --rm -it -u root \ |
| 15 | + --entrypoint='' \ |
| 16 | + -v $PWD:/home/model-server \ |
| 17 | + intel/intel-optimized-pytorch:2.2.0-serving-cpu \ |
| 18 | + python quantize_model.py |
| 19 | +``` |
| 20 | + |
| 21 | +> [!NOTE] |
| 22 | +> If you are working under a corporate proxy you will need to include the following parameters in your `docker run` command: `-e http_proxy=${http_proxy} -e https_proxy=${https_proxy}`. |
| 23 | +
|
| 24 | +## Archive Model |
| 25 | +The [Torchserve Model Archiver](https://github.com/pytorch/serve/blob/master/model-archiver/README.md) is a CLI tool found in the torchserve container as well as on [pypi](https://pypi.org/project/torch-model-archiver/). This process is very similar for the [TorchServe Workflow](https://github.com/pytorch/serve/tree/master/workflow-archiver). |
| 26 | + |
| 27 | +Follow the instructions found in the link above depending on whether you are intending to archive a model or a workflow. Use the provided container rather than installing the archiver with the example command below: |
| 28 | + |
| 29 | +```bash |
| 30 | +docker run \ |
| 31 | + --rm -it -u root \ |
| 32 | + --entrypoint='' \ |
| 33 | + -v $PWD:/home/model-server \ |
| 34 | + intel/intel-optimized-pytorch:2.2.0-serving-cpu \ |
| 35 | + torch-model-archiver \ |
| 36 | + --model-name ipex-resnet50 \ |
| 37 | + --version 1.0 \ |
| 38 | + --serialized-file rn50_int8_jit.pt \ |
| 39 | + --handler image_classifier \ |
| 40 | + --export-path /home/model-server/model-store |
| 41 | +``` |
| 42 | + |
| 43 | +> [!NOTE] |
| 44 | +> If you are working under a corporate proxy you will need to include the following parameters in your `docker run` command: `-e http_proxy=${http_proxy} -e https_proxy=${https_proxy}`. |
| 45 | +
|
| 46 | +#### Advanced Model Archival |
| 47 | +The `--handler` argument is an important component of serving as it controls the inference pipeline. Torchserve provides several default handlers [built into the application](https://pytorch.org/serve/default_handlers.html#torchserve-default-inference-handlers). that are often enough for most inference cases, but you may need to create a custom handler if your application's inference needs additional preprocessing, postprocessing or using other variables to derive a final output. |
| 48 | + |
| 49 | +To create a custom handler, first inherit `BaseHandler` or another built-in handler and override any necessary functionality. Usually, you only need to override the preprocessing and postprocessing methods to achieve an application's inference needs. |
| 50 | + |
| 51 | +```python |
| 52 | +from ts.torch_handler.base_handler import BaseHandler |
| 53 | + |
| 54 | +class ModelHandler(BaseHandler): |
| 55 | + """ |
| 56 | + A custom model handler implementation. |
| 57 | + """ |
| 58 | +``` |
| 59 | + |
| 60 | +> [!TIP] |
| 61 | +> For more examples of how to write a custom handler, see the [TorchServe documentation](https://github.com/pytorch/serve/blob/master/docs/custom_service.md). |
| 62 | +
|
| 63 | +Additionally, the `torch-model-archiver` allows you to pass additional parameters/files to tackle complex scenarios while archiving the package. |
| 64 | + |
| 65 | +```txt |
| 66 | +--requirements-file Path to requirements.txt file containing |
| 67 | + a list of model specific python packages |
| 68 | + to be installed by TorchServe for seamless |
| 69 | + model serving. |
| 70 | +--extra-files Pass comma separated path to extra dependency |
| 71 | + files required for inference and can be |
| 72 | + accessed in handler script. |
| 73 | +--config-file Path to model-config yaml files that can |
| 74 | + contain information like threshold values, |
| 75 | + any parameter values need to be passed from |
| 76 | + training to inference. |
| 77 | +``` |
| 78 | + |
| 79 | +> [!TIP] |
| 80 | +> For more use-case examples, see the [TorchServe documentation](https://github.com/pytorch/serve/tree/master/examples). |
| 81 | +
|
| 82 | +## Start Server |
| 83 | +Start the TorchServe Server. |
| 84 | + |
| 85 | +```bash |
| 86 | +docker run \ |
| 87 | + -d --rm --name server \ |
| 88 | + -v $PWD/model-store:/home/model-server/model-store \ |
| 89 | + -v $PWD/wf-store:/home/model-server/wf-store \ |
| 90 | + --net=host \ |
| 91 | + intel/intel-optimized-pytorch:2.2.0-serving-cpu |
| 92 | +``` |
| 93 | + |
| 94 | +> [!TIP] |
| 95 | +> For more information about how to configure the TorchServe Server, see the [Intel AI Containers Documentation](https://github.com/intel/ai-containers/tree/main/pytorch/serving). |
| 96 | +
|
| 97 | +> [!NOTE] |
| 98 | +> If you are working under a corporate proxy you will need to include the following parameters in your `docker run` command: `-e http_proxy=${http_proxy} -e https_proxy=${https_proxy}`. |
| 99 | +
|
| 100 | +Check the server logs to verify that the server has started correctly. |
| 101 | + |
| 102 | +```bash |
| 103 | +docker logs server |
| 104 | +``` |
| 105 | + |
| 106 | +Register the model using the HTTP/REST API and verify it has been registered |
| 107 | + |
| 108 | +```bash |
| 109 | +curl -v -X POST "http://localhost:8081/models?url=ipex-resnet50.mar&initial_workers=1" |
| 110 | +curl -v -X GET "http://localhost:8081/models" |
| 111 | +``` |
| 112 | + |
| 113 | +Download a [test image](https://raw.githubusercontent.com/pytorch/serve/master/docs/images/kitten_small.jpg) and make an inference request using the HTTP/REST API. |
| 114 | + |
| 115 | +```bash |
| 116 | +curl -v -X POST "http://localhost:8080/v2/models/ipex-resnet50/infer" \ |
| 117 | + -T kitten_small.jpg |
| 118 | +``` |
| 119 | + |
| 120 | +Unregister the Model |
| 121 | + |
| 122 | +```bash |
| 123 | +curl -v -X DELETE "http://localhost:8081/models/ipex-resnet50" |
| 124 | +``` |
| 125 | + |
| 126 | +## Stop Server |
| 127 | +When finished with the example, stop the torchserve server with the following command: |
| 128 | + |
| 129 | +```bash |
| 130 | +docker container stop server |
| 131 | +``` |
| 132 | + |
| 133 | +## Trademark Information |
| 134 | +Intel, the Intel logo and Intel Xeon are trademarks of Intel Corporation or its subsidiaries. |
| 135 | +* Other names and brands may be claimed as the property of others. |
| 136 | + |
| 137 | +©Intel Corporation |
0 commit comments