text-embeddings-inference documentation
Using TEI Container with Intel® Hardware
Using TEI Container with Intel® Hardware
This guide explains how to build and deploy text-embeddings-inference
containers optimized for Intel® hardware, including CPUs, XPUs, and HPUs.
CPU
Build Docker Image
To build a container optimized for Intel® CPUs, run the following command:
platform="cpu" docker build . -f Dockerfile-intel --build-arg PLATFORM=$platform -t tei_cpu_ipex
Deploy Docker Container
To deploy your model on an Intel® CPU, use the following command:
model='BAAI/bge-large-en-v1.5' volume=$PWD/data docker run -p 8080:80 -v $volume:/data tei_cpu_ipex --model-id $model
XPU
Build Docker Image
To build a container optimized for Intel® XPUs, run the following command:
platform="xpu" docker build . -f Dockerfile-intel --build-arg PLATFORM=$platform -t tei_xpu_ipex
Deploy Docker Container
To deploy your model on an Intel® XPU, use the following command:
model='BAAI/bge-large-en-v1.5' volume=$PWD/data docker run -p 8080:80 -v $volume:/data --device=/dev/dri -v /dev/dri/by-path:/dev/dri/by-path tei_xpu_ipex --model-id $model --dtype float16
HPU
Build Docker Image
To build a container optimized for Intel® HPUs (Gaudi), run the following command:
platform="hpu" docker build . -f Dockerfile-intel --build-arg PLATFORM=$platform -t tei_hpu
Deploy Docker Container
To deploy your model on an Intel® HPU (Gaudi), use the following command:
model='BAAI/bge-large-en-v1.5' volume=$PWD/data docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e MAX_WARMUP_SEQUENCE_LENGTH=512 tei_hpu --model-id $model --dtype bfloat16
Prebuilt Docker Images
For convenience, prebuilt Docker images are available on GitHub Container Registry (GHCR). You can pull these images directly without the need to build them manually:
CPU
To use the prebuilt image optimized for Intel® CPUs, run:
docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-ipex-latest
XPU
To use the prebuilt image optimized for Intel® XPUs, run:
docker pull ghcr.io/huggingface/text-embeddings-inference:xpu-ipex-latest
HPU
To use the prebuilt image optimized for Intel® HPUs (Gaudi), run:
docker pull ghcr.io/huggingface/text-embeddings-inference:hpu-latest