text-embeddings-inference documentation

Using TEI Container with Intel® Hardware

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Using TEI Container with Intel® Hardware

This guide explains how to build and deploy text-embeddings-inference containers optimized for Intel® hardware, including CPUs, XPUs, and HPUs.

CPU

Build Docker Image

To build a container optimized for Intel® CPUs, run the following command:

platform="cpu"

docker build . -f Dockerfile-intel --build-arg PLATFORM=$platform -t tei_cpu_ipex

Deploy Docker Container

To deploy your model on an Intel® CPU, use the following command:

model='BAAI/bge-large-en-v1.5'
volume=$PWD/data

docker run -p 8080:80 -v $volume:/data tei_cpu_ipex --model-id $model

XPU

Build Docker Image

To build a container optimized for Intel® XPUs, run the following command:

platform="xpu"

docker build . -f Dockerfile-intel --build-arg PLATFORM=$platform -t tei_xpu_ipex

Deploy Docker Container

To deploy your model on an Intel® XPU, use the following command:

model='BAAI/bge-large-en-v1.5'
volume=$PWD/data

docker run -p 8080:80 -v $volume:/data --device=/dev/dri -v /dev/dri/by-path:/dev/dri/by-path tei_xpu_ipex --model-id $model --dtype float16

HPU

Build Docker Image

To build a container optimized for Intel® HPUs (Gaudi), run the following command:

platform="hpu"

docker build . -f Dockerfile-intel --build-arg PLATFORM=$platform -t tei_hpu

Deploy Docker Container

To deploy your model on an Intel® HPU (Gaudi), use the following command:

model='BAAI/bge-large-en-v1.5'
volume=$PWD/data

docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e MAX_WARMUP_SEQUENCE_LENGTH=512 tei_hpu --model-id $model --dtype bfloat16

Prebuilt Docker Images

For convenience, prebuilt Docker images are available on GitHub Container Registry (GHCR). You can pull these images directly without the need to build them manually:

CPU

To use the prebuilt image optimized for Intel® CPUs, run:

docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-ipex-latest

XPU

To use the prebuilt image optimized for Intel® XPUs, run:

docker pull ghcr.io/huggingface/text-embeddings-inference:xpu-ipex-latest

HPU

To use the prebuilt image optimized for Intel® HPUs (Gaudi), run:

docker pull ghcr.io/huggingface/text-embeddings-inference:hpu-latest
< > Update on GitHub