Inference

TorchRec provides easy-to-use APIs for transforming an authored TorchRec model into an optimized inference model for distributed inference, via eager module swaps.

This transforms TorchRec modules like EmbeddingBagCollection in the model to a quantized, sharded version that can be compiled using torch.fx and TorchScript for inference in a C++ environment.

The intended use is calling quantize_inference_model on the model followed by shard_quant_model.

.. codeblock::

.. automodule:: torchrec.inference.modules

.. autofunction:: quantize_inference_model

.. autofunction:: shard_quant_model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference-api-reference.rst

inference-api-reference.rst

Inference

Files

inference-api-reference.rst

Latest commit

History

inference-api-reference.rst

File metadata and controls

Inference