Skip to content

Latest commit

 

History

History
19 lines (13 loc) · 655 Bytes

inference-api-reference.rst

File metadata and controls

19 lines (13 loc) · 655 Bytes

Inference

TorchRec provides easy-to-use APIs for transforming an authored TorchRec model into an optimized inference model for distributed inference, via eager module swaps.

This transforms TorchRec modules like EmbeddingBagCollection in the model to a quantized, sharded version that can be compiled using torch.fx and TorchScript for inference in a C++ environment.

The intended use is calling quantize_inference_model on the model followed by shard_quant_model.

.. codeblock::

.. automodule:: torchrec.inference.modules

.. autofunction:: quantize_inference_model
.. autofunction:: shard_quant_model