TorchRec provides easy-to-use APIs for transforming an authored TorchRec model into an optimized inference model for distributed inference, via eager module swaps.
This transforms TorchRec modules like EmbeddingBagCollection
in the model to
a quantized, sharded version that can be compiled using torch.fx and TorchScript
for inference in a C++ environment.
The intended use is calling quantize_inference_model
on the model followed by
shard_quant_model
.
.. codeblock::
.. automodule:: torchrec.inference.modules
.. autofunction:: quantize_inference_model
.. autofunction:: shard_quant_model