Bring the notion of Model-as-a-Service to life
Sparsity-aware deep learning inference runtime for CPUs
Large Language Model Text Generation Inference
An easy-to-use LLMs quantization package with user-friendly apis
Openai style api for open large language models
Libraries for applying sparsification recipes to neural networks
Neural Network Compression Framework for enhanced OpenVINO
Efficient few-shot learning with Sentence Transformers
A Unified Library for Parameter-Efficient Learning