You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# For the best alpha values (range [0, 1.0], float) tuned for specific models, we verified good accuracy: "EleutherAI/gpt-j-6b" with alpha=1.0, "meta-llama/Llama-2-7b-chat-hf" with alpha=0.8.
129
-
# For more recipes, please refer to https://github.com/intel/neural-compressor/blob/master/docs/source/smooth_quant.md#validated-models
130
-
# Note: by default, we use "--int8" to run int8 mixed fp32 mode, while for peak performance of static quantization, please use "--int8-bf16-mixed" instead (may impact accuracy).
127
+
OMP_NUM_THREADS=<physical cores num> numactl -m <node N> -C <physical cores list> python run.py --benchmark -m <MODEL_ID> --ipex-smooth-quant --qconfig-summary-file <path to specific model qconfig> --output-dir "saved_results" --int8
128
+
# We provide tuned qconfig recipes files for "meta-llama/Llama-2-7b-hf", "meta-llama/Llama-2-7b-chat-hf" and "EleutherAI/gpt-j-6b"
129
+
# For the qconfig recipes of more models, you can just run your model_id and try with IPEX default recipes by removing "--qconfig-summary-file <path to specific model qconfig>"
130
+
# If IPEX default recipes are not good enough for accuracy requirements, please refer to https://github.com/intel/neural-compressor/blob/master/docs/source/smooth_quant.md#validated-models for tuning more recipes.
131
+
# Note: by default, we use "--int8" to run int8 mixed fp32 inference, while for the peak performance of static quantization, please use "--int8-bf16-mixed" instead (may impact accuracy).
(1) for quantization benchmarks, the first runs will auto-generate the quantized model named "best_model.pt" in the "--output-dir" path, you can reuse these quantized models for inference-only benchmarks by adding "--quantized-model-path <output_dir + "best_model.pt">".
138
+
(1) <aname="generation_sq">for all quantization benchmarks</a>, the first runs will auto-generate the quantized model named "best_model.pt" in the "--output-dir" path, you can reuse these quantized models for inference-only benchmarks by adding "--quantized-model-path <output_dir + "best_model.pt">". Specific for static quantization, if not using "--qconfig-summary-file", a qconfig recipe will also been generated in the "--output-dir" path.
138
139
139
140
(2) for Falcon quantizations, "--config-file <CONFIG_FILE>" is needed and example of <CONFIG_FILE>: "utils/model_config/tiiuae_falcon-40b_config.json".
For the quantized models to be used in accuracy tests, we can reuse the model files that are named "best_model.pt" in the "--output-dir" path ([generated during inference performance tests](#generation_sq)).
248
250
```bash
249
251
# general command:
250
-
# For the quantized models to be used in accuracy tests, we can reuse the model files that are named "best_model.pt" in the "--output-dir" path (generated during inference performance tests).
0 commit comments