Recommendation precision
- get_recommendation_precision(self, input_model_path: str, output_dir: str, dataset_path: str | None, weight_precision: QuantizationPrecision = QuantizationPrecision.INT8, activation_precision: QuantizationPrecision = QuantizationPrecision.INT8, metric: SimilarityMetric = SimilarityMetric.SNR, threshold: float | int = 0, input_layers: List[Dict[str, int]] | None = None, wait_until_done: bool = True, sleep_interval: int = 30) QuantizerMetadata
Get recommended precision for a model based on a specified quality threshold.
This function analyzes each layer of the given model and recommends precision settings for layers that do not meet the specified threshold, helping to balance quantization quality and performance.
- Parameters:
input_model_path (str) – The file path where the model is located.
output_dir (str) – The local folder path to save the quantized model.
dataset_path (str) – Path to the dataset. Useful for certain quantizations.
weight_precision (QuantizationPrecision) – Target precision for weights.
activation_precision (QuantizationPrecision) – Target precision for activations.
metric (SimilarityMetric) – Metric used to evaluate quantization quality.
threshold (Union[float, int]) – Quality threshold; layers below this threshold will receive precision recommendations.
input_layers (List[Dict[str, int]], optional) – Specifications for input shapes (e.g., to convert from dynamic to static batch size).
wait_until_done (bool) – If True, waits for the quantization process to finish before returning. If False, starts the process and returns immediately.
sleep_interval (int) – Interval, in seconds, between checks when wait_until_done is True.
- Raises:
e – If an error occurs during the model quantization.
- Returns:
Quantize metadata.
- Return type:
QuantizerMetadata
Example
from netspresso import NetsPresso
from netspresso.enums import QuantizationPrecision
netspresso = NetsPresso(email="YOUR_EMAIL", password="YOUR_PASSWORD")
quantizer = netspresso.quantizer()
recommendation_metadata = quantizer.get_recommendation_precision(
input_model_path="./examples/sample_models/test.onnx",
output_dir="./outputs/quantized/automatic_quantization",
dataset_path="./examples/sample_datasets/pickle_calibration_dataset_128x128.npy",
weight_precision=QuantizationPrecision.INT8,
activation_precision=QuantizationPrecision.INT8,
threshold=0,
)
recommendation_precisions = quantizer.load_recommendation_precision_result(recommendation_metadata.recommendation_result_path)