Plain Quantization

uniform_precision_quantization(self, input_model_path: str, output_dir: str, dataset_path: str | None, metric: SimilarityMetric = SimilarityMetric.SNR, weight_precision: QuantizationPrecision = QuantizationPrecision.INT8, activation_precision: QuantizationPrecision = QuantizationPrecision.INT8, input_layers: List[Dict[str, int]] | None = None, wait_until_done: bool = True, sleep_interval: int = 30)

Apply uniform precision quantization to a model, specifying precision for weight & activation.

This method quantizes all layers in the model uniformly based on the specified precision levels for weights and activations.

Parameters:
  • input_model_path (str) – The file path where the model is located.

  • output_dir (str) – The local folder path to save the quantized model.

  • dataset_path (str) – Path to the dataset. Useful for certain quantizations.

  • metric (SimilarityMetric) – Quantization quality metrics.

  • weight_precision (QuantizationPrecision) – Weight precision

  • activation_precision (QuantizationPrecision) – Activation precision

  • input_layers (List[InputShape], optional) – Target input shape for quantization (e.g., dynamic batch to static batch).

  • wait_until_done (bool) – If True, wait for the quantization result before returning the function. If False, request the quantization and return the function immediately.

Raises:

e – If an error occurs during the model quantization.

Returns:

Quantize metadata.

Return type:

QuantizerMetadata

Example

from netspresso import NetsPresso
from netspresso.enums import QuantizationPrecision, SimilarityMetric


netspresso = NetsPresso(email="YOUR_EMAIL", password="YOUR_PASSWORD")

quantizer = netspresso.quantizer()
quantization_result = quantizer.uniform_precision_quantization(
    input_model_path="./examples/sample_models/test.onnx",
    output_dir="./outputs/quantized/uniform_precision_quantization",
    dataset_path="./examples/sample_datasets/pickle_calibration_dataset_128x128.npy",
    metric=SimilarityMetric.SNR,
    weight_precision=QuantizationPrecision.INT8,
    activation_precision=QuantizationPrecision.INT8,
)