Quantization

Shrinking a model by storing its numbers at lower precision, which reduces memory and speeds it up with a small quality trade-off.

Why it matters

Quantization is what lets capable models run on a laptop or phone.

Related terms

Distillation Inference Open-Weight Model

Back to the full AI glossary.