Run Mistral 7B and Mixtral 8x7B locally. From budget-friendly Q4 to full-precision inference.
24GB VRAM • 90+ tok/s on Mistral 7B • Runs Mixtral 8x7B at Q4
| Model | Full Precision | Q8 (8-bit) | Q4 (4-bit) |
|---|---|---|---|
| Mistral 7B | 14 GB | 8 GB | 5 GB |
| Mixtral 8x7B | 93 GB | 50 GB | 26 GB |
| Mixtral 8x22B | 282 GB | 150 GB | 80 GB |
* MoE models load all experts but activate fewer per token. Add 1-2GB overhead.
Check if your GPU can run Mistral or Mixtral at every quantization level.
Open VRAM CalculatorRent GPU compute from $0.39/hr. Compare 24+ providers with live pricing.
Browse Cloud GPUs