Calculate exactly how much GPU memory (VRAM) you need to run any AI model locally. Supports 280+ models including LLaMA 3, DeepSeek R1, Mistral, Qwen 2.5, and Phi-4 at FP16, Q8, Q4, and other quantization levels.
7B models (LLaMA 3 8B, Mistral 7B): ~4-5GB at Q4, ~14GB at FP16
13-14B models (Phi-4, Qwen 2.5 14B): ~8-9GB at Q4, ~28GB at FP16
70B models (LLaMA 3 70B, Qwen 2.5 72B): ~40GB at Q4, ~140GB at FP16
Rule of thumb: Multiply parameters (B) by 0.6 for Q4 VRAM, by 2 for FP16 VRAM.
Calculate how much GPU memory you need to run AI models locally. Supports all quantization levels.
VRAM estimates are approximate. Actual usage varies by model architecture, batch size, and runtime.
For MoE models (Mixtral, DeepSeek), only active parameters are loaded — actual VRAM may be lower than total parameter count suggests.