Run LLaMA 3 8B and 70B models locally with the right GPU. Tested recommendations from budget to enthusiast builds.
24GB VRAM • 80-100 tok/s on LLaMA 3 8B • Runs 70B quantized
| Model | Full Precision | Q8 (8-bit) | Q4 (4-bit) |
|---|---|---|---|
| LLaMA 3 8B | 16 GB | 9 GB | 5 GB |
| LLaMA 3 70B | 140 GB | 75 GB | 40 GB |
| LLaMA 3.1 405B | 810 GB | 405 GB | ~220 GB |
* Add 1-2GB overhead for context window. Values are approximate.
Use our free VRAM Calculator to check if your GPU can run specific models.
Open VRAM Calculator