Can TPU v6e run Llama 3.2 1B?

Question

Accepted Answer

Yes, the TPU v6e has 256GB VRAM and Llama 3.2 1B requires approximately 0.6GB VRAM for Q4 quantization. Estimated inference speed: 6750 tokens/second.

✅ — TPU v6e Can Run Llama 3.2 1B

Hardware Specs

Model Requirements

Estimated Performance