Compare AI model performance across MMLU, HumanEval, and other benchmarks. Sort and filter to find top-performing models.
Model | Size | MMLU | HumanEval | VRAM (Q4) | License | |
|---|---|---|---|---|---|---|
o1 OpenAI | 200B | 92.3% | 94.0% | 110GB | Proprietary | |
Qwen3 235B-A22B Alibaba | 235B | 92.3% | 88.0% | 129.3GB | Open | Run it |
DeepSeek R1 (HF) DeepSeek | 671B | 92.0% | 95.0% | 369.1GB | Open | Run it |
DeepSeek V3 (HF) DeepSeek | 671B | 92.0% | 95.0% | 369.1GB | Open | Run it |
DeepSeek Coder V2 Instruct (HF) DeepSeek | 236B | 92.0% | 95.0% | 129.8GB | Open | Run it |
Grok 3 xAI | 314B | 91.0% | 90.0% | 172.7GB | Proprietary | |
DeepSeek R1 DeepSeek | 671B | 90.8% | 97.3% | 369.1GB | Open | Run it |
Gemini 2.0 Flash Google | 30B | 89.0% | 85.0% | 16.5GB | Proprietary | |
Claude 3.5 Sonnet Anthropic | 175B | 88.7% | 92.0% | 96.3GB | Proprietary | |
GPT-4o OpenAI | 200B | 88.7% | 90.2% | 110GB | Proprietary | |
Llama 3.1 405B Meta | 405B | 88.6% | 89.0% | 222.8GB | Open | Run it |
DeepSeek V3 DeepSeek | 671B | 88.5% | 91.0% | 369.1GB | Open | Run it |
Claude Opus 4.5 Anthropic | 200B | 88.3% | 92.7% | 110GB | Proprietary | |
Claude Sonnet 4.5 Anthropic | 120B | 88.0% | 92.5% | 66GB | Proprietary | |
Claude Opus 4.1 Anthropic | 200B | 88.0% | 92.0% | 110GB | Proprietary | |
Claude Opus 4 Anthropic | 200B | 87.7% | 91.2% | 110GB | Proprietary | |
Grok 2 xAI | 314B | 87.5% | 88.0% | 172.7GB | Proprietary | |
Grok 2 xAI | 314B | 87.5% | 82.0% | 173GB | Proprietary | |
Claude Sonnet 4 Anthropic | 100B | 87.2% | 90.8% | 55GB | Proprietary | |
Claude 3.5 Sonnet Anthropic | 100B | 87.0% | 80.0% | 55GB | Proprietary | |
Claude 3 Opus Anthropic | 175B | 87.0% | 80.0% | 96.3GB | Proprietary | |
Claude 3 Sonnet Anthropic | 100B | 87.0% | 80.0% | 55GB | Proprietary | |
o1 Openai | 175B | 87.0% | 90.0% | 96.3GB | Proprietary | |
o1-mini Openai | 175B | 87.0% | 90.0% | 96.3GB | Proprietary | |
o1-pro Openai | 175B | 87.0% | 90.0% | 96.3GB | Proprietary | |
o3 Openai | 175B | 87.0% | 90.0% | 96.3GB | Proprietary | |
o3-mini Openai | 175B | 87.0% | 90.0% | 96.3GB | Proprietary | |
Gemini 1.5 Pro Google | 175B | 87.0% | 80.0% | 96.3GB | Proprietary | |
Gemini 1.0 Pro Google | 175B | 87.0% | 80.0% | 96.3GB | Proprietary | |
Gemini 1.0 Ultra Google | 300B | 87.0% | 80.0% | 165GB | Proprietary | |
Claude 3 Opus Anthropic | 200B | 86.8% | 84.9% | 110GB | Proprietary | |
Claude 3.7 Sonnet Anthropic | 100B | 86.8% | 89.7% | 55GB | Proprietary | |
DeepSeek R1 Distill Llama 70B DeepSeek | 70B | 86.5% | 85.0% | 38.5GB | Open | Run it |
GPT-4 Turbo OpenAI | 175B | 86.4% | 87.1% | 96.3GB | Proprietary | |
Qwen 2.5 72B Alibaba | 72B | 86.1% | 86.6% | 39.6GB | Open | Run it |
Llama 3.1 70B Meta | 70B | 86.0% | 80.5% | 38.5GB | Open | Run it |
Llama 3.2 90B Vision Meta | 90B | 86.0% | - | 49.5GB | Open | Run it |
Llama 3.3 70B Meta | 70B | 86.0% | 88.4% | 38.5GB | Open | Run it |
Llama 3.3 70B Instruct Meta | 70B | 86.0% | 88.4% | 42GB | Open | Run it |
Gemini 1.5 Pro Google | 175B | 85.9% | 84.1% | 96.3GB | Proprietary | |
Claude Haiku 4.5 Anthropic | 40B | 85.9% | 88.3% | 22GB | Proprietary | |
o1-mini OpenAI | 30B | 85.2% | 92.0% | 16.5GB | Proprietary | |
Nemotron 70B NVIDIA | 70B | 85.0% | 73.0% | 38.5GB | Open | Run it |
Phi-4 Microsoft | 14B | 84.8% | 82.6% | 9GB | Open | Run it |
Mistral Large 2 Mistral AI | 123B | 84.0% | 92.0% | 67.7GB | Proprietary | |
Qwen3 32B Alibaba | 32B | 84.0% | 82.0% | 17.6GB | Open | Run it |
Mistral Large (API) Mistral AI | 123B | 84.0% | 78.0% | 67.7GB | Proprietary | |
Reka Core Reka AI | 67B | 83.2% | 76.4% | 37GB | Proprietary | |
Qwen 2.5 32B Alibaba | 32B | 83.0% | 79.0% | 17.6GB | Open | Run it |
DeepSeek R1 Distill Qwen 32B DeepSeek | 32B | 82.5% | 80.0% | 17.6GB | Open | Run it |