LLM Pricing & Benchmarks
Compare costs, context windows, and benchmark scores across leading models.
Data last verified: 6/14/2026
Showing 17 of 17 models
| Model | Provider | Input / 1M | Output / 1M | Context | MMLU | Arena |
|---|---|---|---|---|---|---|
| DeepSeek-V3 | DeepSeek | $0.140 | $0.280 | 64K | 89.4 | 1318 |
| gpt-4o-mini | OpenAI | $0.150 | $0.600 | 128K | 82 | 1225 |
| Gemini 2.5 Flash | $0.150 | $0.600 | 1.0M | 86.8 | 1320 | |
| Llama 4 Maverick | Meta | $0.200 | $0.600 | 256K | 87.5 | 1330 |
| Grok 3 Mini | xAI | $0.300 | $0.500 | 131K | 85.4 | 1280 |
| DeepSeek-R1 | DeepSeek | $0.550 | $2.19 | 64K | 90.8 | 1354 |
| Claude 4 Haiku | Anthropic | $0.625 | $2.50 | 200K | 85.2 | 1245 |
| Qwen 3 235B A22B | OpenRouter | $0.800 | $1.60 | 128K | 86.5 | 1310 |
| Llama 3.3 70B | Together AI | $0.880 | $0.880 | 131K | 83.5 | 1285 |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1.0M | 91.7 | 1415 | |
| GPT-4.1 | OpenAI | $2.00 | $8.00 | 1.0M | 90.2 | 1372 |
| gpt-4o | OpenAI | $2.50 | $10.00 | 128K | 88.7 | 1287 |
| Claude 4 Sonnet | Anthropic | $3.00 | $15.00 | 200K | 90.4 | 1379 |
| Grok 3 | xAI | $3.00 | $15.00 | 131K | 88.9 | 1340 |
| Claude 3.7 Sonnet | Anthropic | $3.00 | $15.00 | 200K | 88.5 | 1350 |
| o3 | OpenAI | $10.00 | $40.00 | 200K | 92.9 | 1365 |
| Claude 4 Opus | Anthropic | $15.00 | $75.00 | 200K | 91.9 | 1398 |
DeepSeek-V3
Strong open-weight model at very low cost.
gpt-4o-mini
Fast, affordable small model for everyday tasks.
Gemini 2.5 Flash
Fast, cost-efficient Gemini with 1M context window.
Llama 4 Maverick
Open-weight multimodal model via API partners.
Grok 3 Mini
Fast, affordable Grok model for everyday tasks.
DeepSeek-R1
Reasoning model. Output is long due to chain-of-thought.
Claude 4 Haiku
Fast, cost-effective model for high-volume tasks.
Qwen 3 235B A22B
Mixture-of-experts model available through unified API.
Llama 3.3 70B
Open-weight model hosted on serverless inference platform.
Gemini 2.5 Pro
1M token context window. Strong coding and reasoning.
GPT-4.1
Long-context coding model with 1M token context window.
gpt-4o
Flagship multimodal model. Pricing per 1M tokens.
Claude 4 Sonnet
Latest Sonnet with extended thinking. Pricing per 1M tokens.
Grok 3
xAI flagship with real-time X data access.
Claude 3.7 Sonnet
Prior-generation Claude Sonnet with extended thinking mode.
o3
Reasoning model. Higher latency, best for complex STEM tasks.
Claude 4 Opus
Most capable Claude model for complex agentic workflows.
