Head-to-head comparison tables across context window, pricing, capabilities, and speed.
The tables below aggregate key specs across the major AI model providers into a single reference. Because pricing and specifications change frequently, treat these as a starting point — always verify current rates at each provider's pricing page before building products or making purchasing decisions.
Pricing pages to bookmark: - anthropic.com/pricing - openai.com/pricing - ai.google.dev/pricing (AI Studio) / cloud.google.com/vertex-ai/pricing (Vertex AI) - console.x.ai (xAI) - docs.perplexity.ai/docs/pricing (Perplexity) - azure.microsoft.com/pricing (Microsoft Azure)
Meta's Llama models are free to download and self-host; API providers charge their own rates.
---
| Model | Provider | Context Window | |---|---|---| | Gemini 1.5 Pro | Google | 2,000,000 tokens | | Gemini 1.5 Flash | Google | 1,000,000 tokens | | Gemini 2.0 Flash | Google | 1,000,000 tokens | | Claude 3 Haiku | Anthropic | 200,000 tokens | | Claude 3 Sonnet | Anthropic | 200,000 tokens | | Claude 3 Opus | Anthropic | 200,000 tokens | | Claude 3.5 Sonnet | Anthropic | 200,000 tokens | | Claude 3.5 Haiku | Anthropic | 200,000 tokens | | Claude 3.7 Sonnet | Anthropic | 200,000 tokens | | o1 | OpenAI | 200,000 tokens | | o3-mini | OpenAI | 200,000 tokens | | o3 | OpenAI | 200,000 tokens | | Llama 3.1 (all sizes) | Meta | 128,000 tokens | | Llama 3.2 (all sizes) | Meta | 128,000 tokens | | Llama 3.3 70B | Meta | 128,000 tokens | | GPT-4o | OpenAI | 128,000 tokens | | GPT-4o mini | OpenAI | 128,000 tokens | | GPT-4 Turbo | OpenAI | 128,000 tokens | | Grok 2 | xAI | 131,072 tokens | | Grok 3 | xAI | 131,072 tokens | | Phi-4 | Microsoft | 16,384 tokens |
Key takeaway: Google's Gemini models have the largest context windows by a significant margin. The 2M token window in Gemini 1.5 Pro can hold roughly 1,500 dense novel pages or very large codebases.
---
| Model | Provider | Input ($/MTok) | Output ($/MTok) | |---|---|---|---| | Phi-4 | Microsoft | $0.07 | $0.14 | | GPT-4o mini | OpenAI | $0.15 | $0.60 | | Gemini 1.5 Flash | Google | $0.075–$0.15 | $0.30–$0.60 | | Gemini 2.0 Flash | Google | $0.10 | $0.40 | | Claude 3 Haiku | Anthropic | $0.25 | $1.25 | | Claude 3.5 Haiku | Anthropic | $0.80 | $4.00 | | Sonar (Perplexity) | Perplexity | $1.00 | $1.00 | | Gemini 1.5 Pro | Google | $1.25–$2.50 | $5.00–$10.00 | | GPT-4o | OpenAI | $2.50 | $10.00 | | Grok 2 | xAI | $2.00 | $10.00 | | Claude 3.5 Sonnet | Anthropic | $3.00 | $15.00 | | Claude 3.7 Sonnet | Anthropic | $3.00 | $15.00 | | Grok 3 | xAI | $3.00 | $15.00 | | Sonar Pro (Perplexity) | Perplexity | $3.00 | $15.00 | | o3-mini | OpenAI | $1.10 | $4.40 | | o3 | OpenAI | $10.00 | $40.00 | | GPT-4 Turbo | OpenAI | $10.00 | $30.00 | | o1 | OpenAI | $15.00 | $60.00 | | Claude 3 Opus | Anthropic | $15.00 | $75.00 |
Note: Gemini 1.5 Flash/Pro have tiered pricing based on prompt length (under/over 128K tokens). Llama models not listed — API pricing varies by provider (Groq, Together AI, etc.) and self-hosting is free.
Key takeaway: The cheapest models (Phi-4, GPT-4o mini, Gemini Flash) cost 100-200x less per token than the most expensive (Claude Opus, o1). For high-volume use cases, model selection has enormous cost implications.
---
| Model | Vision | Tools/Functions | Voice | Image Gen | Web Search | Open Weights | |---|---|---|---|---|---|---| | Claude 3.5 Sonnet | Yes | Yes | No | No | No | No | | Claude 3.7 Sonnet | Yes | Yes | No | No | No | No | | Claude 3.5 Haiku | Yes | Yes | No | No | No | No | | GPT-4o | Yes | Yes | Yes (Realtime API) | No | Yes (with tool) | No | | GPT-4o mini | Yes | Yes | No | No | No | No | | o1 | Yes | Yes | No | No | No | No | | o3 | Yes | Yes | No | No | No | No | | Gemini 1.5 Pro | Yes | Yes | Yes | No | Yes | No | | Gemini 2.0 Flash | Yes | Yes | Yes | Yes (native) | Yes | No | | Grok 2 | Yes | Yes | No | No | Yes (X data) | No | | Grok 3 | Yes | Yes | No | No | Yes (X + web) | No | | Perplexity Sonar | No | Limited | No | No | Yes (default) | No | | Llama 3.1 (all) | No | Yes | No | No | No | Yes | | Llama 3.2 Vision | Yes | Yes | No | No | No | Yes | | Phi-4 | No | Yes | No | No | No | Yes |
Voice = native audio I/O capability. Image Gen = model can produce images. Web Search = built-in or natively integrated retrieval.
---
| Model | Best For | |---|---| | Claude 3.7 Sonnet | Complex reasoning, coding, extended thinking tasks | | Claude 3.5 Sonnet | General flagship use, writing, agentic tasks, coding | | Claude 3.5 Haiku | Cost-efficient production, smart fast responses | | Claude 3 Haiku | Highest-volume cheapest Claude tasks | | GPT-4o | General flagship, voice apps, multimodal, tool use | | GPT-4o mini | High-volume cheap tasks needing GPT-4 class quality | | o1 / o3 | Deep reasoning, math, hard coding problems | | o3-mini | Affordable reasoning tasks | | Gemini 1.5 Pro | Very long documents, massive context window use cases | | Gemini 1.5 Flash | Long-context tasks at low cost | | Gemini 2.0 Flash | Agentic tasks, real-time apps, native image output | | Grok 3 | Real-time social/news data, less restricted content | | Perplexity Sonar | Factual Q&A with citations, research | | Llama 3.1 8B | On-device, privacy-sensitive, local deployment | | Llama 3.1 70B / 3.3 70B | Self-hosted production, competitive open-weight quality | | Llama 3.2 Vision | Open-weight vision tasks, local multimodal apps | | Phi-4 | Edge deployment, extreme cost efficiency, on-device |
---
This table is qualitative — it represents general market positioning, not precise benchmark scores.
| Tier | Models | Trade-off | |---|---|---| | Fast + Cheap | GPT-4o mini, Gemini 1.5 Flash, Claude 3 Haiku, Claude 3.5 Haiku, Phi-4 | Lower cost and latency; good for well-defined, high-volume tasks | | Balanced | GPT-4o, Claude 3.5 Sonnet, Gemini 2.0 Flash, Grok 2, Llama 3.3 70B | Strong capability at reasonable cost; recommended default for most tasks | | Powerful + Slower | Claude 3.7 Sonnet (extended thinking), o1, o3, Gemini 1.5 Pro | Higher accuracy on hard tasks; higher cost and latency; use when quality matters most |
---
Benchmarks vs. real-world performance: Model rankings on public benchmarks do not always translate to your specific use case. The best way to evaluate is to test the models that seem most relevant on your actual tasks.
Prices change: The AI market is highly competitive and prices have generally decreased over time. The figures in this article reflect early 2025 rates.
New models release frequently: All major providers release new models multiple times per year. This article reflects the model landscape as of early 2025 — there will be newer models by the time you read this.
Context window ≠ context quality: Having a large context window does not guarantee the model uses that context well. Gemini 1.5 Pro's 2M token window is real, but performance on tasks requiring reasoning across that full context varies.
Verify current information: - anthropic.com/pricing - openai.com/pricing - ai.google.dev/pricing - console.x.ai - docs.perplexity.ai/docs/pricing
Have a follow-up question about this topic?
Ask AI