AI Cost Breakdown

What AI actually costs to run — API pricing, token math, and realistic monthly estimates.

How AI Pricing Works

Most AI APIs charge by token — the unit that models use to process text. A token is roughly 3-4 characters, or about 0.75 words. "The quick brown fox" is approximately 5 tokens.

Pricing is expressed as cost per million tokens, split into two rates: - Input tokens: the text you send to the model (your prompt, context, instructions) - Output tokens: the text the model generates in response

Output tokens are always more expensive than input tokens. Generation is computationally heavier than processing input.

To make this concrete: a typical short message or question is around 50-200 tokens. A detailed prompt with context might be 1,000-5,000 tokens. A long document or code file could be 10,000-100,000 tokens.

Current Pricing by Provider

Prices as of early 2025. These change frequently — always check provider documentation for current rates.

Anthropic (Claude)

| Model | Input (per MTok) | Output (per MTok) | |---|---|---| | Claude 3.5 Haiku | $0.80 | $4.00 | | Claude 3.5 Sonnet | $3.00 | $15.00 | | Claude 3.7 Sonnet | $3.00 | $15.00 |

Claude 3.5 Haiku is Anthropic's fast, cost-efficient model. The Sonnet models offer stronger reasoning and are preferred for complex tasks.

OpenAI (GPT)

| Model | Input (per MTok) | Output (per MTok) | |---|---|---| | GPT-4o mini | $0.15 | $0.60 | | GPT-4o | $2.50 | $10.00 | | o1 | $15.00 | $60.00 | | o3 | $10.00 | $40.00 |

GPT-4o mini is notable for its very low cost. The o1/o3 models are reasoning-optimized and significantly more expensive — they "think" before responding, generating internal reasoning tokens that add to the cost.

Google (Gemini)

| Model | Input (per MTok) | Output (per MTok) | |---|---|---| | Gemini 1.5 Flash | $0.075 | $0.30 | | Gemini 1.5 Pro | $1.25 | $5.00 |

Gemini 1.5 Flash is one of the cheapest capable models on the market. Gemini 1.5 Pro offers a very large context window (up to 2 million tokens), which matters for processing long documents.

Meta (Llama)

Llama models are free to use under Meta's open-source license. However, running them requires infrastructure: - Self-hosted: you pay for compute (GPU cloud instances), not per token. Cost depends on hardware and utilization. - Third-party APIs: providers like Together AI, Groq, and Fireworks offer Llama via API at rates typically $0.10-$0.80 per MTok, often cheaper than closed-model alternatives.

Practical Cost Examples

10,000 Customer Support Conversations per Month

Assumptions: average 800 input tokens per conversation, 400 output tokens per response. - Total: 8M input tokens + 4M output tokens

| Model | Monthly Cost | |---|---| | GPT-4o mini | $1.20 + $2.40 = $3.60 | | Claude 3.5 Haiku | $6.40 + $16.00 = $22.40 | | GPT-4o | $20.00 + $40.00 = $60.00 | | Claude 3.5 Sonnet | $24.00 + $60.00 = $84.00 |

For high-volume, cost-sensitive use cases, model selection has a dramatic impact. GPT-4o mini at $3.60/month versus Claude Sonnet at $84/month for the same workload — a 23x difference.

Document Summarization (100 long documents per day)

Assumptions: 50,000 input tokens per document, 2,000 output tokens per summary. Monthly: 150M input tokens, 6M output tokens.

| Model | Monthly Cost | |---|---| | Gemini 1.5 Flash | $11.25 + $1.80 = $13.05 | | GPT-4o | $375 + $60 = $435 | | Claude 3.5 Sonnet | $450 + $90 = $540 |

For large-context document processing, Gemini 1.5 Flash's pricing makes it extremely competitive.

Embeddings

Embeddings are dense vector representations of text, used for semantic search, retrieval-augmented generation (RAG), and similarity matching. They're priced separately from generation.

OpenAI text-embedding-3-small: $0.02 per MTok — very cheap
OpenAI text-embedding-3-large: $0.13 per MTok
Google text-embedding-004: $0.00 (free up to quota limits)

Embedding costs are usually negligible compared to generation costs unless you're embedding very large document collections frequently.

Historical Trend: Costs Are Falling

AI API costs have dropped dramatically and consistently since GPT-3's release. GPT-4 launched at roughly $60 per MTok for output. GPT-4o runs at $10. GPT-4o mini at $0.60. Models that cost $15/MTok for output today will likely cost $1-3/MTok within 18-24 months based on historical trajectory.

This has practical implications for budgeting: your current cost estimates are likely conservative for a 2-3 year horizon. Build your economics around today's prices, but expect significant cost improvement over time.

Building a Cost Model

For any AI-powered feature, estimate: 1. Average tokens per interaction (input + output separately) 2. Volume (interactions per month) 3. Model (cheaper models for high-volume/simple tasks, better models where quality matters) 4. Total monthly cost = (input tokens × input rate) + (output tokens × output rate)

Start with a cheap model and upgrade only where quality falls short. For most business use cases, GPT-4o mini or Claude Haiku handles the majority of work adequately — reserve the expensive models for tasks where the quality difference is demonstrable.

Have a follow-up question about this topic?

Ask AI

Choosing a Provider for Your Product