Anthropic API Deep Dive

Authentication, endpoints, messages API, streaming, tool use, and vision — full reference.

Authentication

Every request to the Anthropic API requires two headers:

x-api-key: sk-ant-api03-... anthropic-version: 2023-06-01

The anthropic-version header is required and tells the API which version of the protocol you're using. Use 2023-06-01 — it's the stable version. Set your API key from an environment variable, never hardcode it.

```python import anthropic import os

# SDK reads ANTHROPIC_API_KEY automatically client = anthropic.Anthropic()

# Or set explicitly client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY")) ```

The Messages API

The core endpoint is POST /v1/messages. The request body:

python response = client.messages.create( model="claude-opus-4-5", # Required: model name max_tokens=1024, # Required: output token ceiling system="You are a helpful assistant.", # Optional: system prompt (TOP-LEVEL, not in messages) messages=[ # Required: conversation history {"role": "user", "content": "Explain how async/await works in Python."} ], temperature=0.7, # Optional top_p=1.0, # Optional top_k=0, # Optional stop_sequences=["###"], # Optional: stop generation at these strings )

The System Prompt: Anthropic's Key Difference

This is the most common mistake developers make coming from OpenAI: the system prompt is a top-level parameter, not a message in the array.

```python # CORRECT — Anthropic client.messages.create( model="claude-opus-4-5", max_tokens=1024, system="You are a precise JSON-output assistant.", # Top-level messages=[{"role": "user", "content": "..."}] )

# WRONG — don't put system in the messages array like OpenAI messages=[ {"role": "system", "content": "..."}, # This will cause an error {"role": "user", "content": "..."} ] ```

Content Blocks

Anthropic uses a content blocks format rather than a plain string for message content. The user content field can be a string (shorthand) or a list of blocks:

```python # String shorthand — works fine for text-only messages=[{"role": "user", "content": "What is 2+2?"}]

# Full content blocks format — required for multi-modal messages=[{ "role": "user", "content": [ {"type": "text", "text": "What is in this image?"}, { "type": "image", "source": { "type": "base64", "media_type": "image/jpeg", "data": base64_encoded_string, } } ] }] ```

For URL-referenced images: ``python { "type": "image", "source": { "type": "url", "url": "https://example.com/image.jpg" } }

Tool Use

Tool use allows the model to request that your code execute a function and return the result. The flow: send tool definitions → model returns a tool_use block → you execute the function → send result back → model produces final response.

```python tools = [ { "name": "get_weather", "description": "Get current weather for a city", "input_schema": { "type": "object", "properties": { "city": {"type": "string", "description": "City name"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]} }, "required": ["city"] } } ]

# First call — model may request tool use response = client.messages.create( model="claude-opus-4-5", max_tokens=1024, tools=tools, messages=[{"role": "user", "content": "What's the weather in Tokyo?"}] )

if response.stop_reason == "tool_use": tool_block = next(b for b in response.content if b.type == "tool_use") tool_name = tool_block.name tool_input = tool_block.input

# Execute your actual function result = get_weather(tool_input["city"], tool_input.get("unit", "celsius"))

# Send result back response = client.messages.create( model="claude-opus-4-5", max_tokens=1024, tools=tools, messages=[ {"role": "user", "content": "What's the weather in Tokyo?"}, {"role": "assistant", "content": response.content}, # Include model's tool_use block { "role": "user", "content": [{ "type": "tool_result", "tool_use_id": tool_block.id, "content": json.dumps(result) }] } ] ) ```

Use tool_choice={"type": "auto"} (default) to let the model decide, {"type": "any"} to force tool use, or {"type": "tool", "name": "specific_tool"} to force a specific tool.

Streaming

```python with client.messages.stream( model="claude-opus-4-5", max_tokens=1024, messages=[{"role": "user", "content": "Write a haiku about APIs."}] ) as stream: for text in stream.text_stream: print(text, end="", flush=True)

# Or with full event handling with client.messages.stream(...) as stream: for event in stream: if event.type == "content_block_delta": print(event.delta.text, end="", flush=True) elif event.type == "message_stop": print("\nDone") ```

Raw SSE events include: message_start, content_block_start, content_block_delta, content_block_stop, message_delta, message_stop. The delta events carry the actual text chunks.

Token Counting

python # Count tokens before sending — useful for cost estimation and avoiding overflow token_count = client.messages.count_tokens( model="claude-opus-4-5", system="You are a helpful assistant.", messages=[{"role": "user", "content": "This is my prompt."}] ) print(token_count.input_tokens)

Model Names

| Model | Best for | Context | |---|---|---| | claude-opus-4-5 | Highest capability, complex tasks | 200k | | claude-sonnet-4-5 | Balanced performance/cost | 200k | | claude-haiku-3-5 | Fast, cheap, lightweight tasks | 200k |

Check the Anthropic docs for the latest model names — they follow a consistent naming convention of claude-[series]-[version].

Rate Limits

Rate limits vary by tier. The default free tier is very limited. Paid tiers start at $5 spend. Limits are in: - Requests per minute (RPM) - Tokens per minute (TPM) - Tokens per day (TPD)

When you hit a rate limit, you get a 429 response. The retry-after header tells you how long to wait. Use exponential backoff — see the error handling article.

Python SDK vs Raw HTTP

The Python SDK is the right choice for most uses — it handles auth headers, retries, type safety, and streaming abstractions. Raw HTTP is appropriate if you're using a language without an official SDK, building a thin proxy, or debugging at the protocol level.

Have a follow-up question about this topic?

Ask AI

← Previous

How AI APIs Work

OpenAI API Deep Dive