Authentication, endpoints, messages API, streaming, tool use, and vision — full reference.
Every request to the Anthropic API requires two headers:
x-api-key: sk-ant-api03-...
anthropic-version: 2023-06-01
The anthropic-version header is required and tells the API which version of the protocol you're using. Use 2023-06-01 — it's the stable version. Set your API key from an environment variable, never hardcode it.
```python import anthropic import os
# SDK reads ANTHROPIC_API_KEY automatically client = anthropic.Anthropic()
# Or set explicitly client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY")) ```
The core endpoint is POST /v1/messages. The request body:
python
response = client.messages.create(
model="claude-opus-4-5", # Required: model name
max_tokens=1024, # Required: output token ceiling
system="You are a helpful assistant.", # Optional: system prompt (TOP-LEVEL, not in messages)
messages=[ # Required: conversation history
{"role": "user", "content": "Explain how async/await works in Python."}
],
temperature=0.7, # Optional
top_p=1.0, # Optional
top_k=0, # Optional
stop_sequences=["###"], # Optional: stop generation at these strings
)
This is the most common mistake developers make coming from OpenAI: the system prompt is a top-level parameter, not a message in the array.
```python # CORRECT — Anthropic client.messages.create( model="claude-opus-4-5", max_tokens=1024, system="You are a precise JSON-output assistant.", # Top-level messages=[{"role": "user", "content": "..."}] )
# WRONG — don't put system in the messages array like OpenAI messages=[ {"role": "system", "content": "..."}, # This will cause an error {"role": "user", "content": "..."} ] ```
Anthropic uses a content blocks format rather than a plain string for message content. The user content field can be a string (shorthand) or a list of blocks:
```python # String shorthand — works fine for text-only messages=[{"role": "user", "content": "What is 2+2?"}]
# Full content blocks format — required for multi-modal messages=[{ "role": "user", "content": [ {"type": "text", "text": "What is in this image?"}, { "type": "image", "source": { "type": "base64", "media_type": "image/jpeg", "data": base64_encoded_string, } } ] }] ```
For URL-referenced images:
``python
{
"type": "image",
"source": {
"type": "url",
"url": "https://example.com/image.jpg"
}
}
Tool use allows the model to request that your code execute a function and return the result. The flow: send tool definitions → model returns a tool_use block → you execute the function → send result back → model produces final response.
```python tools = [ { "name": "get_weather", "description": "Get current weather for a city", "input_schema": { "type": "object", "properties": { "city": {"type": "string", "description": "City name"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]} }, "required": ["city"] } } ]
# First call — model may request tool use response = client.messages.create( model="claude-opus-4-5", max_tokens=1024, tools=tools, messages=[{"role": "user", "content": "What's the weather in Tokyo?"}] )
if response.stop_reason == "tool_use": tool_block = next(b for b in response.content if b.type == "tool_use") tool_name = tool_block.name tool_input = tool_block.input
# Execute your actual function result = get_weather(tool_input["city"], tool_input.get("unit", "celsius"))
# Send result back response = client.messages.create( model="claude-opus-4-5", max_tokens=1024, tools=tools, messages=[ {"role": "user", "content": "What's the weather in Tokyo?"}, {"role": "assistant", "content": response.content}, # Include model's tool_use block { "role": "user", "content": [{ "type": "tool_result", "tool_use_id": tool_block.id, "content": json.dumps(result) }] } ] ) ```
Use tool_choice={"type": "auto"} (default) to let the model decide, {"type": "any"} to force tool use, or {"type": "tool", "name": "specific_tool"} to force a specific tool.
```python with client.messages.stream( model="claude-opus-4-5", max_tokens=1024, messages=[{"role": "user", "content": "Write a haiku about APIs."}] ) as stream: for text in stream.text_stream: print(text, end="", flush=True)
# Or with full event handling with client.messages.stream(...) as stream: for event in stream: if event.type == "content_block_delta": print(event.delta.text, end="", flush=True) elif event.type == "message_stop": print("\nDone") ```
Raw SSE events include: message_start, content_block_start, content_block_delta, content_block_stop, message_delta, message_stop. The delta events carry the actual text chunks.
python
# Count tokens before sending — useful for cost estimation and avoiding overflow
token_count = client.messages.count_tokens(
model="claude-opus-4-5",
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "This is my prompt."}]
)
print(token_count.input_tokens)
| Model | Best for | Context |
|---|---|---|
| claude-opus-4-5 | Highest capability, complex tasks | 200k |
| claude-sonnet-4-5 | Balanced performance/cost | 200k |
| claude-haiku-3-5 | Fast, cheap, lightweight tasks | 200k |
Check the Anthropic docs for the latest model names — they follow a consistent naming convention of claude-[series]-[version].
Rate limits vary by tier. The default free tier is very limited. Paid tiers start at $5 spend. Limits are in: - Requests per minute (RPM) - Tokens per minute (TPM) - Tokens per day (TPD)
When you hit a rate limit, you get a 429 response. The retry-after header tells you how long to wait. Use exponential backoff — see the error handling article.
The Python SDK is the right choice for most uses — it handles auth headers, retries, type safety, and streaming abstractions. Raw HTTP is appropriate if you're using a language without an official SDK, building a thin proxy, or debugging at the protocol level.
Have a follow-up question about this topic?
Ask AI