How AI APIs Work

What an API call to an AI model actually looks like, end to end.

The Full Picture Before You Write a Line

Most AI API tutorials show you a working curl example and move on. If you've already used HTTP APIs before, that's fine. If you haven't, or if you want to understand why things are structured the way they are, this is the end-to-end picture.

What a REST API Is

A REST API is a way to communicate with a remote server over HTTP using standard request/response cycles. You send a request; the server processes it and sends back a response. In AI APIs, you send a conversation history and parameters; the server runs inference on a model and returns the generated text.

Everything travels as JSON (JavaScript Object Notation) — a text format for structured data. You serialize your request to JSON, send it over HTTPS, get JSON back, and deserialize it into objects in your language.

HTTPS is mandatory for AI APIs. Never send API keys over plain HTTP.

Authentication: API Keys

Every major AI API authenticates via API keys — long random strings that identify who you are and what you're allowed to do. They're passed as HTTP headers, not in the URL.

``` # OpenAI Authorization: Bearer sk-proj-abc123...

# Anthropic x-api-key: sk-ant-abc123... anthropic-version: 2023-06-01 ```

Critical rule: never expose API keys client-side. In a browser, anyone can open DevTools and read the request headers. Keys belong on your server, in environment variables, never in your frontend code, never in git repositories.

If you accidentally commit an API key, rotate it immediately. GitHub scans public repos for exposed keys and so do bad actors.

The Request Structure

An AI API request is an HTTP POST to the completions or messages endpoint. The body is a JSON object containing:

Model: Which model to use (gpt-4o, claude-opus-4-5, gemini-2.0-flash)
Messages: The conversation history — a list of objects, each with a role (user, assistant) and content
System prompt: Instructions that shape the model's behavior (placement varies by provider)
Parameters: Temperature, max_tokens, etc.

```python # Minimal working example — OpenAI from openai import OpenAI

client = OpenAI() # reads OPENAI_API_KEY from environment

response = client.chat.completions.create( model="gpt-4o", max_completion_tokens=1024, messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the difference between TCP and UDP?"} ] ) print(response.choices[0].message.content) ```

```python # Minimal working example — Anthropic import anthropic

client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from environment

response = client.messages.create( model="claude-opus-4-5", max_tokens=1024, system="You are a helpful assistant.", messages=[ {"role": "user", "content": "What is the difference between TCP and UDP?"} ] ) print(response.content[0].text) ```

Both make an HTTPS POST, both return JSON. The SDKs handle serialization, headers, retries, and response parsing for you.

The Response Structure

The response object contains:

The generated text: Where to find it varies by provider (OpenAI: choices[0].message.content, Anthropic: content[0].text)
Usage/token counts: How many tokens were in the input and output — what you're billed for
Finish reason: Why generation stopped — stop (natural end), max_tokens (hit the limit), tool_use (model called a tool)
Model: Which model actually generated the response (can differ from what you requested if model aliases are used)

```python # OpenAI response fields print(response.choices[0].message.content) # The text print(response.usage.prompt_tokens) # Input tokens print(response.usage.completion_tokens) # Output tokens print(response.choices[0].finish_reason) # "stop", "length", "tool_calls" print(response.model) # Actual model used

# Anthropic response fields print(response.content[0].text) # The text print(response.usage.input_tokens) # Input tokens print(response.usage.output_tokens) # Output tokens print(response.stop_reason) # "end_turn", "max_tokens", "tool_use" print(response.model) # Actual model used ```

Always check finish_reason / stop_reason. If it's max_tokens or length, the response was cut off — either increase max_tokens or handle truncation gracefully.

The Messages Array: How Conversation History Works

The model has no memory between API calls. You pass the entire conversation history on every call. This is what makes multi-turn conversations work — you append the assistant's previous response and the user's next message, then send the whole thing again.

python messages = [ {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris."}, {"role": "user", "content": "And what's its population?"}, ]

This statefulness-on-the-client pattern is why context window size matters: long conversations eventually overflow the context limit, and you need a strategy (truncation, summarization, sliding window) when they do.

What the SDK Does for You

The Python SDKs for both OpenAI and Anthropic handle: - Setting the right headers (auth, content-type, api-version) - Serializing your Python objects to JSON - Making the HTTP request - Deserializing the JSON response into typed Python objects - Automatic retries on transient errors (configurable) - Streaming support

You can always drop to raw HTTP if you need to. The SDK is just convenience.

```python # Raw HTTP with urllib (no SDK) — works for any AI API import urllib.request import json

url = "https://api.anthropic.com/v1/messages" payload = { "model": "claude-opus-4-5", "max_tokens": 1024, "messages": [{"role": "user", "content": "Hello"}] } headers = { "x-api-key": "YOUR_KEY", "anthropic-version": "2023-06-01", "content-type": "application/json" } req = urllib.request.Request(url, json.dumps(payload).encode(), headers, method="POST") with urllib.request.urlopen(req) as resp: result = json.loads(resp.read()) print(result["content"][0]["text"]) ```

How All Major APIs Follow Similar Conventions

The OpenAI API format has become an informal industry standard. Most providers — Anthropic, Google, Mistral, Together, Groq — follow similar patterns: POST to a messages/completions endpoint, JSON body with a messages array, JSON response with token counts. The differences are mostly in headers, field names, and where the system prompt lives.

This means skills transfer. Once you understand the pattern for one provider, picking up another takes an hour, not a day.

Have a follow-up question about this topic?

Ask AI

Anthropic API Deep Dive