What an API call to an AI model actually looks like, end to end.
Most AI API tutorials show you a working curl example and move on. If you've already used HTTP APIs before, that's fine. If you haven't, or if you want to understand why things are structured the way they are, this is the end-to-end picture.
A REST API is a way to communicate with a remote server over HTTP using standard request/response cycles. You send a request; the server processes it and sends back a response. In AI APIs, you send a conversation history and parameters; the server runs inference on a model and returns the generated text.
Everything travels as JSON (JavaScript Object Notation) — a text format for structured data. You serialize your request to JSON, send it over HTTPS, get JSON back, and deserialize it into objects in your language.
HTTPS is mandatory for AI APIs. Never send API keys over plain HTTP.
Every major AI API authenticates via API keys — long random strings that identify who you are and what you're allowed to do. They're passed as HTTP headers, not in the URL.
``` # OpenAI Authorization: Bearer sk-proj-abc123...
# Anthropic x-api-key: sk-ant-abc123... anthropic-version: 2023-06-01 ```
Critical rule: never expose API keys client-side. In a browser, anyone can open DevTools and read the request headers. Keys belong on your server, in environment variables, never in your frontend code, never in git repositories.
If you accidentally commit an API key, rotate it immediately. GitHub scans public repos for exposed keys and so do bad actors.
An AI API request is an HTTP POST to the completions or messages endpoint. The body is a JSON object containing:
gpt-4o, claude-opus-4-5, gemini-2.0-flash)user, assistant) and content```python # Minimal working example — OpenAI from openai import OpenAI
client = OpenAI() # reads OPENAI_API_KEY from environment
response = client.chat.completions.create( model="gpt-4o", max_completion_tokens=1024, messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the difference between TCP and UDP?"} ] ) print(response.choices[0].message.content) ```
```python # Minimal working example — Anthropic import anthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from environment
response = client.messages.create( model="claude-opus-4-5", max_tokens=1024, system="You are a helpful assistant.", messages=[ {"role": "user", "content": "What is the difference between TCP and UDP?"} ] ) print(response.content[0].text) ```
Both make an HTTPS POST, both return JSON. The SDKs handle serialization, headers, retries, and response parsing for you.
The response object contains:
choices[0].message.content, Anthropic: content[0].text)stop (natural end), max_tokens (hit the limit), tool_use (model called a tool)```python # OpenAI response fields print(response.choices[0].message.content) # The text print(response.usage.prompt_tokens) # Input tokens print(response.usage.completion_tokens) # Output tokens print(response.choices[0].finish_reason) # "stop", "length", "tool_calls" print(response.model) # Actual model used
# Anthropic response fields print(response.content[0].text) # The text print(response.usage.input_tokens) # Input tokens print(response.usage.output_tokens) # Output tokens print(response.stop_reason) # "end_turn", "max_tokens", "tool_use" print(response.model) # Actual model used ```
Always check finish_reason / stop_reason. If it's max_tokens or length, the response was cut off — either increase max_tokens or handle truncation gracefully.
The model has no memory between API calls. You pass the entire conversation history on every call. This is what makes multi-turn conversations work — you append the assistant's previous response and the user's next message, then send the whole thing again.
python
messages = [
{"role": "user", "content": "What's the capital of France?"},
{"role": "assistant", "content": "Paris."},
{"role": "user", "content": "And what's its population?"},
]
This statefulness-on-the-client pattern is why context window size matters: long conversations eventually overflow the context limit, and you need a strategy (truncation, summarization, sliding window) when they do.
The Python SDKs for both OpenAI and Anthropic handle: - Setting the right headers (auth, content-type, api-version) - Serializing your Python objects to JSON - Making the HTTP request - Deserializing the JSON response into typed Python objects - Automatic retries on transient errors (configurable) - Streaming support
You can always drop to raw HTTP if you need to. The SDK is just convenience.
```python # Raw HTTP with urllib (no SDK) — works for any AI API import urllib.request import json
url = "https://api.anthropic.com/v1/messages" payload = { "model": "claude-opus-4-5", "max_tokens": 1024, "messages": [{"role": "user", "content": "Hello"}] } headers = { "x-api-key": "YOUR_KEY", "anthropic-version": "2023-06-01", "content-type": "application/json" } req = urllib.request.Request(url, json.dumps(payload).encode(), headers, method="POST") with urllib.request.urlopen(req) as resp: result = json.loads(resp.read()) print(result["content"][0]["text"]) ```
The OpenAI API format has become an informal industry standard. Most providers — Anthropic, Google, Mistral, Together, Groq — follow similar patterns: POST to a messages/completions endpoint, JSON body with a messages array, JSON response with token counts. The differences are mostly in headers, field names, and where the system prompt lives.
This means skills transfer. Once you understand the pattern for one provider, picking up another takes an hour, not a day.
Have a follow-up question about this topic?
Ask AI