OpenAI API Deep Dive

Chat completions, assistants, streaming, function calling, embeddings — full reference.

Authentication

OpenAI uses Bearer token authentication in the Authorization header:

Authorization: Bearer sk-proj-abc123...

The SDK picks this up from the OPENAI_API_KEY environment variable automatically.

```python from openai import OpenAI import os

client = OpenAI() # reads OPENAI_API_KEY from env # or client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY")) ```

Chat Completions — The Core Endpoint

POST /v1/chat/completions is what you'll use for 90% of tasks.

```python response = client.chat.completions.create( model="gpt-4o", max_completion_tokens=1024, temperature=0.7, messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain dependency injection."}, ] )

print(response.choices[0].message.content) print(response.usage.prompt_tokens) print(response.usage.completion_tokens) print(response.choices[0].finish_reason) # "stop", "length", "tool_calls" ```

Note: the system prompt goes inside the messages array with role: "system" — this is the opposite of Anthropic's convention.

Function Calling / Tool Use

OpenAI calls this "function calling" in older documentation and "tool use" in newer versions. The concepts are the same; the format differs from Anthropic's.

```python tools = [ { "type": "function", "function": { "name": "search_database", "description": "Search the product database for items matching a query", "parameters": { "type": "object", "properties": { "query": {"type": "string", "description": "Search query"}, "limit": {"type": "integer", "description": "Max results", "default": 10} }, "required": ["query"] } } } ]

response = client.chat.completions.create( model="gpt-4o", tools=tools, tool_choice="auto", # "auto", "none", or {"type": "function", "function": {"name": "..."}} messages=[{"role": "user", "content": "Find me some wireless keyboards under $100"}] )

if response.choices[0].finish_reason == "tool_calls": tool_call = response.choices[0].message.tool_calls[0] function_name = tool_call.function.name function_args = json.loads(tool_call.function.arguments)

# Execute function result = search_database(**function_args)

# Append the assistant message with the tool_calls, then add result messages = [ {"role": "user", "content": "Find me some wireless keyboards under $100"}, response.choices[0].message, # The assistant's message with tool_calls { "role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(result) } ]

final_response = client.chat.completions.create( model="gpt-4o", tools=tools, messages=messages ) ```

Parallel tool calling: GPT-4o supports requesting multiple tools in one response. Check response.choices[0].message.tool_calls for a list, not just one item.

Vision

Pass images via the image_url content type in the messages array:

python response = client.chat.completions.create( model="gpt-4o", messages=[{ "role": "user", "content": [ {"type": "text", "text": "Describe what you see in this image."}, { "type": "image_url", "image_url": { "url": "https://example.com/image.jpg", "detail": "high" # "low", "high", or "auto" } } ] }] )

For base64: ``python {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_string}"}}

The detail parameter affects cost and quality. low crops to 512×512 (85 tokens). high tiles the image for more detail (significantly more tokens).

Streaming

```python stream = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Count from 1 to 10."}], stream=True )

for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) ```

Chunks have choices[0].delta.content for text and choices[0].delta.tool_calls for tool call chunks (which you assemble incrementally). choices[0].finish_reason is non-null only on the last chunk.

Structured Output / JSON Mode

```python # JSON mode — model is instructed to output valid JSON # You still need to specify the structure in your prompt response = client.chat.completions.create( model="gpt-4o", response_format={"type": "json_object"}, messages=[{ "role": "user", "content": "Extract name and age from: 'Alice is 30 years old'. Return as JSON." }] )

# Structured output with schema (gpt-4o and newer) from pydantic import BaseModel

class UserInfo(BaseModel): name: str age: int

response = client.beta.chat.completions.parse( model="gpt-4o", messages=[{"role": "user", "content": "Extract: 'Alice is 30 years old'"}], response_format=UserInfo ) print(response.choices[0].message.parsed) # UserInfo(name='Alice', age=30) ```

Embeddings

python response = client.embeddings.create( model="text-embedding-3-small", # or "text-embedding-3-large" input="The quick brown fox" ) vector = response.data[0].embedding # list of floats, 1536 dimensions print(response.usage.total_tokens)

Batch multiple inputs for efficiency: input=["text1", "text2", "text3"]

Token Counting with tiktoken

```python import tiktoken

enc = tiktoken.encoding_for_model("gpt-4o") tokens = enc.encode("Hello, how are you?") print(len(tokens)) # 6

# Count tokens in a messages array (approximate) def count_message_tokens(messages, model="gpt-4o"): enc = tiktoken.encoding_for_model(model) num_tokens = 0 for message in messages: num_tokens += 4 # message overhead for key, value in message.items(): num_tokens += len(enc.encode(str(value))) return num_tokens ```

Model Names

| Model | Best for | Context | |---|---|---| | gpt-4o | General purpose, vision, tools | 128k | | gpt-4o-mini | Fast, cheap, most tasks | 128k | | o3-mini | Math and coding reasoning | 200k | | o1 | Complex reasoning, slower | 200k |

The Assistants API — When to Use It and When Not To

The Assistants API adds persistent threads, built-in file retrieval, and code interpreter. The overhead: asynchronous runs, polling for completion, more complex state management.

Use it when: You need built-in file search, code execution in a sandbox, or persistent thread storage managed by OpenAI.

Don't use it when: You just want chat completions with conversation history. Managing your own messages array is simpler, more debuggable, and just as capable. Most production apps are better off with chat completions + their own database for conversation history.

Have a follow-up question about this topic?

Ask AI

← Previous

Anthropic API Deep Dive

API Comparison: Anthropic vs OpenAI vs Google