Chat completions, assistants, streaming, function calling, embeddings — full reference.
OpenAI uses Bearer token authentication in the Authorization header:
Authorization: Bearer sk-proj-abc123...
The SDK picks this up from the OPENAI_API_KEY environment variable automatically.
```python from openai import OpenAI import os
client = OpenAI() # reads OPENAI_API_KEY from env # or client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY")) ```
POST /v1/chat/completions is what you'll use for 90% of tasks.
```python response = client.chat.completions.create( model="gpt-4o", max_completion_tokens=1024, temperature=0.7, messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain dependency injection."}, ] )
print(response.choices[0].message.content) print(response.usage.prompt_tokens) print(response.usage.completion_tokens) print(response.choices[0].finish_reason) # "stop", "length", "tool_calls" ```
Note: the system prompt goes inside the messages array with role: "system" — this is the opposite of Anthropic's convention.
OpenAI calls this "function calling" in older documentation and "tool use" in newer versions. The concepts are the same; the format differs from Anthropic's.
```python tools = [ { "type": "function", "function": { "name": "search_database", "description": "Search the product database for items matching a query", "parameters": { "type": "object", "properties": { "query": {"type": "string", "description": "Search query"}, "limit": {"type": "integer", "description": "Max results", "default": 10} }, "required": ["query"] } } } ]
response = client.chat.completions.create( model="gpt-4o", tools=tools, tool_choice="auto", # "auto", "none", or {"type": "function", "function": {"name": "..."}} messages=[{"role": "user", "content": "Find me some wireless keyboards under $100"}] )
if response.choices[0].finish_reason == "tool_calls": tool_call = response.choices[0].message.tool_calls[0] function_name = tool_call.function.name function_args = json.loads(tool_call.function.arguments)
# Execute function result = search_database(**function_args)
# Append the assistant message with the tool_calls, then add result messages = [ {"role": "user", "content": "Find me some wireless keyboards under $100"}, response.choices[0].message, # The assistant's message with tool_calls { "role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(result) } ]
final_response = client.chat.completions.create( model="gpt-4o", tools=tools, messages=messages ) ```
Parallel tool calling: GPT-4o supports requesting multiple tools in one response. Check response.choices[0].message.tool_calls for a list, not just one item.
Pass images via the image_url content type in the messages array:
python
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Describe what you see in this image."},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg",
"detail": "high" # "low", "high", or "auto"
}
}
]
}]
)
For base64:
``python
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_string}"}}
The detail parameter affects cost and quality. low crops to 512×512 (85 tokens). high tiles the image for more detail (significantly more tokens).
```python stream = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Count from 1 to 10."}], stream=True )
for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) ```
Chunks have choices[0].delta.content for text and choices[0].delta.tool_calls for tool call chunks (which you assemble incrementally). choices[0].finish_reason is non-null only on the last chunk.
```python # JSON mode — model is instructed to output valid JSON # You still need to specify the structure in your prompt response = client.chat.completions.create( model="gpt-4o", response_format={"type": "json_object"}, messages=[{ "role": "user", "content": "Extract name and age from: 'Alice is 30 years old'. Return as JSON." }] )
# Structured output with schema (gpt-4o and newer) from pydantic import BaseModel
class UserInfo(BaseModel): name: str age: int
response = client.beta.chat.completions.parse( model="gpt-4o", messages=[{"role": "user", "content": "Extract: 'Alice is 30 years old'"}], response_format=UserInfo ) print(response.choices[0].message.parsed) # UserInfo(name='Alice', age=30) ```
python
response = client.embeddings.create(
model="text-embedding-3-small", # or "text-embedding-3-large"
input="The quick brown fox"
)
vector = response.data[0].embedding # list of floats, 1536 dimensions
print(response.usage.total_tokens)
Batch multiple inputs for efficiency: input=["text1", "text2", "text3"]
```python import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o") tokens = enc.encode("Hello, how are you?") print(len(tokens)) # 6
# Count tokens in a messages array (approximate) def count_message_tokens(messages, model="gpt-4o"): enc = tiktoken.encoding_for_model(model) num_tokens = 0 for message in messages: num_tokens += 4 # message overhead for key, value in message.items(): num_tokens += len(enc.encode(str(value))) return num_tokens ```
| Model | Best for | Context |
|---|---|---|
| gpt-4o | General purpose, vision, tools | 128k |
| gpt-4o-mini | Fast, cheap, most tasks | 128k |
| o3-mini | Math and coding reasoning | 200k |
| o1 | Complex reasoning, slower | 200k |
The Assistants API adds persistent threads, built-in file retrieval, and code interpreter. The overhead: asynchronous runs, polling for completion, more complex state management.
Use it when: You need built-in file search, code execution in a sandbox, or persistent thread storage managed by OpenAI.
Don't use it when: You just want chat completions with conversation history. Managing your own messages array is simpler, more debuggable, and just as capable. Most production apps are better off with chat completions + their own database for conversation history.
Have a follow-up question about this topic?
Ask AI