A complete walkthrough of building a real AI-powered feature from scratch.
A document summarizer: takes a long text, returns a structured summary with key points, main topics, and a one-sentence overview. We'll build it properly — streaming output, error handling, clean architecture, easy to adapt to other use cases.
By the end, you'll have a pattern you can apply to classification, Q&A, extraction, and anything else.
pip install anthropic python-dotenv
# .env
ANTHROPIC_API_KEY=sk-ant-...
File structure:
``
my_ai_app/
ai_service.py # All AI logic lives here
main.py # Usage example
.env
The key architectural decision: put all AI calls in a service module, not scattered through your application. This makes it easy to swap models, add logging, and debug issues.
```python # ai_service.py import os import json import time import random import logging from typing import Iterator import anthropic from dotenv import load_dotenv
load_dotenv()
logger = logging.getLogger(__name__)
# Initialize the client once — reuse across calls client = anthropic.Anthropic( api_key=os.environ.get("ANTHROPIC_API_KEY"), timeout=60.0 )
SUMMARIZER_SYSTEM_PROMPT = """You are a document summarizer. When given a document, produce a structured summary in JSON format with exactly these fields: - overview: A single sentence summarizing the document - key_points: A list of 3-5 bullet-point strings covering the most important information - topics: A list of 2-4 main topic strings - word_count_estimate: Your estimate of the source document's word count as an integer
Return only valid JSON, no other text."""
def summarize_document(text: str, max_retries: int = 3) -> dict: """ Summarize a document and return a structured dict.
Args: text: The document text to summarize max_retries: Number of retry attempts on transient errors
Returns: dict with keys: overview, key_points, topics, word_count_estimate
Raises: ValueError: If the text is empty or the response can't be parsed anthropic.APIError: On non-retryable API errors """ if not text or not text.strip(): raise ValueError("Document text cannot be empty")
# Truncate if very long — 150k chars is roughly 35k tokens, safe for 200k context if len(text) > 150_000: logger.warning(f"Document truncated from {len(text)} to 150,000 chars") text = text[:150_000] + "\n\n[Document truncated for length]"
prompt = f"Please summarize the following document:\n\n{text}"
for attempt in range(max_retries): try: start_time = time.time()
response = client.messages.create( model="claude-haiku-3-5", # Use Haiku for cost efficiency on this task max_tokens=1024, temperature=0.0, # Deterministic — we want consistent structure system=SUMMARIZER_SYSTEM_PROMPT, messages=[{"role": "user", "content": prompt}] )
elapsed_ms = int((time.time() - start_time) * 1000) logger.info( "summarize_success", extra={ "input_tokens": response.usage.input_tokens, "output_tokens": response.usage.output_tokens, "latency_ms": elapsed_ms, "stop_reason": response.stop_reason } )
if response.stop_reason == "max_tokens": logger.warning("Summary was truncated by max_tokens limit")
raw_text = response.content[0].text.strip()
# Parse the JSON response try: return json.loads(raw_text) except json.JSONDecodeError: # Try to extract JSON if the model added surrounding text import re json_match = re.search(r'\{.*\}', raw_text, re.DOTALL) if json_match: return json.loads(json_match.group()) raise ValueError(f"Could not parse model response as JSON: {raw_text[:200]}")
except anthropic.RateLimitError as e: if attempt == max_retries - 1: raise delay = (2 ** attempt) + random.uniform(0, 1) logger.warning(f"Rate limited, retrying in {delay:.1f}s") time.sleep(delay)
except anthropic.InternalServerError as e: if attempt == max_retries - 1: raise delay = (2 ** attempt) + random.uniform(0, 1) logger.warning(f"Server error, retrying in {delay:.1f}s") time.sleep(delay)
except (anthropic.AuthenticationError, anthropic.BadRequestError): raise # Non-retryable
def summarize_document_streaming(text: str) -> Iterator[str]: """ Stream a prose summary (not structured JSON) for real-time display.
Yields chunks of text as they're generated. """ if not text or not text.strip(): raise ValueError("Document text cannot be empty")
stream_prompt = """You are a document summarizer. Write a concise summary of the following document in 2-3 short paragraphs. Cover the main points, key arguments, and important conclusions. Write in plain English, no bullet points or headers."""
with client.messages.stream( model="claude-haiku-3-5", max_tokens=512, temperature=0.3, system=stream_prompt, messages=[{"role": "user", "content": f"Summarize:\n\n{text[:100_000]}"}] ) as stream: for text_chunk in stream.text_stream: yield text_chunk ```
```python # main.py import logging from ai_service import summarize_document, summarize_document_streaming
logging.basicConfig(level=logging.INFO)
# Long document to summarize document = """ Machine learning is a subset of artificial intelligence that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves... [rest of your document] """
# Option 1: Structured JSON summary print("=== Structured Summary ===") summary = summarize_document(document) print(f"Overview: {summary['overview']}") print(f"\nKey Points:") for point in summary['key_points']: print(f" • {point}") print(f"\nTopics: {', '.join(summary['topics'])}")
# Option 2: Streaming prose summary print("\n=== Streaming Summary ===") for chunk in summarize_document_streaming(document): print(chunk, end="", flush=True) print() ```
The same structure — system prompt, user message, parse response — works for almost every AI task. Change the system prompt and output schema:
Question Answering: ```python SYSTEM = """Answer questions based only on the provided context. If the answer is not in the context, say "I don't know". Return JSON: {"answer": str, "confidence": "high"|"medium"|"low", "source_quote": str}"""
def answer_question(context: str, question: str) -> dict: response = client.messages.create( model="claude-haiku-3-5", max_tokens=512, temperature=0.0, system=SYSTEM, messages=[{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}] ) return json.loads(response.content[0].text) ```
Classification: ```python SYSTEM = """Classify the support ticket into one category. Categories: billing, technical, account, feature_request, other Return JSON: {"category": str, "confidence": float, "reasoning": str}"""
def classify_ticket(ticket_text: str) -> dict: response = client.messages.create( model="claude-haiku-3-5", max_tokens=256, temperature=0.0, system=SYSTEM, messages=[{"role": "user", "content": ticket_text}] ) return json.loads(response.content[0].text) ```
Data Extraction:
``python
SYSTEM = """Extract structured data from the provided text.
Return JSON matching this schema exactly:
{"company_name": str, "founded_year": int|null, "headquarters": str|null, "employee_count": int|null}
Use null for fields not mentioned in the text."""
A few things that matter as your integration grows:
Never expose API keys client-side. Your frontend should call your own backend, which calls the AI API. This applies to both web and mobile apps.
Keep AI calls in a service layer. Don't scatter client.messages.create() calls throughout your codebase. One module, one set of logging, one retry policy, one place to change models.
Log inputs and outputs for debugging. When something goes wrong in production (and it will), you want to be able to reproduce it. Log the prompt and the response. Be careful with PII — log enough to debug, not enough to create a liability.
Set explicit `max_tokens`. Don't rely on the model to stop on its own. A bug in your prompt or an adversarial input can lead to very long (expensive) responses if you don't cap output length.
Test with a variety of inputs. AI systems have edge cases that traditional software doesn't. Test with short inputs, very long inputs, inputs in other languages, inputs that ask the model to do something other than what you want. Find the failure modes before your users do.
Have a follow-up question about this topic?
Ask AI