How to write prompts that generate correct, production-ready code instead of plausible-looking garbage.
The most common mistake when prompting for code is treating the model like a search engine — you type a vague description and hope for something usable. Sometimes that works. For simple, well-trodden tasks it works fine. But as soon as you need something with specific constraints, error handling, a particular integration point, or a specific coding style, vague prompts produce code you'll spend more time fixing than writing from scratch would have taken.
The mental shift: treat prompting for code like writing a ticket for a competent contractor who has never worked with your codebase. They'll produce something that technically compiles and runs, but it won't match your conventions, handle your edge cases, or integrate cleanly — unless you tell them all of that upfront.
1. Language and version
Don't assume. "Write a function to parse CSV" will get you something — but is it Python 3.11 or 3.8? Does it use the built-in csv module or pandas? State it.
2. Framework and dependencies you're already using If you're using FastAPI, say so. If you're using SQLAlchemy 2.0, say so — the ORM syntax changed significantly between 1.x and 2.0, and the model may default to the older version.
3. What the function integrates with
"This function will be called from a FastAPI route handler. It should use our existing db: AsyncSession dependency and return a Pydantic model." That context shapes the entire implementation.
4. Input/output examples
Concrete examples eliminate ambiguity. "Given {'name': 'Alice', 'age': 30}, return User(name='Alice', age=30)" is unambiguous. "Parse user data" is not.
5. Error handling expectations
"Raise a ValueError if the input is empty" vs. "Return None if the input is empty" vs. "Raise an HTTP 422 exception" — all valid, all different implementations. The model will pick one arbitrarily if you don't specify.
6. Constraints No external dependencies, must be synchronous, must handle concurrent calls, must work with Python 3.9+ — state anything that rules out a class of solutions.
Before (vague):
``
Write a function to send an email notification.
After (specific):
``
Write a Python async function send_notification_email that:
- Uses the aiosmtplib library (already installed)
- Accepts: recipient: str, subject: str, body: str, html_body: Optional[str] = None
- Reads SMTP config from environment variables: SMTP_HOST, SMTP_PORT, SMTP_USER, SMTP_PASSWORD
- Sends a multipart email with both text and HTML parts if html_body is provided
- Raises a custom EmailDeliveryError` exception (define it) on delivery failure
- Logs success and failure using the standard logging module
Include a short usage example in a docstring. ```
The second prompt will get you production-quality code. The first will get you a demo.
Always ask for tests when asking for a non-trivial function. The model is good at generating pytest tests, and having them immediately surfaces edge cases you didn't think of.
Write the function AND write a pytest test suite covering:
- The happy path
- Empty input handling
- The error cases mentioned above
- At least one edge case you identify
If the model's tests are flimsy (just checking that the function doesn't throw), ask it to write assertions about the actual output values.
For unfamiliar code, paste it and ask:
Explain what this code does, step by step. Identify any potential bugs,
edge cases, or performance issues. Note any patterns that seem unusual
or non-idiomatic.
This is more useful than just "what does this do?" because it asks for critique, not just description. Models are often better at finding issues when explicitly asked to look for them.
When you have an error, don't paste just the error. Give the model: 1. The error message and full stack trace 2. The code that produced it 3. What input caused it (if you know) 4. What you expected to happen
I'm getting this error when I call process_order(order_id=123)`:
[paste full traceback]
Here is the relevant code: [paste code]
I expected it to return an Order object. What's causing this and how do I fix it? ```
The "what I expected" part is critical — it tells the model the correct behavior, which helps it distinguish a code bug from a misunderstanding of the API.
Claude (Anthropic): Strong at following detailed specifications and maintaining consistency across a long prompt. Good for tasks where you've written out all the constraints and want them all respected. The system prompt is well-suited for establishing coding conventions that should apply to the whole session.
GPT-4o (OpenAI): Fast, capable, and has a huge training corpus of code. Often good for the first pass at well-known patterns. The function calling / tool use features make it useful for agentic coding pipelines.
Gemini 2.0 Flash: Good value for the cost, competes with GPT-4o on many coding tasks, and handles very long contexts well — useful when you need to paste a large codebase for context.
For most code tasks, the differences are smaller than prompt quality. A well-specified prompt to a mid-tier model beats a vague prompt to the frontier model every time.
If you're building a coding assistant or using the API directly, use the system prompt to lock in conventions:
You are a Python coding assistant. Follow these conventions in all code you write:
- Use type hints on all function signatures
- Prefer async/await for I/O operations
- Use Pydantic v2 models for data validation
- Raise specific, typed exceptions rather than generic ones
- All functions must have docstrings
- Use structlog` for logging, not the standard library
When asked for code, produce the implementation and a pytest test suite in the same response. ```
This is dramatically more effective than repeating these requirements in every user message.
Have a follow-up question about this topic?
Ask AI