Learn/Fine-tuning & Evaluation/Fine-tuning with OpenAI, Anthropic & Google
Fine-tuning & Evaluation

Fine-tuning with OpenAI, Anthropic & Google

All three major API providers offer managed fine-tuning. Capabilities, pricing, and target use cases differ meaningfully.

Fine-tuning with OpenAI, Anthropic & Google

All three major API providers offer managed fine-tuning. Capabilities, pricing, and target use cases differ meaningfully.

OpenAI Fine-tuning

The most accessible managed option. Supported models include GPT-4o mini and GPT-3.5 Turbo.

Workflow: prepare JSONL data → upload via API or dashboard → kick off training job → receive a fine-tuned model ID you call like any endpoint.

Pricing: ~$8 per million training tokens + higher inference costs than the base model. OpenAI also supports reinforcement fine-tuning for select use cases — using reward signals rather than direct supervision.

Minimum data: At least 50 examples to see any improvement; 200+ for reliable gains.

Best for: Format and style consistency at scale. Not suited for injecting proprietary knowledge — use RAG for that.

Anthropic Fine-tuning

Available for Claude models through two cloud partners: AWS Bedrock and Google Cloud Vertex AI. Enterprise-oriented positioning reflects Anthropic's focus on safety and governance requirements.

Submit training data through your cloud provider's interface rather than directly through Anthropic's API. Claude Haiku is the primary fine-tuning target — smaller, faster, cheaper to adapt.

Important: Anthropic's constitutional constraints persist through adaptation. The fine-tuned model retains the base model's safety behaviors — matters for enterprise deployments requiring predictable guardrails.

Minimum data: 100+ high-quality examples, with emphasis on curation over volume.

Google Fine-tuning

Fine-tuning for Gemini models through Vertex AI. Broader options than other providers: both supervised fine-tuning (standard instruction-response pairs) and RLHF (requiring human preference data) are available.

Well-integrated into the Google Cloud ecosystem — a natural choice for teams already running GCP workloads.

Minimum data: 100–500 examples depending on task complexity.

Managed vs. Self-hosted

Managed fine-tuning — right when you need reliability without infrastructure investment, or compliance requirements favor a major cloud provider.

Self-hosted fine-tuning (Axolotl, LLaMA-Factory, Unsloth with open-source models) — right when you need full control, your scale makes managed pricing unsustainable, or you want data to stay on your infrastructure.

Teams running more than a few fine-tuning iterations per quarter often find self-hosted infrastructure pays for itself within a year.

Have a follow-up question about this topic?

Ask AI