Model Encyclopedia

Meta Models

Every Llama model — specs, context window, capabilities, and self-hosting considerations.

About Meta AI

Meta — the parent company of Facebook, Instagram, and WhatsApp — made a strategic decision to develop and release its large language models as open-weight models rather than keeping them proprietary. The Llama model family (Large Language Model Meta AI) represents the most widely used open-weight models in the world.

This decision has made Llama the foundation for thousands of fine-tuned models, research projects, and production deployments across the AI ecosystem.

---

What "Open Weights" Means

Open weights means that Meta releases the actual model parameters — the numerical values that define the trained model — for anyone to download and use. This is different from:

  • Closed API models (GPT-4, Claude, Gemini): you send requests to a server; you never have the model itself
  • Fully open source: open weights models may have license restrictions on commercial use

With open weights, you can: - Run the model entirely on your own hardware, with no data leaving your infrastructure - Fine-tune the model on your own data without sharing that data with any provider - Modify, merge, and extend the model - Host it yourself at any scale, paying only infrastructure costs

Llama models are free to download from Meta's website and Hugging Face. Commercial use is permitted under Meta's Llama license (with restrictions for companies over 700M monthly active users).

---

Llama 3.1 Family

Llama 3.1 8B

| Property | Value | |---|---| | Parameters | 8 billion | | Context Window | 128,000 tokens | | Weights | Free / Open | | License | Llama 3.1 Community License | | Hardware (inference) | Can run on consumer GPU (RTX 3080/4080, M-series Mac) | | Quantized versions | Available (4-bit quantization fits in ~5GB VRAM) |

Best for: On-device or local deployment, privacy-sensitive applications, experimentation and fine-tuning, cost-controlled production at scale with self-hosting.

Strengths: Genuinely capable model that outperforms models many times larger from earlier generations. Runs on consumer hardware when quantized. The 128K context window is competitive with commercial models. Excellent for fine-tuning on domain-specific data.

Weaknesses: Requires engineering effort to deploy and serve. No official support. Performance is below larger models on complex tasks.

---

Llama 3.1 70B

| Property | Value | |---|---| | Parameters | 70 billion | | Context Window | 128,000 tokens | | Weights | Free / Open | | Hardware (inference) | Requires server-grade GPU (A100 or equivalent, ~140GB VRAM for full precision; ~40GB quantized) |

Best for: Production applications where quality matters and self-hosting economics are favorable at scale. Competitive alternative to commercial models for many use cases.

Strengths: Performs comparably to GPT-3.5-class commercial models on many benchmarks. Significantly better than the 8B model on complex reasoning and instruction-following. Strong multilingual performance.

Weaknesses: Requires dedicated server infrastructure to run at reasonable speed. Not practical on consumer hardware at full precision.

---

Llama 3.1 405B

| Property | Value | |---|---| | Parameters | 405 billion | | Context Window | 128,000 tokens | | Weights | Free / Open | | Hardware (inference) | Requires multiple high-end GPUs or specialized hardware (8x A100 80GB minimum) |

Best for: Organizations with significant infrastructure willing to run frontier-competitive open models. Research, custom deployments, large-scale fine-tuning.

Strengths: Competitive with frontier commercial models on many benchmarks. Largest openly available model at launch. Meta positions it as a reference for the ecosystem.

Weaknesses: Infrastructure requirements are extreme for most organizations. Available via API providers (see below) if self-hosting is impractical.

---

Llama 3.2 Family (Multimodal)

Llama 3.2 11B Vision

| Property | Value | |---|---| | Parameters | 11 billion | | Context Window | 128,000 tokens | | Vision | Yes — first multimodal Llama | | Weights | Free / Open |

Best for: Vision tasks in privacy-sensitive or on-device contexts. Document understanding, image captioning, visual question answering with local deployment.

Strengths: First Llama model with image understanding. Can run on relatively accessible hardware compared to larger vision models. Opens up multimodal applications for self-hosted deployments.

Weaknesses: Vision capabilities are competitive but not state-of-the-art. Weaker than commercial vision models (GPT-4o, Claude 3.5 Sonnet, Gemini) on demanding visual reasoning tasks.

---

Llama 3.2 90B Vision

| Property | Value | |---|---| | Parameters | 90 billion | | Context Window | 128,000 tokens | | Vision | Yes | | Weights | Free / Open |

Best for: Higher-quality vision tasks with open-weight flexibility. Server deployment for production vision applications.

Strengths: Significantly stronger vision performance than the 11B model. Competitive with some commercial vision models on structured tasks.

Weaknesses: Substantial hardware requirements. Still below GPT-4o and Claude 3.5 Sonnet on complex vision reasoning tasks.

---

Llama 3.3 70B

| Property | Value | |---|---| | Parameters | 70 billion | | Context Window | 128,000 tokens | | Weights | Free / Open | | Released | December 2024 |

Best for: Drop-in replacement for Llama 3.1 70B with improved performance. Best open-weight 70B model available as of early 2025.

Strengths: Improved instruction-following and general performance over 3.1 70B. Meta claims performance approaching the 405B model on many tasks. Same hardware requirements as 3.1 70B.

---

Where to Access Llama Models

Direct download: meta.llama.com (request access, download weights directly)

Hugging Face: huggingface.co/meta-llama — full model repository, community fine-tunes, quantized versions

API providers (pay per token, no self-hosting needed): - Groq: Known for extremely fast inference using custom hardware - Together AI: Wide model selection, fine-tuning support - Fireworks AI: Production-ready API with fast inference - AWS Bedrock: Enterprise access with AWS compliance

Consumer product: Meta AI is available in WhatsApp, Instagram, Facebook Messenger, and as a standalone assistant at meta.ai. Powered by Llama models.

---

Hardware Requirements Summary

| Model | Minimum VRAM (quantized 4-bit) | Practical Setup | |---|---|---| | 8B | ~5 GB | RTX 3060, M2 MacBook Pro | | 70B | ~40 GB | 2x RTX 3090 / A100 40GB | | 405B | ~200 GB+ | 4-8x A100 80GB |

Software: Ollama is the easiest way to run Llama models locally — single command install, automatic model download, OpenAI-compatible API.

---

Why Open Source Matters

Privacy and data control: Your data never leaves your infrastructure. Critical for healthcare, legal, finance, and other sensitive industries.

Cost at scale: At very high volumes, self-hosting can be dramatically cheaper than per-token API fees.

No vendor lock-in: You own your deployment. Provider pricing and availability changes do not affect you.

Customization: Fine-tuning on proprietary data produces domain-specific improvements that are not possible with closed models.

Research and auditability: Open weights enable independent safety research, bias auditing, and capability evaluation.

Honest limitations: Requires engineering to deploy and operate. No official support channel — community support through Hugging Face and Meta forums. Keeping up with updates requires ongoing effort.

Have a follow-up question about this topic?

Ask AI