← All models
MetaCurrent

Llama 3.2 11B Vision

Metas first open-source multimodal Llama model. Understands images and text together. Ideal for vision tasks without cloud dependency.

Context Window

128K tokens

≈ 96K words

Input Price

$0.05

per 1M tokens

Output Price

$0.05

per 1M tokens

Released

September 2024

Capabilities

Vision / Image input

Best for

On-device or self-hosted image understanding tasks

Strengths

  • Free
  • Open source
  • Vision
  • Multimodal
  • Privacy

API identifier

meta-llama/Llama-3.2-11B-Vision-Instruct

Compare Llama 3.2 11B Vision side-by-side with any other model

Compare models →