Metas first open-source multimodal Llama model. Understands images and text together. Ideal for vision tasks without cloud dependency.
Context Window
128K tokens
≈ 96K words
Input Price
$0.05
per 1M tokens
Output Price
$0.05
per 1M tokens
Released
September 2024
On-device or self-hosted image understanding tasks
API identifier
meta-llama/Llama-3.2-11B-Vision-InstructCompare Llama 3.2 11B Vision side-by-side with any other model
Compare models →