Ollama is the tool that made local AI accessible to people who aren't machine learning engineers. It wraps the complexity of running large language models into a simple command-line interface and a cl
Ollama is the tool that made local AI accessible to people who aren't machine learning engineers. It wraps the complexity of running large language models into a simple command-line interface and a clean local API.
Ollama is an open-source runtime for large language models. It handles model downloading, memory management, GPU acceleration, and serving — all automatically. Running a state-of-the-art open model locally can be as simple as:
bash
ollama run llama4
That single command downloads the model if needed and opens an interactive chat session in your terminal.
curl -fsSL https://ollama.com/install.sh | shOllama installs as a background service. It uses your GPU when available (Apple Silicon, NVIDIA, AMD) and falls back to CPU otherwise.
Notable models available in 2026: - Llama 4 (Meta) — strong general-purpose, multiple size variants - Mistral / Mistral Nemo — efficient, fast, European-built - Gemma 3 (Google) — compact and surprisingly capable - Phi-4 (Microsoft) — punches above its weight at small sizes - DeepSeek-R1 — strong reasoning - Qwen 2.5 (Alibaba) — excellent multilingual support
Pull any model: ollama pull modelname. List installed: ollama list.
Ollama exposes an OpenAI-compatible REST API at http://localhost:11434. Code written for the OpenAI API works with Ollama by changing one line:
python
client = openai.OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama" # required but unused
)
| Model Size | RAM/VRAM Needed | |---|---| | 3B–7B | ~6–8 GB | | 13B | ~16 GB | | 30B–34B | ~24 GB | | 70B | 40 GB+ |
Quantized models (Q4_K_M format) cut memory requirements significantly while preserving most quality.
For a ChatGPT-like browser interface connected to Ollama, install Open WebUI via Docker:
bash
docker run -d -p 3000:80 --add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui ghcr.io/open-webui/open-webui:main
Open http://localhost:3000 — conversation history, model switching, system prompt configuration, all running entirely on your machine.
Have a follow-up question about this topic?
Ask AI