Retrieval-Augmented Generation: give the AI access to your own data at query time.
Retrieval-Augmented Generation (RAG) is an architecture that gives a language model access to a custom knowledge base at query time, without retraining it. When a user asks a question, the system first searches a database of documents (the retrieval step), finds the most relevant passages, and then injects those passages into the model's context alongside the question. The model generates its answer using both its pre-trained knowledge and the retrieved text. The result is a model that can answer questions about private, domain-specific, or recent information it was never trained on.
A typical RAG system works in two phases. During indexing (done once, or as documents are added), documents are chunked into smaller pieces, each chunk is converted into a numerical vector called an embedding using an embedding model, and those vectors are stored in a vector database. During retrieval (at query time), the user's question is embedded using the same model, and the vector database performs a similarity search to find the most semantically similar chunks. Those chunks are retrieved and passed to the language model as context.
An embedding is a list of numbers (typically 768–3072 floats) that encodes the semantic meaning of a piece of text. Texts with similar meanings produce embeddings that are close together in high-dimensional space. Vector databases (Pinecone, Weaviate, pgvector, ChromaDB) are optimised for finding the nearest neighbours to a query vector — effectively answering "which stored chunks are most semantically similar to this question?" This semantic search is what makes RAG robust to paraphrase and terminology variation.
Fine-tuning bakes knowledge into the model's weights, which makes updating it expensive and slow — you have to retrain whenever your data changes. RAG keeps the knowledge external, so you can add, update, or remove documents without touching the model at all. RAG also provides provenance — you know which document the model's answer came from — which is critical for trust and auditability in enterprise applications. For most "custom knowledge base" use cases, start with RAG before investing in fine-tuning.
Example
User asks: 'What is our refund policy?' System retrieves relevant policy doc chunks from vector DB AI answers using retrieved text: 'According to our policy (Section 3.2), refunds are accepted within 30 days...'
Try this skill with our AI assistant
Try it →