01 / The short version

Vectors find similar. Graphs find related.

Vector RAG (Retrieval Augmented Generation) is the dominant pattern for enterprise AI in 2024 and 2025. The pattern: chunk your documents, embed them, store the embeddings in a vector database, and at query time fetch the top-N most similar chunks to ground the LLM answer. Simple, fast, popular.

It also has a fundamental ceiling. Vector similarity does not know what your business is. It returns chunks that look textually or semantically close to the query. That is not the same as returning the entities that are actually related to the answer.

A context graph is what comes next. It returns a focused subgraph of governed entities, with their typed relationships, that the agent can reason against. The agent stops guessing. The answer cites a path. The auditor goes home happy.

↳ Vector RAG

Returns chunks that look similar

  • Embeddings find lexical or semantic neighbours
  • No notion of business entities or relationships
  • Cannot reason across multiple hops
  • “Customer” in chunk A might not match “client” in chunk B
  • Hallucinations live in the gap between similar and correct
↳ Context graph

Returns the entities that are related

  • Traverses a typed graph of governed entities
  • Knows what a customer is, what they own, what they have done
  • Multi-hop reasoning by design
  • Every answer cites the path through the graph
  • Hallucinations have nowhere to hide
02 / A worked example

“Why is this customer at risk?”

Imagine a CDO asks the AI: which of our customers are most at risk of churning this quarter?

Vector RAG approach

The system embeds the question. It finds the most similar chunks across your knowledge base. It might return: a 2023 churn analysis report, a marketing memo about retention tactics, a press release about competitor pricing. Plausible-looking material. The LLM dutifully synthesises an answer that sounds confident and is mostly composed of generic risk factors.

The answer might be wrong. It almost certainly cannot point at the specific customers. It has no idea who they are.

Context graph approach

The system parses the question against the ontology. It identifies the relevant classes: Customer, Renewal, Usage, Ticket, SLA, NPS. It traverses the knowledge graph for instances matching the risk criteria: customers with renewals in the next 90 days, declining usage, open tickets, missed SLAs.

It returns 47 specific customer entities, each with the path that explains why they are flagged. The LLM presents them. The CDO can drill into any individual one and see the supporting subgraph.

Same question. Different architecture. Different answer.

03 / What is a “context graph”, really

A focused subgraph, materialised at decision time.

The full knowledge graph is large. A retail bank might have hundreds of millions of nodes (customers, accounts, transactions, products) and billions of edges. You cannot give the whole thing to an LLM. Even if the context window were big enough, the noise would drown the signal.

A context graph is the focused subgraph that is relevant to a specific question. It is built at query time by traversing from a starting entity (or set of entities) along the relationships the ontology says matter for this kind of question. The output is a small, dense, typed graph that the LLM can actually reason about.

You can think of context graphs as the graph equivalent of a retrieval step in RAG, but instead of returning unstructured chunks, they return structured entities the LLM understands as entities. The LLM then uses its language abilities to compose the answer, rather than its (poor) tabular reasoning to construct one.

04 / Why most teams will hit this wall

Vector RAG works until it does not.

Most enterprise AI teams have built vector RAG over the last 18 months. It works well for the “search-then-summarise” use cases that are genuinely text-heavy: legal precedents, policy documents, meeting transcripts. We are not arguing against it for those.

It hits a wall the moment the question is relational: which customers, which models, which suppliers, which transactions. These are the questions enterprise AI is most often asked. They are not text retrieval problems. They are graph problems. Vector RAG handles them by hallucinating, sometimes politely.

The teams that scale AI past pilot are the teams that recognise this and add a context graph layer. The teams that double down on bigger embeddings, more chunks, better rerankers, are usually the ones still stuck at the pilot stage in 18 months.

05 / How they live together

It is not either-or.

The right architecture for most enterprises uses both. The context graph handles the relational backbone of every answer: which entities, which relationships, which constraints. Vector retrieval handles the long tail of unstructured supporting documents: the policy text, the case notes, the email thread that explains why a particular contract was signed.

The agent fetches the context graph first to know what entities to look at, then optionally enriches with vector chunks for narrative detail. The graph is the skeleton. The vectors are the muscle. Neither alone is enough.

Building this hybrid is most of what we do. The graph is the new foundation. The vectors keep the bits the graph cannot easily express.

A 30 minute conversation

Want to talk through your stack?

Tell us where you are using vector RAG today. We will tell you honestly which queries it handles well and which need a context graph layer.

Book a meeting →