The Question Your RAG Can't Answer

A proof of concept to explore Context Graph, a pattern for giving AI agents structured, curated, write-back memory.

I've been thinking about what it would take for an AI agent to have real memory — not just recalled text, but structured knowledge it can traverse, update, and reason over with confidence.

To explore the idea, I built a proof of concept called Context Graph. The premise: instead of storing knowledge as documents or extracting a graph from text, you curate a graph directly as your source of truth. Typed edges. Temporal metadata. And critically, agents can write decisions back into it.

To make the comparison concrete, I ran the same question against three architectures: flat RAG, GraphRAG, and the Context Graph PoC. The question:

The Atlas project is at risk. Who has both ML and DevOps skills, isn't already assigned to Atlas, and who is their manager?

Four constraints. One question. Here's where each approach lands.


The Demo

Eight documents. Real employees, real projects, real skill profiles ... the kind of thing that lives in Confluence, or scattered across HR tools. The same corpus feeds all three approaches.

The demo app showing all 8 corpus documents and the knowledge graph
Three tabs, one question, all three fire in parallel. The knowledge graph on the left is the Context Graph. Curated, not extracted.

Flat RAG: Isolated Chunks, No Relationships

Flat RAG pulls the four most keyword-relevant chunks. For this question, that's:

  • the Atlas requirements doc,
  • Frank's skills profile,
  • Grace's record, and
  • Alice's overview.

Document 6, the one that establishes Frank's actual assignment to Phoenix, doesn't score high enough on keyword overlap to make the cut.

Flat RAG LLM answer: 'there is no one who can cover both ML and DevOps needs who isn't already assigned to Atlas'
The model concludes there is no one, because it never saw the document placing Frank on Phoenix, not Atlas.

It missed Frank entirely. The manager question gets a hard no: No managers are identified in the provided context.

The information exists in the corpus. It just arrived as isolated sentences with no relationship between them. Flat RAG is a bag of sentences. Bags don't have edges.


GraphRAG: Better, but Still Inherits Your Blind Spots

GraphRAG extracts a graph from the text corpus using an LLM. It identifies entities and relationships, builds a knowledge graph, then traverses it at query time. An improvement over raw retrieval.

Graph RAG extracted graph showing 13 nodes and 16 edges, with three gap badges highlighting missing relationship types
13 nodes, 16 edges. Skills and assignments extracted correctly. But the gap badges show what couldn't be captured.

The traversal finds Frank. But then:

Graph RAG answer identifying Frank as the candidate but showing Manager as Unknown with a red X
Candidate found. Manager unknown. The model suggests checking HR systems or org charts.

No REPORTS_TO edges. Document 5 says Alice manages the engineering team ... too vague for a reliable extraction. An extractor shouldn't confidently produce Frank -[REPORTS_TO]-> Alice from that sentence. So it doesn't.

GraphRAG inherits every omission in your source documents. And it can only read, i.e., when the agent makes a decision, that decision disappears.


Context Graph: Curated, Temporal, Write-Back

This is the pattern I wanted to explore. A Context Graph isn't extracted from text. It is maintained as the source of truth. Relationships are written explicitly, at the time they become true, with the metadata that actually matters to decisions.

(Frank)-[:HAS_SKILL {since: "2019-06", confidence: 0.98, endorsedBy: "Alice"}]->(DevOps)
(Frank)-[:REPORTS_TO]->(Alice)

The traversal can now follow a verified 4-hop path:

Context Graph tab showing the Cypher query, verified traversal path from Atlas through required skills to Frank to Alice, and the Decision Trace panel
Atlas → REQUIRES → ML ← HAS_SKILL ← Frank → REPORTS_TO → Alice. Every hop is a confirmed database relationship.
Context Graph LLM answer: 'Frank, reporting to Alice (VP Engineering), is the clear and only candidate.'
Complete answer. Manager named. Coordination path identified. And the decision just wrote itself back into the graph.

The Part That Interests Me Most: Write-Back

Notice the Decision Trace panel. The agent's recommendation is now a first-class node in the graph: an AgentDecision with typed edges to Frank, to Atlas, with a timestamp and a status.

You can query it. You can update it when Frank actually moves. You can see that the same question was asked a year ago, resolved to Grace, and its outcome is recorded. That history becomes part of the context for every future query.

This is what auditable AI could look like in practice. Not just what did the model say but what was recommended, to whom, when, based on what, and what happened next.


What This PoC Shows

CapabilityFlat RAGGraphRAGContext Graph
Finds the right candidate
Knows the manager
Temporal skill metadata
Agent decision write-back
Graph qualityN/AOnly as good as docsSource of truth

The PoC is deliberately narrow: a staffing scenario, a small corpus, a single domain. But the pattern generalizes. Anywhere an agent needs to reason over relationships, not just retrieve text, a curated graph gives it a fundamentally better foundation to work from.


The Honest Trade-Off

Context Graph requires more upfront investment than either RAG approach. Someone has to write the edges. That's real work, and the right answer for whether it's worth it depends on what's at stake.

But for domains where AI is making consequential decisions — staffing, routing, resource allocation, diagnosis — the alternative is an agent that confidently guesses a manager's name or quietly omits that the same decision was made and reversed six months ago.

The interesting thing to watch in the demo isn't the answers. It's the knowledge graph panel after the Context Graph query runs. A new Decision node appears in real time, edges radiating out to Frank, to Atlas, anchoring the recommendation in the structure of everything else the system knows.

That's the direction I think agent memory is going ... or, needs to go.