Most retrieval systems fail quietly. The agent answers with confidence, but the source library had a better passage three sections down. The paragraph existed. The model never saw it.

That is usually not a model problem. It is a retrieval problem.

Context Repo's Deep Search is built around a simple belief: **long documents are not flat text.** They have titles, headings, sections, subsections, tables, paragraphs, and order. If retrieval ignores that structure, it loses the signals a human would naturally use to find the answer. If retrieval preserves that structure, an AI agent can land on a relevant passage, read the parent section, move to neighboring passages, and gather evidence without loading the entire document.

This article explains the public shape of that system: what hierarchy-aware retrieval is, why it recalls passages that cheaper systems miss, and why it costs more to run.

## What "hierarchical retrieval" means

In Context Repo, a processed document becomes a searchable hierarchy. At the product level, the shape is:

- **Document**: the root artifact.
- **Section**: a heading-delimited region of the document.
- **Paragraph**: the smaller passage-level unit an agent usually needs for an answer.

Each searchable chunk keeps its text plus its place in the tree: document, level, parent, sibling position, and section path. `deep_search` searches inside those chunks. `deep_read` retrieves one chunk in full. `deep_expand` moves through the tree: up to the parent, down to children, sideways to previous or next siblings, or around the match with surrounding context.

```mermaid
flowchart TB
  Q(("Agent question"))
  I["Document ingest"]
  H["Document hierarchy<br/>document, section, paragraph"]
  V["Vector index<br/>with structure-aware chunks"]
  S["deep_search<br/>ranked passage hits"]
  R["deep_read<br/>inspect one chunk"]
  E["deep_expand<br/>up, down, next,<br/>previous, surrounding"]

  Q --> S
  I --> H
  H --> V
  V --> S
  S --> R
  S --> E
  E --> R

  classDef stage fill:#27272a,stroke:#52525b,stroke-width:1px,color:#fafafa,rx:10,ry:10
  classDef action fill:#18181b,stroke:#06b6d4,stroke-width:1.5px,color:#fafafa,rx:10,ry:10
  class I,H,V stage
  class S,R,E action
  style Q fill:#3b82f6,stroke:#60a5fa,stroke-width:2.5px,color:#ffffff,font-weight:600
  linkStyle default stroke:#52525b,stroke-width:1.5px
```

The important part is not that the system has more chunks. More chunks alone can make retrieval worse by scattering context. The important part is that the chunks retain enough structure for an agent to answer three follow-up questions:

1. Where did this passage come from?
2. What larger section does it belong to?
3. What should I read next if this passage is close but incomplete?

Flat retrieval systems usually answer only the first question, and sometimes not even that well.

## Why flat retrieval misses real answers

Cheaper retrieval models are useful. Context Repo uses simpler retrieval surfaces too, because not every query deserves the expensive path. The problem is that cheap models have predictable blind spots.

### Keyword search

Keyword search is excellent when the query is exact: an error code, a function name, a customer name, a policy ID. It is cheap, fast, and deterministic.

It fails when the answer uses different words than the question. A user asks "How long are audit logs retained?" and the document says "Retention: 90 days" under an "Audit Logs" heading. The literal terms do not line up, even though a human would find the answer immediately.

### One vector per document

One-vector-per-document search is cheap to store and simple to query. It can tell you which document is generally about a topic.

It cannot tell you which passage answered the question. For long documents, the embedding becomes an average of many topics. The agent still has to read too much text after the document is found.

### Flat chunk vector search

Flat chunk search is the common RAG baseline. Split a document into chunks, embed each chunk, return top-K nearest chunks.

That works until the chunk needs the heading above it. Tables, parameter lists, short paragraphs, and repeated boilerplate often do not carry enough local meaning by themselves. A flat chunk may contain "90 days" but not "audit logs." A flat chunk may contain "required" but not the feature or policy it refers to. The chunk is relevant only because of where it lives.

### Top-K only retrieval

Top-K retrieval treats the first result list as the whole answer. If the best chunk is incomplete, the agent has to guess or rerun the search. If several chunks from the same section match, the list can waste slots on near-duplicates. If the right answer is one sibling away, the system has no native way to move there.

Context Repo's hierarchy-aware retrieval is designed around those failure modes.

## Why hierarchy improves recall

Recall is the ability to find the relevant evidence when it exists. For an AI context repository, recall matters more than a pretty search results page. If the agent never sees the right passage, every later step is downstream of a miss.

Hierarchy improves recall in four practical ways.

### 1. Passages carry their document context

The system does not treat a paragraph as an orphan. A chunk keeps enough structural context for retrieval to understand what part of the document it belongs to. That matters most for thin passages:

- A table row whose body text is mostly numbers.
- A short list item under a highly specific heading.
- A paragraph that starts with "This applies when..." and depends on the section title.
- A repeated policy clause that only becomes unique because of its location.

The stored passage remains the passage. The retrieval layer uses its place in the document to make it easier to find.

### 2. Search can land at the right level

Some questions want a paragraph. Some questions want a section. Some questions are broad enough that the section heading is the best initial landing point.

A hierarchy gives retrieval multiple useful landing zones. If a section and a paragraph both match, the agent can prefer the more specific passage, then expand upward when the larger section is needed. That is better than forcing every answer through one fixed chunk size.

### 3. Navigation recovers nearby evidence

The first hit does not need to be perfect. It needs to be a good starting point.

Once an agent has a chunk ID, it can move:

- **Up** to read the parent section.
- **Down** to read the child passages under a matched section.
- **Next** or **previous** to continue a sequence.
- **Surrounding** to inspect nearby passages without rerunning search.

This is how humans read a structured document. We find a relevant paragraph, glance at the heading, read the paragraph before it, and sometimes scan the rest of the section. Hierarchical retrieval gives the agent the same moves.

### 4. Repeated searches do not have to repeat themselves

Agentic retrieval is iterative. An agent often searches, reads, asks a narrower follow-up, and searches again.

Deep Search supports session-based deduplication so later calls can avoid re-returning chunks the agent already saw. That keeps exploration moving outward instead of circling the same top result.

## Why it costs more

Hierarchy-aware retrieval is not a free lunch. It is more expensive than simpler retrieval models in four places.

### Ingestion cost

A cheap model can store one document row and maybe one embedding. Deep Search has to process the document into a hierarchy, preserve ordering, create passage-level records, and generate embeddings for many chunks. Large documents produce many searchable units.

Every content update has to rebuild the relevant retrieval structure. That is background work users do not see, but the system still pays for it.

### Storage cost

Each chunk stores content, metadata, hierarchy links, and a vector. The vector alone is a 1536-number representation. Multiply that by section and paragraph chunks across a document library and storage grows quickly compared with one-vector-per-document search.

That extra storage is not decorative. It is what lets the agent retrieve a precise passage, cite where it came from, and navigate from it.

### Query cost

At query time, the system embeds the query, searches the vector index, gathers candidate chunks, applies the caller's scope, handles result quality, removes redundant parent-child hits, enforces limits, and returns navigation metadata. If the agent then expands context, that is another read.

A keyword search can be one indexed lookup. A hierarchy-aware search is a retrieval workflow.

### Engineering cost

The hardest cost is not the embedding bill. It is keeping the retrieval contract reliable:

- Chunk IDs must remain useful to agents.
- Parent and sibling navigation must stay in document order.
- Large documents must process without blocking the app.
- Search must stay scoped to the caller's documents and collections.
- Result shapes must be stable across the dashboard, REST, MCP, and ChatGPT App surfaces.

This is why Context Repo treats Deep Search as a product layer, not just "a vector database with docs in it."

## The trade-off table

| Retrieval model | Lowest cost | Main failure mode | Where Context Repo's hierarchy helps |
|---|---:|---|---|
| Keyword search | Very low | Misses semantic matches and wording changes | Keep it for exact phrases, use Deep Search for meaning |
| One vector per document | Low | Finds the document, not the answer | Return passage-level hits with source position |
| Flat chunk vector search | Medium | Loses heading and section context | Preserve the chunk's place in the document |
| Top-K only RAG | Medium | Stops at the first list of hits | Let agents read, expand, and continue |
| Hierarchical retrieval | Higher | More ingest, storage, and query work | Better recall on long, structured documents |

The point is not that the expensive model should replace every cheaper model. The point is that each model should be used for the job it is good at.

Context Repo exposes both retrieval layers for that reason. `find_items` helps an agent find the right prompt, document, or collection. `deep_search` helps the agent search inside document content. The companion article [Semantic Search and Deep Search: Two Retrieval Layers](/resources/semantic-search-and-deep-search-two-retrieval-layers) walks through that split in detail.

## Why the cost is worth paying

The cost is worth paying when the corpus has answers that are easy for a human to find and easy for a flat system to miss.

That usually means:

- Long PDFs, manuals, research docs, specs, or knowledge-base articles.
- Documents with headings, subsections, tables, and lists.
- Questions that need a cited passage, not just a relevant document title.
- Agents that need to verify, compare, or expand evidence before answering.
- Workflows where loading the whole document into the model is too slow, too expensive, or too noisy.

In those settings, cheaper retrieval often saves pennies during ingest and spends dollars during reasoning. The model burns context on irrelevant text, misses the answer, or asks for another search. Hierarchical retrieval pays more up front so the agent gets a better starting point and a cleaner path to the surrounding evidence.

## What this gives an AI agent

For an agent, a Deep Search result is not just a snippet. It is a handle into the document.

The result says: here is the passage, here is the document, here is the level, here is the parent, here are the siblings, and here is the score. That shape lets the agent behave less like a search box and more like a reader:

1. Search for likely evidence.
2. Read the best chunk.
3. Expand to the section or neighboring passages.
4. Decide whether the evidence is enough.
5. Cite the exact chunks it used, or say the repository does not establish the answer.

That last part matters. Better retrieval is not just about finding more. It is about giving the agent enough structure to know when it has found enough.

## What users can depend on

The retrieval contract is simple:

- Documents are processed into hierarchical chunks.
- Chunks are embedded for semantic search.
- Results include structure, not just text.
- Agents can inspect and expand from any result.
- The system spends more work on ingest and retrieval to improve recall inside long documents.

That is the part that matters in practice. Context Repo gives an agent a better starting point than a flat hit list, then gives it the tools to keep reading only where the evidence leads.

## Where to read next

- [Semantic Search and Deep Search: Two Retrieval Layers](/resources/semantic-search-and-deep-search-two-retrieval-layers). How `find_items`, `deep_search`, `deep_read`, and `deep_expand` fit together.
- [Prompt and Document Management for AI Agents](/resources/prompt-and-document-management-for-ai-agents). How documents are ingested, versioned, and exposed to agents.
- [How MCP Servers Connect AI Agents to Knowledge Bases](/resources/how-mcp-servers-connect-ai-agents-to-knowledge-bases). How these retrieval tools reach Claude, Cursor, ChatGPT, and other MCP clients.
- [What Is an AI Context Repo for Agents?](/resources/what-is-an-ai-context-repo-for-agents). The category framing for prompts, documents, collections, and retrieval.
- [MCP Server install page](/mcp-server). Connect your client and try Deep Search against your own documents.
