You wrote the perfect prompt in Cursor. Two days later you need it in Claude, and your AI does not know where to look. You drop a 200-page PDF into ChatGPT, ask a single question, and the model spends its whole context window re-reading the whole document. Both of these are retrieval problems. They are not the same retrieval problem.
Context Repo is a context repository for AI agents and humans. Inside it, retrieval splits into two surfaces on purpose: one for finding the right artifact, one for finding the right passage inside that artifact. Under the hood both run on the same Convex vector index, using OpenAI text-embedding-3-small for embeddings, but they answer different questions and they return different shapes.
Rendering diagram…
This article walks through both surfaces, when each one earns its place, and the small set of facts (1536-dim vectors, a 256-result vector-store cap, a document-only chunk index) that shape what they can do.
What is the difference between find_items and deep_search?
Every retrieval question in a context repository falls into one of two shapes:
- "Which artifact is relevant here?" This is catalog-level. The unit of return is a prompt, a document, or a collection. An agent uses it when it knows the topic it is after but does not know which item in your repository holds the answer.
- "What does this artifact say about X?" This is content-level. The unit of return is a chunk inside one document. The agent uses it once it has identified the right document (or a small candidate set) and needs the specific passage.
find_items answers the first question. deep_search answers the second. They share an embedding model and an index, but they are separate tools with separate inputs because they are separate jobs.
How does find_items search the catalog?
find_items takes a natural-language query and returns matches across prompts, documents, and collections. The MCP tool signature, simplified:
find_items({
query: string,
type?: 'prompts' | 'documents' | 'collections' | 'all',
semantic?: boolean, // default: true
})Returns
Each result is item-level: title, ID, type, similarity score, and a short highlight (the first ~150 characters of matching content with the query terms bolded). The agent uses the list to decide which artifact to read in full. The MCP tool is the only path that surfaces prompts in semantic results; deep_search operates on document chunks only.
When to reach for it
Two specific reasons to use this surface:
- The agent does not know the title yet. "Find the document about HIPAA compliance" can match a document called "Compliance Notes 2026" because the body mentions HIPAA. Semantic search bridges the title-to-content gap.
- The agent wants to weight prompts vs documents vs collections. Pass
type: 'prompts'to retrieve only matching templates. Passtype: 'documents'to retrieve only matching reference material. Pass'all'(the default) when you want the full catalog.
Set semantic: false when you remember the exact phrase. Literal substring match across title, description, and the first ~4 KiB of each document's content preview is faster, has zero embedding cost, and beats vector search when the query is a token the document literally contains ("error code E2049" does not need a vector index).
The matching REST surface is GET /v1/search with the same q, type, and semantic query parameters; the MCP tool is a thin wrapper over it.
How does deep_search read inside documents?
deep_search is what makes Context Repo a real retrieval system, not a search bar with a vector index bolted on. Signature:
deep_search({
query: string,
documentId?: string, // search inside one document
collectionId?: string, // search inside one collection's documents
limit?: number, // server default: 10
sessionId?: string, // dedup across paginated reads
})Scope: documents only. Prompts are stored as single embeddings in a separate index and are reachable only through find_items or search_prompts. That is a deliberate split: a prompt is a unit you read whole; a document is a unit you navigate.
Returns
Each match is chunk-level. The fields on every result:
- A stable
chunkId. - The chunk's
content(the text). - A
level: one of"document","section", or"paragraph". - A
chunkIndexfor sibling ordering under the same parent. - A
parentId(or null at the root). - A similarity
score.
The chunks are pieces of the document hierarchy Context Repo builds at ingest time. A textbook chapter becomes a section chunk; the section under that becomes a deeper section; the paragraph under that becomes a paragraph chunk. The structure is always parent-pointer. (Once an agent has a chunkId, calling deep_read returns the same chunk with richer navigation metadata: sectionPath, prevSiblingId, nextSiblingId, headingText, and wordCount.)
When to reach for it
This is the right shape for retrieval-augmented generation (RAG) inside large documents. Instead of dropping a 50,000-token PDF into the model's context, the agent runs deep_search, picks the top three chunks by score, reads them, and decides whether it needs more.
sessionId is the dedup ledger. Pass one back on the next call and Context Repo will not re-return chunks it already gave you in this session. On the MCP transport an auto-session is created and reused per caller, so iterative exploration just works; on REST you mint a sessionId via POST /v1/pd/session and pass it forward yourself.
How does deep_expand navigate document chunks?
A chunk on its own is often not enough. The matching paragraph might assume context from its parent section ("As mentioned in §3, the policy applies to..."). The agent needs to walk the tree.
deep_expand is the navigator:
deep_expand({
chunkId: string,
direction: 'up' | 'down' | 'next' | 'previous' | 'surrounding',
count?: number, // for 'surrounding' direction
})The five directions:
up: read the parent chunk. Useful when the matching paragraph needs the surrounding section's setup.down: read the child chunks. Useful when the match landed on a section heading and the agent needs the section body.nextandprevious: read sibling chunks under the same parent. Useful for continuing a list or a sequential narrative.surrounding: read a window of same-parent siblings around the target. On a sparse hierarchy where the target is the only child under its parent,surroundingautomatically falls back to the last/first chunks of the parent's neighbouring sections, so the agent still gets meaningful context.
The agent picks the direction based on what it needs. The cost is one tool call per navigation step, but each step returns only the requested chunk. Context-window cost stays bounded by intent.
Why two retrieval surfaces, not one
Concretely, here is the failure mode we avoided by splitting them:
- A single combined endpoint returns "document X scored 0.71, document Y scored 0.69, chunk inside X scored 0.66." The agent has to reason about whether to read X in full or just the matching chunk, and ranking does not really sort that out.
- Or the endpoint returns only chunks, and the agent loses the catalog view it needs to choose between candidate documents.
- Or the endpoint returns only documents, and the agent has to pull the entire candidate document just to confirm the match.
By keeping find_items catalog-level and deep_search chunk-level, an agent can run a two-step retrieval pattern that mirrors how humans search:
- Find the right artifact.
- Find the right passage within it.
It maps to the way the agent already thinks about RAG, and it keeps the model's context window honest.
What embedding model and vector store does Context Repo use?
We are pre-launch and have not published benchmark numbers; we will not invent them here. The shape that matters:
- Vectors are 1536-dimensional, computed with OpenAI
text-embedding-3-small. The Convex vector index pins the dimension; mismatched embeddings fail silently, which is why we treat it as a hard contract. - Embeddings live in Convex's native vector index, in the same backend that holds prompts and documents. No second store to keep in sync.
- Embedding generation happens at ingest time for stored content. Query embeddings are computed at call time.
- The vector store has a Convex-imposed cap of 256 results per query. Context Repo caps user-visible limits below that, so callers cannot trip the underlying bound by accident.
- The chunk hierarchy is three levels:
document,section,paragraph. Section chunks can nest; paragraphs are the leaves.
When to use each surface
| Question | Surface |
|---|---|
| Which document covers our HIPAA policy? | find_items (catalog) |
| What does the HIPAA policy say about audit logs? | deep_search (content) inside the doc found above |
| Show me the paragraph after that one | deep_expand with direction: 'next' |
| What is the section heading this paragraph lives under? | deep_expand with direction: 'up' |
| Find prompts about code review | find_items with type: 'prompts' |
| Search inside one collection only | deep_search with collectionId |
| Search the literal phrase "error code E2049" | find_items with semantic: false |
Internalize that table and the rest is mechanical.
How this connects to the rest of Context Repo
Both surfaces are exposed as MCP tools and as REST endpoints. The MCP server at contextrepo.com/mcp advertises both tools (plus deep_read and deep_expand for chunk navigation). The REST API at /v1 exposes:
GET /v1/searchfor catalog retrieval (matches thefind_itemstool).POST /v1/pd/searchfor chunk-level retrieval (matchesdeep_search).GET /v1/pd/read/:chunkIdfor chunk inspection (matchesdeep_read).POST /v1/pd/expandfor hierarchy navigation (matchesdeep_expand).POST /v1/pd/sessionto mint a sessionId for cross-call dedup.
The dashboard search bar at contextrepo.com/dashboard/search routes user queries through the same underlying endpoints, so a search you ran from Cursor and a search you ran from the browser return the same ranked results (modulo your auth scope).
There is no separate vector database to keep in sync with the row store. One less moving part, one fewer reason for retrieval to drift from truth.
Where to read next
- What Is an AI Context Repo for Agents?: category framing
- Prompt and Document Management for AI Agents: what we are searching across
- How MCP Servers Connect AI Agents to Knowledge Bases: the protocol layer
- Using Context Repo with Claude, Cursor, and ChatGPT: real workflows that lean on both retrieval surfaces
- Deep search documentation: full MCP tools reference
- REST API endpoints: programmatic access