Prompt management and document management sound like two features on a checklist. In practice they are the spine of the product. If the primitives are wrong, nothing built on top of them works. The search ranks the wrong stuff. The agent retrieves stale templates. The version history cannot be trusted to roll back. This article walks through how a context repository handles both, in enough detail to ground a buying decision.

## How does prompt versioning work in Context Repo?

A prompt in Context Repo is a piece of text with metadata. The text is the **template body**. The metadata covers the title, description, engine target (which model the prompt was designed for), tags, and version history.

The template body supports `${variableName}` placeholder syntax. We chose this syntax for two reasons. Every JavaScript engine on earth knows how to read it, and none of our target clients (Claude, Cursor, ChatGPT, VS Code) have an interpreter that conflicts with it. A prompt looks like this:

```
You are a senior engineer reviewing ${language} code for ${reviewType}.

The code under review:
${code}

Output requirements:
- Be specific about line numbers
- Cite ${language}-specific best practices
- Skip nitpicks unless ${reviewType} == "pedantic"
```

### Are prompts rendered server-side or client-side?

The substitution is **caller-side**, not server-side. When an MCP client reads the prompt via `read_prompt`, it gets the template verbatim. The client's tool-call layer fills in `${language}`, `${reviewType}`, and `${code}` from the surrounding context (the file the user is editing, the review mode they picked, the diff in the buffer). The server does not know those values and does not try to guess. Caller-side substitution is the only design that lets the same prompt work identically across Claude, Cursor, and ChatGPT without the server having to model every client's surrounding context.

### Versions, change logs, and restore

Every time the prompt's `content` field changes, we write a new row to the version history. The shape:

- A monotonically incrementing version number.
- The full template body (no diffs-only storage; we keep the bytes).
- The timestamp and author.
- An optional **change log** message the editor can attach.

The dashboard surfaces this as a diff view. The MCP server surfaces it through `get_prompt_versions` (list every version of a prompt) and `restore_prompt_version` (restore an old version, which creates a new current version pointing at the restored bytes; the original history is never destroyed).

This matters because **prompt engineering is iterative.** You tune a system prompt across thirty edits to find the version that gets the model to behave. Version 17 was the best, then you broke it on version 24, and you cannot remember exactly what version 17 said. Without history this is a hostage situation. With history it is a `restore_prompt_version` call.

We never overwrite history. We never auto-delete versions. A free-trial account and an Ultimate plan have identical version-history behavior.

### How do you search across prompts?

Prompts are indexed two ways:

- **Keyword search** via `search_prompts` does a substring match on title, description, and content. Use this when you remember a literal phrase.
- **Semantic search** via `find_items` (which spans prompts, documents, and collections) compares your query against a 1536-number fingerprint of what each item is *about*, so search finds ideas, not just words.

Both are auth-scoped to the calling user. The semantic search uses [OpenAI's `text-embedding-3-small`](https://platform.openai.com/docs/guides/embeddings) for both queries and stored content, the dimension is fixed at 1536, and a mismatch fails silently so we keep it pinned.

## What file formats does Context Repo support for documents?

Documents are the larger artifacts. The product accepts 75+ file formats through [LlamaIndex Cloud](https://www.llamaindex.ai/), plus rendered web content via Firecrawl. PDFs, Word, Excel, PowerPoint, Markdown, plain text, source code, most image formats. The Chrome extension and the scrape endpoint additionally capture rendered webpage content (the version after JavaScript has finished, not the raw HTML).

A document, once ingested, has:

- **Title and tags** for cataloging.
- **Full body text** stored verbatim.
- **Embeddings** at the chunk level for semantic retrieval.
- **A hierarchical chunk tree** for `deep_search` and `deep_expand`.
- **Version history** identical to prompts (every content edit writes a new version).

### What does the ingest pipeline do?

When you upload a file or scrape a URL, the pipeline does three things in order:

1. **Extract text.** PDFs lose their layout, Word docs lose their formatting, code files keep their syntax. The output is a normalized markdown-like representation.
2. **Chunk hierarchically.** A document is not a flat string. We split each document into a tree of sections so an agent can read the top-level summary, then drill into a specific section, then drill into a paragraph, without ever loading 200 pages into the model.
3. **Embed each chunk.** Embeddings live in Convex's native vector index. The dimension is 1536. Each chunk has a stable ID, a parent ID (or null at the root), a position within its parent, and the raw text.

### How does hierarchical retrieval actually feel?

Two retrieval surfaces, intentionally separate:

- **`find_items`** is catalog-level search. "Find the document about HIPAA compliance." Returns matching prompts, documents, and collections with titles, IDs, and short highlights. Use this when you know roughly what you want and need to identify the artifact.
- **`deep_search`** is content-level search inside one or all documents. Returns ranked, hierarchical chunks with their position in the tree. Use this when you know which document is relevant and you want a passage.

Once a chunk surfaces, `deep_expand` lets the agent navigate the document:

- `up` reads the parent (the section the chunk belongs to).
- `down` reads the children (subsections of the chunk).
- `next` and `previous` read sibling chunks under the same parent.
- `surrounding` reads a window of N chunks before and after.

The shape is identical to the way a human navigates a book's table of contents. The reason it matters for agents is **context-window economics**. Pulling a 50,000-token document into a model's working memory is expensive, slow, and dilutes attention. Pulling the three most relevant 800-token chunks and then expanding only the ones that turn out to matter is cheap, fast, and keeps the agent focused.

Visually, `deep_search` lands inside a document and `deep_expand` navigates from there:

```mermaid
flowchart TB
  D(("Your document"))

  S1["Section: Setup"]
  S2["Section: Methods"]
  S3["Section: Results"]

  P1["Paragraph"]
  P2["Paragraph"]
  P3["Matched chunk"]
  P4["Paragraph"]
  P5["Paragraph"]

  D --> S1
  D --> S2
  D --> S3
  S2 --> P1
  S2 --> P2
  S2 --> P3
  S3 --> P4
  S3 --> P5

  classDef section fill:#27272a,stroke:#52525b,stroke-width:1px,color:#fafafa,rx:10,ry:10
  classDef para    fill:#18181b,stroke:#3f3f46,stroke-width:1px,color:#a1a1aa,rx:6,ry:6
  classDef match   fill:#18181b,stroke:#06b6d4,stroke-width:1.5px,color:#fafafa,rx:6,ry:6
  class S1,S2,S3 section
  class P1,P2,P4,P5 para
  class P3 match
  style D fill:#3b82f6,stroke:#60a5fa,stroke-width:2.5px,color:#ffffff,font-weight:600
  linkStyle default stroke:#52525b,stroke-width:1.5px
```

## How does Context Repo isolate data between users?

A collection is a named group of prompts and documents. The same prompt can live in multiple collections. The same document can live in multiple collections. Collections do two useful things, and both connect to per-user isolation.

- **Scope search.** `deep_search` and `find_items` both accept a `collectionId` parameter. Pass it, and the search runs against only the artifacts in that collection.
- **Scope access.** Per-user API keys can be issued with `permissions: ["collections.read", "prompts.read"]` scoped to specific collection IDs. The Cursor IDE on your work laptop can hold an API key that only sees the `work` collection.

Underneath every collection, every prompt, and every document, the same hard line holds. The record is scoped to the authenticated user. Both auth modes (Clerk-issued OAuth JWTs and per-user API keys) carry the user identity, and every Convex query enforces ownership before returning data. There is no team workspace today. If you want two contexts (work and personal), create two collections and issue two API keys.

## Can I export everything if I leave?

Yes, and the export surface is the same surface that AI clients use day-to-day. Every primitive is reachable through three doors:

- **The dashboard** at [contextrepo.com/dashboard](https://contextrepo.com/dashboard) is the UI for humans.
- **The MCP server** at [contextrepo.com/mcp](https://contextrepo.com/mcp) exposes 28 tools over streamable-HTTP transport for AI clients. See [How MCP Servers Connect AI Agents to Knowledge Bases](/resources/how-mcp-servers-connect-ai-agents-to-knowledge-bases) for the protocol details.
- **The REST API** under [contextrepo.com/v1](https://contextrepo.com/v1) covers 29 operations for programmatic clients. OpenAPI 3.1 spec at [/openapi.json](https://contextrepo.com/openapi.json).

Authentication is two modes, equivalent in capability:

- **OAuth 2.1 with PKCE** (the modern OAuth flow that lets you log in safely without a shared secret in the URL). The deployment issues tokens via Clerk at `https://clerk.contextrepo.com`. Discovery metadata at [`/.well-known/oauth-authorization-server`](https://contextrepo.com/.well-known/oauth-authorization-server) (RFC 8414) and [`/.well-known/oauth-protected-resource`](https://contextrepo.com/.well-known/oauth-protected-resource) (RFC 9728).
- **Per-user API keys** sent as `Authorization: API-Key gm_...` headers. Generated from the dashboard with granular permission scopes per key.

Rate limits use a sliding window via Upstash Redis. 10 scrapes per minute, 100 mutating API calls per minute, 120 read-only calls per minute. `X-RateLimit-Remaining` and `Retry-After` headers ship on 429 responses, and the gate fails open if Redis is unreachable.

Pagination uses opaque cursors. You echo back the `cursor` value from the previous response to get the next page; we do not expose internal offsets that could break across schema migrations. Default page size is 20, max 100. Iterate by passing `nextCursor` until it comes back empty.

The export contract is the same as the day-to-day read contract. Whichever auth mode you used to put data in, the same mode can pull every byte back out.

## How to save a versioned prompt and retrieve it from any AI client

Five steps, a few seconds each.

1. **Sign in to Context Repo.** Open [contextrepo.com](https://contextrepo.com) and start the 3-day Pro trial. The trial includes all 28 MCP tools, prompt and document storage, and Chrome extension capture.
2. **Create a prompt in the dashboard.** Open the Prompts page, click New Prompt, paste a template with `${variable}` placeholders, add a description, and save. The save creates version 1 automatically.
3. **Connect an MCP client.** From the MCP Server page, use the one-click install for Cursor, the Claude Desktop instructions, or copy the manual JSON config. Authentication is OAuth or API key, your choice.
4. **Retrieve and use the prompt from the client.** Ask the AI client to `search_prompts` for your prompt by title or content, then `read_prompt` to fetch the template. The client substitutes the variables and runs the prompt against its model.
5. **Edit and review history.** Edit the prompt in the dashboard with an optional change-log message. The next save creates version 2. Call `get_prompt_versions` from any client to see the history, and `restore_prompt_version` to roll back if needed.

Same primitives, same auth, working across whichever AI client you happen to be in.

## Where this lands for your team

The mental model we settled on after building this:

- **Prompts** are the system prompts and templates you reuse across tools.
- **Documents** are the reference material you upload everywhere and lose track of.
- **Collections** are the project, client, or context boundary.

If your AI work is fragmented across three or more clients, a context repository pays for itself in the time you stop spending re-uploading PDFs and re-pasting prompts. If your AI work is concentrated in one tool with its native storage working fine, you do not need one yet, and the [pricing page](https://contextrepo.com/pricing) will tell you that before you commit.

## Where to read next

- [What Is an AI Context Repo for Agents?](/resources/what-is-an-ai-context-repo-for-agents). The category framing for the whole product line.
- [How MCP Servers Connect AI Agents to Knowledge Bases](/resources/how-mcp-servers-connect-ai-agents-to-knowledge-bases). The protocol layer that surfaces these prompts and documents inside Claude, Cursor, ChatGPT, and 90+ other AI clients.
- [Semantic Search and Deep Search: Two Retrieval Layers](/resources/semantic-search-and-deep-search-two-retrieval-layers). How retrieval works once your AI is connected.
- [Using Context Repo with Claude, Cursor, and ChatGPT](/resources/using-context-repo-with-claude-cursor-and-chatgpt). Concrete workflows in each client.
- [API reference](/docs/api). Request and response shapes for every endpoint.
- [MCP tools reference](/docs/mcp/tools-reference). Full reference of all 28 tools.