Chat Agents13 min read·

Inside Ask Context Repo: How a LangGraph Agent Turns Docs into Cited Answers

How Ask Context Repo uses a LangGraph agent, Context Repo's retrieval API, and a middleware stack to deliver cited answers from the docs.

By Context Repo Team

A search result is a location where an answer might live. Ask Context Repo returns the answer itself, grounded in our documentation and cited to the exact passage it came from.

That is a harder problem than it sounds. Retrieval gets you close. Synthesis gets you an answer. But synthesis without retrieval hallucinates, and retrieval without synthesis dumps a pile of chunks in the user's lap. The interesting part is what happens between those two steps. This article walks through how we built it.

What Ask Context Repo is

Ask Context Repo is a public chat at ask.contextrepo.com. No login. No API key. You type a question about Context Repo, and you get back a synthesized answer with inline citations you can click to read the original documentation.

It is not a search box. It does not return a ranked list of pages. It reads the docs, reasons about the question, and writes a grounded answer. If the docs do not cover what you asked, it says so.

The interface streams the answer as it generates. You can see the agent's intermediate steps: which searches it ran, which passages it read, which chunks it expanded for more context. The citations are not decorative. Each one links to a specific passage in our published docs, validated before it reaches your browser.

Why a LangGraph agent

The answer is not produced by a single prompt. It is the output of a multi-step reason-and-act loop.

The agent receives your question, decides what to retrieve, calls Context Repo tools to gather evidence, reads the results, and generates a grounded answer. If the first search does not surface enough evidence, it can try a different query. If a retrieved chunk is too narrow, it can expand to siblings or the parent section. This is not a fixed pipeline. The agent decides, on each turn, what action to take next.

LangGraph orchestrates that loop. It is a TypeScript framework for building stateful, multi-step agent graphs. Each node in the graph is a function (model call, tool call, validation step). Edges define the flow. State persists across steps so the agent can accumulate evidence before committing to an answer.

We chose LangGraph over a bare prompt-and-tools loop because we needed middleware. More on that in a moment.

Rendering diagram…

The agent is deployed on LangGraph Platform, which handles thread persistence, streaming, and horizontal scaling. The generation model is Claude (Anthropic), configured for consistency.

Context Repo as data source and reasoning engine

Here is the part that makes this architecture unusual: the agent does not embed its own documents. It does not run its own vector index. It does not maintain a local copy of the docs.

It calls Context Repo's public REST API. The same API any user or agent can call.

Context Repo stores the documentation as a collection of documents, chunked into a three-level hierarchy (document, section, paragraph) with 1536-dimension vector embeddings. The agent calls tools like search_content to find relevant passages, read_chunk to get the full text of a specific chunk, and expand_context to navigate up to parent sections or sideways to sibling paragraphs.

But retrieval is only half the job. Context Repo also provides synthesize, a server-side endpoint that takes a question and a set of evidence chunks, and produces a cited answer with validated sources, identified gaps, and surfaced conflicts. The agent can use this endpoint when it has gathered enough evidence and wants a grounded synthesis pass.

The distinction matters. Context Repo is not just a database the agent reads from. It is a reasoning engine that performs evidence synthesis and citation validation. The agent is the orchestration layer that decides when and how to invoke that engine.

This is the same retrieval infrastructure described in Cited Answers, Not Just Search Results, Why Hierarchical Retrieval Finds More Than Flat RAG, and Semantic Search and Deep Search: Two Retrieval Layers. Ask Context Repo is a live consumer of all three layers.

The agent loop and why middleware is load-bearing

A bare agent loop (model decides, tool executes, repeat) is easy to build and hard to trust. Without constraints, the model can drift off-scope, burn through tool calls chasing a tangent, hallucinate citations that point to nothing, or generate answers that contradict the evidence.

Ten middleware wrap the model in Ask Context Repo. Each one enforces a single constraint. Together they form the contract that makes the answers trustworthy.

Scope Enforcement classifies each incoming question as in-scope or out-of-scope before the agent begins work. If you ask about something outside the Context Repo docs, the middleware short-circuits with an honest "I don't have documentation about that." No tool calls. No wasted compute.

Tool and Model Budgets cap the number of tool calls and model invocations per turn. The agent cannot run an infinite loop of search-read-search-read. If it exhausts its budget, a wrap-up middleware forces it to generate the best answer it can from the evidence gathered so far.

Dynamic System Prompt injects a scope-aware instruction set fresh on every turn. This prompt carries five accuracy rules: always search before answering, cite only what was retrieved, acknowledge gaps honestly, surface conflicts between sources, and respect the scope boundary. The prompt is not static. It is rebuilt per turn so the model always sees current instructions.

Summarization compresses long conversation histories when they approach the model's practical context window. This keeps multi-turn threads manageable without losing the thread of the conversation.

Citations is the final middleware in the chain. It registers retrieved sources with stable ordinals, validates that every citation marker in the generated answer points to a real source, strips markers that reference nothing, renumbers the survivors densely, and writes the validated citations to a dedicated state channel for the frontend to render.

Rendering diagram…

Without this middleware stack, the agent is just a model with tools. With it, the agent has a contract: stay in scope, stay on budget, stay grounded, and prove every claim.

Turning evidence into citations you can click

Citations are not an afterthought bolted on at the end. They are a pipeline that runs from the moment the first chunk is retrieved until the answer reaches your browser.

When the agent retrieves chunks from Context Repo, each chunk enters a source registry and receives a stable ordinal. The model sees its evidence formatted as "Source [1] - Document Title > Section Name: content..." lines. When it generates an answer, it references sources using those ordinal markers: [1], [2], [3].

After generation, the citations middleware validates every marker. If the model cited [7] but only six sources were retrieved, that marker gets stripped. The surviving markers are renumbered densely (no gaps) and written to a citationsByMessage channel in the graph state. This channel streams to the frontend alongside the answer text.

Rendering diagram…

The frontend renders each validated citation as a clickable chip. Click it, and you see the source passage in a panel. Click again, and you land on the original published documentation page. The chain is: model claim, validated marker, registered chunk, public docs page. Every link is auditable.

The model cannot invent citations. It can only cite ordinals that correspond to chunks the agent actually retrieved. If it tries to cite something that was not retrieved, the validation step removes it. This is not a policy the model is asked to follow. It is a structural guarantee enforced by code.

Why a public chat needs a guardrail tier

Ask Context Repo is public and anonymous. Anyone with a browser can ask questions. That means the backend needs a layer between visitors and the LLM that the visitor cannot bypass.

Bot verification prevents automated scripts from hammering the endpoint. If you are not a real browser session, you do not reach the agent.

Rate limiting uses sliding-window counters so no single visitor can monopolize the system. Message quotas control how much LLM compute any one session can consume. Input size limits prevent cost inflation from enormous pasted inputs.

Bounded work clamps tool and model budgets server-side. Even if a question triggers a complex retrieval chain, the agent cannot recurse indefinitely. The system forces a graceful answer once budgets are exhausted.

Fixed scope pinning happens at thread creation. The scope (which collection the agent can access) is set by the server, not by the visitor. A visitor cannot escalate their thread to access documents outside the docs collection. This is not a prompt instruction the model might ignore. It is an argument resolved server-side before the agent graph starts.

Auth stripping ensures that internal service metadata, API credentials, and infrastructure details never reach the browser. The response the visitor sees contains only the answer text, citation data, and agent step metadata. Nothing else.

Each category exists because the chat is public. A private, authenticated agent could trust its caller more. A public, anonymous agent cannot trust its caller at all.

Honest boundaries

The agent is only as good as the documentation it can access. If the answer is not in the collection, the agent says so. This is not a failure mode. It is part of the answer shape.

The scope enforcement middleware classifies questions before the agent starts work. The dynamic system prompt carries an explicit rule: acknowledge the scope boundary. And the model itself is instructed to say "I don't have documentation about that" rather than speculate. Gaps in the answer are preferable to hallucinated claims. The agent can never fabricate a citation, and it can never generate a grounded answer from evidence that does not exist in the collection.