Building an LLM Wiki: From Karpathy's Gist to a Working CLI
On April 4, Andrej Karpathy published a gist that crystallized something I’d been thinking about for a while. The idea: instead of building RAG systems that re-derive knowledge from scattered documents on every query, have an LLM maintain a persistent wiki where the synthesis compounds over time.
“The wiki is a persistent, compounding artifact,” Karpathy writes. “Cross-references are already there. The synthesis already reflects everything you’ve read.”
Four days later, I have a working implementation. Here’s what I built and what I learned.
Key Takeaways
- Karpathy’s gist proposes synthesizing knowledge on write instead of on read, avoiding the RAG problem of re-deriving answers from scratch every query
- wiki-cli implements this as a Bun CLI with a single dependency (Claude Code Agent SDK) in ~500 lines of TypeScript
- Auto-capture hooks silently extract knowledge from every Claude Code session, so the wiki builds itself over time
- The entire system is just markdown files and JSONL logs. No vector database, no embeddings, no infrastructure
What’s Wrong with RAG?
RAG works. But it has an uncomfortable property: every query starts from scratch. Your system fetches relevant chunks, shoves them into a context window, and asks the model to synthesize an answer. The next query does the same thing, with no memory that the synthesis ever happened.
This means you’re paying (in latency, tokens, and quality) to re-derive insights your system already figured out yesterday. Worse, the connections between documents only exist transiently in the model’s context window. They’re never written down.
Karpathy’s proposal flips this. Instead of synthesizing on read, you synthesize on write. New information gets integrated into a persistent set of interconnected markdown pages. When you query later, the hard work is already done.
How Does a Three-Layer Architecture Map to a CLI?
Karpathy describes three layers:
- Raw sources - immutable documents you feed in
- The wiki - LLM-generated markdown pages with summaries and cross-links
- The schema - a configuration document that tells the LLM how to organize everything
I mapped this directly onto a CLI called wiki-cli. It’s a standalone Bun tool with a single dependency: the Claude Code Agent SDK. Each command spawns a Claude Code session that can read, write, and search files. No database, no vector store, just markdown and JSONL logs.
~/.wiki/
├── schema.md # The rules (layer 3)
├── index.md # One-line summary per page
├── pages/ # The wiki itself (layer 2)
│ ├── electron-ipc-patterns.md
│ ├── bun-sqlite-gotchas.md
│ └── ...
└── logs/
├── sources.jsonl # What was ingested
└── captures.jsonl # What was captured from sessions
Six commands cover everything:
wiki init # Set up ~/.wiki/ with schema and hooks
wiki ingest <file|url|-> # Feed new knowledge in
wiki query "question" # Ask the wiki something
wiki lint [--fix] # Health check for contradictions, orphans, gaps
wiki capture [session-id] # Extract knowledge from a Claude Code session
wiki status # Dashboard of wiki health
Why Is the Schema the Most Important File?
The schema is a markdown document that gets injected into every agent call. It defines page structure, naming conventions, tag taxonomy, cross-reference rules, and (critically) a quality bar.
## Quality Bar
**Wiki-worthy knowledge:**
- Architectural decisions and rationale
- Patterns, idioms, and best practices learned
- Gotchas, pitfalls, and non-obvious behaviors
- Domain concepts and mental models
- Reusable solutions to recurring problems
**NOT wiki-worthy:**
- Typo fixes, variable renames
- Routine CRUD operations
- Pure file browsing with no conclusions
This quality bar is what prevents the wiki from drowning in noise. Without it, you’d get a page for every minor debugging session. With it, the LLM acts as a curator, not a stenographer.
Every page follows a strict format: YAML frontmatter with tags, sources, and dates, followed by concise content and a ## Related section with [[cross-references]]. The LLM maintains index.md as a searchable catalog, one entry per page.
What Does the Agent SDK Integration Look Like?
The core of the implementation is a thin wrapper around the Claude Code Agent SDK’s query() function:
import { query } from '@anthropic-ai/claude-agent-sdk'
export async function runAgent(opts: AgentCallOptions): Promise<AgentResult> {
const options = {
cwd: wikiHome(),
model: opts.model ?? 'claude-sonnet-4-6',
tools: opts.tools ?? ['Read', 'Write', 'Glob'],
permissionMode: 'acceptEdits',
agent: 'wiki-curator',
agents: {
'wiki-curator': {
description: 'Wiki curator that maintains a structured knowledge base',
prompt: opts.systemPrompt,
tools: opts.tools ?? ['Read', 'Write', 'Glob'],
},
},
}
const q = query({ prompt: opts.userMessage, options })
let resultText = ''
let costUsd = 0
for await (const message of q) {
if (message.type === 'result') {
resultText = message.result ?? ''
costUsd = message.total_cost_usd ?? 0
}
}
return { text: resultText, costUsd }
}
A few things worth noting:
permissionMode: 'acceptEdits'means the agent can freely write files within~/.wiki/. No human-in-the-loop approval needed for page updates.- The agent is stateless. Each command spawns a fresh session. The schema, index, and pre-loaded relevant pages are injected into the system prompt. No persistent memory beyond what’s written to disk.
- Tools vary by command. Ingest gets
Read,Write,Glob, and optionallyWebFetch(for URLs). Query gets read-only access. Triage gets no tools at all, it just classifies.
How Does Ingest Work?
The ingest command is the primary way to feed knowledge into the wiki. It accepts local files, URLs, or piped stdin:
wiki ingest ./architecture-decisions.md
wiki ingest https://docs.bun.sh/api/sqlite
echo "Electron doesn't support bun:sqlite" | wiki ingest -
Before the agent runs, the CLI does something important: it pre-loads relevant pages. Using a simple keyword-matching algorithm against index.md, it identifies which existing wiki pages are most likely to need updating and includes their full content in the prompt.
const entries = parseIndex(index)
const relevant = matchPages(entries, sourceContent.slice(0, 2000))
const pages = await loadPages(wikiPagesDir(), relevant.map(e => e.name))
This is a pragmatic alternative to vector search. The index is small enough (one line per page) that keyword matching works well, and it avoids adding an embedding model or vector database as a dependency.
The agent then reads the source material, decides what’s wiki-worthy, creates or updates pages, and maintains cross-references. Every ingest is logged to sources.jsonl with the source label and cost.
What Makes Capture the Killer Feature?
Here’s where it gets interesting. The capture command extracts knowledge from Claude Code sessions themselves. Every time you use Claude Code to debug a tricky issue, make an architectural decision, or learn something new about a library, that knowledge can flow back into the wiki.
It works in two phases.
Phase 1: Triage
First, a lightweight agent reads the session transcript and decides if it contains anything wiki-worthy. This uses a single-turn call with no tools, just classification:
const triageResult = await runAgent({
systemPrompt: triagePrompt({ schema, index }),
userMessage: `Session transcript:\n\n${transcript}`,
tools: [],
maxTurns: 1,
})
The triage returns a JSON verdict: {worthy: true/false, topics: [...], summary: "..."}. Most sessions fail triage, and that’s by design. You don’t want a wiki page for “fixed a typo in the README.”
Phase 2: Extraction
If the session passes triage, a second agent with full write access extracts the knowledge, creates or updates pages, and maintains the index. The triage summary guides what to focus on, so the extraction agent doesn’t get lost in the noise of a long debugging session.
Auto-Capture via Hooks
The real magic: wiki init installs a Claude Code hook that runs wiki capture --auto at the end of every session. Silently. In the background.
{
"hooks": {
"SessionEnd": [{
"matcher": "",
"hooks": [{
"type": "command",
"command": "wiki capture --auto"
}]
}]
}
}
The --auto flag suppresses all output and exits silently if the session isn’t wiki-worthy. You never notice it running. But over days and weeks, your wiki accumulates knowledge from every meaningful coding session without any manual effort.
This is, in my opinion, the most compelling part of the whole system. It turns every Claude Code session into a potential knowledge source. The human’s job becomes exactly what Karpathy described: “curate sources, direct analysis, ask good questions.” The LLM handles everything else.
How Do You Keep a Wiki from Rotting?
Wikis rot. Links break, information goes stale, pages drift into contradiction. The lint command runs a health check:
wiki lint # Report issues
wiki lint --fix # Report and fix them
It scans for five categories of problems:
- Stale info - content that may be outdated
- Contradictions - pages that disagree with each other
- Orphan pages - no inbound cross-references
- Gaps -
[[page-name]]references pointing to nonexistent pages - Merge candidates - pages with significant content overlap
With --fix, the agent actually resolves the issues: merging overlapping pages, creating stubs for gaps, adding missing cross-references. Run it weekly and the wiki stays coherent.
Design Decisions That Mattered
A few choices that shaped the project:
No vector database. The index is a flat markdown file. Keyword matching against one-line summaries is fast and good enough when your wiki has dozens to low hundreds of pages. This keeps the dependency count at exactly one (the Agent SDK) and means the entire system is human-readable markdown files.
Stateless sessions. Each command starts a fresh Claude Code session. There’s no conversation state to manage, no sessions to resume, no memory to corrupt. The wiki itself is the memory. This makes the system dramatically simpler and more reliable.
Schema as configuration. The schema isn’t code, it’s a markdown document the LLM reads. Want to change how pages are organized? Edit schema.md. Want a stricter quality bar? Add criteria. Want different tag conventions? Update the taxonomy section. No code changes needed.
Two-phase capture. Triage before extraction means 90% of sessions are dismissed with a single cheap API call (no tools, one turn). Only the worthwhile sessions trigger the full extraction pipeline. This keeps the auto-capture hook fast and inexpensive.
What’s It Actually Like to Use?
After a few days, the wiki has pages I didn’t explicitly create. They materialized from sessions where I was debugging Electron IPC issues, figuring out Bun’s SQLite API quirks, or making architecture decisions for other projects.
The query command is where it pays off:
wiki query "What did I learn about Electron IPC patterns?"
Instead of searching through old chat logs or trying to remember which session had that insight, the wiki has a consolidated page with cross-references to related topics. The synthesis was done at capture time, not query time.
It feels like a second brain that actually works. Not because the retrieval is smarter, but because the knowledge was organized when it was fresh.
Credit Where It’s Due
This project is a direct implementation of Andrej Karpathy’s LLM wiki gist. The three-layer architecture, the schema-driven approach, the ingest/query/lint operations: all from his design. What I added was an implementation layer. The Agent SDK integration, the auto-capture hook for Claude Code sessions, and the specific prompts and tooling to make it work as a CLI.
The gist is worth reading in full. The core insight, that LLMs should synthesize on write not on read, is one of those ideas that seems obvious in retrospect but changes how you think about knowledge management entirely. For where this persistent memory layer sits in the broader context-surface map (alongside CLAUDE.md, skills, and MCP), see Context Engineering in Practice.
Frequently Asked Questions
How much does auto-capture cost per session?
Triage is a single-turn call with no tools, so it costs fractions of a cent. Most sessions fail triage and stop there. When a session does pass, the full extraction typically runs $0.01-0.05 depending on transcript length and how many pages need updating.
Does this work with LLMs other than Claude?
Not currently. wiki-cli is built specifically on the Claude Code Agent SDK, which gives the agent access to file system tools (Read, Write, Glob). You could adapt the prompts for another model, but you’d need to replace the agent orchestration layer entirely.
How does it scale as the wiki grows?
The keyword-matching approach works well up to a few hundred pages. Beyond that you’d probably want to add vector search for page selection. The architecture makes this straightforward since you’d only need to swap out matchPages() in index-parser.ts.
Can I use this across multiple machines?
Yes. Point WIKI_HOME at a synced directory (Dropbox, iCloud, git repo) and the wiki is available everywhere Claude Code is installed. The files are just markdown and JSONL, so conflicts are easy to resolve.
What if the LLM makes a mistake in a wiki page?
Run wiki lint to catch contradictions and stale info. You can also edit pages directly since they’re just markdown files. The next ingest or capture will respect your manual edits.
Try It
The project is built with Bun and requires Claude Code to be installed and authenticated. The entire implementation is about 500 lines of TypeScript across a handful of files.
git clone https://github.com/iceinvein/wiki-cli
cd wiki
bun install
bun run bin/wiki.ts init
Then just use Claude Code normally. The wiki builds itself.
If it was useful, pass it along.