How does codebase-memory-mcp reduce token usage?

AI agents normally explore code through repeated grep-then-read cycles, which burn large numbers of tokens. codebase-memory-mcp answers the same structural questions from a precomputed graph. In a five-query benchmark it used about 3,400 tokens versus about 412,000 tokens for file-by-file search — a roughly 120x reduction.

Which programming languages does codebase-memory-mcp support?

It supports 159 languages through vendored tree-sitter grammars compiled into the binary, including Python, Go, JavaScript, TypeScript, Rust, Java, C, C++, C#, PHP, Ruby, Kotlin, Swift, and many more. Six language families — Python, TypeScript/JavaScript, PHP, C#, Go, and C/C++ — additionally get Hybrid LSP semantic type resolution.

Hybrid LSP is a clean-room re-implementation of the type-resolution algorithms used by real language servers (tsserver, pyright, gopls, intelephense, Roslyn), embedded directly into the binary. It runs alongside tree-sitter to resolve imports, generics, inheritance, and stdlib types — so call edges in the graph mirror what an IDE 'Go to Definition' would resolve, with no language-server process or per-project setup.

Does codebase-memory-mcp support semantic or natural-language code search?

Yes. Alongside structural and BM25 full-text search, it offers semantic vector search over the whole graph via the search_graph tool's semantic_query parameter. It is powered by nomic-embed-code embeddings compiled directly into the binary (768-dimensional), so it bridges vocabulary gaps — finding 'publish' when you search 'send' — with no API key, no Ollama, and no Docker. The indexer also generates SEMANTICALLY_RELATED edges between conceptually similar functions and SIMILAR_TO edges for near-duplicate and clone detection.

Can codebase-memory-mcp detect duplicate or near-clone code?

Yes. During indexing it builds SIMILAR_TO edges using MinHash plus LSH with Jaccard scoring, surfacing near-duplicate and copy-pasted functions across the codebase. Combined with SEMANTICALLY_RELATED edges, this makes refactoring candidates and redundant implementations queryable through the graph.

codebase-memory-mcp

Name: codebase-memory-mcp
Author: DeusData

by DeusData

The fastest, most efficient code intelligence engine for AI coding agents. It indexes any repository into a persistent knowledge graph — full-indexing an average repo in seconds and the Linux kernel in 3 minutes — so your agent answers structural questions with ~120x fewer tokens. Tree-sitter parsing across 159 languages, Hybrid LSP type resolution, single static C binary.

~120x

fewer tokens

159

languages

3 min

Linux kernel index

agents supported

View on GitHub Download latest release

3D knowledge-graph visualization of the codebase-memory-mcp graph showing thousands of nodes and edges

Built-in 3D graph visualization (UI variant) — explore your knowledge graph at localhost:9749.

What is codebase-memory-mcp?

codebase-memory-mcp is an open-source Model Context Protocol (MCP) server that indexes a codebase into a persistent knowledge graph of functions, classes, call chains, HTTP routes, and cross-service links. Instead of reading files one at a time, an AI coding agent queries the graph — answering structural questions with roughly 120x fewer tokens. It parses 159 languages and ships as a single static C binary with zero runtime dependencies.

It is a structural-analysis backend, not a chatbot: there is no embedded LLM and no API key. Your MCP client (Claude Code, or any MCP-compatible agent) is the intelligence layer; codebase-memory-mcp builds and serves the graph. All processing happens locally — your code never leaves your machine.

Why do AI agents waste tokens exploring code?

AI coding agents explore codebases by reading files one at a time. Every structural question triggers a cascade of grep → read file → grep again → read more files. The cost compounds fast.

Across five structural questions about a real codebase, file-by-file search consumed ~412,000 tokens; the same questions answered from the knowledge graph took ~3,400 tokens — a ~120x reduction.

The win is not about fitting the context window. It is cost (at $3–15 per million tokens, exploration adds up), latency (sub-millisecond graph queries versus seconds of file reading), and accuracy (less noise means better answers and no "lost in the middle" problem).

Source: project benchmark, 5 structural queries — see the full benchmark report.

How do I install codebase-memory-mcp?

Install with a single command, then tell your agent to index the project. It is a single static C binary for macOS, Linux, and Windows — no Docker, no runtime dependencies, no API key.

# 1. One-line install (macOS / Linux). Add --ui for the 3D graph UI.
curl -fsSL https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/install.sh | bash

# 2. The installer auto-detects and configures every installed agent.

# 3. Restart your agent, then say:
"Index this project"

One command configures all 11 supported agents: Claude Code, Codex CLI, Gemini CLI, Zed, OpenCode, Antigravity, Aider, KiloCode, VS Code, OpenClaw, and Kiro — with MCP entries, instruction files, and pre-tool hooks for each. Windows users run install.ps1. Also available via npm, pip, Homebrew, Scoop, Winget, Chocolatey, AUR, and go install.

What is Hybrid LSP?

Hybrid LSP is semantic type resolution beyond tree-sitter. Tree-sitter alone produces a syntactic AST — it handles naming, structure, and call sites, but it cannot tell that user.profile.display_name() resolves to Profile.display_name declared three modules away, because it does not track imports, generics, inheritance, or stdlib types.

codebase-memory-mcp ships a clean-room re-implementation of the type-resolution algorithms used by real language servers — tsserver/typescript-go, pyright, gopls, intelephense, and Roslyn — embedded directly into the single static C binary. There is no language-server process, no per-project setup, and no API key. This layer runs alongside tree-sitter on every parse and refines CALLS, USAGE, and RESOLVED_CALLS edges with type information, so the graph mirrors what an IDE "Go to Definition" would resolve.

Languages with full Hybrid LSP

Language	What it resolves
Python	Imports and dotted submodule walks, dataclasses, `Self` return types, generics, `@property`, `match/case` patterns, SQLAlchemy 2.0 `Mapped[T]`, Pydantic models, `typing` annotations, async/await, isinstance/walrus narrowing, and common stdlib.
TypeScript / JavaScript / JSX / TSX	Generics, JSX component dispatch, JSDoc inference for plain JS, `.d.ts` declarations, module re-exports, and method chaining via return-type propagation across a shared cross-file registry.
PHP	Namespaces, traits, late-static-binding, PHPDoc inference, parameter binding, and return-type inference.
C#	Global usings, file-scoped namespaces, records (incl. C# 12 primary constructors), LINQ method syntax, `async Task<T>`/`ValueTask<T>` unwrap, generic methods, `var` inference, and common BCL stdlib.
Go	Pre-built per-package cross-file registry, generics, embedded structs, interface satisfaction, and package-aware import resolution.
C / C++	Shared cross-language registry: macros, `typedef` chains, and header-vs-source linking on the C side; templates, namespaces, `auto` inference, and class-hierarchy method resolution on the C++ side.

The two-layer pipeline runs a fast syntactic tree-sitter pass for every one of the 159 languages, then a type-aware Hybrid LSP pass on top for the families above. Languages without a Hybrid LSP pass yet fall back to textual resolution, so you always get an answer.

Can it do semantic and natural-language code search?

Yes. Beyond structural and full-text search, codebase-memory-mcp performs semantic vector search across the whole graph — so you can find code by meaning, not just by name. A search for send surfaces functions named publish, emit, or dispatch.

It is powered by nomic-embed-code embeddings compiled directly into the binary (768-dimensional, int8). There is no API key, no Ollama, and no Docker — the embeddings run on-device, so semantic search stays 100% local like everything else. Results combine 11 signals (TF-IDF, API/type/decorator signatures, AST profiles, data flow, Halstead-lite complexity, MinHash, module proximity, and graph diffusion) into one relevance score.

Meaning-aware edges in the graph

The indexer also writes two kinds of meaning-aware edges, queryable like any other relationship:

Edge	What it captures
`SEMANTICALLY_RELATED`	Conceptually similar functions whose names and tokens differ — vocabulary-mismatch matches, scored ≥ 0.80, within the same language.
`SIMILAR_TO`	Near-duplicate and copy-pasted code, detected with MinHash + LSH and Jaccard scoring — ideal for finding clones and refactor candidates.

# Find code by meaning, not by name — embeddings run locally, no API key.
search_graph(semantic_query=["retry", "backoff", "exponential"])

Semantic and similarity edges are computed in full and moderate index modes; fast mode skips them for the lowest-latency indexing.

How much does the knowledge graph save?

Each common structural question costs hundreds of tokens against the graph versus tens of thousands via file-by-file search. Totals across five queries: ~3,400 vs ~412,000 tokens.

Question type	Graph	File-by-file	Savings
Find function by pattern	~200	~45,000	225x
Trace call chain (depth 3)	~800	~120,000	150x
Dead code detection	~500	~85,000	170x
List all routes	~400	~62,000	155x
Architecture overview	~1,500	~100,000	67x
Total	~3,400	~412,000	~121x

A separate evaluation across 31 real-world repositories, described in the preprint, reported 83% answer quality, 10x fewer tokens, and 2.1x fewer tool calls versus file-by-file exploration.

Source: “Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP”, arXiv:2603.27277 — and the project benchmark report.

How fast is it?

Indexing is RAM-first (LZ4 compression, in-memory SQLite, single dump at end) and memory is released to the OS afterward. Queries run in under a millisecond.

Operation	Time	Notes
Linux kernel full index	3 min	28M LOC, 75K files → 4.81M nodes, 7.72M edges
Django full index	~6 s	49K nodes, 196K edges
Cypher query	<1 ms	Relationship traversal
Name search (regex)	<10 ms	SQL LIKE pre-filtering
Trace call path (depth 5)	<10 ms	BFS traversal

Source: project Performance benchmarks, measured on Apple M3 Pro.

Features

159 languages

Python, Go, JS, TS, TSX, Rust, Java, C++, C#, C, PHP, Ruby, Kotlin, Scala, Zig, Elixir, Haskell, OCaml, Swift, Dart, Lean 4, and many more via vendored tree-sitter grammars compiled into the binary.

Hybrid LSP type resolution

Language-server-grade type inference for Python, TS/JS, PHP, C#, Go, and C/C++ — embedded in the binary, no server process or per-project setup.

Pure C, zero dependencies

A single static C binary for macOS, Linux, and Windows. No Docker, no runtime, no API keys. Download, run install, done.

Call-graph tracing

Trace callers and callees across files and packages with import-aware, type-inferred resolution. BFS traversal up to depth 5.

Dead-code detection

Find functions with zero callers, with smart filtering that excludes entry points like route handlers, main(), and framework decorators.

Cross-service linking

Matches REST routes to HTTP call sites across services with confidence scoring — and detects gRPC, GraphQL, and tRPC services plus pub/sub channels (EMITS/LISTENS_ON for Socket.IO, EventEmitter, and message buses) and async queue dispatch.

Infrastructure-as-code indexing

Dockerfiles, Kubernetes manifests, and Kustomize overlays become graph nodes with cross-references to the resources they configure.

Auto-sync

A background watcher detects changes and re-indexes incrementally. No manual reindex after editing files.

Team-shared graph artifact

Commit one zstd-compressed snapshot (.codebase-memory/graph.db.zst); teammates bootstrap from it and skip the full reindex.

3D graph visualization

An optional UI binary serves an interactive 3D graph at localhost:9749 to explore nodes, edges, and clusters visually.

14 MCP tools

search_graph, trace_path, detect_changes, query_graph (Cypher), get_architecture, get_code_snippet, manage_adr, and 7 more.

Cypher graph queries

Run read-only Cypher-style queries against the graph for multi-hop patterns that structured search can't express.

Semantic code search

Find code by meaning, not just name, via semantic_query vector search — powered by nomic-embed-code embeddings baked into the binary. No API key, fully local.

Clone & similarity detection

SIMILAR_TO edges (MinHash + LSH) surface near-duplicate code; SEMANTICALLY_RELATED edges link conceptually similar functions across the graph.

Cross-repo intelligence

Index multiple repositories in one store and link them with CROSS_* edges. A multi-galaxy 3D layout and cross-repo architecture summary span the whole fleet.

Data-flow tracing

DATA_FLOWS edges follow values from argument to parameter, with field-access chains — trace how data moves, not just who calls whom.

Change-impact analysis

detect_changes maps an uncommitted git diff to affected symbols and their blast radius, with risk classification — see what a change touches before you ship it.

Architecture Decision Records

manage_adr persists architectural decisions alongside the graph, so design rationale survives across sessions and teammates.

What are the recent releases?

The latest release notes are loaded live from GitHub. Each entry links to its full changelog.

Loading recent releases from GitHub…

View all releases on GitHub →

Frequently asked questions

Does codebase-memory-mcp send my code anywhere?

No. All indexing and querying happen 100% locally. There is no embedded LLM and no API key. Release binaries are signed, checksummed, and scanned by 70+ antivirus engines.

Does it support semantic or natural-language code search?

Yes. Alongside structural and full-text search, search_graph's semantic_query parameter runs vector search over the whole graph, powered by nomic-embed-code embeddings compiled into the binary — so it finds publish when you search send. No API key, no Ollama, no Docker; the embeddings run on-device. The indexer also builds SEMANTICALLY_RELATED edges between similar functions and SIMILAR_TO edges for near-clone detection.

Do I need Docker or a runtime?

No. It is a single static C binary with zero runtime dependencies for macOS (arm64/amd64), Linux (arm64/amd64), and Windows (amd64).

How does it stay up to date as I edit code?

A background watcher detects file changes and re-indexes incrementally — typically a sub-millisecond no-op when nothing changed. You only run a manual index for the first build or after a large git pull.

Why is there no built-in LLM?

Other code-graph tools embed an LLM to translate natural language into graph queries, which means extra API keys and cost. With MCP, the agent you are already talking to is the query translator — codebase-memory-mcp just builds and serves the graph.

Is it free and open source?

Yes. It is MIT licensed. The full source, signed release binaries, and checksums are on GitHub.