Codebase Graph — AST Intelligence Across 158 Languages
Kodelyth ECC integrates DeusData/codebase-memory-mcp — a single static binary that indexes any codebase into a tree-sitter AST knowledge graph with Hybrid LSP semantic type resolution.
Structural queries like "who calls X" or "what does the auth flow look like" now cost ~3,400 tokens instead of ~412,000 tokens via file-by-file grep. 99% token reduction.
Their binary, their curl script, their MIT license. ECC installs, wires, and surfaces it. No fork, no code copy, no npm dependency.
What you get
- AST-parsed graph — 158 languages via tree-sitter grammars vendored into the binary
- Hybrid LSP — semantic type resolution for Python, TypeScript / JavaScript / JSX / TSX, PHP, C#, Go, C, C++, Java, Kotlin, and Rust (parameter binding, return-type inference, generic substitution, JSX component dispatch, JSDoc inference)
- Cross-service linking — HTTP routes, gRPC, GraphQL, tRPC, EventEmitter channels
- 14 MCP tools —
search_graph,trace_path,get_architecture,manage_adr,semantic_query,detect_changes,search_code,dead code detection,Cypher queries, and more - Zero infrastructure — SQLite-backed, persists to
~/.cache/codebase-memory-mcp/ - Local only — your code never leaves your machine
Auto-install via ECC
Add --codebase-graph to your install:
npm i -g kodelyth-ecc
kodelythecc --target claude-code --codebase-graphOr after ECC is installed:
kodelythecc codebase installBoth flows:
- Detect if
codebase-memory-mcpis on your PATH (idempotent — reuses existing install) - If not, install via their official curl script (
~/.local/bin/codebase-memory-mcp) - Run their
installcommand which auto-registers MCP entries in every detected AI-coding agent (~/.claude.json, Codex CLI, Gemini CLI, Zed, OpenCode, Antigravity, Aider, KiloCode, VS Code, OpenClaw, Kiro)
First index
Open a project in your AI tool. Say:
Index this project
The MCP tool index_repository builds the graph. Django-scale takes ~6 seconds. Linux kernel (28M LOC, 75K files) takes 3 minutes.
Verify:
kodelythecc codebase statuscodebase-memory-mcp: codebase-memory-mcp 0.8.1
indexed projects: 8
cache dir: /Users/you/.cache/codebase-memory-mcp
next: open a project in your AI tool and say "Index this project"Query the graph from the CLI
kodelythecc codebase query search_graph '{"name_pattern": ".*Handler.*"}'
kodelythecc codebase query trace_path '{"function_name": "main", "direction": "outbound"}'
kodelythecc codebase query get_architecture '{}'
kodelythecc codebase query detect_changes '{}'All queries run locally. No LLM cost. Results are structured JSON your AI tool can consume in a single MCP call.
CLI reference
kodelythecc codebase install # install binary + auto-register agents
kodelythecc codebase status [--json] # binary version + indexed projects + cache dir
kodelythecc codebase register # re-run their auto-configure step for installed agents
kodelythecc codebase query <cli-cmd> [json] # pass-through to `codebase-memory-mcp cli`
kodelythecc codebase --help # focused helpGraph edge types (selected)
CALLS— function-to-functionIMPORTS— module dependencyDEFINES— file defines a symbolIMPLEMENTS— interface/trait implementationINHERITS— class inheritanceHTTP_CALLS,ASYNC_CALLS— cross-serviceEMITS,LISTENS_ON— pub-sub channelsDATA_FLOWS— arg-to-param mapping with field access chainsSIMILAR_TO— MinHash + LSH near-clone detectionSEMANTICALLY_RELATED— vocabulary-mismatch, same-language, score ≥ 0.80
Common queries (via your AI tool)
Once indexed, ask your AI tool things like:
- "Who calls
ProcessOrder?" - "What's the impact of changing
AuthMiddleware?" - "Show me the architecture of this repo"
- "Find dead code — functions with zero callers"
- "Which HTTP routes touch the
userstable?"
The AI translates natural language to MCP calls behind the scenes. You never write Cypher unless you want to.
Dashboard view
kodelythecc dashboard → Codebase tab shows:
- Binary version
- Indexed project count (real, from
list_projects) - Graph nodes / edges
- Language distribution
- Entry points (top 5)
- Project list with per-project node + edge counts
When no active session graph exists, dashboard shows the indexed project list with node/edge counts. When you open a project in your AI tool, its architecture snapshot fills in.
All numbers come from live queries — zero hardcoded values.
Performance
Benchmarked on Apple M3 Pro (from their docs):
| Operation | Time |
|---|---|
| Linux kernel full index | 3 min (28M LOC, 75K files → 4.81M nodes, 7.72M edges) |
| Linux kernel fast index | 1m 12s (1.88M nodes) |
| Django full index | ~6s (49K nodes, 196K edges) |
| Cypher query | <1ms |
| Name search (regex) | <10ms |
| Dead code detection | ~150ms |
| Trace call path (depth=5) | <10ms |
RAM-first pipeline: all indexing runs in memory with LZ4 compression and in-memory SQLite. Memory is released after indexing completes.
Attribution
- License: MIT
- Maintainer: DeusData/codebase-memory-mcp
- Research paper: Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP
- ECC's wrapper:
scripts/codebase/index.js— thin, no fork - Fallback: If upstream disappears, ECC will fork + vendor. MIT permits.
See also
- MCP Server — how ECC exposes its own MCP surface
- External MCP Servers — register more MCP servers
- Dashboard — live codebase tile
- Getting Started — install path