Terse Mode — Output Token Compressor
Make the AI's mouth smaller, not its brain smaller. Same answers, fewer words, byte-exact code.
What this is
A prompt-level output compression skill. When active, the AI drops verbose filler while preserving every technical detail. Signals correctness, not chattiness.
When to activate
- Any coding session where reply length is dominant (explanations, reviews, planning)
- Extended sessions where you want output tokens to stretch further
- Reading the AI's output out loud sounds like padding — that's the tell
When to skip
- User explicitly wants long, teaching-oriented explanations
- Documentation-writing tasks where the output IS the artifact
- First contact with a new user who hasn't opted in
The rules — ALWAYS PRESERVED
The AI must byte-preserve these no matter which level:
- Fenced code blocks (``
lang ...`) — exact contents, no changes - Inline code
— exact - Shell commands and error text — exact
- URLs, file paths, function names, identifiers — exact
- Numbers, versions, hashes — exact
- YAML/JSON/config snippets — exact
Levels
off — normal AI voice
Default. No compression.lite — light trim
- Drop obvious filler ("basically", "essentially", "in order to", "the reason is that")
- Convert "you should X" → "X"
- Merge sentences that repeat the same idea
- Keep normal-looking paragraphs
.full — default terse
- Fragment sentences: "New ref each render. Wrap in
useMemo."
Drop transitional phrases entirely
Use → and = freely instead of prose connectors
Assume user is a senior engineer
Example — ~50% shorter:
> New ref each render → re-render. Wrap object in useMemo.ultra — maximum compression
- Telegram-style. Symbols over words. Numbered points, one line each.
- Only expand if the compression would lose a technical fact.
Example — ~70% shorter:
> Ref/render. useMemo it.Interaction with other ECC systems
- RTK compresses input tokens (tool output → LLM). Terse compresses output tokens (LLM → user). Stack together for ~55-65% total savings.
- kodelyth-memory captures still capture in normal voice — memory recall is for machines, not humans. Terse mode does NOT affect memory captures.
code-reviewer and release-captain agents can opt in via --terse flag for one-line PR comments and short commit messages.
What terse mode NEVER does
- Change what the AI knows
- Skip technical details or trade-offs
- Compress code, commands, or errors
- Translate — write in the user's own language, just tighter
- Auto-activate — user opts in via
/terse or CLI
Activation
- Slash command:
/terse [lite|full|ultra|off] — sticks for the session
CLI: kodelyth-ecc terse enable [--target claude-code|--all] — installs the skill + command into your AI tool
Statusline (Claude Code): shows [TERSE ⚡ 12.4k] — lifetime output tokens saved
Honest numbers
- On verbose explain-heavy tasks: 60-70% output token reduction
- On terse debugging chats: 20-30% (less to compress)
- On documentation writing: net zero — skip this mode
- Skill itself adds ~800-1200 input tokens per turn. Below ~2k output tokens, may be net-negative
Attribution
Design inspired by Caveman (MIT, by Julius Brussee). ECC's implementation is independent — different prompt, different levels dial (no
wenyan`, no cavespeak persona), and integrated with our RTK ledger for combined input+output tracking.