Terse Mode — Output Token Compressor

Make the AI's mouth smaller, not its brain smaller. Same answers, fewer words, byte-exact code.

What this is

A prompt-level output compression skill. When active, the AI drops verbose filler while preserving every technical detail. Signals correctness, not chattiness.

When to activate

Any coding session where reply length is dominant (explanations, reviews, planning)
Extended sessions where you want output tokens to stretch further
Reading the AI's output out loud sounds like padding — that's the tell

When to skip

User explicitly wants long, teaching-oriented explanations
Documentation-writing tasks where the output IS the artifact
First contact with a new user who hasn't opted in

The rules — ALWAYS PRESERVED

The AI must byte-preserve these no matter which level:

Fenced code blocks (``lang ... `) — exact contents, no changes

Inline code — exact


Shell commands and error text — exact
URLs, file paths, function names, identifiers — exact
Numbers, versions, hashes — exact
YAML/JSON/config snippets — exact


Levels

off `— normal AI voice`


Default. No compression.

lite `— light trim`


Drop obvious filler ("basically", "essentially", "in order to", "the reason is that")
Convert "you should X" → "X"
Merge sentences that repeat the same idea
Keep normal-looking paragraphs

Example — same info, ~25% shorter:
> The React component re-renders because a new object reference is created on each render. Wrap the object in

useMemo

full `— default terse`

Fragment sentences: "New ref each render. Wrap in useMemo."


Drop transitional phrases entirely

Use → and = freely instead of prose connectors


Assume user is a senior engineer


Example — ~50% shorter:
> New ref each render → re-render. Wrap object in

useMemo

ultra `— maximum compression`


Telegram-style. Symbols over words. Numbered points, one line each.
Only expand if the compression would lose a technical fact.

Example — ~70% shorter:
> Ref/render.

useMemo

 it.
Interaction with other ECC systems

RTK compresses input tokens (tool output → LLM). Terse compresses output tokens (LLM → user). Stack together for ~55-65% total savings.
kodelyth-memory captures still capture in normal voice — memory recall is for machines, not humans. Terse mode does NOT affect memory captures.

code-reviewer and release-captain agents can opt in via --terse flag for one-line PR comments and short commit messages.


What terse mode NEVER does

Change what the AI knows
Skip technical details or trade-offs
Compress code, commands, or errors
Translate — write in the user's own language, just tighter

Auto-activate — user opts in via /terse or CLI


Activation

Slash command: /terse [lite|full|ultra|off] — sticks for the session

CLI: kodelyth-ecc terse enable [--target claude-code|--all] — installs the skill + command into your AI tool

Statusline (Claude Code): shows [TERSE ⚡ 12.4k] — lifetime output tokens saved


Honest numbers
On verbose explain-heavy tasks: 60-70% output token reduction
On terse debugging chats: 20-30% (less to compress)
On documentation writing: net zero — skip this mode
Skill itself adds ~800-1200 input tokens per turn. Below ~2k output tokens, may be net-negative

Attribution

Design inspired by Caveman (MIT, by Julius Brussee). ECC's implementation is independent — different prompt, different levels dial (no wenyan`, no cavespeak persona), and integrated with our RTK ledger for combined input+output tracking.

Terse Mode — Output Token Compressor

What this is

When to activate

When to skip

The rules — ALWAYS PRESERVED

Levels

off — normal AI voice

lite — light trim

full — default terse

ultra — maximum compression

Interaction with other ECC systems

What terse mode NEVER does

Activation

Honest numbers

Attribution

off `— normal AI voice`

lite `— light trim`

full `— default terse`

ultra `— maximum compression`