January 13, 2026

I Built an AI-Native CLI Tool That Uses 4.4x Fewer Tokens Than Grep

AI-nativeTokensDeveloper toolingGo

By Michael

I Built an AI-Native CLI Tool That Uses 4.4x Fewer Tokens Than Grep

I've been working on an unreleased TUI app built in Go. I had to refactor because my Bubbletea Model had gotten massive. It had become a single-depth struct with all the properties dumped at the top level. Things fell where they could while moving fast on features. I was leveraging Claude to break down the model into smaller parts, but the problem with that kind of structure is that references to it are riddled throughout the codebase. That's where I was most looking forward to using Claude - cleaning up my tedious, ignorant mess.

I was halfway through the first submodel refactor when I saw three huge flashes of text from Claude. Grep output. That's when I got curious about whether I could make a tool and register it with Claude like a Skill, Hook, or Agent.

The answer is yes, and the tool is checkfor. JSON output, minimal tokens, built for repetition. The whole thing taught me something about a category of tooling we've needed for years, but that not many people are talking about.

Check out checkfor here

This project demonstrates Go proficiency (CLI tool development, file I/O, pattern matching), API design (token-efficient JSON output), performance optimization (4.4x token reduction vs Grep, 19.4x vs Read), and MCP protocol integration (extending Claude Code with custom tools).

The Core Insight: AI Collaboration Tools Need Different Design

Building tools for AI collaboration is a fundamentally different design problem than building tools for humans.

Grep is already optimized. The source code is fine. What's not optimized is the output format. Grep was designed for humans to read in a terminal - color codes, repeated file paths for every match, a thousand and one different arguments. All of that makes perfect sense when you're the one using and reading it.

But Claude doesn't need any of that. Claude pays tokens for colors it can't see, formatting it doesn't use, and repeated information it already has. The optimization target changed from "human readability" to "token efficiency," and that means different tools entirely.

This isn't about making grep better. Grep is great at what it does. This is about recognizing that AI collaboration has different constraints, and we need a new category of tooling built around those constraints.

The Numbers: 4.4x Token Reduction

Here's what the token usage actually looked like during a real refactoring session that required 12 verification queries across multiple phases:

Method	Total Tokens	Multiplier vs checkfor	Cost (Sonnet 4.5)
checkfor (actual)	~8,000	1x	$0.024
Grep	~35,100	4.4x	$0.105
Read (16 files × 3 passes)	~155,250	19.4x	$0.466

Token calculation for Read tool:

16 files totaling ~3,450 lines
Average 15 tokens per line
3 passes for inventory, mid-check, final verification
51,750 tokens per pass × 3 = 155,250 tokens

Session context limit: 200,000 tokens

With Read tool approach, would have exceeded limit during phase 3 of 5. With checkfor, completed all phases in single session.

API cost savings for full refactor: $0.442 vs Read, $0.081 vs Grep.

The Real-World Problem

The refactoring touched 16 files in the internal/cli/ directory. Multiple phases - first breaking out the FormModel, then TableModel, then NavigationModel. Each phase meant finding every reference to the old fields and updating them to use the new submodel structure.

Claude was doing the refactor. It would update three or four files, then verify it didn't miss anything by running grep looking for old field names. Huge block of output scrolls by showing every match with context lines. Claude reports back: "Found 17 more references across 8 files."

Then it would do it again. Update those files. Verify with grep. More formatted output. "Found 9 references remaining."

And again. And again.

Each verification was necessary. When you're relying on AI to reliably refactor, you have to verify 4x to 5x as many times as the agent updates code. One missed reference and the whole thing breaks and you're back to pasting logs into Claude.

But each grep call was returning formatted output designed for a human to parse visually, and Claude was consuming all of it as tokens.

For most people it isn't an issue. So I spend 1000 tokens instead of 500, what's the big deal? That's less than a penny in API costs. But if I had left it, I would have had to start a new session and lose context at 3 refactors instead of getting through 5.

Why AI Tools Need Different Design

Humans need formatting. Color codes, visual hierarchy, context lines to understand what they're looking at. When grep shows you a match with two lines before and after, that's helpful. You can see the function it's in, what's happening around it.

AI doesn't need that. Claude pays tokens for colors it can't see. It pays tokens for repeated file paths on every match when the filename once in a JSON structure would be enough. Context lines might be useful sometimes, but most of the time the line number alone is sufficient.

Then there's the repetition problem. A human runs grep once to find something. Claude might run the same search 12 times in one session to track progress. "How many references are left?" becomes a question you ask over and over as the refactor proceeds.

For humans, verbose output is mildly inconvenient. For AI, it can kill the entire workflow by exhausting the context window. Token budget isn't a soft constraint you can ignore. It's a hard limit, and when you hit it, the session ends.

Design Principles I Used

Answer exactly the question asked. checkfor only scans one directory at single depth. Not recursive. If you ask about internal/cli/, that's what you get. Nothing more. This is different from most search tools that default to recursion because humans often want "find this anywhere." But for verification, you usually know exactly which directory matters.

JSON-only output. No human-friendly formatting, no colors, no repeated headers. Structured data that AI can parse instantly. The output includes a match count at the top, then an array of files with their matches. Line numbers, content, optional context. That's it.

Minimal by default, configurable when needed. Context lines default to zero. If you need surrounding lines to understand the match, add --context 1 or --context 2. Most verification tasks don't need it. "Is this field still referenced anywhere?" doesn't require knowing what function it's in.

Built for repetition. The tool is designed to be called many times in one session without token bloat. Same query at different stages of a refactor to track progress. 32 matches, then 17, then 9, then zero. Each call costs roughly the same minimal token count.

Native integration. checkfor runs as an MCP (Model Context Protocol) server, which is how tools register themselves with Claude Code. Not a wrapper script, not a hack. Claude Code sees it as a first-class tool with the same status as Read or Grep. Configuration goes in .mcp.json, and it's available immediately.

Exact counts matter. The JSON output includes matches_found as a top-level field. AI can report "17 references remaining" with confidence, not "approximately 17" or "many references." For tracking refactor progress, exact numbers make the difference between knowing you're done and guessing.

For AI Only

These tools are built for AI to use, not humans. If you run checkfor manually, you'll get raw JSON that's annoying to read. That's intentional. grep is still the right tool when you're searching files yourself. Token-optimized tools exist to make AI collaboration efficient, not to replace your existing workflow.

The Broader Pattern: Every CLI Tool Needs This

This pattern applies way beyond search tools.

Every traditional CLI tool outputs information formatted for humans. ls with colorized files and column layouts. find with verbose paths and special characters. git log with formatted commit messages and author info. Test runners with pretty output, progress bars, summary tables.

All of that makes sense when you're the one reading it. But when AI uses these tools, it pays tokens for formatting that serves no purpose. The data is there, it's just wrapped in presentation layer designed for terminal eyeballs.

There's an entire category of "AI-native tools" that doesn't exist yet. Not replacements for the originals - those work fine for what they do. Complementary tools built around a different optimization target. Where grep optimizes for human readability, checkfor optimizes for token efficiency. Same core function, different constraints.

The optimization target changed, so the tool design has to change with it. Token budgets work like memory constraints in embedded systems. You don't just use less memory, you design around the limitation from the start. That's what AI collaboration tooling needs to do.

What I Learned

Building checkfor taught me that AI collaboration isn't just about better prompts - it's about building infrastructure that respects the constraints of the medium. Token budgets aren't an inconvenience to work around, they're a fundamental design constraint that shapes what tools should look like.

I also learned that the best opportunities for AI tooling are in the repetitive operations, not the one-off tasks. Humans run grep once. AI runs it 12 times in one session. That's where token efficiency matters most.

If I were doing this again, I'd build the MCP integration from the start instead of as an afterthought. The ability to register custom tools with Claude Code turned out to be more powerful than I expected. It's not just about this one tool - it's about establishing a pattern for how AI-native tooling should work.

The meta insight: we're designing for a new category of user. Not humans reading terminals, but AI agents consuming structured data. That changes everything about what "good tool design" means.

Check Out the Project

Repository: github.com/hegner123/checkfor
Performance: 4.4x fewer tokens than Grep, 19.4x fewer than Read tool
Integration: MCP server for Claude Code

I practice AI-augmented development building real solutions to problems. Workflows, orchestrators, content generation systems, AI co-working tools, and developer tooling are what I enjoy building. Currently exploring AI engineering roles where I can identify problems AI can solve and build the systems to solve them. If you found this interesting, let's connect on LinkedIn or reach out at hegner123@gmail.com.