90% Token Savings: Building a Local Alternative to Claude's WebFetch
90% Token Savings: Building a Local Alternative to Claude's WebFetch
I built webfetch-clean, a Go-based tool that fetches web pages, strips out ads and clutter, and returns clean markdown or HTML. It works as both a CLI tool and an MCP server (a protocol that lets tools integrate directly with Claude Code). The result? 65-96% token cost savings compared to Claude's built-in WebFetch tool.
What This Demonstrates
This project demonstrates Go proficiency (HTTP clients, HTML parsing with goquery), API design (MCP protocol integration with JSON-RPC 2.0, plus a dual-mode CLI/server architecture), cost optimization (4.4x to 25x token reduction depending on page complexity), and understanding of LLM economics (where token costs come from, how to minimize them).
The Problem: Paying for Raw HTML
I had just finished building checkfor, an MCP tool for token-efficient file searching. That project taught me to think about token costs differently. Every grep command that returned 35,000 tokens was money. Every verification query added up.
So I asked Claude a question: "What are your most expensive tools?" WebFetch was on that list.
We dug into where the cost was coming from. Turns out, when you use Claude's built-in WebFetch, you pay for the entire raw HTML as input tokens. Fetch a documentation page? That's the full HTML—scripts, styles, ads, navigation menus, tracking pixels—sent to the API before Claude processes it.
A 100KB documentation page becomes roughly 25,000 input tokens. Claude reads it, summarizes it, and gives you an answer. But you've already paid for all 25,000 tokens of raw HTML.
I asked Claude: "Could we build a tool that cleans the HTML locally before it hits the API?"
Claude said yes. We started working on it.
Building It: From Idea to Prototype in an Hour
The design question was simple: where should the HTML processing happen?
Claude's WebFetch does it on the API side. Fetch the page, send everything as input tokens, then process. This might be a liability decision—Anthropic probably doesn't want to be accused of censoring what the API sees by pre-cleaning HTML.
But it's expensive. Really expensive if you're fetching documentation regularly.
My idea: do the cleaning locally. Fetch the HTML with a standard HTTP client, strip out the garbage, convert to markdown, then return just the cleaned output.
Zero API tokens for the processing.
I knew what the core targets were: head, styles, scripts, navigation elements, inline attributes. Claude helped me think through the less obvious candidates—footer, aside, sidebar, popup, modal, cookie banners, social media widgets. All the structural clutter that modern websites use.
I came up with a multi-pass cleaning strategy, inspired by how some compilers do multiple passes for different objectives. HTML is a markup language that requires surgical precision for automatic processing.
Try to do everything in one pass and you'll miss edge cases or accidentally remove content you meant to keep. Multiple passes with specific targets is safer.
First pass removes obvious noise: head, script, style, and nav elements. Second pass targets ads—elements with "ad" or "advertisement" in class or id attributes. Third pass strips tracking iframes. Fourth pass removes structural clutter like footer, aside, sidebar, popup, modal, and cookie banners. Fifth pass strips inline attributes, keeping only href, src, alt, and title. Optional sixth pass removes images if you want text-only output.
Claude found the right packages. goquery for HTML parsing (jQuery-like selectors in Go), html-to-markdown for conversion. Both mature libraries with good documentation.
We built it in stages. Started with a simple HTTP fetcher, then the multi-pass cleaner, then the converter. My first version of the ad detection was too aggressive—it removed elements with "read" and "thread" in the class name because they contained "ad". Had to add pattern matching to check for actual ad indicators like "ad-", "ad", "advertisement" instead of just substring matching.
Tested it on real URLs. Hacker News, Go documentation, random blog posts. Cleaned output looked good.
Token counts were way down.
Whole thing took about an hour from "could we build this?" to working prototype.
The Results: Real Token Savings
I tested webfetch-clean against Claude's WebFetch on four different types of pages. Here's what the token counts looked like:
| Page | WebFetch Tokens | webfetch-clean Tokens | Reduction |
|---|---|---|---|
| example.com (10KB) | ~2,500 | ~166 | 93% |
| Go Effective Go (100KB) | ~25,000 | ~1,013 | 96% |
| Hacker News (30KB) | ~7,500 | ~2,618 | 65% |
| Go blog post (20KB) | ~5,000 | ~1,124 | 77% |
The Go documentation page was the most dramatic. WebFetch sent 25,000 tokens of raw HTML to the API. webfetch-clean returned 1,013 tokens of cleaned markdown. That's 96% reduction.
Hacker News was interesting because it's mostly text already—minimal styling, no heavy JavaScript frameworks, simple layout. Still saved 65% by removing the navigation, footer, and voting UI elements.
The pattern held across different page types. Simple pages saved 90%+. Complex pages with lots of structural elements saved 95%+. Even minimalist sites like Hacker News saved 65%.
Every test returned complete, accurate content. We ran WebFetch and webfetch-clean side by side on the same URLs. WebFetch gave AI summaries of the pages. webfetch-clean returned the full cleaned content. webfetch-clean was consistently more detailed because it preserved everything—just without the clutter. No summarization, no risk of missing important details.
This is still a prototype. There are improvements to make and edge cases to find through more extensive QA. But the core concept works, and the token savings are real.
Dual-Mode Architecture
webfetch-clean is designed as an MCP server first. It's a tool for AI, not humans. The whole point is to integrate with Claude Code and provide token-efficient web fetching during AI workflows.
But you can't develop an MCP server without a way to test it quickly. Sending JSON-RPC requests through stdin for every test iteration is slow. So I added a --cli flag that switches between development mode (CLI) and production mode (MCP server). Run it with --cli --url https://example.com for fast iteration and easy debugging. Run it without flags and it speaks JSON-RPC 2.0 over stdin/stdout for Claude Code integration.
This architecture kept development fast while ensuring the production MCP server is the real deliverable.
Cost Analysis: The Business Case
Token savings sound good on paper. But what does this actually mean in dollars?
Consider a development team with 100 engineers. Each developer does 1-10 documentation lookups per day during normal work—checking API references and reading framework docs as they work through problems.
At 1 lookup per developer per day (100 pages/day):
Using Claude WebFetch:
- 100 pages × 25,000 tokens = 2.5 million tokens/day
- Monthly: ~75 million tokens
- Cost at $6/million input tokens: $450/month
Using webfetch-clean:
- 100 pages × 1,000 tokens = 100,000 tokens/day
- Monthly: ~3 million tokens
- Cost at $6/million input tokens: $18/month
Savings: $432/month, or $5,184 annually.
I didn't expect the dollar amounts to be this dramatic when I started building this.
The savings scale linearly. Double your team size or double the lookups per developer, and you double the savings. A 200-person team doing 5 lookups per day saves $43,200/month. The percentage reduction stays constant at 96%, but the dollar impact grows with usage.
There's also a non-linear benefit: local processing means no API rate limits. WebFetch at high volume might hit throttling. webfetch-clean doesn't care—it's just making HTTP requests.
The savings for a team on the scale of a FAANG company you can imagine, though I'm sure they're already using something like this in-house.
For teams actively using AI-assisted development, this isn't theoretical.
What I Learned
The potential savings from this type of tooling is insane.
I knew token costs added up, but I didn't expect the numbers to be this dramatic.
Building tools for AI collaboration is a different game than traditional performance optimization. Squeezing another picosecond from grep's algorithms takes deep expertise and brutal attention to detail. Building webfetch-clean took an hour because the opportunity was obvious once you looked at where the costs were.
The world of AI tooling is still young enough that massive improvements are just sitting there waiting to be built.
I love working collaboratively with AI. Not because it writes code for me, but because the back-and-forth—problem discovery, planning, implementation, testing—actually works. I bring the architectural thinking and domain knowledge. Claude brings speed through deep understanding of the language and best practices.
Check Out the Project
Repository: github.com/hegner123/webfetch-clean
Related work: checkfor - Token-efficient file searching for AI collaboration
This is still a prototype. There are edge cases to find and improvements to make. If you're interested in testing it on different types of pages or have ideas for better cleaning strategies, contributions are welcome.
I practice AI-augmented development building real solutions to problems. Workflows, orchestrators, content generation systems, AI co-working tools, and developer tooling are what I enjoy building. Currently exploring AI engineering roles where I can identify problems AI can solve and build the systems to solve them. If you found this interesting, let's connect on LinkedIn or reach out at hegner123@gmail.com.