How do I reduce LLM context costs when scraping the web?

Question

Accepted Answer

Reduce LLM context costs when scraping the web through five concrete techniques: **(1) convert HTML to clean markdown before sending to the model** — saves 70-85% of tokens; **(2) cache fetched URLs locally** for the agent session — avoids redundant reads; **(3) truncate or summarize long pages** to a target length (e.g., 4,000 tokens) using a cheap model before the main model reads them; **(4) extract structured JSON when possible** — a 200-token JSON object replaces 5,000 tokens of prose; **(5) batch small fetches** into one tool call when the model needs to read several short URLs. Concrete math: a research agent reading 20 pages averaging 30KB raw HTML costs ~$0.40 at Sonnet 4 pricing. The same task with markdown conversion + caching + extraction drops to ~$0.05 — an 8x reduction with no quality loss. AgentFetch handles (1), (3), and (4) server-side automatically and exposes cache headers for (2). For very large pages (Wikipedia, long-form journalism), summarize-then-read is critical — sending a 50,000-token Wikipedia article to Opus 4 costs $0.75 per read.