How do I reduce LLM context costs when scraping the web?
Reduce LLM context costs when scraping the web through five concrete techniques: (1) convert HTML to clean markdown before sending to the model — saves 70-85% of tokens; (2) cache fetched URLs locally for the agent session — avoids redundant reads; (3) truncate or summarize long pages to a target length (e.g., 4,000 tokens) using a cheap model before the main model reads them; (4) extract structured JSON when possible — a 200-token JSON object replaces 5,000 tokens of prose; (5) batch small fetches into one tool call when the model needs to read several short URLs. Concrete math: a research agent reading 20 pages averaging 30KB raw HTML costs ~$0.40 at Sonnet 4 pricing. The same task with markdown conversion + caching + extraction drops to ~$0.05 — an 8x reduction with no quality loss. AgentFetch handles (1), (3), and (4) server-side automatically and exposes cache headers for (2). For very large pages (Wikipedia, long-form journalism), summarize-then-read is critical — sending a 50,000-token Wikipedia article to Opus 4 costs $0.75 per read.