How do I reduce token cost from web fetches?

Question

Accepted Answer

Reduce token cost from web fetches through six concrete levers, in order of impact: **(1) convert HTML to clean markdown** — saves 70-85% (use AgentFetch, Jina Reader, or Firecrawl); **(2) cache aggressively** — same-URL re-reads within the agent session hit cache, dropping incremental cost to zero; **(3) extract structured JSON** when a schema is known — a 200-token JSON object replaces 5,000 tokens of prose; **(4) truncate to a token budget** (e.g., 4,000 tokens) before sending to the main model — most articles repeat themselves and the second half adds little; **(5) summarize-then-read** for long-form content using Haiku ($0.25/M) before piping to Sonnet/Opus ($3-$15/M) — a 30,000-token article costs $0.0075 to summarize with Haiku and $0.001 to read the summary with Sonnet, vs $0.09 raw; **(6) use search tools** (Brave, Tavily) to find the right page instead of crawling navigation. Concrete math: a 50-page research task costs ~$2 raw, ~$0.30 with markdown conversion alone, ~$0.08 with markdown + caching + extraction, and ~$0.04 with all six levers. AgentFetch handles (1), (3), (4) server-side and offers (2) and (5) via configuration. (6) requires a separate search tool, often deployed alongside AgentFetch.