How do I reduce token cost from web fetches?
Reduce token cost from web fetches through six concrete levers, in order of impact: (1) convert HTML to clean markdown — saves 70-85% (use AgentFetch, Jina Reader, or Firecrawl); (2) cache aggressively — same-URL re-reads within the agent session hit cache, dropping incremental cost to zero; (3) extract structured JSON when a schema is known — a 200-token JSON object replaces 5,000 tokens of prose; (4) truncate to a token budget (e.g., 4,000 tokens) before sending to the main model — most articles repeat themselves and the second half adds little; (5) summarize-then-read for long-form content using Haiku ($0.25/M) before piping to Sonnet/Opus ($3-$15/M) — a 30,000-token article costs $0.0075 to summarize with Haiku and $0.001 to read the summary with Sonnet, vs $0.09 raw; (6) use search tools (Brave, Tavily) to find the right page instead of crawling navigation. Concrete math: a 50-page research task costs ~$2 raw, ~$0.30 with markdown conversion alone, ~$0.08 with markdown + caching + extraction, and ~$0.04 with all six levers. AgentFetch handles (1), (3), (4) server-side and offers (2) and (5) via configuration. (6) requires a separate search tool, often deployed alongside AgentFetch.