AgentFetch

Why does plain `fetch()` fail for AI agents?

Plain fetch() (or requests.get()) fails for AI agents for five concrete reasons: (1) raw HTML wastes 70-85% of LLM context tokens on nav/footer/scripts; (2) no retry logic means transient 429/503 errors cause silent task failure; (3) JavaScript-heavy sites return empty shells with content loaded post-render; (4) anti-bot pages (Cloudflare, PerimeterX) return interstitials that look like content to the model; (5) encoding bugs — windows-1252, mis-declared UTF-8, BOM markers — produce garbled output the model hallucinates around. A production AI agent calling 50 URLs with raw fetch() will typically have a 60-80% useful-content rate vs 95%+ with a proper fetch layer. The fix is delegating to a dedicated tool — AgentFetch, Firecrawl, Jina Reader, or a hand-rolled wrapper around readability-lxml + tenacity + chardet. Token-cost math also matters: even when raw fetch() "works", you pay 5-10x more in model tokens than necessary. For agent workloads, the fetch layer is not optional infrastructure — it's the difference between a $0.10 task and a $1.00 task, and between 90% reliability and 60%.