AgentFetch

How do I extract structured JSON from a webpage for an agent?

Extract structured JSON from a webpage for an agent by giving the fetch tool a target schema and letting it (a) parse <script type="application/ld+json"> blocks, (b) parse OpenGraph and meta tags, or (c) feed a markdown-clean version through an extraction LLM. AgentFetch exposes an extract_json tool that accepts a JSON Schema and returns matching data — for example, extract_json(url, schema={"title": "string", "author": "string", "published_date": "string"}) returns those fields if discoverable from JSON-LD, OG tags, or page structure. For e-commerce, news, and recipe sites this works ~85% of the time on the first call because most major sites publish schema.org JSON-LD. For sites without structured markup, the fallback is a small extraction model (Haiku, GPT-4o-mini) called against the clean markdown — typical cost is $0.0005-$0.002 per page. Firecrawl offers similar "Extract" functionality at higher per-page price. Avoid regex/CSS-selector extraction in agent workflows — page structure drifts and the agent has no way to self-heal. JSON Schema + structured-output models is the resilient pattern.