How do I scrape JSON-LD with MCP?

Question

Accepted Answer

Scrape JSON-LD with MCP by calling AgentFetch's `extract_json` tool — it automatically locates `<script type="application/ld+json">` blocks in the page HTML, parses them, and returns the structured data matching your requested schema. Example: `extract_json("https://example.com/recipe", schema={"name": "string", "recipeIngredient": "array", "totalTime": "string"})` returns the recipe schema.org fields if present. JSON-LD is the most reliable structured-data source on the modern web — ~40% of crawlable pages publish it, including most major news (NYT, WaPo, Reuters), recipe sites (AllRecipes, Serious Eats), e-commerce (Shopify-default themes), and event listings. When JSON-LD is present, extraction is essentially free of model tokens — the data is already structured, so AgentFetch returns 100-500 bytes of clean JSON vs sending 30KB of HTML through an extraction LLM. For pages without JSON-LD, AgentFetch falls back to OpenGraph + Twitter Cards meta tags (covers another ~30% of pages), then to a small extraction model on clean markdown (handles the long tail). This three-tier fallback is invisible to the calling agent — it just gets the schema-shaped JSON back. Avoid manually parsing JSON-LD client-side; pages often ship multiple blocks (Product + BreadcrumbList + Organization) and AgentFetch's matcher picks the right one for your schema.