How do I feed PDFs from URLs to Claude?

Question

Accepted Answer

Feed PDFs from URLs to Claude by calling AgentFetch's `fetch_url` on the PDF URL — it auto-detects `application/pdf` content type, extracts text using pdfminer.six (with OCR fallback via Tesseract for scanned/image PDFs), and returns clean markdown the model can read directly. Example: `fetch_url("https://arxiv.org/pdf/2401.12345.pdf")` returns the paper's text as ~10-30KB of markdown. Without this preprocessing, Claude Desktop and Cursor can't read PDFs at URLs — Claude's native `document` content block requires base64 upload, not URL fetch, and even then it counts as a vision input (more expensive). Pipeline cost: a typical 20-page PDF is ~80KB binary → ~40KB text → ~10k tokens at $3/M Sonnet input = $0.03 to read. AgentFetch caches PDF extractions for 24 hours by default, so re-fetches are free. For arXiv specifically, prefer the HTML version when available (`/abs/` instead of `/pdf/`) — same content, 30-50% smaller markdown. For PDFs requiring OCR (scanned forms, historical archives, image-based reports), AgentFetch automatically detects the lack of text layer and routes through Tesseract; expect higher latency (~5-10s/page) and slightly noisier output. For form-fillable PDFs and tables, use the `extract_json` tool with a schema instead of raw text.