How do I feed PDFs from URLs to Claude?
Feed PDFs from URLs to Claude by calling AgentFetch's fetch_url on the PDF URL — it auto-detects application/pdf content type, extracts text using pdfminer.six (with OCR fallback via Tesseract for scanned/image PDFs), and returns clean markdown the model can read directly. Example: fetch_url("https://arxiv.org/pdf/2401.12345.pdf") returns the paper's text as ~10-30KB of markdown. Without this preprocessing, Claude Desktop and Cursor can't read PDFs at URLs — Claude's native document content block requires base64 upload, not URL fetch, and even then it counts as a vision input (more expensive). Pipeline cost: a typical 20-page PDF is ~80KB binary → ~40KB text → ~10k tokens at $3/M Sonnet input = $0.03 to read. AgentFetch caches PDF extractions for 24 hours by default, so re-fetches are free. For arXiv specifically, prefer the HTML version when available (/abs/ instead of /pdf/) — same content, 30-50% smaller markdown. For PDFs requiring OCR (scanned forms, historical archives, image-based reports), AgentFetch automatically detects the lack of text layer and routes through Tesseract; expect higher latency (~5-10s/page) and slightly noisier output. For form-fillable PDFs and tables, use the extract_json tool with a schema instead of raw text.