This AI-powered OCR tool extracts text from images and PDF documents into structured Markdown, handling tables, hyperlinks, and embedded images alongside plain text. Upload a file and recognition runs automatically; results are returned page by page, with per-page copy and full-document download options.
What affects recognition accuracy
Source file quality is the main factor. For best results:
- Scanned documents at 150 DPI or higher with clear, unobstructed text recognize most accurately
- Blurry photos, heavily skewed pages, dense watermarks, or very small type (below 6pt) introduce errors
- Multi-column layouts and complex formatting are handled better than with traditional rule-based OCR
For PDFs, each page is processed independently. Processing time scales with page count — keeping single submissions under 50 pages is recommended.
What each page result contains
After recognition, each page returns:
- Markdown body — headings, paragraphs, lists, code blocks
- Tables — extracted as Markdown table syntax, copyable separately
- Hyperlinks — URLs found in the document are listed individually
- Embedded images — charts and illustrations are extracted as inline base64 images when detectable
- Page dimensions and DPI — the original pixel dimensions of the source page
Supported file types
Image formats
- JPEG, PNG, WEBP
- GIF, BMP, TIFF
- SVG (vector graphics)
- Best for single-page scans and screenshots
Document format
- PDF (any page count)
- Each page recognized independently
- Results displayed per page with individual download
Downloading results
Individual pages can be downloaded as .md (Markdown) or .txt (plain text). For multi-page documents, "Download All" merges every page into a single file with --- separators between pages.