OCR Document Processor — Agent Skills

When to use

Use when you need to extract text from images, scanned PDFs, or photographs. Supports over 100 languages, table detection, and structured output formats.

How it works

Reads the input file (PNG, JPEG, TIFF, BMP, or PDF)
Runs OCR with language detection
Returns extracted text with confidence scoring
Optionally structures output as markdown, JSON, or HTML

Capabilities

Image OCR — extract text from PNG, JPEG, TIFF, BMP
PDF OCR — process scanned PDFs page by page
Multi-language — supports 100+ languages
Table detection — extract tabular data to CSV/JSON
Batch processing — handle multiple documents at once
Quality assessment — confidence scoring for results

Invoke

/rvanbaalen:ocr-document-processor