Optical Character Recognition (OCR)

Optical Character Recognition, or OCR, is a process by which an image is converted to text. It is heavily used in the scanning of documents and is a core component of document capture software. There several types of OCR, including:

Full page OCR – converts the entire scanned page to text.
Zone OCR – converts only a small portion, or zone, of a document to text.
OCR Separation – technique for splitting documents based on text

Most OCR engines provide a wide variety of output formats, including: text, Microsoft Word, Microsoft Excel and Adobe PDF. PDF conversion to text is the most common output as it stores both the image and text in a single container. OCR Software is usually bundled and sold with document scanners, and can be used in simple form as a desktop application.

Terminology Categories

Document Capture & Scanning

Document Management & ECM

IBM i (iSeries - AS400)

Workflow & BPM

Optical Character Recognition (OCR)