Document Classification

Document Classification is a method by which certain attributes identify a particular type of document. This is a typical feature of advanced capture applications, and it drive great automation and enhanced efficiency in any scanning and imaging operation. There are a number of ways to classify documents:

Optical Character Recognition (OCR) – this method usually finds a particular word or phrase, and then classifies the document.
Zone OCR – Zone OCR looks at one or more areas on a page to determine the type, or classification of the document. For example, a particular form might have a form ID number in the top right corner.
Pattern Matching – advanced classification engines usually have a pattern or pixel matching algorithm that examines the overall layout and pattern of the page for identification.
Hybrid – Hybrid classification engines use multiple techniques to “vote” on the identification and enhance accuracy.

Document classification is a core element of any forms processing application, and is used not only to ID the document type, but can also be utilized in document separation as well.

Terminology Categories

Document Capture & Scanning

Document Management & ECM

IBM i (iSeries - AS400)

Workflow & BPM

Document Classification