Document Classification
Document Classification is a method by which certain attributes identify a particular type of document. This is a typical feature of advanced capture applications, and it drive great automation and enhanced efficiency in any scanning and imaging operation. There are a number of ways to classify documents:
- Optical Character Recognition (OCR) – this method usually finds a particular word or phrase, and then classifies the document.
- Zone OCR – Zone OCR looks at one or more areas on a page to determine the type, or classification of the document. For example, a particular form might have a form ID number in the top right corner.
- Pattern Matching – advanced classification engines usually have a pattern or pixel matching algorithm that examines the overall layout and pattern of the page for identification.
- Hybrid – Hybrid classification engines use multiple techniques to “vote” on the identification and enhance accuracy.
Document classification is a core element of any forms processing application, and is used not only to ID the document type, but can also be utilized in document separation as well.