Terminology Categories

Document Classification

Document Classification is a method by which certain attributes identify a particular type of document.  This is a typical feature of advanced capture applications, and it drive great automation and enhanced efficiency in any scanning and imaging operation.  There are a number of ways to classify documents:

  • Optical Character Recognition (OCR) – this method usually finds a particular word or phrase, and then classifies the document.
  • Zone OCR – Zone OCR looks at one or more areas on a page to determine the type, or classification of the document.  For example, a particular form might have a form ID number in the top right corner.
  • Pattern Matching – advanced classification engines usually have a pattern or pixel matching algorithm that examines the overall layout and pattern of the page for identification.
  • Hybrid – Hybrid classification engines use multiple techniques to “vote” on the identification and enhance accuracy.

Document classification is a core element of any forms processing application, and is used not only to ID the document type, but can also be utilized in document separation as well.


