02Sep

Document Classification

Written by Ken Kramer. Posted in Scanning and Capture, Terminology

Document Classification is a method by which certain attributes identify a particular type of document. This is a typical feature of advanced capture applications, and it drive great automation and enhanced efficiency in any scanning and imaging operation. There are a number of ways to classify documents:

Optical Character Recognition (OCR) – this method usually finds a particular word or phrase, and then classifies the document.
Zone OCR – Zone OCR looks at one or more areas on a page to determine the type, or classification of the document. For example, a particular form might have a form ID number in the top right corner.
Pattern Matching – advanced classification engines usually have a pattern or pixel matching algorithm that examines the overall layout and pattern of the page for identification.
Hybrid – Hybrid classification engines use multiple techniques to “vote” on the identification and enhance accuracy.

Document classification is a core element of any forms processing application, and is used not only to ID the document type, but can also be utilized in document separation as well.

Release and Migration

Written by Ken Kramer. Posted in Scanning and Capture, Terminology

Release and/or migration is a term used for the final step in any document capture and scanning operation. During the capture workflow, documents and data are collected and housed temporarily in the capture software database. This information is held until the final workflow step, and then released or migrated to an end storage location. The end repository can be a document management system, network folders or a line of business system. Here are some examples of typical release scripts:

Microsoft SharePoint
Office 365
Hyland Onbase
M-files
Application Extender
Opent Text DMS
MAS90
Sage

Most document scanning suites will provide an extensive list of supported migration scripts, and allow the release of the document image, extracted data and OCR text.

Learn more about Capture, Release and Migration

Continue Reading

19Aug

Forms Processing

Written by Ken Kramer. Posted in Scanning and Capture, Terminology

Forms processing is a key feature to document scanning and capture software that provides a means to scan, split and extract data from scanned or imported forms. This function usually leverages a combination of OCR, OMR and ICR technology to identify key information, and then automatically extract the data from documents. Form processing and extraction is a complex task, and takes a powerful and deep featured capture suite to achive correct extraction and validation. Here is a sample of forms that can be processed:

Surveys
Feedback forms
Immigration forms
HR forms
Bill of lading forms
Shipping forms
Refund forms
Prior authorization forms

Document Separation

Written by Ken Kramer. Posted in Scanning and Capture, Terminology

Document separation is the means by which document capture software splits images as they are scanned. This is one of the key automation features in any capture application, and allows you to take advantage of the large document feeders on scanners and “scan the stack”. For example, with an AP Invoice Processing solution, you can take a group of 30 invoices, and the scanning software will find when one stops, and another begins. Methods of document separation are as follows:

OCR Separation – this method uses Optical Character Recognition (OCR) to find key terms on the first page of a document, and then split when a match occurs. This typically requires a pattern matching, or advanced extraction engine to accomplish this task. OCR can be refined further through the use of zones to increase the speed and improve accuracy.
Page Separation – the simplest of all methods, this simply counts pages before a new document is created.
Folder Separation – this method combines all the pages within a network or PC folder into a single document.
Pattern Separation – this is an advanced method of separation that recognizes patterns and/or images on the first page to identify when to split documents while capturing.
Hybrid Separation – this method combines several of the above methods to minimize false separation and maximize accuracy when the document capture and scanning process is running.

Continue Reading

14Aug

TWAIN

Written by Ken Kramer. Posted in Scanning and Capture, Terminology

TWAIN is a communications protocol with an application programming interface (API) that provides an interface between computing devices and document imaging devices or scanners. It was originally released back in 1992, and has been updated as recently as 2013 with version 2.3. The objectives were as follows:

To provide cross platform scanning support
To maintain a no charge toolkit for scanning software developers
To insure scanning hardware compatibility
To insure ease of implementation

TWAIN provides support for high speed desktop scanners, scanning copiers and digital cameras, and supports a wide variety of operating systems, including Microsoft Windows, Mac OS and Linux. The working group is composed of manufacturers of both scanning hardware and scanning software, and includes: HP, Kodak, Fujitsu, Visioneer and Epson to name a few.

See our TWAIN Scanning Application

Continue Reading

14Aug

Scanning Software

Written by Ken Kramer. Posted in Scanning and Capture, Terminology

Document Scanning Software works with a scanner to facilitate the conversion of paper documents to document images. Usually, document scanners come bundled with basic scanner software, with a number of basic features, including: scanning profiles, image processing, simple output and integration with basic document management or enterprise content management systems. Scanning software differs from document capture software, in that capture software focuses on automation and advanced extraction techniques. Some scanning software will include an OCR engine to facilitate the conversion of scanned images to searchable file formats. It is a critical component to any paperless office initiative and is easy to use for just about any office worker, and is usually more cost effective than capture for large deployments and user bases.

Document Scanner

Written by Ken Kramer. Posted in Scanning and Capture

Document scanners provide a means to digitize paper documents through the process of “scanning”. Scanners have camera elements, that are used to image the document as it passes through the scanners feed mechanism. Modern scanners have two camera elements to image both sides of the document at the same time, thus reducing scanning time and effort. Scanners usually have scanner software, combined with a scanner driver, to manage settings, improve image quality and integrate with the host operating system. Document capture software is added to a scanner to increase automation and improve efficiency through the use of advanced data extraction, optical character recognition and other technologies. Below are the primary document scanner manufacturers:

Fujitsu
Canon
Kodak
Panasonic
Epson

Many organization have moved away from desktop scanners, and now leverage network scanners to provide a one to many usage scenario.
Read more on Document Scanner Software

Continue Reading

12Aug

Document Scanning

Written by Ken Kramer. Posted in Scanning and Capture, Terminology

Document scanning is the process of taking paper documents and using a hardware device, or scanner, to convert the paper to an image file. It is a critical component to any Paperless Office initiative, and reduces the costs and pain associated with paper documents. Typically, documents are scanned, and an image file is created. Industry standards leverage the TIFF and PDF file formats as an output for document archival. Document scanning differs from document capture in that its sole purpose is just to create a digital image, and any advanced data extraction or data collection is absent, and operators manually name the digital file.

Optical Mark Recognition (OMR)

Written by Ken Kramer. Posted in Scanning and Capture, Terminology

Optical Mark Recognition (OMR) is the process of reading human created marks on a scanned page. This is most commonly associated with reading marks from forms, surveys and tests. These recognition engines typically focus on the contrast between unmarked and marked areas on the document. OMR engines are usually part of an overall recognition engine that includes Optical Character Recognition (OCR) and Intelligent Character Recognition (ICR). One of the most familiar OMR applications is the Scantron tests utilized within schools. Many organizations are using general OMR solutions as a Scantron replacement. OMR is usually leveraged with dedicated scanning hardware, and is part of an overall document capture and scanning solution.

Intelligent Character Recognition (ICR)

Written by Ken Kramer. Posted in Scanning and Capture, Terminology

Intelligent Character Recognition (ICR) is a technology that provides the ability to convert hand printed text to machine text from an image. ICR is typically used to extract data from scanned or imported forms, and reduces the need for hand keying information. Its accuracy is debatable, and most ICR engines can provide improved reading accuracy if they are utilized on forms that have combed fields or boxes that house the hand printing. There are a few ICR engines that can also read handwriting, or script, but they are quite costly. ICR engines are typically housed in document capture software or are an extension to OCR.

Terminology Categories

Document Capture & Scanning

Document Management & ECM

IBM i (iSeries - AS400)

Workflow & BPM

Document Classification

Release and Migration

Forms Processing

Document Separation

TWAIN

Scanning Software

Document Scanner

Document Scanning

Optical Mark Recognition (OMR)

Intelligent Character Recognition (ICR)