Document Capture and Scanning Terminology

Document Capture & Scanning

Document Capture and Scanning is the process of ingesting paper documents and electronic documents for processing to extract metadata; and then export to a database table; and export the image of the document to a repository.

Document imaging

Document imaging is a historical term for the process of scanning and converting paper to image. Essentially, they are the same action, but today document scanning is the most commonly used term.

Document scanning

Document scanning is the process of scanning and converting paper to image.

Enterprise Resource Planning (ERP)

Enterprise Resource Planning systems integrate internal and external management information across an entire organization, embracing finance/accounting, manufacturing, sales and service, customer relationship management, etc. ERP systems automate this activity with an integrated software application. Their purpose is to facilitate the flow of information between all business functions inside the boundaries of the organization and manage the connections to outside stakeholders. ERP systems can run on a variety of computer hardware and network configurations, typically employing a database as a repository for information.

Image

An image is a logical container for files. It is a parent to image files and universal files. An image consists of file consisting of image 1, image 2, etc. Properties of an image include filename, image type, Has Image, Has Universal, Original Filename. A universal file is something other than an image format file. Image format files are files ending with identifiers such as TIFF, JPG, or GIF.

Index

An index is what is known as metadata, meaning “data which describes data.” An index can exist as multiple levels such as a document index or batch index. The properties of an index can include attributes such as: Name, Value, Source (OCR, Typed, Barcode, etc.), Location (Page, X, Y, Length, Width), and Barcode Type.

Intelligent Character Recognition (ICR)

ICR is an acronym which stands for Intelligent character recognition. It is a handwriting recognition system that allows fonts and different styles of handwriting to be learned by a computer in order to generate a textual value of a scanned section of handwritten text. ICR is most frequently used to decode predefined areas on fixed forms. ICR is not frequently applied to decode an entire page of handwritten text, and almost never applied to analyze a page of mixed machine printed and handwritten text.

Metadata

Metadata is technically, "data about data." Metadata in the capture world is a transaction, system, and document data captured during scanning and passed to an data capture solution for further processing, including the document set, batch number, operator ID, bar code(s) and more.

OCR scanner

An OCR scanner is a scanner with built-in OCR (optical character recognition) software. ... Ordinary scanners do not have this capability; a scanner that is not an OCR scanner will only create non-searchable images of documents. However, using the correct software, you can effectively turn any scanner into an OCR scanner.

Optical Character Recognition (OCR)

OCR, or Optical Character Recognition, is the process of taking a scanned image and converting the words into text. OCR Software Solutions can provide automated data extraction, as well as provide you with searchable content. Specific solutions, like SharePoint OCR, can be specific to a system and provide for solving specific OCR requirements.

Optical Mark Recognition (OMR)

Optical mark recognition is the process of capturing human-marked data from document forms such as surveys and tests. They are used to read questionnaires, multiple choice examination paper in the form of lines or shaded areas

Semi-Structured Document

Semi-structured documents contain common data elements but the data has a different location, from document to document. For example, nearly every invoice contains data such as a P.O. number and an invoice total, but it is in a different location on each invoice depending on the vendor. Because of these location differences, it is not feasible to use templates to capture data from semi-structured documents.

Structured Document

Strucutred documents are standardized forms that come in the exact same format or layout every time. Examples of structured documents include credit applications, surveys, and order forms. The data to be captured is always located in the same place on the form. To eliminate the need to manually enter the data from structured forms, a template is created to define each of the individual data fields to capture, like name, address or Social Security Number. Document capture or forms processing software can then capture that information at the same location every time.

Unstructured Document

Unstructured documents are forms and documents where the desired data can be located in varying positions on the page of the same document type. An example of unstructured documents is and an EOB (Explanation of Benefits) document.

Terminology Categories

Document Capture & Scanning

Document Management & ECM

IBM i (iSeries - AS400)

Workflow & BPM

Document Capture and Scanning Terminology

Document Capture & Scanning

Document imaging

Document scanning

Enterprise Resource Planning (ERP)

Image

Index

Intelligent Character Recognition (ICR)

Metadata

OCR scanner

Optical Character Recognition (OCR)

Optical Mark Recognition (OMR)

Semi-Structured Document

Structured Document

Unstructured Document