Full text is a document management and capture term that refers to the full OCR text of the designated document. Creating the full text OCR can be accomplished prior to placing in the document management systems (front-end capture), or can be done as a back-end process with an OCR crawler. Full text gives end users the ability to search for anything that occurs within the document, and is typically referred to as full text search capability. There are two ways to facilitate this layer:
- Image and Text – many of the older document management repositories support the use of a dual layer document system. One layer is comprised of the image file, usually TIFF, while the other is the text layer, usually a pure text file.
- PDF with Full Text – most OCR engines today provide the ability to create a PDF that had the image layer, as well as the text layer in a single file: PDF image with hidden text. This format is nice, as it is a “suitcase” that contains both layers, and is portable.
Most document and content management systems today can “crawl” content to provide full text search capabilities across a wide variety of document formats, giving users the power to search within the contents of any document.