0% found this document useful (0 votes)
45 views2 pages

How Do PDF Files Work?

PDF files display texts correctly wherever they are viewed because they carry their typographic information with them. PDF documents present their pages as images, but the ability to change the basic text is limited. PDF Converter has the ability to perform Optical Character Recognition (OCR) this is the process of extracting text from an image.

Uploaded by

koushi010
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views2 pages

How Do PDF Files Work?

PDF files display texts correctly wherever they are viewed because they carry their typographic information with them. PDF documents present their pages as images, but the ability to change the basic text is limited. PDF Converter has the ability to perform Optical Character Recognition (OCR) this is the process of extracting text from an image.

Uploaded by

koushi010
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

How do PDF files work?

PDF files display texts correctly wherever they are viewed because they carry their typographic information with them. Fonts in the document are embedded in the PDF file and are used after distribution to reconstruct the document. The display does not depend on the needed font files being available on the viewing machine, nor on the language of its operating system. PDF documents present their pages as images. They can be marked-up and commented, but the ability to change the basic text is limited. Most PDF files can be searched, because the file has two layers. There is an image layer that is presented onscreen. Behind that there is usually a text layer that can be matched to the characters displayed on the screen.

When the starting point for a PDF file is a set of images, or a scanning process, this text layer is not present and the result is an image-only PDF. When the starting point is an editable document, the text layer can be created and the PDF is called 'Normal' or 'Searchable'. The creator of a PDF can require provision of a password to allow access the text layer.

How does PDF Converter work?


PDF Converter has the ability to perform Optical Character Recognition (OCR). This is the process of extracting text from an image. It does not need to use OCR to unlock PDF or XPS files with an accessible text layer - it must capture the page layout and arrange the given text and other elements correctly on each page in the new document. Optical Character Recognition (OCR) is normally used only for input pages without an accessible text layer or when non-standard character encoding is detected, but you can require it for any conversion under Processing Options in the Converter Assistant.

Handling Image-only Pages


Pages without a text layer are a special case for conversion. You can decide how the program should handle these pages: convert them with the built-in Optical Character Recognition (OCR), transfer them as images to the target document or skip them. You

can require inspection of the first pages (up to ten) in files you open. Optionally, you can set conversion to be stopped, if no text-layer pages are detected. If you have ScanSoft OmniPage, you can use this to have more control over the recognition process.

You might also like