Abbyy Finereader: Optical Character Recognition Program
Abbyy Finereader: Optical Character Recognition Program
ABBYY FineReader ®
Version 7.0
User’s Guide
ABBYY: P.O. Box 72, 127015, Moscow, Russia [email protected]; www.abbyy.com; www.finereader.com.
Contents
Contents
Welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Chapter 1
Installing and Starting ABBYY FineReader . . . . . . . . . . . . . . . . . . . . . . . . 9
Software and hardware requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Installing ABBYY FineReader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Network server/workstation installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Starting ABBYY FineReader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
About ABBYY FineReader activation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Chapter 2
Quick Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
How to input a document in less than a minute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
The ABBYY FineReader main window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
ABBYY FineReader toolbars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Chapter 3
General Features of ABBYY FineReader . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
What is an OCR system? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
New features of ABBYY FineReader 7.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Supported document saving formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Supported image formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Chapter 4
Acquiring the Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Scanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Setting scanning parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Tips on brightness tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Scanning multipage documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Opening images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Acquiring images from the Hot Folder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Scanning dual pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Adding business cards images to a batch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Page numbering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Working with an image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Batch image options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Chapter 5
Page Layout Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
General information on page layout analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Block types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Automatic page layout analysis options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Drawing and editing blocks manually . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Manual table layout analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Using block templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Chapter 6
Recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
General information on recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Recognition languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Source text print type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Other recognition options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Background recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Recognition with training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
How to train a user pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
How to edit a user pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
User languages and language groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
How to create a new language. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
How to create a new language group. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Chapter 7
Checking and Editing Text. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Checking text in ABBYY FineReader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Check and edit text options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Adding and deleting words to/from the user dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Editing text in ABBYY FineReader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Editing tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Chapter 8
Saving into External Applications and Formats. . . . . . . . . . . . . . . . . . 75
General information on saving recognized text. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Text saving options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Saving the recognized text in RTF, DOC and Word XML formats . . . . . . . . . . . . . . . . . . . . . 79
Saving the recognized text in PDF format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Saving the recognized text in HTML format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Saving the recognized text in PPT format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Saving the page image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4
Contents
Chapter 9
Working with Batches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
General information on working with batches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Creating a new batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Opening a batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Adding images to a batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Batch page number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Saving a batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Closing a batch page or the whole batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Deleting a batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Batch settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Full–text search in recognized batch pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Chapter 10
Network Document Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Working with the same batch over a network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Group work with the same user languages and dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Group work with customized dictionaries
(languages with dictionary support only) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Appendix
Hot Keys and Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Hot Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Welcome!
User’s Guide
The User’s Guide introduces you to the basics of using ABBYY FineReader. Each chapter starts
with a short summary description and a list of the chapter’s contents.
Online Help
FineReader’s online Help contains basic and advanced information on program features, set
tings and dialogs. Online Help is provided in HTML format and has been designed for quick
and easy information retrieval.
Readme File
The Readme file contains the latest information on the software.
Technical Support
If you have any questions on how to use FineReader, please consult all the documentation you
have available (the User’s Guide and the Help file) before contacting our technical support
service. Also, take a look at the technical support section on our website at www.abbyy.com.
You may find the information you need there.
If, after having consulted both your documentation and the ABBYY website, you still require
assistance, email us at [email protected]. Note that our technical support experts will need
the following information from you to be able to deal with your enquiries:
● The serial number of your copy of FineReader
● Your scanner make and model
● A general description of the problem and the full error message text
(if you have encountered an error message)
● Your Windows operating system version
● Any other information you consider important.
Note: Some system information can be obtained by clicking on System Info in the
About... dialog (menu Help/About).
8
Chapter 1
Installing and Starting
ABBYY FineReader
Chapter Contents:
● Software and hardware requirements
● Installing ABBYY FineReader
● Network server/workstation installation
● Starting ABBYY FineReader
● About ABBYY FineReader activation
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Note: Microsoft Internet Explorer 4.0 or later is required to search in recognized pages and
to read news on the ABBYY Community news channel (only for ABBYY FineReader
7.0 Professional Edition).
10
C h a p t e r 1 . I n st a l l i n g a n d S t a r t i n g A B BY Y Fi n e Rea d e r
Installation options
During the installation, you will be asked to select one of the two installation options:
● Typical (recommended) – This option installs all components of the pro
gram, including all recognition languages. You will be prompted to choose a
single interface language during installation.
● Custom installation – This option allows you to choose to install only specific
components of the program, including all available recognition languages.
Consult the readme.htm file on the ABBYY FineReader CDROM if you encounter an error
message.
Note: If you wish to retain your user dictionaries and patterns from a previously installed
version of ABBYY FineReader, do not uninstall the older version of the program prior
to installing the new version. All existing user dictionaries and patterns will then
be available for use in the latest version.
Only the system administrator may install ABBYY FineReader 7.0 Corporate Edition on a net
work server. There are two stages to the installation. First, the program is installed on the serv
er. From the server, the program can be installed on workstations using one of the four
methods:
The System Administrator's Guide (which can be found in the Administrator’s Guide folder
on the server where ABBYY FineReader is installed) provides additional information about
installing ABBYY FineReader on workstations, working with the License Manager and working
with the program in a local area network.
11
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Note: Make sure your scanner is connected to your computer, pluggedin, and turned on
before you start FineReader. To install a scanner after installing the program, please
consult the user guide supplied with the scanner for installation instructions. If you do
not have a scanner, you can still recognize image files using ABBYY FineReader 7.0.
You will find sample image files in the ABBYY FineReader/Demo folder on the
program CDROM.
ABBYY FineReader 7.0 incorporates a specialized activation technology that prohibits illegal
copying and distribution of the software. This technology effectively stops the unauthorized
use of ABBYY products by those who have not signed a License Agreement with ABBYY.
A singleuser License Agreement allows for installation on a single PC. Installation of the soft
ware on additional PCs breaches the License Agreement, as well as international copyright
laws. The activation technology controls copying of the software and prevents the installation
of a licensed copy on multiple workstations. At the same time, the technology allows the soft
ware to be reinstalled on the licensed PC as often as necessary.
Depending on the product version and territory of distribution, the functionality of the soft
ware may be limited in the following ways:
● the program cannot save or print recognized Cyrillic texts
(ABBYY FineReader 7.0 Professional Edition);
● the program cannot save or print recognized text in any language
(ABBYY FineReader 7.0 Professional Edition);
● the program will not function prior to activation
(ABBYY FineReader 7.0 Corporate Edition).
12
C h a p t e r 1 . I n st a l l i n g a n d S t a r t i n g A B BY Y Fi n e Rea d e r
The Wizard will generate a code (called an Installation ID), which contains all of the neces
sary activation information including system parameters and program information. The
Installation ID does not include personal information about the computer user or the system,
and the code cannot be used to identify the user.
After activation, FineReader 7.0 will be fully functional on the registered system. The program
can be reinstalled on that computer as often as desired without reactivation. The FineReader
Activation Wizard detects and tolerates changes to your PC configuration. Minor upgrades will
not require reactivation. If major upgrades are made to the system (i.e. reformatting the hard
drive, reinstallation of the operating system, etc.), an additional activation may be required.
13
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Activation may be required to access the full functionality of FineReader 7.0. This process veri
fies that you are installing a genuine ABBYY product. ABBYY guarantees that activation of the
product does not entail the communication of personal information to ABBYY. In fact, activa
tion may be completely anonymous, if desired.
At activation, the FineReader Activation Wizard creates a unique Installation ID that indicates
only the configuration of your PC at the time of activation. The Installation ID does not
include: personal information about the user; information about other software or data that
may reside on the PC; or information about the specific make or model of the PC. The code is
used solely for the purpose of activation. The Installation Wizard sends only limited informa
tion to the ABBYY activation server, including: your specific Installation ID and the name, seri
al number, version number, and interface language of your copy of the FineReader software.
This information is used only to select the correct language for the program and to generate
the contents of a reply message that is sent to you to confirm the results of activation. None of
this data will be used for any other purpose.
14
Chapter 2
Quick Start
Chapter Contents:
● How to input a document in less than a minute
● The ABBYY FineReader main window
● ABBYY FineReader toolbars
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
There are four steps to input a document: scanning, reading, spellchecking and saving
recognized text. Once scanning is complete, the scanned document will appear in the Image
window. The application then asks you to set up the recognition parameters (i.e. resolution,
scan mode and brightness). Once you have identified your preferred parameters, FineReader
will start reading the image and analyzing its layout. Recognized text will be shown highlight
ed in blue within the document. The recognized data will also be displayed as editable text in
the Text window. Once you have finished correcting your text, the Scan&Read Wizard will
prompt you to send the final text to an application, save it to a file, or start processing another
document.
16
Chapter 2. Quick Start
Find the FineReader main menu at the top of the FineReader Main window. Four toolbars are
displayed on the main menu: Standard, Formatting, Image Tools, and WizardBar. You
may display or hide any toolbar by clicking on the View menu and selecting the Toolbar. You
can also rightclick on any toolbar to open the local menu and then click on the name of the
toolbar that you want to display or hide (currently selected toolbars are highlighted).
A status bar, located at the bottom of the ABBYY FineReader main window, displays informa
tion on the application’s status and operations currently being performed, as well as a brief
description of menu items and selected buttons.
Other windows in the main window include the Batch, Image, Zoom, and Text windows.
The Image, Zoom and Text windows are interconnected: doubleclicking on an image area
in the Image window causes that area to be displayed in the Zoom window, and moves the
17
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
pointer in the Text window to the position you clicked on (if text has already been recog
nized on the page). You can customize the onscreen windows arrangement To alter the on
screen windows arrangement:
● In the View menu, select one of the following items: Batch Window; Image
and Text Windows; Zoom Window.
Batch window on the left; Batch View: Thumbnails; …a batch that contains only a
Image, Text and Zoom windows small number of pages.
Batch window at the top: Batch View: Details; Image, ...a batch that contains many
Text and Zoom windows pages.
Batch window at the top; Batch View: Details; Image ...layout analysis and recogni
and Zoom windows tion.
Batch window at the top; Batch View: Details; Text ...editing recognized text.
and Zoom windows
18
Chapter 2. Quick Start
The WizardBar
The buttons on the WizardBar launch the main FineReader functions: Scanning, Reading,
Checking and Saving recognition results. The numbers on the buttons indicate the order in
which the document input actions should be performed. You may perform each action sepa
rately or combine them into a single action by clicking the Scan&Read Wizard button to
perform the full document processing cycle automatically.
Each button offers several function modes. Click the small downwardpointing arrow located
at the right side of each button and select the mode of your choice in that local menu. The
button icon automatically displays the previously selected mode. Click the button itself to run
this mode again.
Scan&Read
Scan&Read – scans and read a document using the current
options.
Scan&Read Multiple Images – scans and reads several con
secutive images.
Open&Read – opens and reads the images selected in the
Open dialog.
Scan&Read Wizard – launches Scan&Read mode. ABBYY
FineReader guides you through the document processing steps
and helps you to obtain the desired results.
1–Scan
Open Image – adds image(s) to the batch. Each added image
is copied to the batch folder.
Scan Image – scans an image.
Scan Multiple Images – scans images continuously. Select
the Stop Scanning item in the File menu to stop scanning.
Hot Folder (Corporate Edition only) – launches folder moni
toring (all images that are added to a specified folder will be
automatically opened in the ABBYY FineReader window). To
disable folder monitoring, select Disable Hot Folder in the
File menu.
Options – opens the Scan/Open Image tab (Options dia
log) to allow you to set scanning options.
19
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
2 – Read
Read – reads the open batch page.
Read All – reads all unrecognized batch pages.
Options – opens the Recognition tab (Options dia
log) to allow you to set document recognition options.
3 – Check Spelling
4 – Save
The Standard toolbar features file and image tools (e.g. undo/redo an action, scroll the batch
pages, clean and rotate the image) and a list of Recognition Languages.
20
Chapter 2. Quick Start
The Formatting toolbar features various text formatting tools. You can edit and format text
in the Text window.
21
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Note: Low monitor resolution may limit the number of buttons desplayed on ABBYY
FineReader's toolbars. Although all of FineReader's functionality is available through
the program menus, you must increase the monitor's resolution to display all available
buttons. FineReader allows you to customize the Standard, Image and Formatting
toolbars by removing or adding application command buttons.
Each menu item has its own icon. You can access the full list of commands and their respec
tive buttons in the Customize (Tools>Customize menu) dialog in the Commands list.
The selected command will be added to the list of toolbar commands and displayed on the
chosen toolbar in the main window.
Note:
1. The Toolbar buttons list determines the order of the buttons on the toolbar. To
change the order, select the command you wish to move and click the Up
(Down) button to move the command.
2. Commands may be distributed between a set of groups: select the Separator
item in the Commands list and click the Add button. A separator will be
added to the list of toolbar buttons. The separator may be moved.
3. To restore the default set of buttons on a given toolbar, select the toolbar in
the Toolbars list and click the Reset button. To restore the default set of but
tons on all toolbars, click the Reset All option.
22
Chapter 3
General Features
of ABBYY FineReader
Chapter Contents:
● What is an OCR system?
● New features of ABBYY FineReader 7.0
● Supported document saving formats
● Supported image formats
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
ABBYY FineReader is an easytouse program that recognizes texts in practically any font with
out any prior training. The program features high recognition accuracy and low sensitivity to
print defects due to its incorporation of special recognition technology based on the princi
ples of Integral, Purposeful and Adaptable (IPA) perception.
The system generates a hypothesis about a recognition object (a character, part of a character,
or several glued characters) and then accepts or rejects the hypothesis according to whether
the structural elements are present. These structural elements are computer equivalents of
character parts crucial for human perception (arcs, circles, dots, etc.). The application then
adapts itself to the text according to the degree of accuracy attained. Purposeful searching and
context information enable the system to recognize even torn and distorted characters mak
ing the system oblivious to print defects. Recognized text, which can be edited or saved in a
convenient format, is displayed in FineReader Text window. The final result is the recognized
24
C h a p t e r 3 . G e n e ra l Fea t u re s o f A B BY Y Fi n e Rea d e r
text that you see in the FineReader Text window, a text you can edit and save in any conven
ient format.
25
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Additional Features
New capabilities in FineReader 7.0 Professional Edition include:
● The image splitting tool lets you split an image into multiple areas and save
as a separate page. This mode is particularly useful for recognizing a page of
business cards, books, and PowerPoint printouts.
● Search with morphology support. Any batch created in ABBYY
FineReader can be used as a fully searchable small database. You can search for
words in any grammatical form. (This feature is available for the 34 languages
that have dictionary support.)
● Intel HyperThreading Technology support. This technology greatly
increases the productivity in recognizing large or numerous documents.
● Duplex scanning. The program creates two separate images if you scan a
twosided document using a duplex scanner. This option can be turned off if
you do not need duplex scanning.
● JPEG 2000 image files can be opened and saved.
26
C h a p t e r 3 . G e n e ra l Fea t u re s o f A B BY Y Fi n e Rea d e r
Refer to the “System Administrator’s Guide” on the Administrator’s Guide folder (located on
the server where ABBYY FineReader is installed) for more information about installing ABBYY
FineReader on workstations, working with the License Manager, and working with the pro
gram in a local area network.
27
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
PDF: JPEG:
Files in PDF format (Version 1.3 or earlier) gray, color
JPEG 2000:
BMP: gray, color
2–bit – black and white
4– and 8–bit – Palette TIFF:
16–bit – Mask black and white – uncompressed, CCITT3,
24–bit – Palette and TrueColor CCITT3FAX, CCITT4, Packbits
32–bit – Mask gray – uncompressed, Packbits, JPEG
TrueColor – uncompressed, JPEG
PCX, DCX: Palette – uncompressed, Packbits
2–bit – black and white multi–image TIFF
4– and 8–bit – Palette
24bit – TrueColor PNG:
black and white, gray, color
BMP: TIFF:
black and white, gray, color black and white – uncompressed, CCITT3,
CCITT3FAX, CCITT4, Packbits
PCX: gray – uncompressed, Packbits, JPEG
black and white, gray color – uncompressed and JPEG
JPEG: PNG:
gray, color black and white, gray, color
JPEG 2000:
gray, color
28
Chapter 4
Acquiring the Image
Chapter Contents
● Scanning
● Setting scanning parameters
● Tips on brightness tuning
● Scanning multipage documents
● Opening images
● Acquiring images from the Hot Folder
● Scanning dual pages
● Adding business cards images to a batch
● Page numbering
● Working with the image
● Batch image options
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Scanning
ABBYY FineReader communicates with the scanner through a TWAIN interface. The TWAIN
standard, which was adopted in 1992, is a universal standard that unifies the interaction
between a computer image input device (such as a scanner) and an external application.
ABBYY FineReader communicates with a scanner through a TWAIN driver in two ways:
● through the ABBYY FineReader interface. In this case, use the Scanner
Settings dialog and select Use FineReader interface;
● using the scanner’s TWAIN interface. In this case, use the scanner’s
TWAIN dialog to set scanning options; select Use TWAIN–Source interface.
Using the TWAIN source interface makes the “preview image” option available so that you can
set the scanning area and tune the brightness precisely, and see how these changes effect the
previewed image. Every scanner has a unique TWAIN driver dialog. Consult your scanner’s doc
umentation for precise instructions on using the TWAIN dialog. Using the ABBYY FineReader
interface provides access to a couple of additional features; a) the ability to scan multiple
pages with a scanner that does not have an automatic document feeder (ADF); and b) the
ability to access scanning options in the batch template file (*.fbt) and use them for other
batches.
Note:
1. The Use FineReader interface may be unavailable (or disabled) in certain
scanner models.
2. If you wish to see the Scanner Settings dialog in Use FineReader inter
face mode, select the Display options dialog before scanning item on the
Scan/Open Image tab (Tools>Options).
30
C h a p t e r 4 . Ac q u i r i n g t h e I m a g e
To start scanning:
Click the 1Scan button or select the Scan item in the File menu. The
Image window containing a scanned imageof the page will appear in
ABBYY FineReader’s Main window.
To scan multiple pages simultaneously, click the arrow to the right of the 1Scan button and
select the Scan Multiple Images item.
If scanning does not begin immediately, one of two dialogs will open:
● The scanner’s TWAINSource dialog. Check the scanning options and click
the OK button to start scanning.
● The Scanner Settings dialog. Check the scanning options and click the OK
button to start scanning.
Tip: To start recognition immediately after the source images are scanned, use the
Scan&Read or Scan&Read Multiple Images option:
Click the arrow at the right of the Scan&Read button and select either
Scan&Read or Scan&Read Multiple Images item in the local menu.
ABBYY FineReader will scan and read the images. The scanned image will appear in the
Image window and the recognition results will be displayed in the Text window of the main
window. From there, the text may be exported to an external application or saved in any of a
variety of formats.
31
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Note: Scanning at 400 to 600 dpi resolution (instead of the default 300 dpi) or scanning in
grayscale or color (instead of black & white) mode takes more time. Some scanners
may take up to four times longer to scan at 600 dpi rather than 300 dpi resolution
scanning.
● To scan images using the ABBYY FineReader TWAIN interface, select the
Scanner Settings item in the Tools menu. The Scanner Settings dialog will
open. Select the appropriate scanning options from the dialog.
● If you wish to scan your images using the TWAINSource interface, your
scanner’s TWAIN dialog will open automatically when you click the 1Scan
button. Set the scanning parameters in the dialogue. Scanning options may
have different names depending on the scanner model. For example, for
brightness the word “threshold”, a “sun” symbol or a black and white circle
may be used. Consult your scanner documentation for a full description of
available options.
If you see that the scanned image is compromised (characters are glued or torn), consult the
table below to find ways to improve image quality.
32
C h a p t e r 4 . Ac q u i r i n g t h e I m a g e
The process of scanning a large number of pages changes if you are using a scanner with an
Automatic Document Feeder (ADF) or one without.
ADF Scanning:
1. If you are using the ABBYY FineReader interface, select the Use ADF
option in the Scanner Settings dialog (menu Tools>Scanner Settings) and
then select File>Scan Multiple Images to start scanning.
2. If you are using the TWAINSource interface, select the Use ADF option in
the TWAIN dialog of your scanner (remember that each scanner may name
this option differently; consult your scanner documentation for details) and
then select File>Scan Multiple Images to start scanning.
33
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Non–ADF Scanning
● If you are using the ABBYY FineReader interface, select Scan Multiple
Images from the File menu.
If you are using a flatbed scanner without an ADF and the ABBYY FineReader
interface, there are two ways to increase its efficiency:
● Set a pause value (i.e. the time that will elapse between the scanning of one
page and the next). To do this, select the Pause between pages option and
then set the pause value (in seconds) in the Scanner Settings dialog
(Tools>Scanner Settings menu). The scanner will pause for the predefined
time before scanning the next page to allow you to place the next page onto
the scanner. After the pause, scanning continues automatically.
● Select the Stop between pages option in the Scanner Settings dialog
(Tools>Scanner Settings menu). Each time a page scan is completed, a dia
log will ask you if you wish to continue scanning. Click the Yes button to con
tinue scanning or No to end the process.
When you have finished scanning your pages, select the Stop Scanning item in the File menu.
Scan a page, insert the next page into the scanner and click the Scan button in the TWAIN
dialog of your scanner to continue scanning.
When all pages have been scanned, click the Close or other scannerspecific button in the
TWAIN dialog of your scanner.
Tip: To have greater control over the quality of your scanned images, select the Open image
during scanning option on the Scanning tab (Tools>Options). This command
opens each scanned page in the Image window immediately after it has been scanned.
Reject the scanned page and halt the scanning process by clicking on Stop Scanning in
the File menu. Next, rescan the image.
34
C h a p t e r 4 . Ac q u i r i n g t h e I m a g e
Opening Images
You can recognize image files without using a scanner (see the list of supported image formats
under “Supported Image Formats”).
To open an image:
● Click on the downwardpointing arrow to the right of the 1Scan button and
select the Open Image item in the local menu. An Open caption will replace
the Scan caption on the button.
● Select the Open Image from the File menu.
● In Windows Explorer, rightclick the image file you want to open and select
Open with FineReader from the local menu. If ABBYY FineReader is running,
the image will be added to the current batch. Otherwise, the program will be
launched and the most recently used batch opened before the image is added.
● In Microsoft Outlook or Windows Explorer, click on the image file you want
to open and drag it onto the minimized ABBYY FineReader window. The
image will be added to the current batch and opened in the Image window.
Select one or several images in the Open dialog. The selected images will be displayed in the
Batch window, and the last selected image displayed in the Image and Zoom windows. All
selected images are copied into the batch folder. See “General Information on Working with
Batches” section for more information on batch organization and a description of how pages
are displayed within batches.
Tip: If you want the opened images to be recognized immediately, select the Open&Read
mode:
1. Select the Open&Read item from the Process menu or press
CTRL+SHIFT+D. The Open dialog will open.
2. Select the images for recognition in the Open dialog.
35
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
When the Hot Folder mode is enabled, the Scan&Read icon on the status bar will be replaced
with the Hot Folder button . If an error occurs, it will change again to . Double
clicking this icon to view the error message.
This command splits each dual page into two batch pages. See “General Information on
Working with Batches” section for more information on batches.
Note: If a dual page has been split incorrectly, deselect the Split dual pages checkbox and
rescan the dual page, or add the page images to the batch again. Finally, try to split
the image manually using the Split Image dialog (Image>Split Image).
36
C h a p t e r 4 . Ac q u i r i n g t h e I m a g e
Note:
1. This process removes the split page from the batch and replaces it with indi
vidual card images. For more detailed information, see “General Information
on Working with Batches” section.
2. If the image has been split incorrectly, try to split the image manually by using
the Add vertical separator/Add horizontal separator button.
3. In order to delete all separators, click the Remove all separators button.
4. To move a separator, switch to Select separator mode (click the
button).
5. To delete a separator, switch to Select separator mode (click the but
ton) and move the separator outside of the image.
Page Numbering
A number is assigned to each scanned page. The default number is the number of the last
batch page plus one.
You may set page numbers manually if you want to retain the original page numbers in the
document or if you want to scan pages according to page number. To specify page numbers:
● Select the Ask for page number before adding page to the batch item
on the Scan/Open Image tab (Tools>Options menu).
37
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
2. Specify a number for the first scanned page in the Page number dialog, then
select Odd and even separately in the Page numbering field. Select an
order for the pages: ascending or descending to reflect the way in which the
doublesided pages have been entered into the automatic document feeder
(i.e. whether the last page or the first page has been placed on top).
Note: Despeckling may decrease recognition quaility if the original document is very faint or
contains a light font. Very small characters, such asperiods or commas, and parts of
very thin characters may disappear.
If you scan or open a “dusty” image, select Despeckle image in the Image Preprocessing
group on the Scan/Open Image tab (Tools>Options menu) to despeckle the image prior to
adding them to the batch.
2. Invert image
Some scanners invert images (turning black into white and vice versa) during scanning.
You may wish to apply the Invert Image option to create a uniform or standard appearance
(e.g. a black font against a white background) among the documents. To do this:
● Select the Invert Image from the Image menu.
38
C h a p t e r 4 . Ac q u i r i n g t h e I m a g e
Note: If you scan or open inverted images, select the Invert image item in the Image
Preprocessing group on the Scan/Open Image tab (Tools>Options menu) prior
to adding these images to the batch.
Recognition quality relies on the image having a standard orientation (the text should be read
from top to bottom and all lines should be horizontal). ABBYY FineReader automatically
detects page orientation during the recognition stage. If the program detects page orientation
incorrectly, clear Detect image orientation (during recognition) on the Scan/Open
Image tab and rotate the image manually. To do this:
● Click or select the Rotate Clockwise from the Image menu to rotate
the image 90° clockwise.
● Click or select the Rotate Counter–Clockwise from the Image menu
to rotate the image 90° counter–clockwise.
● Select Rotate Upside Down item in the Image menu to rotate the image
180°.
39
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
7. Print image
You can print the image in the Image window, the pages selected in the Batch window, or all
batch page images. To do this:
● Select Print image from the File menu. The Print dialog will open. Set the
desired printing parameters (the printer to be used, number of pages to be
printed, the number of copies etc.).
8. Undo the previous action
● Click the Undo button on the Standard bar .
Tip: To undo the Undo action click the Redo button on the Standard bar .
Select Convert color and gray images to black and white to scan images in grayscale
using the TWAINSource interface. The scanned images will not retain color pictures or col
ored fonts or backgrounds. This option reduces the amount of disk space needed to store
scanned images.
40
Chapter 5
Page Layout Analysis
In this chapter you will learn more about: when manual page
analysis is necessary; what block types are available; how to edit
blocks drawn using automatic layout analysis; and how to
streamline the layout analysis with block templates.
Chapter Contents:
● General information on page layout analysis
● Block types
● Automatic page layout analysis options
● Drawing and editing blocks manually
● Manual table layout analysis
● Using block templates
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Note: Standalone page layout analysis is also available (Process>Analyze Layout menu).
This process may be needed at times, but often this approach provides inferior page
layout analysis, since coupled layout analysis/recognition uses information acquired
during recognition to improve layout analysis
You may opt to draw blocks manually if:
1. Only part of a page is to be recognized;
2. Automatic layout analysis drew blocks incorrectly.
Tip:
● In some cases, the quality of the automatic layout analysis can be improved by
altering the page layout analysis options. To view the current layout analysis
options, go to the Recognition tab (Tools>Options menu).
● If the application has drawn some blocks incorrectly, it is often faster to edit
the incorrect blocks with the block editing tools than to delete the blocks and
draw them again manually.
Block Types
Blocks are image areas enclosed in frames. Blocks tell the system which image areas should be
recognized and in what order. The blocks also influence how the original page layout is
retained. The differently colored frames indicate different types of blocks. The frame colors of
the blocks can be changed on the View tab of the Options dialog (Tools>Options menu) in
the Appearance group. Select the required block type in the Item field and the desired color
in the Color field.
Recognition Area – this is used for automatic recognition and analysis. After the 2Read
button is clicked, all blocks of this type will be automatically analyzed and recognized.
Text – this is used for text image areas and should only contain singlecolumn text. If there
are pictures within the text, draw separate blocks around them.
42
C h a p t e r 5 . Pa g e Layo u t A n a l ys i s
Table – this is used for table image areas or for areas of text that are structured in a table.
When the application reads this type of block, it draws vertical and horizontal separators
inside the block to form a table. This block is represented as a table in the output text. You can
draw and edit tables manually.
Picture – this is used for image areas that contain pictures. This type of block may enclose an
actual picture or any other object that should be displayed as a picture (e.g. a section of text).
Barcode – this is used for barcode image areas. If your document contains a barcode that
should be displayed as a series of numbers and letters rather than as a picture, draw a separate
block for the barcode and set the block type to barcode.
Note: It is possible to have barcode analysis and recognition carried out automatically, but
this is not a default option. To enable it, select the Look for barcodes item on the
Recognition tab (Tools>Options menu).
To start automatic layout analysis (and text recognition), click the 2Read button. Before click
ing this button, however, select the main layout analysis options: document type and table
analysis options.
Document type
43
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Single column – The text is formatted into one column. Use this option if auto
matic page layout analysis incorrectly determines the text type as
multicolumn.
Plain text – The text is formatted into a single column and set in a uniformly
formatted with sized, monospaced font. In the recognized text, left indents are
spaces represented by spaces, each line is separated into a separate para
graph, and empty lines separate original paragraphs. This
approach is useful, for example, when recognizing C++ code
printouts or old computer printouts.
Usually, the application divides tables into rows and columns automatically. If additional tun
ing of table options is needed, open the Recognition tab (Tools>Options) and in the Tables
group select the desired item. Change these options if:
● Automatic page layout analysis has drawn the table rows and columns incor
rectly;
● The document contains a large number of simple tables of the same type (i.e.
there are no merged cells or there is always only one line of text per cell).
1. Use the One line of text per cell option if your table has no (or minimal) black separators
and each cell has only a single line of text. For example:
Kilometers Miles – This table has only one line of text per cell
1 0.62
5 3.2
Physical t, degrees – This table has more than one line of text per cell
phenomenon centigrade
Water boiling 100
point
Water freezing 0
point
44
C h a p t e r 5 . Pa g e Layo u t A n a l ys i s
2. Use the No merged cells in table option if your table has no merged cells in it. For example:
100 373
Note: Do not select One line of text per cell and/or No merged cells in table if the text con
tains tables with differing structures. Selecting these options may result in errors during
layout analysis and may adversely affect recognition quality.
45
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
You may change the drawn block type to any of the following: Recognition Area, Text, Table,
Picture, or Barcode. To change a block type:
● Rightclick the block and select the Block Type item followed by the corre
sponding block type in the local menu.
Modifying blocks
Note: you click a block corner, you can move both horizontal and vertical borders of the
block simultaneously.
46
C h a p t e r 5 . Pa g e Layo u t A n a l ys i s
Note:
1. You can alter block borders by adding new nodes (splitting points). Use the
mouse to move split border segments in any direction.
To add a new node, press Shift, place the mouse pointer to where you want a
new node (the pointer will become a cross) and click on the border. A new
node will be created.
2. ABBYY FineReader imposes certain limitations on block form. To be success
fully recognized, text lines within blocks must be unbroken. To enforce these
requirements, ABBYY FineReader automatically corrects block borders as parts
are added or deleted. For example, if you delete a portion from the top or
bottom of a block, a whole block corner will automatically be cut. Similarly, if
you try to cut off a part from between the two upper or lower corners, the
application will cut the right block corner (upper or lower) as well. The pro
gram will also forbid operations that involve moving the segments that form
the block borders.
● Select the tool and click on the desired block or press the left mouse but
ton and draw a rectangle around all the blocks you want to select.
Note: You can select one or more blocks using the block drawing tools. To select several
blocks at once, hold down SHIFT or CTRL when one of the following tools is activated: ,
, or Drag the arrow over the blocks you want to select. To invert the
selection (i.e. to select an unselected block or vice versa), hold down the CTRL key while one
of the tools is activated: , , or and drag the arrow over the desired blocks.
To move blocks:
l Hold down ALT with one of the tools activated: , , or
and move the blocks.
47
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
To renumber blocks:
1. Select the tool.
2. Click the blocks in the desired order. The contents of blocks will be displayed
in the output text in the same order.
Note: If you delete blocks on an image that has already been recognized, the recognized text
in the Text window will also be deleted.
To delete a block:
● Select the tool and click the block you wish to delete, or
● Select the blocks you wish to delete and press DEL on the keyboard.
Note: If you delete a previously recognized block, its associated text in the Text window will
be deleted as well.
Note: If you delete blocks on an image that has already been recognized, the recognized text
in the Text window will also be deleted.
Tip: If automatic table layout analysis has incorrectly drawn table rows and columns, editing
the automatic analysis results instead of deleting all the blocks and redrawing them
manually is usually more efficient.
– Remove a separator
48
C h a p t e r 5 . Pa g e Layo u t A n a l ys i s
If the table cell only contains a picture, select Treat cell as picture in the Block Properties
dialog (View>Properties menu). If the table cell contains both text and pictures, draw a sep
arate picture block (or blocks) inside the cell.
Note: You can split previously merged cells using the Split Table Cells command (Edit
menu). The Merge Table Rows option does not affect the division of the table into
columns.
Note: To avoid drawing horizontal and vertical separators manually, draw a separate table
block, then rightclick on it. Select Analyze Table Structure in the local menu. The sys
tem will then draw all the necessary separators. Should the system draw any separa
tors incorrectly, you can edit the table manually.
Note: Documents should always be scanned using their respective template(s) and using the
resolution that was used to create the template(s).
49
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
50
Chapter 6
Recognition
The goal of OCR is to read a text from a source image and retain
the source page layout. To succeed, however, the main recogni
tion parameters (recognition language, font type of the source
text, and document type) must be identified. This chapter deals
with these parameters and other important recognition issues,
including the use of different recognition settings.
Chapter Contents:
● General information on recognition
● Recognition languages
● Source text print type
● Other recognition options
● Background recognition
● Recognition with training
● How to train a user pattern
● How to edit a user pattern
● User languages and language groups
● How to create a new language
● How to create a new language group
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
You may:
Read – To recognize the open page or all the pages selected in the Batch window;
As a default, the 2Read button recognizes the open image. To change the
mode of the 2Read button, click the small downward arrow to the right of
the button and select the mode of your choice in the local menu.
Note: When you perform OCR on a page that has already been recognized, only new or
modified blocks will be recognized.
52
C h a p t e r 6 . Re c o g n i t i o n
Recognition Languages
ABBYY FineReader recognizes documents containing a single or multiple languages. When
recognizing documents in English or in German, you may also use the corresponding special
ized medical or legal dictionaries in addition to the generalpurpose ABBYY FineReader dic
tionaries.
To set the text recognition language, select it in the drop–down list on the Standard toolbar.
Note:
1. You may choose to create a language group that includes the language combi
nations that you use regularly.
2. Increasing the number of the recognition languages used simultaneously may
reduce recognition quality. For best results, limit the number of languages to
two or three.
3. Before recognizing a document, check that the fonts selected on the
Formatting tab support all the characters contained in the recognition lan
guage(s) chosen. Unsupported characters will be displayed incorrectly (“?” or
“_” symbols will appear instead of letters). See “Fonts for Recognition
Languages that may be Displayed in Text Editor Incorrectly” in ABBYY
FineReader Help for more information.
53
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
the language list on the Standard toolbar. The Recognition Language dia
log will open. Select the desired language.
3. The language was disabled during custom installation.
Tip: You may specify recognition language down to the block level. To do this, rightclick the
desired block and select Properties in the local menu. The Properties dialog will
open. Select the Block tab in the dialog and then select the block recognition language
in the Languages field on the tab.
Select an alternate print type to increase recognition quality of dot matrix printouts done in
draft mode or typewritten texts:
● Select the Typewriter item if you wish to recognize typewritten texts
● Select the Dot Matrix Printer item if you wish to recognize dot matrix
printouts.
● Select the print type of your choice on the Recognition tab in the Options
dialog (Tools>Options menu).
Note: Once you have completed recognition of typewritten texts or dot matrix printouts, re
enable the Autodetect item to recognize normal texts again.
54
C h a p t e r 6 . Re c o g n i t i o n
When processing large numbers of pages, recognition is invariably faster if the processed
image is not displayed on–screen. To run recognition without displaying the image:
● Clear the Show image during recognition item on the General tab
(Tools>Options menu).
Text direction
If the application recognizes blocks containing vertical text incorrectly (a text block or a
table cell):
● right–click the block with vertical text and select the Properties item in the
local menu. The Block properties dialog will open. Select the relevant item
in the Text direction list in the dialog and re–recognize the image.
Inverted or flipped block
If the application recognizes blocks containing inverted or flipped text incorrectly (a text
block, a table cell, or a whole table):
● Rightclick on the appropriate block and select Properties in the local menu.
The Block properties dialog will open. Select the Inverted or Flipped item
in the dialog and re–recognize the image.
Background Recognition
If you wish to simultaneously edit previously recognized pages and run recognition, you may
find background recognition mode useful. To start background recognition:
● Select the Start Background Recognition item in the Process menu.
The sign will appear in the status line at the bottom of FineReader’s main
window. If the Details view mode is active in the Batch window (to activate
Details view mode, right–click on the Batch window and select
View>Details in the local menu), the page currently being recognized will
display the icon in the Opened by column.
55
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Note: The background recognition mode uses the recognition options that were active
when the process started.
Tip: Use Train User Pattern mode only if one of the above applies. In other cases, the time
and effort required will outweigh the slight increase in recognition quality..
Pattern training creates a pattern based on one or two pages recognized in training mode.
ABBYY FineReader uses this pattern to improve recognition of the remaining text.
Sometimes two or three characters may get “glued” together, and ABBYY FineReader may be
unable to enclose each character in an individual frame to separate them. If you cannot move
the frame so that it contains only one whole character, you can train ABBYY FineReader to
recognize the entire character combination. Examples of character combinations (or ligatures)
that are frequently found glued together include ff, fi, and fl.
Note:
1. A pattern is only useful when a document has the same font, font size, and
resolution as the document used to create the user pattern.
2. Each pattern is created specifically for a particular batch. Consequently, it is
deleted if the associated batch is deleted. Patterns can, however, be copied
into other batches. To transfer a user pattern to another batch, simply save the
batch options in a batch template format file.
3. If you switch to recognizing texts set in a different font, always disable any
user patterns – choose Do not use user pattern on the Recognition tab,
menu Tools>Options.
56
C h a p t e r 6 . Re c o g n i t i o n
Note:
1. To create several patterns for the same batch, use the Pattern Editor dialog
(click the Pattern Editor button on the Recognition tab or select the
Tools>Pattern Editor menu item). Create a new pattern by clicking on the
New button in the dialog and select it by clicking on the Set Active button.
Working with a created pattern is no different than working with a default
pattern (see steps 15). Keep in mind, however, that only one pattern may be
active at a time.
2. If you’ve created several patterns for the same batch, the last created pattern
will be active. The status bar displays the active pattern. To activate another
pattern, select the desired pattern from the pattern list in the Pattern Editor
dialog (Tools>Pattern Editor menu) and click Set Active. Next, click on
Use user pattern on the Recognition tab, Tools>Options menu, in the
Training group.
3. If the Use builtin patterns option is set, ABBYY FineReader will recognize
all texts using its builtin patterns and stop only at uncertain characters. If you
are training the system to read decorative and/or nonstandard fonts (for
example, Tibetan) the use of inbuilt patterns may result in characters being
read incorrectly. To avoid this problem, disable the use of builtin patterns
(clear the Use builtin patterns checkbox on the Recognition tab) and
train the system to recognize each unknown character.
57
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Note:
1. The system can only be trained to recognize characters that are in the alpha
bet of the language being recognized. If the keyboard does not contain the
character that you want to train, denote the character by combining two char
acters or copy the required character from the Character Table. To access
this table, click the button in the Pattern Training dialog.
58
C h a p t e r 6 . Re c o g n i t i o n
2. The system can be trained to retain character formatting. Select the corre
sponding Italic or Bold item in the Pattern Training dialog and then click
the Train button.
3. Training is case sensitive. During training, make sure that you use upper or
lower case characters as appropriate.
Correct mistakes made during training by clicking the Back button to return the frame to its
previous position. The last “imagecharacter” pair to be entered will automatically be removed
from the pattern. The “undo” function is limited to the last word trained.
Training to recognize ligatures
Ligatures, which are a combination of two or three “glued” characters (such as fi, fl, ffi, etc.),
are difficult to separate because they are “glued” as part of the printing process. Better results
can be obtained training the software to recognize the compound characters as a single
unique character.
Train the same way that you would train separate characters:
1. Type the necessary character combination and click the Train button.
2. The frame in the top dialog window should enclose the entire ligature. You
can move the frame border using the mouse or by clicking the and
buttons.
Each pattern may contain up to 1000 new characters. Limit the number of ligatures that you
train, since these characters may lower recognition quality.
When you train ABBYY FineReader, please remember:
1. ABBYY FineReader doesn’t differentiate between certain characters that are
unique to the human eye. These multiple characters will be categorized
together and assigned a certain character. For example, the straight (‘), right
(‘) and left (‘) apostrophes are all identified as a single character – the straight
apostrophe. These characters will never appear in recognized text even if you
try to train them.
2. The recognition of some characters depends on their environment.
59
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
● Supermarket product lists that contain only product codes. These codes are a
combination of letters and numbers. Create a new language that consists of
only the necessary characters and use it to recognize these documents.
● Documents that are made up of only capitalized text. You may increase the
recognition quality in these documents by creating a language that prohibits
all lowercase letters.
Create a language group for oftenused language combinations. To create a new language or a
language group, open the Language Editor dialog (Tools menu, Language Editor item).
60
C h a p t e r 6 . Re c o g n i t i o n
Set the following language parameters for the new language in the Simple
Language Properties dialog:
61
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Note: The spelling checker will consider capitalization of words in the user dictionary to be
correct if they are found in the text with any of the following capitalizations: diction
ary set capitalization; lowercase only; uppercase only; sentence case capitalization
(first letter capitalized, remaining letters lowercase). Examples include:
● Regular expression (used to specify the grammatical rules of the new lan
guage; see the Regular Expressions section in ABBYY FineReader Help for
details.).
Note:
1. Click on the Advanced button in the Simple Language Properties dialog to set
advanced properties for the new language (e.g. characters to be ignored, pro
hibited characters, etc.).
2. As a default, new user languages are saved into the batch folder. Note that
ABBYY FineReader Corporate Edition allows you to specify a folder where the
language should be saved. For more information on group work with user lan
guages and dictionaries, see “Group work with the same user languages and
user dictionaries” in ABBYY FineReader Help.
Note: Customize the recognition languages that are shown on the language list of the
Standard toolbar by selecting Select multiple languages in the list. The
Recognition Language dialog will open. Select the desired languages in the dialog..
62
C h a p t e r 6 . Re c o g n i t i o n
1. Select Language Editor in the Tools menu and click the New button. A dia
log will open. Select the Create a new group of languages item in the dialog.
2. The Language Group Properties dialog will open.
Set the following new language group parameters (all parameters are set
in the Language Group Properties dialog):
1. Group name.
2. Languages contained in the group.
Note:
1. If you know that your text will not contain certain characters, you may wish
to prohibit the characters in the relevant language group’s properties. Limiting
the characters to be recognized will increase recognition speed and quality. To
prohibit characters, click the Advanced button in the Language Group
Properties dialog. The Advanced Language Group Properties dialog will
open. Specify the set of prohibited characters in the Prohibited characters
line.
2. By default, the newly created user language group will be saved in the batch
folder. In ABBYY FineReader Corporate Edition, you may specify the destina
tion folder. For more information on group work with user languages and dic
tionaries, see “Group work with the same user languages and user dictionaries”
in ABBYY FineReader Help.
63
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Chapter 7
Checking and Editing Text
Chapter Contents:
● Checking text in ABBYY FineReader
● Check and edit text options
● Adding and deleting words to/from the user dictionary
● Editing text in ABBYY FineReader
● Editing tables
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
3. The Check Spelling dialog contains three windows. The top window is similar
to the ABBYY FineReader Zoom window and displays the original image of
the word. The middle window displays the actual word, while the line above it
identifies the error type. The Suggestions window at the bottom provides
replacement suggestions (if any). Note that suggestions are based on the dic
tionary selected in the Dictionary language dropdown list; any language may
be chosen from this list.
66
C h a p t e r 7 . C h e c k i n g a n d E d i t i n g Te x t
Note: You can enlarge the Check Spelling dialog to make it easier to check and
edit text. Simply click the dialog border; the mouse pointer will become a
doubleheaded arrow. Drag the border to make the dialog larger or smaller.
Note: When you click the Ignore or Ignore All button, the “uncertain” flag is
removed from the word and the system assumes that the word no longer
contains unrecognized or uncertain characters and removes the highlight
ing. As a result, when you export such words in .PDF format and select the
Replace uncertain words with images mode, the software will not
replace these with images.
● Select a replacement suggestion and then click the Replace or Replace All
button to replace the current word or all words in the text. If none of the
options in the Suggestions window are correct, you can enter one yourself in
the middle window. (Important: When you switch to edit mode, certain but
tons may change function and adopt new captions.) Click the Confirm
(Confirm All) button to change the current word (or all such words) in the
text and move to the next uncertainly recognized word.
● Click Add... to add the word to the dictionary. Once a word is added, all sub
sequent occurrences of this word will be recognized.
● Click Options... to set the spell check options.
● Click Close to close the dialog window.
Moving between uncertain words
To check the recognition results quickly, you can use the button and button to move
to the next or previous uncertain word respectively.
You can also use the F4 (SHIFT F4) hotkey to navigate between uncertain words.
67
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Note: The number of errors displayed in the Text window will change after you
re–recognize the page.
This option stops the spell checker at words with uncertain characters.
Stop at words not found in the dictionary
This option stops the spelling checker at words not found in the dictionary. If a word is not
found in the dictionary, it may have been incorrectly recognized.
Stop at compound words
This option stops the spelling checker at the words not in the dictionary, which can be either
made up according to the available morphology models or from the words in the dictionary.
Ignore words with digits and other non–alphabetic characters
This option causes the spell checker to treat all words with digits and other not included in
the recognition language characters as correct, unless they contain uncertain characters.
68
C h a p t e r 7 . C h e c k i n g a n d E d i t i n g Te x t
A distinctive feature of ABBYY FineReader’s spell checker is that, in addition to adding each
word in its original form, the program also adds its paradigm (i.e. the set of all of its forms). This
feature allows ABBYY FineReader to recognize an entered word in all of its forms.
To add a word to the dictionary during the spell check:
● Click Add in the Check Spelling dialog.
Click OK. The Create Paradigm dialog will open. ABBYY FineReader will query you about
the word forms in order to construct a paradigm for the new word. Select Yes or No to
answer these questions. If you make a mistake, click Anew to have ABBYY FineReader ask the
question again. The Paradigm dialog will display the constructed paradigm.
Note:
1. If you want to add uninflected words rather than creating a paradigm, select
the Add without prompting for word forms option (English dictionary
only) on the Check Spelling tab (Tools>Options menu).
2. You may also add words when you view the list of added words. Simply select
View Dictionaries in the Tools menu. The Select Dictionary dialog will
open. Select the desired language in the Select Language dialog and click
View. The dictionary with the list of the added words will open. Add words by
clicking on the Add button.
69
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
The program will notify you if you try to add a word that already exists in the dictionary. You
may view its paradigm and construct a new one if you think the existing paradigm is incorrect
(as with homonyms, for example. Click the Add button in the Add Word dialog.
Tip:
1. ABBYY FineReader allows you to import user dictionaries created by previous
versions (3.0, 4.0 and 5.0).
2. ABBYY FineReader also allows you to import user dictionaries (*.dic) created
using Microsoft Word 6.0, 7.0, 97, and 2000.
To import a dictionary:
1. Select View Dictionaries from the Tools menu, then select the dictionary
language, and click View.
2. Click Import in the opened dialog and select files with *.pmd, *.txt or *.dic
extensions.
To delete a word from the dictionary:
1. Select View Dictionaries from the Tools menu. Select the desired language
and click OK. A dialog will open.
2. Select the word you want to delete and click Delete.
Note: If the ABBYY FineReader Text window displays characters incorrectly (i.e. “?” replaces
some or all of the letters), your current font is not supported in full by your recogni
tion language alphabet. To correct the problem, select a font that supports your entire
recognition set (for example, Arial Unicode or Bitstream Cyberbit) on the Formatting
tab (Tools>Options menu) in the Fonts group, and recognize the document again.
See under “Fonts for Recognition Languages that May be Displayed in Text Editor
Incorrectly” in ABBYY FineReader Help.
70
C h a p t e r 7 . C h e c k i n g a n d E d i t i n g Te x t
After recognition, the page text is displayed in the Text window. When you send your text to
an external application, the layout retention options mandate how the text layout is retained.
Set these options on the Formatting tab (Tools>Options menu) and in the format dialogs.
The program automatically highlights uncertainly recognized characters. To disable this fea
ture, unselect Highlight uncertain characters on the View tab (Tools>Options menu).
ABBYY FineReader editor features two document viewing modes: full mode (the full layout is
displayed) and draft mode.
In full mode blocks with recognized text, tables and pictures are displayed exactly as they are
to be found on the original image. The complete original layout is retained: columns, tables,
pictures and dropped capitals (oversized letters that are several line widths high). The block
where the pointer is located is the active block. If the pointer is moved using the arrow keys,
the order of navigation between blocks is determined by their numbering on the original
image. If editing makes the amount of text inside a particular block too large to be contained
in the block, parts of other inactive blocks may become invisible. The borders of these block(s)
will be displayed with red markers. When a block is active, its borders are enlarged to display
all of the text.
Draft mode displays the following text features: left indent; paragraph alignment (all para
graphs are aligned to the left); and text and background color. Draft mode uses a predeter
mined font size (12pt by default) throughout to display text. Effects (bold, italic, underlined,
superscript and subscript) are retained.
Switch between draft and full modes by clicking the (full mode) or (draft mode)
buttons in the Text window.
71
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
2. The Search dialog will open. Type the word or phrase you wish to find in the
Find what line of the dialog and set the search parameters.
72
C h a p t e r 7 . C h e c k i n g a n d E d i t i n g Te x t
To search and replace a word or phrase in the text you are editing:
1. Try one of the following:
● Select Replace in the Edit menu, or
● Press CTRL+H
2. The Replace dialog will open. Type the word or the phrase you want to find
in the Find what line of the dialog, type the word or phrase that is to replace
the search pattern in Replace with line, and set the search parameters.
Font effects
1. Click on the desired word or highlight the appropriate text.
2. Try one of the following:
● Click the appropriate fonteffect button (e.g. ) ) on the Formatting
bar, or
● Rightclick the Text window and select Character Properties in the
local menu. The Character dialog will open. Select the desired font and
set the required font parameters in the dialog, or
● Press CTRL+B for boldface, CTRL+I for italics, CTRL+U for underliningt.
Note: The program allows you to specify the following parameters in the Font
dialog: character spacing, character scale, and use of lowercase capitals.
These formatting changes will only be visible once you export your docu
ment to an application that supports formatting (such as MS Word) and
cannot be seen in ABBYY FineReader’s builtin text.
Text alignment
73
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Editing Tables
The table editor provides you with tools
to carry out the following:
● Merge cell or row contents
● Split cell contents
● Split row/column contents
● Delete cell contents
To merge cell or row contents:
● Hold down the CTRL key and select the cells or rows you wish to merge, and
then press Merge Table Cells or Merge Table Rows in the Edit menu.
To split cell contents:
● Select Split Table Cells in the Edit menu.
● Select the or tool on the toolbar in the Image window, then click
the row/column you wish to split or add a new horizontal/vertical separator to.
Tip: You can merge row contents by using the tool or the Merge Table
Rows command (Edit menu).
● Select the cell(s) you wish to delete in the Text window and press DEL.
74
Chapter 8
Saving into External Applications
and Formats
Chapter Contents:
● General information on saving recognized text
● Text saving options
● Saving the recognized text in RTF, DOC and Word XML formats
● Saving the recognized text in PDF format
● Saving the recognized text in HTML format
● Saving the recognized text in PPT format
● Saving the page image
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Note: To save a specific number of pages, select them prior to clicking the 4Save button.
After the export is finished, the 4Save button icon will change to reflect the previous action
(sending the recognized text to an application, sending it by email, copying it to the
Clipboard or saving it to a file). The last export mode becomes the default for the 4Save but
ton, so that clicking the icon will use that save option without going to the button's local
menu.
76
C h a p t e r 8 . S av i n g i n to E x t e r n a l A p p l i ca t i o n s a n d Fo r m a t s
Note: Some additional options may become available depending on the chosen export for
mat. In the case of the RTF/DOC/Word XML formats, you can set the default page
size and highlight uncertain characters. In the HTML format, you can set the picture
resolution and code page. To set these options, go to the Formats Settings dialog
(Tools>Formats Settings menu). Since the dialog has a separate tab for each format,
just click on the desired format tab and set the options.
Retain pictures
If you chose this option, pictures will be saved together with recognized text. The option is
only available in the case of RTF/DOC/Word XML, PPT and HTML formats.
Image resolution
(RTF/DOC/Word XML, PDF, PPT and HTML formats)
Sometimes you may want to reduce the image resolution that you are using. For example,
HTML files are normally viewed using browsers, and highresolution files, due to their size, are
usually unwelcome on the Internet. To reduce image resolution (and, consequently, HTML file
size) without lowering image quality, enter a lower resolution value in the Reduce picture
resolution to field on the Formats>RTF/DOC/Word XML (PDF, PPT, HTML) tab.
Note: Entering a higher resolution value than the one originally entered in the Reduce pic
ture resolution to field will cause the value to be ignored. Instead, the pictures will
be saved using the source resolution.
JPEG quality
(saving in RTF, DOC, Word XML, PDF, PPT and HTML)
When saving text in PDF, PPT and HTML formats, JPEG format is used.
When saving results in RTF, DOC, Word XML formats, you can specify that the image be saved
in the JPEG format.
77
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
The JPEG format uses a "quality loss" algorithm to compress the image (i.e. the compressing
technology averages groups of pixels and saves the entire region as a single number rather
than assigning numbers to each pixel). The quality of the image will be determined by the
value specified in the JPEG quality field (Tools>Formats Settings, PDF, RTF/DOC/Word
XML, PPT and HTML tabs). Specify a value from 1 to 100 (the default value is 50 the aver
age value).
A higher value will result in a higher quality saved image. This value also effects image size:
higher values result in the creation of a larger JPEG file. To achieve the most favorable
size/quality ratio, save the image using different JPEG values, and open it in an image viewing
application. The JPEG quality value is set on the Formats Settings>PDF (PPT,
RTF/DOC/Word XML, HTML) tab.
The fonts specified on the Formatting tab are used as the default when saving in
RTF/DOC/Word XML, PPT or HTML formats. You can, specify which fonts are used. To change
fonts, go to the Text window or select other fonts on the Formatting tab in the Fonts group.
Save all batch pages or selected ones only
You may choose to save all of the pages in a batch or only selected ones. To save specific pages,
select them before saving.
Recognized text saving modes (when saving several batch
pages at a time)
● Create a separate file for each page – The program saves each batch page
as a separate file. The batch page number is automatically appended to the file
name.
● Name files as source images – This option saves each page in a separate
file and retains the name of the original image.
Note:
1. Pages that are unrelated to the original image (e.g. scanned pages) will not be
saved in this mode. A warning will be displayed when this type of page is
encountered.
2. If consecutive batch pages share the same image as the original image or if all
images are named the same thing, the program will treat the pages as a multi
page TIFF and save the text into a single file. If several pages have identical
names but are not in consecutive order, the pages will be treated as individual
image files, and the text will be saved in different files, with an index appended
to their file names ( _1, _2, etc.).
78
C h a p t e r 8 . S av i n g i n to E x t e r n a l A p p l i ca t i o n s a n d Fo r m a t s
● Create a new file at each blank page – This option treats the entire batch
as a set of page groups that contain a blank page at the end of each group.
Pages from different groups are saved into different files with file names con
sisting of the userspecified name and index number: 1, 2, 3, etc.
● Create a single file for all pages – All (or all selected) batch pages are
saved as a single file.
Layout retention modes are set on the Formatting tab in the Options dialog (Tools>Options
menu).
Note: When you save the text in RTF, DOC and Word XML formats, the program uses the
fonts that are specified on the Formatting tab in the Options dialog
(Tools>Options menu) or those you set during text editing in the Text window.
Tips:
● If you edit the recognized text in Microsoft Word rather than in the
FineReader text window, uncertain characters can still be highlighted. Select
the With background color and/or the With text color items on the
RTF/DOC/Word XML tab in the Highlight uncertain characters group.
All uncertain characters will be highlighted with the specified color.
● When saving results in Word XML, the recognized image can be viewed in the
Zoom window integrated into MS Word. This window presents the magnified
image of the current line or portion of the document. By default, the Open
FineReader's Zoom window in MS Word 2003 option is checked on the
RTF/DOC/Word XML tab.
1. Text and pictures only – This option saves only the recognized text and the
associated pictures.
2. Page image only – This option saves only the image.
79
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
3. Text over the page image – This option saves the entire image as a picture
and saves text areas over the picture.
4. Text under the page image – This option saves the entire image as a pic
ture and puts recognized text under it. This option is useful if you export your
text to document archives: the fullpage layout is retained and the fulltext
search is available in this mode.
1. Select Formats Settings in the Tools menu. The Formats Settings dialog
will open.
2. Set the options you need on the PDF tab in the dialog.
Note:
1. A special Replace uncertain words with images option is available if you
use the Text and pictures only or the Text over the page image mode.
This option replaces all uncertain words with their images. Set this option on
the PDF tab in the Formats Settings dialog.
2. When you save texts that use a nonLatin codepage (such as Cyrillic, Greek,
Czech, etc.), ABBYY FineReader uses the fonts provided by ParaType company
(www.paratype.com/shop).
3. If during the export in PDF, a message indicates that nonstandard fonts are
present in the textis, you must select a mode of working with Type 1 fonts and
the Type 1 fonts themselves as well. These fonts must be available via Adobe
Type Manager or via postscript font installer (in Windows 2000). See more
detail about Type 1 fonts in "Using Type 1 fonts during export to PDF" section
of ABBYY FineReader Help.
4. Before you can edit PDF files that use nonLatin code page (such as Cyrillic,
Greek, Czech, etc.) in Adobe Acrobat, you must change the font of the current
text section to a font installed on your computer.
Note: When you save text in HTML format, set the used fonts on the Formatting tab in the
Options dialog (Tools>Options menu) or in the Text window during text editing.
80
C h a p t e r 8 . S av i n g i n to E x t e r n a l A p p l i ca t i o n s a n d Fo r m a t s
● Set the Keep pictures option on the Formatting tab in the Options dialog
(Tools>Options menu).
Note: Each picture is saved into a separate *.jpg file. Determine the resolution and quality of
the images on the HTML tab of the Formats dialog (Tools>Formats).
1. Full (uses CSS and requires Internet Explorer 4.0 or later) – Uses the
latest HTML format (HTML 4). HTML 4 supports all document layout retention
types (the actual retention type used depends on the options set on the
Formatting tab in the Retain layout group). The builtin style sheet is used.
2. Simple (compatible with all Internetbrowsers) Uses the HTML 3 for
mat. The approximate document layout is retained (i.e. the program retains
the approximate font size but not the first line indent). HTML 3 format sup
ports only a limited number of font sizes, so ABBYY FineReader chooses the
HTML 3 format font size that corresponds most closely to the actual font size
of the text. HTML 3 is supported by all browsers (Netscape Navigator, Internet
Explorer 3.0 and later).
3. Auto (saves Full and Simple formats in a single file with autoselec
tion depending on browser type) – Saves both formats (Simple and Full)
to the same file. The browser you use will determine the format that is used.
Note: If your browser does not have full HTML4/CSS support (e.g. Microsoft Internet
Explorer 3.0 or earlier, Netscape 4.x, etc.), use the Simple saving mode.
Note: The application detects the code page automatically. To switch the code page, select
the code page of your choice in the Code page field on the HTML tab in the
Formats Settings dialog.
81
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Note: When you save text in the HTML format, ABBYY FineReader uses either the fonts
specified on the Formatting tab in the Options dialog (Tools>Options menu) or
those set during text editing in the Text window.
Important! When saving results in the .PPT format, ABBYY FineReader creates special
HTML files that contain the different parts of the presentation. To save the
presentation as a single file, resave it using PowerPoint (select Save As in the
File menu and specify PPT as the saving format).
Note: You may want to save only some of the image areas enclosed by blocks (regardless of
type). To do this, select the block or blocks you wish to save, and then check the Save
only selected blocks checkbox in the Save Image As dialog. This is only an option
when saving a single image. Next, enter the file name.
4. Click OK.
Note: If you save several page images from the Batch window as separate files (i.e. the
images are not being saved as one multipage TIFF), the file names will consist of the
file name entered, the page number (4 digits), and the file suffix.
82
Chapter 9
Working with Batches
Chapter Contents:
● General information on working with batches
● Creating a new batch
● Opening a batch
● Adding images to a batch
● Batch page number
● Saving a batch
● Closing a batch page or the whole batch
● Deleting a batch
● Batch settings
● Full–text search in recognized batch pages
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Tip: Saving similartype pages (e.g. pages from the same book, those written in the same lan
guage, or those with a similar layout) in the same batch is often useful, since it stream
lines the work process.
The Batch window displays a list of the pages contained in the open batch. To view a page,
click on its icon or doubleclick on its page number. All files related to this batch page will
open in the appropriate windows, i.e. the text file in the Text window, the image file in the
Image window, etc.
There are two main ways of displaying pages
in the Batch window:
Details This view provides detailed information about each batch page in the
batch window and offers page lists organized by a userspecified fea
ture. The batch window accommodates a large number of pages,
which is useful when organizing large batches. Open a page by double
clicking on it.
84
C h a p t e r 1 0 . Wo r k i n g w i t h B a t c h e s
You may select several different pages, or a number of consecutive pages, or all of
the batch pages in a row:
● To select a number of consecutive pages, hold down the SHIFT key and
click the first and then the last page of the group you want to select.
● To select several pages, hold down the CTRL key and click the desired
pages.
● To select all batch pages, activate the Batch window and choose the
Select All item in the Edit menu or press CTRL+A.
Opening a Batch
ABBYY FineReader automatically creates a new batch at startup.
Note: To tell ABBYY FineReader to open the last open batch at startup, ccheck Open the
last batch at startup on the General tab of the Options dialog (Tools>Options).
85
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Note: You can also add images directly from Windows Explorer:
You can renumber pages directly in the Batch window or from the Renumber Pages dialog.
Once the page number has been changed, all pages in the Batch window will be re–ordered
to reflect the new numbering.
86
C h a p t e r 1 0 . Wo r k i n g w i t h B a t c h e s
Note :
1. To renumber all batch pages, select the All Pages item in the Renumber
Pages dialog.
2. To renumber only part of a batch:
● Select the pages you wish to renumber in the Batch window.
● Select the Selected pages item in the Renumber Pages dialog.
3. To renumber selected pages continuously, select the Continuous page num
bering option. An example: The renumbering option would cause pages num
bered 2, 5, and 6 (assuming 1 was chosen as the first number) to be renumbered
as 1, 2, and 3. Otherwise (i.e. if the Continuous page numbering option is not
selected), on renumbering page numbers 2, 5, and 6 would become 1, 5, 6. The
first page has been assigned the chosen number, but the remaining pages have
retained their original numbers.
Note: If you renumber only certain batch pages, and allocate a number to a page that has
been used, a warning will be issued and the operation will be cancelled.
Saving a Batch
To save a batch:
● Select Save Batch As in the File menu.
● In the Save Batch As dialog, specify the name of the batch and the desired
storage location .
To close a batch:
● Select Close Batch item in the File menu.
87
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Deleting a Batch
Note: When a batch is deleted, all of its contents (including image and text pages, related
files, user patterns, user languages, etc.) will be deleted, leaving the folder empty.
To delete a batch:
● Delete Delete Batch in the Batch menu.
Batch Settings
The following settings will be saved: the Recognition, Scan/Open Image, Formatting, and
Check Spelling tab settings, and the settings on the Formats Settings tab. The user lan
guages, user language groups and user patterns will also be saved in the file. To apply the tem
plate to all new batches, check Apply this template to new batches in the Save Batch
Template As dialog.
88
C h a p t e r 1 0 . Wo r k i n g w i t h B a t c h e s
ABBYY FineReader allows you to search all recognized pages for words in every possible gram
matical form. The search pattern may consist of one or several words. The search term may be
in any form (for languages with dictionary support), and the search process will identify the
indicated words anywhere within the text (no matter how far apart) and in any order.
To do a full–text search:
1. Select Advanced search item in the Edit menu or press ALT+F3.
2. The Search window will open below the Zoom window.
3. Enter the desired text in the Find what field. You can also paste the
Clipboard contents into the field or select a previous search from the drop
down list.
4. Click Find.
The Search results window will display the list of batch page numbers that contain all of the
words from the Find what field. The date that each identified page was modified will be dis
played and the first page section to contain the search pattern will be highlighted. Click on a
page number to open it in the Image, Text and Zoom windows. The found words will be
highlighted in color in all three windows.
Note: The search function cannot locate specialized characters, such as endofline charac
ters or paragraph marks.
89
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Chapter 10
Network Document Processing
Chapter Contents:
● Working with the same batch over a network
● Group work with the same user languages and dictionaries
● Group work with customized dictionaries (languages with dictionary sup
port only)
92
C h a p t e r 9 . N e t w o r k D o c u m e n t Pro c e ss i n g
1. Create or open a batch and set up the necessary scanning and recognition
options.
Run ABBYY FineReader and open the batch to be processed on all of the
involved computers.
2. Run Background recognition (Process>Start background recognition) on
all computers involved in recognizing the batch.
3. Start the scanning on the computer with an ADF scanner.
Tip: If your highspeed scanner doesn’t support TWAIN, scan your pages directly into the
ABBYY FineReader batch folder. To do this, scan the images with any scanning applica
tion on the local computer to which the scanner is attached and save the images to the
ABBYY FineReader batch folder. The scanned images must then be named according to
the following rule: the files must be named (in the order they are scanned) 0001.tif,
0002.tif, 0003.tif..., etc. ABBYY FineReader will automatically detect and process all of the
scanned images.
4. You may edit the recognized text and save it to a file or send it to a selected
application.
You may monitor the status of the page (scanned, recognized, edited or exported) in the
Batch window. This information is displayed in the corresponding columns in the Details
batch page view. To set up the Details page view:
● Click on the Standard toolbar, or
● Rightclick the Batch window and select the View>Details item in the local
menu.
Customize the Details page view by specifying the columns you want to display in the Batch
window or selecting a column to sort the pages on:
● Rightclick the Batch window and select the View>Customize. Set the
desired options on the Details tab of the Batch View Settings dialog.
If several computers process the batch, ABBYY FineReader will automatically distribute the
workload between them. As each scanned page becomes available, the first free workstation
running the background recognition will begin processing it. All other computers are locked
out of the page. Refresh the batch page list by pressing F5 or by selecting Update page list in
the Batch menu. At the same time, any of the workstations can open already recognized pages
for checking, editing and saving. All users of the batch can access the changes made by other
users.
93
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Note: Recognition speed enhancements from this approach will be particularly noticeable in
batches that contain a lot of pages.
All of the user languages and dictionaries will be stored in one folder. The default is the batch
folder. Prior to creating a user language, you must specify a folder where user languages will be
stored with the other user dictionaries. To specify a folder:
● Click Change in the Language Editor dialog (Tools>Language Editor)
and select the appropriate folder from the dialog.
This folder will become the default storage folder for all user languages and dictionaries.
After the setup is complete, save the batch settings in a batch template file (*.fbt). To do this:
● Click Save on the Options>General tab (Tools>Options). In the Save
Batch Template As dialog select the folder and enter the file name.
Each user who wants to work with the user languages and their associated dictionaries will
have to load the batch settings from the previously saved *.fbt file when creating a new batch.
To do this:
Select Batch template (.fbt) in the Template field. In the Open Batch Template dialog,
select the necessary *.fbt file. This will set the previously saved batch settings, including the
path to user languages and their dictionaries. Once this is done, all users will have the same
path to user languages and the associated dictionaries.
You can edit user dictionaries using these user languages for recognition. All users can access
changes made by another user. In addition, all users who load the batch template can access
the user languages created in this folder. View the list of available user languages and their
properties in the Language Editor dialog in the Userdefined languages group.
The dictionary is locked while a user adds or removes terms. To update a dictionary, click on
Add in the Check Spelling dialog or any button in the Select Dictionary dialog.
94
C h a p t e r 9 . N e t w o r k D o c u m e n t Pro c e ss i n g
Note:
1. You must give read/write access to all users who access a dictionary if you are
using a folder in which the dictionaries of multiple users are stored.
2. User languages that are shared between multiple users are available as “read
only” files (which do not allow changing any parameter of the user language
that has already been created). Entries may be added or removed from the
user dictionary.
To enable the shared use of custom dictionaries for predefined languages by multiple users,
specify a public folder where all such dictionaries are saved. It can be either a local or a net
work folder. To specify the folder:
● Click Browse on the Check Spelling tab of the Options dialog
(Tools>Options menu). Select the appropriate folder for storage of user dic
tionaries for predefined languages.
Every user can expand these custom dictionaries. The dictionary is locked while a user
adds/removes a word. Changes made by a user are available for all users of the folder. To
update the dictionary, click Add in the Check Spelling dialog or any button in the Select
Dictionary dialog.
Note: If several users share a folder where custom dictionaries are stored, all of them must
have read/write access to this folder.
95
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Appendix
Hot Keys and Glossary
Chapter Contents:
● Hot Keys
● Glossary
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Hot Keys
Menu Command Shortcut Key
File Open image from file Ctrl+O
Scan image Ctrl+K
Scan multiple images Ctrl+Shift+K
Stop scanning Ctrl+T
Create new batch Ctrl+N
Open a batch Ctrl+P
Save text to file Ctrl+F2
Save image to file F12
Edit Undo the last action Ctrl+Z
Redo the last undone action Ctrl+Y
Cut the selection and put it to the Clipboard Ctrl+X
Copy the selection to the Clipboard Ctrl+INS or Ctrl+C
Paste the Clipboard contents Ctrl+V or Shift+INS
Delete the active block, the selection, DEL
the selected pages
Select all text in the Text window, Ctrl+A
select all batch pages, select all blocks
on the open image
Find the specified text Ctrl+F
Find the next occurrence of the search text F3
Search for and replace the specified text Ctrl+H
View Magnify the image in the Image window Ctrl+Shift+Num +
Zoom Out the image in the Image window Ctrl+Shift+Num –
Zoom In to selected blocks Ctrl+Shift+Num *
Properties Alt+ENTER
Batch Open next batch page Alt+Down
Open previous batch page Alt+Up
Open page with specified number Ctrl+G
Close the current page Ctrl+F4
Delete the recognized text in the Text window Ctrl+Shift+DEL
Delete all blocks in the Image window and all Ctrl+DEL
recognized text in the Text window
Update page list F5
98
A p p e n d i x . H ot Ke ys a n d G l o ss a ry
99
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
Glossary
A B
Abbreviation Background recognition
A shortened form of a word or phrase used A special recognition mode that allows the
to represent the whole. For example, MS user to edit and save already recognized
DOS (for Microsoft Disk Operating System), pages while ABBYY FineReader recognizes
UN (for United Nations), etc. other pages at the same time.
Activation Batch
The process of obtaining a special code from A folder that contains image files, recognized
ABBYY which allows the user to use his copy text files and other ABBYY FineReader infor
of the software in fullfunction mode on a mation files. There may be up to 9,999 pages
given computer. in a batch. It is useful to save similar pages
(such as all pages from the same book, those
Activation Code
in the same language, or images with the
A code that is issued by ABBYY to each user
same layout) in the same batch to streamline
of ABBYY FineReader Professional Edition
the work process.
during the activation procedure. The
Activation Code is required to activate Block
ABBYY FineReader on the computer that A framed image area.
generated the Installation ID.
Block template
Activation File A particular block arrangement that can be
A file issued by ABBYY to each user of used to recognize pages of similar layout. A
ABBYY FineReader Corporate Edition during block template may be saved in a special file.
the activation procedure. The Activation File
Block type
contains information required to activate the
Each block has a type. The following block
software on the server or on a standalone
types are available in ABBYY FineReader:
computer as the case may be. From the serv
Recognition Area, Text, Picture, Table and
er, the product will be activated on worksta
Barcode.
tions.
Brightness
Active block
A scanning parameter that indicates the con
Indicates the block that is currently ready to
trast between black and white image areas.
have actions (e.g. deleting, changing type,
Setting the correct brightness increases
etc.) applied to it. The active block frame is
recognition quality.
bold and there are “squares” in its corners.
Brightness autotuning
ADF (Automatic Document Feeder)
Automatic brightness tuning performed
A device that automatically feeds documents
either by scanner or ABBYY FineReader. The
through a scanner. A scanner with an ADF
autotuning process sets the brightness for
can scan any number of pages without man
every image area separately.
ual intervention. ABBYY FineReader also sup
ports scanning multiple images.
100
A p p e n d i x . H ot Ke ys a n d G l o ss a ry
C I
Code page Ignored characters
A table that sets the interrelation between Any nonletter characters found in words
the character codes and the characters them (e.g. syllable characters or stress marks).
selves. Users can select the characters they These characters are ignored during the spell
need from the set found in the code page. check.
Compound word Image type
A word made up of two or more stems (gen A scanning parameter that determines
eral meaning); a word not found in the dic whether an image must be scanned in black
tionary, but potentially made up of two or and white, gray or color mode.
more terms found in the dictionary (ABBYY
Installation ID
FineReader meaning).
A computer code that is generated on the
basis of the PC hardware parameters.
D Inverted image
An image with white characters against a
Despeckle image dark background.
Delete excess small black dots from an
image.
dpi (Dots per Inch) L
How resolution is measured.
License Manager
Driver A utility used for managing ABBYY
A program controlling a computer peripheral FineReader licenses and activating ABBYY
(e.g., a scanner, a monitor, etc). FineReader 7.0 Corporate Edition.
Ligature
A combination of two or more “glued” char
F acters, for example, fi, fl, ffi etc. These char
acters are difficult separate because they are
Font effects
usually “glued” in print. Treating them as a
Certain variations of a font outlook (i.e. bold,
single compound character improves scan
italic, underlined, strikethrough, subscript,
ning accuracy.
superscript, small caps).
H M
Monospaced font
Hot folder
A font (such as Courier New) in which all of
A special folder monitored by ABBYY
the characters are equally spaced. Select the
FineReader. Images added to this folder are
Typewriter item on the Print Type group
automatically opened in the ABBYY
(Recognition tab) to increase the recognition
FineReader window.
quality of documents set in monospaced
fonts.
101
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e
O Pattern
A set of pairs (the character image and the
Omnifont system character itself) that is created during pat
A recognition system that recognizes charac tern training. A pattern provides additional
ters set in any font and font size without information during recognition.
prior training. Primary form
Open&Read The form of words in dictionary.
A command that processes an image file: Prohibited characters
opens, analyzes the page layout; and recog If certain characters will never be found in
nizes it. recognized text, they may be specified in a
Optional hyphen set of prohibited characters in the language
A hyphen () that indicates exactly where a group properties. Specifying these characters
word or word combination should be split if increases the speed and quality of recogni
it occurs at the end of a line (e.g. “autofor tion. To specify a set of prohibited characters,
mat” should be split to “autoformat”). click on Advanced in the Language Group
ABBYY FineReader replaces all hyphens Properties dialog. The Advanced language
found in dictionary words with optional group properties dialog will open. Specify the
hyphens. set of prohibited characters in the Prohibited
characters line.
P
R
Page layout
A combination of the way text, tables and Resolution
pictures are arranged on a page, the way text A scanning parameter determining how
is arranged into paragraphs, the font and many dpi to use during scanning. Resolution
font size of the text, the number of text of 300 dpi should be used for texts set in
columns, the character and background 10pt font size and larger, 400 to 600 dpi is
color, and the text orientation. preferable for texts of smaller font size (9pt
and less).
Page layout analysis (drawing blocks)
A process of analyzing the page layout and
enclosing different image areas in blocks
according to the layout. Blocks may be of S
different types. Page layout analysis may be Scanner
performed automatically in a coupled recog A device for inputting images into computer.
nition/page layout analysis procedure (run
by clicking the 2Read button) or manually. Scan&Read
The main ABBYY FineReader button. Click it
Paradigm to have ABBYY FineReader scan and recog
The set of all grammatical forms of a word. nize your image(s).
102
A p p e n d i x . H ot Ke ys a n d G l o ss a ry
103
A B BY Y Fi n e Re a d e r 7 . 0 U s e r ’ s G u i d e