Document Image Analysis
Document Image Analysis
ANALYSIS
DOCUMENT IMAGE ANALYSIS
• A scanner reads documents and converts them to binary data. The OCR software analyzes the scanned image
and classifies the light areas as background and the dark areas as text.
Preprocessing
• The OCR software first cleans the image and removes errors to prepare it for reading
Text recognition
• The two main types of OCR algorithms or software processes that an OCR software uses for text recognition
are called pattern matching and feature extraction.
Postprocessing
• After analysis, the system converts the extracted text data into a computerized file.
What Is Document Layout Analysis?
Image Acquisition:
1. Raw data is acquired from CT or MRI scans.
2. CT scans use X-ray absorption, while MRI scans rely on proton signals during
relaxation and strong magnetic fields.
Reconstruction:
3. Raw data is reconstructed into a suitable format for software.
4. A 3D bitmap of greyscale intensities (voxels) represents the image.
Processing Steps:
5. Noise Reduction: Filters remove unwanted noise or artifacts.
6. Segmentation: Identifying different anatomical regions (e.g., tissue, bone).
7. Measurement and Statistics: Quantifying parts of the image data.
Applications:
8. Diagnosis: Non-invasive exploration of internal anatomy.
9. Treatment Planning: Creating 3D models for surgical planning