0% found this document useful (0 votes)
34 views66 pages

Chapter 2

Uploaded by

Sewunet Samuel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views66 pages

Chapter 2

Uploaded by

Sewunet Samuel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Chapter 2

Digital Image and Video


Why Do We Process Images?
 Facilitate picture storage and transmission
– Efficiently store an image in a digital
camera/Machines
– Send an image through mobile phone/Network

 Enhance and restore images


– Remove scratches from an old photo
– Improve visibility of tumor in a radiograph
Cont…
 Extract information from images
– Measure water pollution from aerial images
– Measure the 3D distances and heights of
objects from stereo images
 Prepare for display or printing
– Adjust image size
– Half toning
Image Processing Applications
Nuclear medicine The paperless office
Medical Diagnostics Photographers, advertising
Automated Industrial agencies and publishers
Inspection
Remote Sensing
Machine vision
Weather Prediction Biometrics
Military Finger Prints
reconnaissance Iris etc.
Geological exploration
Movies and entertainment
Astronomical
Observations
Image database
management
Image Processing Examples
Extraction of Settlement Area from an
Aerial image
Earthquake Analysis from Space
Face Detection
Face Tracking
Fingerprint Recognition
Applications of DIP
 Electromagnetic (EM) band Imaging
– Gamma ray band images
– X-ray band images
–Ultra-violet band images
– Visual light and infra-red images
– Imaging based on micro-waves and radio
waves
Applications of DIP (EM Band Imaging)
 Gamma-Ray Imaging
– Nuclear medicine, astronomical observations.
 X-Ray Imaging
– Medical diagnostics (CAT scans, x-ray scans), industry, astronomy.
 Ultra-Violet Imaging
– Fluorescence microscopy, astronomy
 Visible & Infrared-band Imaging (most widely used)
– Light microscopy, astronomy, remote sensing, industry, law
enforcement, military recognizance, etc.
 Micro-wave and Radio band Imagery
– Radar, Medicine (MRI), astronomy
Classification of DIP and Computer Vision Processes

 Low-Level Process: (DIP)


– Primitive operations where inputs and outputs are images; major
functions: image pre-processing like noise reduction, contrast
enhancement, image sharpening, etc.
 Mid-Level Process (DIP and Computer Vision)
– Inputs are images, outputs are attributes (e.g., edges); major
functions: segmentation, description, classification / recognition of
objects
 High-Level Process (Computer Vision)
– Make sense of an ensemble of recognized objects; perform the
cognitive functions normally associated with vision
Image Processing Steps
Cont…
• Human vision - perceive and understand world
• Computer vision, Image Understanding /
Interpretation, Image processing.
– 3D world -> sensors (TV cameras) -> 2D images
– Dimension reduction -> loss of information
• low level image processing
• transform of one image to another
• high level image understanding
• knowledge based - imitate human cognition
• make decisions according to information in
image
Cont…
Classification / decision

Algorithm HIGH Amount of


Complexity Data
Increases MEDIUM Decreases

LOW

Raw data
Low level digital image processing
• Low level computer vision ~ digital image processing
• Image Acquisition
– image captured by a sensor (TV camera) and digitized
• Pre-processing
– suppresses noise (image pre-processing)
– enhances some object features - relevant to understanding the
image
– edge extraction, smoothing, thresholding etc.
• Image segmentation
– separate objects from the image background
– colour segmentation, region growing, edge linking etc
• Object description and classification
– after segmentation
Signals and Functions
• What is an image
• Signal = function (variable with physical meaning)
– one-dimensional (e.g. dependent on time)
– two-dimensional (e.g. images dependent on two co-
ordinates in a plane)
– three-dimensional (e.g. describing an object in space)
– higher-dimensional
• Scalar functions
– sufficient to describe a monochromatic image - intensity
images
• Vector functions
– represent color images - three component colors
Image Functions
• Image - continuous function of a number of variables
• Co-ordinates x, y in a spatial plane
– for image sequences - variable (time) t
• Image function value = brightness at image points
– other physical quantities
– temperature, pressure distribution, distance from the observer
• Image on the human eye retina / TV camera sensor - intrinsically 2D
• 2D image using brightness points = intensity image
• Mapping 3D real world -> 2D image
– 2D intensity image = perspective projection of the 3D scene
– information lost - transformation is not one-to-one
– geometric problem - information recovery
– understanding brightness info
Image definition
 Image definition:
A 2D function obtained by sensing a scene
F(x,y), F(x1,x2), F(x)
F - intensity, grey level
x,y - spatial co-ordinates N

 No. of grey levels, L = 2B f(o,o)

 B = no. of bits M
f(N-1,M-1)

B L Description
1 2 Binary Image (black and white)
6 54 64 levels, limit of human visual system
8 256 Typical grey level resolution
Image quality
• Quality of digital image proportional to:
– spatial resolution
• proximity of image samples in image plane
– spectral resolution
• bandwidth of light frequencies captured by sensor
– radiometric resolution
• number of distinguishable gray levels
– time resolution
• interval between time samples at which images captured
Digital Image Storage
• Stored in two parts
– header
• width, height … cookie.
– Cookie is an indicator of what type of image file
– data
• uncompressed, compressed, ascii, binary.
• File types
– JPEG, BMP, PPM.
DIP Course
 Digital Image Fundamentals and Image Acquisition
(briefly)
 Image Enhancement in Spatial Domain
– Pixel operations
– Histogram processing
– Filtering
 Image Enhancement in Frequency Domain
– Transformation and reverse transformation
– Frequency domain filters
– Homomorphic filtering
 Image Restoration
– Noise reduction techniques
– Geometric transformations
Cont…
 Color Image Processing
– Color models
– Pseudocolor image processing
– Color transformations and color segmentation
 Wavelets and Multi-Resolution
Processing
– Multi-resolution expansion
– Wavelet transforms, etc.
 Image Compression
– Image compression models
– Error free compression
– Lossy compression, etc
DIP Course
 Image Segmentation
– Edge, point and boundary detection
– Thresholding
– Region based segmentation, etc
Image Representation
• Image
– Two-dimensional function f(x,y)
– x, y : spatial coordinates
• Value of f : Intensity or gray level
Digital Image

• A set of pixels (picture elements, pels)


• Pixel means
– pixel coordinate
– pixel value
– or both
• Both coordinates and value are discrete
Example
• 640 x 480 8-bit image
Cont…
Filtering
• digital images are often processed
using “digital filters”
• digital filters are based on
mathematical functions that operate
on the pixels of the image
Filtering

• there are two classes of digital


filters: global and local
• global filters transform each
pixel uniformly according to the
function regardless of its
location in the image
• local filters transform a pixel
depending upon its relation to
surrounding ones
Global Filters
 Brightness and Contrast control
 Histogram thesholding
 Histogram stretching or equalization
 Color corrections
 Hue-shifting and colorizing
 Inversions
 Sharpening
 Blurring
 Un-sharp Masking
 Edge and line detection
 Noise filters
• Edge and line detection filters subtract all
parts of the image except edges or boundaries
between two different regions
• edge detection is often used to recognized
objects of interest in the image

edges and lines detected in


an image of toy blocks
Analysis
• Image improvement
– Eliminating noise (due to external effects or
missing pixels), or by increasing the contrast
• Pattern Discovery and Recognition
– OCR – Optical Character Recognition
• Scene Analysis and Computer Vision
– Recognition and reconstruction of 3D models of
the scene.
– Industrial robot that measures the relative sizes,
shapes, positions, and color of the objects.
Image Properties
• Color
– Use color histogram
• Texture
– Surface structure
– Use gray-level representation
• Edge detection
Steps involved in image recognition
Formatting and Conditioning
• Image Formatting
– Image Formatting means capturing an image by bringing it
into a digital form
• Conditioning
– In an image, there are usually features which are uninteresting,
either because they were introduced into the image during the
digitization process as noise, or because they form part of a
background.
– An observed image is composed of informative patterns
modified by uninteresting random variations.
– Conditioning suppresses, or normalizes, the uninteresting
variations in the image, effectively highlighting the interesting
parts of the image.
Labeling
• Informative patterns in an image have structure.
• Patterns are usually composed of adjacent pixels which share
some property such that it can be inferred that they are part of the
same structure (e.g., an edge).
• Edge detection techniques focus on identifying continuous
adjacent pixels which differ greatly in intensity or colour, because
these are likely to mark boundaries, between objects, or an object
and the background, and hence form an edge.
• After the edge detection process is complete, many edge will
have been identified. However, not all of the edges are significant.
• Thesholding: filters out insignificant edges. The remaining edges
are labeled. More complex labeling operations may involve
identifying and labeling shape primitives and corner finding.
Grouping
 Grouping can turn edges into lines by
determining that different edges belong to the
same spatial event.
 The first 3 operations represent the image as a
digital image data structure (pixel information),
however, from the grouping operation the data
structure needs also to record the spatial events
to which each pixel belongs.
 This information is stored in a logical data structure.
Extracting
• Grouping only records the spatial event(s) to which pixels
belong.
• Feature extraction involves generating a list of properties for
each set of pixels in a spatial event.
• These may include a set's centroid, area, orientation, spatial
moments, grey tone moments, spatial-grey tone moments,
circumscribing circle, inscribing circle, etc.
• Additionally properties depend on whether the group is
considered a region or an arc.
• If it is a region, then the number of holes might be useful. In the
case of an arc, the average curvature of the arc might be useful to
know.
• Feature extraction can also describe the topographical
relationships between different groups.
Matching
• Finally, once the pixels in the image have been grouped into
objects and the relationship between the different objects has
been determined, the final step is to recognize the objects in
the image.
• Matching involves comparing each object in the image with
previously stored models and determining the best match
template matching.
Transmission
 Transmission through network
 Formats
 Raw Digital Image
 Compressed Digital Image
 Symbolic Representation
Editing Images
• Editing or retouching an image involves selecting a
region of the digital image for processing using some
special effect
• Image compositing combines components of two or more
images into a single image
• Painting (or rotoscoping) an image is to edit the image by
hand with graphic tools that alter color and details
• Compositing images involves combining separate image
layers into one image
• Layers may be moved and arranged
2. Basics of Video
1. Types of Video Signals
a. Component video - each primary is sent as a separate video signal.
– The primaries can either be RGB or a luminance-chrominance
transformation of them (e.g., YIQ, YUV).
– Best color reproduction
– Requires more bandwidth and good synchronization of the three
components
b. Composite video : color (chrominance) and luminance
– signals are mixed into a single carrier wave.
– Some interference between the two signals is inevitable.
c. S-Video (Separated video, e.g., in S-VHS)
– a compromise between component analog video and the composite
video.
– It uses two lines, one for luminance and another for composite
chrominance signal.
luminance-chrominance
 YUV is a color encoding system typically used as part of a color image pipeline
 color image pipeline: An image pipeline or video pipeline is the set of
components commonly used between an image source (such as a camera, a
scanner, or the rendering engine in a computer game), and an image renderer
(such as a television set, a computer screen, a computer printer or cinema
screen), or for performing any intermediate digital image processing consisting
of two or more separate processing blocks.
 An image/video pipeline may be implemented as computer software, in a digital
signal processor.
 In addition, analog circuits can be used to do many of the same functions.
 Typical components include
 image sensor corrections (including "debaying" or applying a Bayer filter),
 noise reduction, image scaling, gamma correction,
 image enhancement,
 Color space conversion (between formats such as RGB, YUV or YCbCr),
 chroma subsampling,
 Frame rate conversion,
 image compression/video compression (such as JPEG), and
 computer data storage/data transmission.
YUV is a color encoding system typically used as
part of a color image pipeline. It encodes a color
image or video taking human perception into
account, allowing reduced bandwidth for
chrominance components, thereby typically
enabling transmission errors or compression
artifacts to be more efficiently masked by the
human perception than using a "direct" RGB-
representation.
Other color encodings have similar properties, and
the main reason to implement or investigate
properties of Y′UV would be for interfacing with
analog or digital television or photographic
equipment that conforms to certain Y′UV
standards.
The Y′UV model defines a color space in terms of
one luma component (Y′) and two chrominance
components, called U (blue projection) and V (red
projection) respectively.
The Y′UV color model is used in the PAL composite
color video (excluding PAL-N) standard.
Previous black-and-white systems used only luma
(Y′) information.
Color information (U and V) was added separately
via a subcarrier so that a black-and-white receiver
would still be able to receive and display a color
picture transmission in the receiver's native black-
and-white format.
 Y′ stands for the luma component (the brightness) and U and
V are the chrominance (color) components; luminance is
denoted by Y and luma by Y′ - the prime symbols (') denote
gamma compression, with "luminance" meaning physical
linear-space brightness, while "luma" is (nonlinear)
perceptual brightness.
 The scope of the terms Y′UV, YUV, YCbCr, YPbPr, etc., is
sometimes ambiguous and overlapping.
 Historically, the terms YUV and Y′UV were used for a
specific analog encoding of color information in television
systems, while YCbCr was used for digital encoding of color
information suited for video and still-image compression and
transmission such as MPEG and JPEG.
 Today, the term YUV is commonly used in the computer
industry to describe file-formats that are encoded using
YCbCr.
 The YPbPr color model used in analog component video and
its digital version YCbCr used in digital video are more or less
derived from it, and are sometimes called Y′UV.
 (CB/PB and CR/PR are deviations from grey on blue–yellow
and red–cyan axes, whereas U and V are blue–luminance and
red–luminance differences respectively.)
 The Y′IQ color space used in the analog NTSC television
broadcasting system is related to it, although in a more
complex way.
 The YDbDr color space used in the analog SECAM and PAL-
N television broadcasting systems, are also related.
Upper half for typical use of two imaging pipelines
and Lower half for computer App
Digital signals are often compressed to reduce file size
and save transmission time.
Since the human visual system is much more sensitive to
variations in brightness than color, a video system can be
optimized by devoting more bandwidth to the luma
component (usually denoted Y'), than to the color
difference components Cb and Cr.
 In compressed images, for example, the 4:2:2 Y'CbCr
scheme requires two-thirds the bandwidth of (4:4:4)
R'G'B'.
This reduction results in almost no visual difference as
perceived by the viewer.
Luma
 Jump to search in video, luma represents the brightness
in an image (the "black-and-white" or achromatic portion
of the image).
 Luma is typically paired with chrominance. Luma
represents the achromatic image, while the chroma
components represent the color information.
 Converting R′G′B′ sources (such as the output of a three-
CCD camera) into luma and chroma allows for chroma
subsampling: because human vision has finer spatial
sensitivity to luminance ("black and white") differences
than chromatic differences, video systems can store and
transmit chromatic information at lower resolution,
optimizing perceived detail at a particular bandwidth.
A visualization of YCbCr color space
• The CbCr plane at constant luma Y′=0.5
• A color image and its Y′, CB and CR components. The Y′
image is essentially a greyscale copy of the main image.
• YCbCr, Y′CbCr, or Y Pb/Cb Pr/Cr, also written as YCBCR or
Y'CBCR, is a family of color spaces used as a part of the
color image pipeline in video and digital photography
systems.
• Y is the luma component and CB and CR are the blue-
difference and red-difference chroma components.
• Y′ (with prime) is distinguished from Y, which is
luminance, meaning that light intensity is nonlinearly
encoded based on gamma corrected RGB primaries.
Analog Video
• Analog video is represented as a continuous (time varying) signal;
• Digital video is represented as a sequence of digital images
a. NTSC Video
– 525 scan lines per frame, 30 fps (33.37 msec/frame).
– Interlaced, each frame is divided into 2 fields, 262.5 lines/field
– 20 lines reserved for control
– information at the beginning of each field
– So a maximum of 485 lines of visible data
b. Laserdisc and S-VHS have actual
– resolution of ~420 lines
– Ordinary TV -- ~320 lines
– Each line takes 63.5 microseconds to scan.
e. Color representation:
– Uses YIQ color model.
• There are three principal Analogue Video signal formats:
 NTSC ( National Television Systems Committee: the US Standard),
 PAL (Phase Alternate Line: the European Standard) and
 SECAM (the French Standard).
 There are several minor variations of PAL and SECAM as well.
 All these are television video formats in which the information in each
picture captured by the CCD or CRT is scanned from left to right to create
a sequential intensity signal.
 The formats take advantage of the persistence of human vision by using an
interlaced scanning pattern in which the odd and even lines of each picture
are read out in two separate scans of the odd and even fields respectively.
 This allows good reproduction of movement in the scene at the relatively
low field rate of 50 fields/sec for PAL and SECAM and 60 fields/sec for
NTSC.
2. PAL (SECAM) Video

• 625 scan lines per frame, 25 frames per second


(40 msec/frame)
• Interlaced, each frame is divided into 2 fields,
312.5 lines/field
• Color representation:
– Uses YUV color model
PAL
 The PAL signal is a 2:1 interlaced video signal with 625
lines per frame (362.5 lines/field), 50 fields per second and
4:3 aspect ratio. The line frequency is thus 625 *25 =
15:625 KHz, thus the line period is 1=(625 *25) = 64µs.
 Some time is necessary for the horizontal retrace thus the
time available for encoding the picture information on each
line is less than 64µs.
 The information along a scanned line is thus superimposed
on a 64µs long signal containing a line synchronization
pulse and various blanking intervals to allow for the
horizontal retrace.
 The line signals are joined end to end as the picture is
scanned and various timing pulses are inserted to indicate
the end of a odd and even field.
 These timing pulses are called vertical synchronization
signals.
• The PAl format represents colour as YUV.
• For black and white video, the active line part of the
video signal is simply the space varying Y component.
• For colour pictures, the colour components are
encoded using QAM to create a composite video signal
c = Y + U sin(µ) + V cos(µ).
• Here, µ= 2¼Fc where Fc = 4:43 MHz. The term
Phase Alternate Line arises because the phase of the V
modulation is reversed by 180± for each consecutive
line.
• This allows errors in color subcarrier to be averaged
out in the receiver.
NTSC
• NTSC is also a 2:1 interlaced video signal. However it
has 525 lines per frame (262.5 lines/field), 60 fields per
second and 4:3 aspect ratio.
• The line frequency is thus 525*30 = 15:75 KHz, thus the
line period is 1=(525 *30) = 63:5µs.
• As with PAL, some time is necessary for the horizontal
retrace thus the time available for encoding the picture
information on each line is less than 63:5µs.
• The active picture information is combined with this
video signal in a similar manner to PAL except that the
timing of the vertical synchronization pulses is different.
Frame Rate and Interlacing
 Persistence of vision: The human eye retains an image for a fraction
of a second after it views the image. This property is essential to all
visual display technologies.
 The basic idea is quite simple, single still frames are presented at a high
enough rate so that persistence of vision integrates these still frames
into motion.
 Motion pictures originally set the frame rate at 16 frames per
second.
 This was rapidly found to be unacceptable and the frame rate was
increased to 24 frames per second.
 In Europe, this was changed to 25 frames per second, as the European
power line frequency is 50 Hz.
 When NTSC television standards were introduced, the frame rate
was set at 30 Hz (1/2 the 60 Hz line frequency).
 Movies filmed at 24 frames per second are simply converted to 30
frames per second on television broadcasting.
Frame Rate and Interlacing
 For some reason, the brighter the still image presented to the
viewer, the shorter the persistence of vision. So, bright
pictures require more frequent repetition.
 If the space between pictures is longer than the period of
persistence of vision -- then the image flickers.
– Large bright theater projectors avoid this problem by placing
rotating shutters in front of the image in order to increase the
repetition rate by a factor of 2 (to 48) or three (to 72) without
changing the actual images.
– Unfortunately, there is no easy way to "put a shutter" in front of a
television broadcast! Therefore, to arrange for two "flashes" per
frame, the flashes are created by interlacing.
 With interlacing, the number of "flashes" per frame is two,
and the field rate is double the frame rate.
 Thus, NTSC systems have a field rate of 59.94 Hz and
PAL/SECAM systems a field rate of 50 Hz.
Digital Video
 Advantages over analog:

– Direct random access --> good for nonlinear video editing

– No problem for repeated recording

– No need for blanking and sync pulse

 Almost all digital video uses component video

 The human eye responds more precisely to brightness information than it does to color, chroma subsampling

(decimating) takes advantage of this.

– In a 4:4:4 scheme, each 8×8 matrix of RGB pixels converts to three YCrCb 8×8 matrices: one for

luminance (Y) and one for each of the two chrominance bands (Cr and Cb).

– A 4:2:2 scheme also creates one 8×8 luminance matrix but decimates every two horizontal pixels to create

each chrominance-matrix entry.

– Thus reducing the amount of data to 2/3rds of a 4:4:4 scheme.

 Ratios of 4:2:0 decimate chrominance both horizontally and vertically, resulting in four Y, one Cr, and one Cb

8×8 matrix for every four 8×8 pixel-matrix sources. This conversion creates half the data required in a 4:4:4

chroma ratio.
Chroma Subsampling
HDTV
Computer Video Format
 Depends on the i/p and o/p devices (digitizers) for motion video medium.
 Digitizers differ in frame resolution, quantization and frame rate
– IRIS video board VINO takes NTSC video signal and after digitization can
achieve frame resolution of 640x480 pixels, 8 bits/pixel and 4 fps.
– SunVideo digitizer captures NTSC video signal in the form of an RGB signal
with frame resolution of 320x240 pixels, 8 bits/pixel and 30 fps.
 Computer video controller standards
– The Color Graphics Adapter (CGA): 320 x 240 pixels x 2 bits/pixel = 16,000
bytes (storage capacity per image)
– The Enhanced Graphics Adapter (EGA): 640 x 350 pixels x 4 bits/pixel =
112,000 bytes
– The Video Graphics Array (VGA): 640 x 480 pixels x 8 bits/pixel = 307,200
bytes
– The 8514/A Display Adapter Mode: 1024 x 768 pixels x 8 bits/pixel = 786,432
bytes
– The Extended Graphics Array (XGA): 1024x768 at 256 colors or 640x480 at
65,000 colors
– The Super VGA (SVGS): Upto 1024x768 pixels x 24 bits/pixel = 2,359,296
bytes

You might also like