MM 2
MM 2
The process of capturing digital images depends initially upon the image’s origin, that is, real
world pictures or digital images. An image capturing device, such as a CCD scanner or CCD camera for
still images, or a frame grabber for moving images. Graphics are generated by use of interactive graphic
systems. Digital images are normally very large and the ways to store these are given below.
Image Formats
Image formats are basically of two kinds:
This is the format that comes out from an image frame grabber, such as VideoPix card, Parallax ,
etc. It is specified by mainly two parameters:
Both these parameter values depend on hardware and software for the input/output of images.
For example, for image capturing on a SPARCstation, the VideoPix card and its software are used. The
spatial resolution is 320 X 240 pixels and the color can be encoded with 1-bit (a binary image format), 8-
bit (color or grayscale) or 24-bit (color-RGB).
While storing an image, we store a two-dimensional array of values, in which each value
represents the data associated with a pixel in the image. For a bitmap, this value is a binary digit. For a
color image (pixmap), the value may be a collection of:
•Three numbers representing the intensities of the red, green and blue components of the color a that
pixel.
•Three numbers that are indices to tables of the red, green and blue intensities.
•An index to any number of other data structures that can represent a color.
The image may be compressed before storage for saving storage space. Some current image file
formats for storing images include GIF, X11 Bitmap, Sun Rasterfile, PostScript, IRIS, JPEG, TIFF, etc.
Graphics Format
Graphics image formats are specified through graphics primitives and their attributes.
•Graphics primitives include lines, rectangles, etc. specifying 2D objects or polyhedron, etc. specifying
3D objects. A graphics package determines which primitives are supported.
•Attributes of the graphics primitives include line style, line width, color effect, etc., that affect the
outcome of the graphical image.
Graphics primitives and their attributes represent a higher level of an image representation
where the graphical images are not represented by a pixel matrix, rather it is represented by bitmap or
pixmap.
A bitmap is an array of pixel values with one bit for each pixel. A pixmap is an array of pixel
values with multiple bits (e.g., 8 bits for 256 colors) for each pixel.
Q2. Audio representation in computer?
Each vertical bar in Figure represents a single sample. The height of a bar indicates the value of that
sample. The mechanism that converts an audio signal into digital samples is called an analog-to-digital
converter, or ADC. To convert a digital signal back to analog, you need a digital-to-analog converter, or
DAC.
Sampling: Sound wave form the smooth, continuous is not directly represented in the computer. The
computer measures the amplitude of the wave form in the regular time interval to produce the series
the numbers. Each of this measurement is called sample. This process is called sampling.
Sampling rate: the rate at which a continuous wave form is sampled is called sampling rate. Like
frequency, sampling rate are measured in Hz. For lossless digitization the sampling rate should be at
least twice of the maximum frequency response.
Quantization: Just as a wave form is sampled at discrete times the value of sample is also discrete. The
quantization of the sample value depends on the number of bits used in measuring the height of
the wave form. The lower quantization lower quality of sound, higher quantization higher quality of
sound.
Human Speech
Speech is based on spoken languages, which means that it has a semantic content. Human beings use
their speech organs without the need to knowingly control the generation of sounds. (Other species
such as bats also use acoustic signals to transmit information, but we will not discuss this here.) Speech
understanding means the efficient adaptation to speakers and their speaking habits. Despite the large
number of different dialects and emotional pronunciations, we can understand each other’s language.
The brain is capable of achieving a very good separation between speech and interference, using the
signals received by both ears. It is much more difficult for humans to filter Speech Output 33 signals
received in one ear only. The brain corrects speech recognition errors because it understands the
content, the grammar rules, and the phonetic and lexical word forms.
Speech signals have two important characteristics that can be used by speech processing applications:
• Voiced speech signals (in contrast to unvoiced sounds) have an almost periodic structure over a
certain time interval, so that these signals remain quasi-stationary for about 30ms.
• The spectrum of some sounds have characteristic maxima that normally involve up to five
frequencies. These frequency maxima, generated when speaking, are called formants. By definition, a
formant is a characteristic component of the quality of an utterance.
Speech Synthesis
Computers can translate an encoded description of a message into speech. This scheme is
called speech synthesis. A particular type of synthesis is text-to-speech conversion. Fair-quality text-to-
speech software has been commercially available for various computers and workstations, although the
speech produced in some lacks naturalness.
Speech recognition is normally achieved by drawing various comparisons. With the current
technology, a speaker-dependent recognition of approximately 25,000 words is possible. The problems
in speech recognition affecting the recognition quality include dialects, emotional pronunciations, and
environmental noise. It will probably take some time before the considerable performance discrepancy
between the human brain and a powerful computer will be bridged in order to improve speech
recognition and speech generation.
The methods used to reconstruct images include the Radon transform and stereoscopy
Radon Transform
It is a mathematical technique with profound implications in medical imaging, particularly in
the realm of computed tomography (CT) scans. The principle underlying the Radon Transform involves
capturing a series of X-ray projections of an object from multiple angles. Each projection represents the
integrated X-ray attenuation along a specific line through the object. These projections are then
combined using the Radon Transform to reconstruct a detailed cross-sectional image, commonly
referred to as a "tomogram."
In a CT scan, an X-ray source emits X-rays through the object, and a detector measures the intensity of
X-rays that have passed through the object. By rotating the X-ray source and detector around the object,
multiple projections are acquired from various angles. The Radon Transform mathematically processes
these projections, essentially "back-projecting" the intensity values to their corresponding positions in
the reconstructed image. The result is a cross-sectional image that reveals internal structures of the
object without the need for physical dissection.
Stereoscopy
Stereoscopy on the other hand, is a visual technique that aims to mimic the natural perception
of depth by our human visual system. This technique capitalizes on the fact that our eyes are positioned
slightly apart, giving each eye a slightly different view of the same scene. Our brain processes these
distinct views to perceive depth and spatial relationships in the environment.
To replicate this effect artificially, stereoscopy involves capturing or generating two separate
images of a scene, with a slight offset to simulate the viewpoint difference between our eyes. These
images are presented to each eye separately using specialized glasses or devices. When the brain
receives these distinct images, it fuses them to create a perception of depth. Objects appear to be at
different distances from the viewer, and the resulting experience is often referred to as a "3D effect."
Conventional Systems
Conventional system used in black and white and color television. Conventional television
systems employ the following standards:
NTSC (National Television Systems Committee)
• NTSC developed in U.S., is the oldest and most widely used television standard.
• The color carrier is used with approximately 4.429 MHZ or with approximately
3.57 MHZ.
• NTSC uses a quadrature amplitude modulation with a suppressed color carrier and
work with a motion frequency of approximately 30 Hz.
• 4×3 Aspect ratio.
• 525 lines
• 30 frames per second.
• Scanned in fields.
Television is the most important application that has driven the development of motion video.
Television is a telecommunication medium for transmitting and receiving moving images that can be
monochrome (black and white) or colored, with or without accompanying sound. Television may
also refer specifically to a television set, television programming or television transmission.
1] Conventional Systems:
Conventional system used in black and white and color television. Conventional television systems
employ the following standards:
PAL
Multimedia System (CMP 366.3)
2] Enhanced Systems
Enhanced Definition Television Systems (EDTV) are conventional systems modified to offer
improved vertical and/or horizontal resolution. EDTV are an intermediate solution, to digital interactive
television system and their coming standards.
• Permits several levels of picture resolution similar to that of High-Quality Computer Monitors,
with 720 or 1080 line (1280×720 pixels or 1920×1080 pixels).
• Uses MPEG-2 compression to squeeze a 19 Megabit per second data flow so that it can be
accommodated by a standard broadcast TV channel of 6 MHz bandwidth.