Computer Graphics and Multimedia Notes 3
Computer Graphics and Multimedia Notes 3
2 Version Number 2
The next figure shows the IFD (Image File Directory) as its content. The IFD is a variable –
length table containing directory entries. The length of table depends on the number of
directory entries in the table. The first two bytes contain the total number of entries in the table
followed by directory entries. Each directory entry consists of twelve bytes. The Last item in
the IFD is a four-byte pointer that points to the next IFD. The byte content of each directory
entry is as follows:
The first two byte contains tag number-tag ID.
The second two byte represent the type of data as shown in table3-1 below.
The next four bytes contains the length for the data type.
The final four bytes contain data or a pointer.
TIFF Tags
The first two bytes of each directory entry contain a field called the Tag ID. Tag IDs arc
grouped into several categories. They are Basic, Informational, Facsimile, Document storage
and Retrieval.
TIFF Classes: (Version 5.0)
It has five classes
1. Class B for binary images
2. Class F for Fax
3. Class G for gray-scale images
4. Class P for palette color images
5. Class R for RGB full-color images.
4.9.3 Resource Interchange File Format (RIFF)
The RIFF file formats consist' of blocks of data called chunks. They are
RIFF Chunk - defines the content of the RIFF file.
List Chunk - allows embedding additional file information such as archival location,
copy right information, creating date, and so on.
Subchunk - allow additional information to a primary chunk when the primary chunk
is not sufficient.
The first chunk in a RIFF file must be a RIFF chunk and it may contain one or more sub chunk.
The first four bytes of the RIFF chunk data field are allocated for the form type field containing
four characters to identify the format of the data stored in the file: AVI, WAV, RMI, PAL and
so on. Table shows the filename extensions used for Microsoft Windows multimedia RIFF file
types.
File Type Form Type File Extension
Waveform Audio File WAVE .WAV
Audio Video Interleaved file AVI .AVI
MIDI File RMID .RMI
Device Independent Bitmap File RDIB .RDI
Palette File PAL .PAL
The sub chunk contains a four-character ASCII string 10 to identify the type of data. Four bytes
of size contains the count of data values, and the data. The data structure of a chunk is same as
all other chunks.
RIFF chunk with two sub chunk:
The first 4 characters of the RlFF chunk are reserved for the "RIFF" ASCII string. The next
four bytes define the total data size: 8 bytes of the RIFF chunk itself, and the size of all
subchunks. The first four characters of the data field are reserved for Form Type. The rest of
the data field contains two subchunk:
(i) fmt - defines the recording characteristics of the waveform.
(ii) data - contains the data for the waveform.
LIST Chunk
RlFF chunk may contains one or more list chunks. List chunks allow embedding additional file
information such as archival location, copyright information, creating date, description of the
content of the file.
RIFF MIDI FILE FORMAT
RIFF MIDI contains a RIFF chunk with the form type "RMID" and a subchunk called "data"
for MIDI data. The 4 bytes are for ID of the RIFF chunk. 4 bytes are for size 4 bytes are for
form type 4 bytes are for ID of the subchunk data and 4 bytes are for the size of MIDI data.
RIFF DIBS (Device-Independent Bit Maps)
DIB is a Microsoft windows standard format. It defines bit maps and color attributes for bit
maps independent of devices. DIEs are normally embedded in .BMP files, .WMF meta data
files, and .CLP files.
DIB Structure
BITMAPINFOHEADER RGBQUAD PIXELS
A RIFF DIB file format contains a RIFF chunk with the Form Type "RDIB" and a subchunk
called "data" for DIB data.
4 bytes denote ID of the RIFF chunk
4 bytes refer size of XYZ.RDI 4 bytes define Forum Type
4 bytes describe ID of the sub chunk data 4 bytes define size of DIB data.
RIFF PALETTE File format
The RIFF Palette file format contains a RIFF chunk with the Form Type "RP AL" and a
subchunk called "data" for palette data. The Microsoft Windows logical palette structure is
enveloped in the RIFF data subchunk. The palette structure contains the palette version number,
number of palette entries, the intensity of red, green and blue colours, and flags for the palette
usage. The palette structure is described by the following code segment:
typedef struct tagLOGP ALETTE {
WORD palVersion; //Windows version number for the structure
WORD palNumEntries;
PALETIEENTRY palpalEntry []; //array of PALEN TRY data
} LOGPALETTE;
First Frame
Header Audio Second Frame
Video Audio
Video
Figure: Interleaved Audio and Video for AVI Files
4.9.4 MIDI File Format
The MIDI file format follows music recording metaphor to provide the means of storing
separate tracks of music for each instrument so that they can be read and synchronized when
they are played.
The MIDI file format also contains chunks (i.e., blocks) of data. There are two types of chunks:
(i) header chunks (ii) track chunks.
Header Chunk
It is made up of 14 bytes.
The first four-character string is the identifier string, "MThd" .
The second four bytes contain the data size for the header chunk. It is set to a fixed value of six
bytes.
The last six bytes contain data for header chunk.
Table shows an example of header chunk.
Header Field Byte # Value
Identifier String 1–4 4D 54 68 64
Data Size 5–8 00 00 00 06
Data 9 – 14 00 00 00 01 01 E0
Track chunk
The Track chunk is organized as follows:
The first 4-character string is the identifier.
The second 4 bytes contain track length.
The rest of the chunk contains MIDI messages.
MIDI Communication Protocol
This protocol uses 2 or more bytes messages. The number of bytes depends on the types of
message. There are two types of messages: (i) Channel messages and (ii) System messages.
Channel Messages
A channel message can have up to three bytes in a message. The first byte is called a status
byte, and other two bytes are called data bytes. The channel number, which addresses one of
the 16 channels, is encoded by the lower nibble of the status byte. Each MIDI voice has a
channel number; and messages are sent to the channel whose channel number matches the
channel number encoded in the lower nibble of the status byte. There are two types of channel
messages: voice messages and the mode messages.
Voice messages
Voice messages are used to control the voice of the instrument (or device); that is, switch the
notes on or off and sent key pressure messages indicating that the key is depressed, and send
control messages to control effects like vibrato, sustain, and tremolo. Pitch wheel messages are
used to change the pitch of all notes
Mode messages
Mode messages are used for assigning voice relationships for up to 16 channels; that is, to set
the device to MONO mode or POLY mode. Omni Mode on enables the device to receive voice
messages on all channels.
System Messages
System messages apply to the complete system rather than specific channels and do not contain
any channel numbers. There are three types of system messages: common messages, real-time
messages, and exclusive messages. In the following, we will see how these messages are used.
Common Messages
These messages are common to the complete system. These messages provide for functions
such as select a song, setting the song position pointer with number of beats, and sending a
tune request to an analog synthesizer.
System Real Time Messages
These messages are used for setting the system's real-time parameters. These parameters
include the timing clock, starting and stopping the sequencer, resuming the sequencer from a
stopped position, and resetting the system.
System Exclusive messages
These messages contain manufacturer-specific data such as identification, serial number,
model number, and other information. Here, a standard file format is generated which can be
moved across platforms and applications.
JPEG Motion Image:
JPEG Motion image will be embedded in AVI RIFF file format.
There are two standards available:
MPEG - In this, patent and copyright issues are there.
MPEG 2 - It provide better resolution and picture quality.
4.9.5 TWAIN
A standard interface was designed to allow application to interface with different types of input
devices such as scanners, digital still cameras, and so on, using a generic TWAIN interface
without creating device- specific driver. The benefits of this approach are as follows:
1. Application developers can code to a single TWAIN specification that allows
application to interface to all TWAIN-complaint input devices.
2. Device manufactures can write device drivers for their proprietary devices and, by
complying to the TWAIN specification, allow the devices to be used by all TWAIN-
compliant applications
TWAIN Specification Objectives
The TWAIN specification was started with a number of objectives:
Supports multiple platforms: including Microsoft Windows, Apple Macintosh
OSSystem6.xor7.x, UNIX, andIBMOSl2.
Supports multiple devices: including scanners, digital camera, frame grabbers etc.
Standard extendibility and backward compatibility: The TWAIN architecture is
extensible for new types of devices and new device functionality. New versions of the
specification are backward compatible.
Easy to use: The standard is well documented and easy to use.
The TWAIN architecture defines a set of application programming interfaces (APls) and a
protocol to acquire data from input devices. It is a layered architecture consisting of a protocol
layer and an acquisition layer sandwiched between the application and device layers. The
protocol layer is responsible for communication between the application and acquisition layers.
The acquisition layer contains the virtual device driver to control the device. This virtual layer
is also called the source.
TWAIN ARCHITECHTURE:
The Twain architecture defines a set of application programming interfaces (APls) and a
protocol to acquire data from input devices.
It is a layered architecture.
It has application layer, the protocol layer, the acquisition layer and device layer.
Application Layer:
A TWAIN application sets up a logical connection with a device. TWAIN does not impose any
rules on the design of an application. However, it set guidelines for the user interface to select
sources (logical device) from a given list of logical devices and also specifies user interface
guidelines to acquire data from the selected sources.
The Protocol Layer:
The application layer interfaces with the protocol layer. The protocol layer is responsible for
communications between the application and acquisition layers. The protocol layer does not
specify the method of implementation of sources, physical connection to devices, control of
devices, and other device-related functionality. This clearly highlights that applications are
independent of sources. The heart of the protocol layer, as shown in Figure is the Source
Manager. It manages all sessions between an application and the sources, and monitors data
acquisition transactions.
The functionality of the Source Manager is as follows:
Provide a standard API for all TWAIN compliant sources
Provides election of sources for a user from within an application
Establish logical sessions between applications and sources, and also manages essions
between multiple applications and multiple sources
Act as a traffic cop to make sure that transactions and communication are routed to
appropriate sources, and also validate all transactions
Keep track of sessions and unique session identities
Load or unload sources as demanded by an application
Pass all return code from the source to the application
Maintain a default source
The Acquisition Layer:
The acquisition layer contains the virtual device driver, it interacts directly with the device
driver. This virtual layer is also called the source. The source can be local and logically
connected to a local device, or remote and logically connected to a remote device (i.e., a device
over the network).
The source performs the following functions:
Control of the device.
Acquisition of data from the device.
Transfer of data in agreed (negotiated) format. This can be transferred in native format
or another filtered format.
Provision of a user interface to control the device.
The Device Layer:
The purpose of the device driver is to receive software commands and control the device
hardware accordingly. This is generally developed by the device manufacturer and shipped
with the device.
NEW WAVE RIFF File Format: This format contains two subchunks:
1. Fmt
2. Data.
It may contain optional subchunks:
1. Fact
2. Cue points
3. Play list
4. Associated Data Chunk.
5. Inst (Instrumental) Chunk.
Fact Chunk: It stores file-dependent information about the contents of the WAVE file.
Cue Points Chunk: It identifies a series of positions in the waveform data stream.
Playlist Chunk: It specifies a play order for series of cue points.
Associated Data Chunk: It provides the ability to attach information, such as labels, to sections
of the waveform data stream.
Inst Chunk: The file format stores sampled sound synthesizer's samples.
Analog-to-Digital Converter
Amplitude and Noise Normalization
Parametric Analysis
Train Recognize
Train or
Recognize
Compare Unknown
With Reference pattern
4. Insertion Error
Number of insertion error
Insertion error = 𝑋 100
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑇𝑒𝑠𝑡 𝑊𝑜𝑟𝑑𝑠
System Performance:
System performance is dependent on the size of the vocabulary, speaker independence,
response time, user interface, and system throughput.
Voice Recognition Applications
Voice mail integration: The voice-mail message can be integrated with e-mail messages to
create an integrated message.
DataBase Input and Query Applications
A number of applications are developed around the voice recognition and voice synthesis
function. The following lists a few applications which use Voice recognition.
Application such as order entry and tracking. It is a server function; It is centralized;
Remote users can dial into the system to enter an order or to track the order by making
a Voice query.
Voice-activated rolodex or address book when a user speaks the name of the person,
the rolodex application searches the name and address and voice-synthesizes the name,
address, telephone numbers and fax numbers of a selected person. In medical
emergency, ambulance technicians can dial in and register patients by speaking into the
hospital's centralized system.
Police can make a voice query through central data base to take follow-up action ifhe
catch any suspect.
Language-teaching systems are an obvious use for this technology. The system can ask
the student to spell or speak a word. When the student speaks or spells the word, the
systems performs voice recognition and measures the student's ability to spell. Based
on the student's ability, the system can adjust the level of the course. This creates a self-
adjustable learning system to follow the individual's pace.
Foreign language learning is another good application where"' an individual student can
input words and sentences in the system. The system can then correct for pronunciation
or grammar.