0% found this document useful (0 votes)
2 views

Chapter 3 - Data Formats

Uploaded by

Sangdup Tamang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Chapter 3 - Data Formats

Uploaded by

Sangdup Tamang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Chapter 3 – Data Formats

· This chapter covers the following main topics:


o Introduction
o Alphanumeric character data representation
o Image data representation
o Video data representation
o Audio data representation
o Data Compression

Introduction

· Computers store and process data using the binary number system regardless of the type of data
· Human, on the other hand, communicate using numbers, text, images, and sound (i.e. multimedia)
· Therefore, it is almost always necessary to:
o Convert multimedia data into binary format for computer storage and processing
o Convert computer binary data into multimedia format for human presentation
· Figure 3.1 in page 62, shows an example of text data conversion process
o Data is inputted into the computer through an input device
o Data is converted into the computer binary format for storage and processing
· The detail internal format for the data depends mainly on:
o Complexity and capability of the input device
o Complexity and capability of the output device
o Required accuracy and resolution of the data
o Complexity and capability of the Software processing the data
· Additional data describing the meaning of data is needed when storing and transmitting data
· This is necessary to allow reproduction of the data
· Application programs define certain format for the data they can recognize and process
· There are 2 types of data format:
o Standard format
§ Publicly known and published data format
§ Allows wide variety of hardware and software to recognize the data
§ Allows many different systems to share and exchange data
§ Standards are designed for high resolution, convenience and high performance
processing, efficiency in storage, flexibility, and other criteria
§ Sometimes some these objectives may contradict and may require trade offs (e.g.
resolution and performance)
o Proprietary format
§ Unpublished format and only known to a particular system
§ Other system will not be able to recognize the data in its original format
§ Data must be converted before it can be stored or processed by a different system
§ Example: WordPerfect and Microsoft Word uses different data formats

Alphanumeric Character Data Representation

· Alphanumeric data consists of letters, numbers, punctuations, an symbols in a particular language


(e.g. English)
· Majority of data used in computers is alphanumeric
· Keyboard is the main device for inputting alphanumeric data
· Alternative methods to input alphanumeric data also exists:
o Bar Code Reader
§ An optical reader is used to scan a printed code
§ A bar code translation module translates the binary input into sequence of
numbers

o Magnetic Stripe Reader


§ Works very similar to magnetic tape
§ Alphanumeric data is stored on the card magnetic stripe
§ Special reader reads the data from the magnetic stripe

o Scanning with Optical Character Recognition (OCR)


§ Scan printed or hand written text using an optical scanner
§ OCR software translates the scanned text images into alphanumeric data

o Voice Input
§ Increasingly getting popular due to improvements in voice recognition
technology
§ Complex translation process is required
§ Voice data is translated into alphanumeric data

· Numeric data is accepted as characters and is converted, by software, to numeric


· Numeric data representation is covered in more details in chapters 4 and 5

Encoding Standards
· Encoding is the process of converting alphanumeric data into binary format for storage and
processing
· Decoding is the process of converting from the binary representation into alphanumeric characters
· Each supported alphanumeric character has unique numeric code (e.g. 0 – 255 in ASCII)
· Codes are ordered, this order is known as Collating Sequence (e.g. ‘0’ – 9, a – z, A – Z, etc.)
· Coding standard is used to define the codes for the supported characters
· Standards defines two classes of codes
o Printing characters characters that can be displayed or printed
o Control characters characters cannot be displayed or printed
· 3 alphanumeric coding standards are in common use:
o ASCII (American Standard Code for Information Interchange)
§ Originally developed as 7-bit code supporting 27 = 128 characters
§ Amended to 8-bit code to support additional 128 characters (28 = 256 characters)
§ ASCII is an ISO standard
§ See English ASCII table in Figure 3.3, page 67

o EBCDIC (Extended Binary Coded Decimal Interchange Code)


§ 8-bit code supporting 28 = 256 characters
§ Developed by IBM and supported mainly by the IBM mainframe computers
§ See English EBCDIC table in Figure 3.4, page 68

o Unicode
§ 16-bit international standard supporting 216 = 65,536 characters
Ø 49000 characters has been defined for common use
Ø 6400 are reserved for private use
Ø 10000 for future expansion
§ Multi language, developed to overcome ASCII and EBCDIC limitations
§ ASCII is a subset of Unicode (i.e. correspond to the first 256 Unicode characters)
§ Conversion between Unicode and ASCII is very simple (add/remove a hex digit
with value zero to the left)
§ Unicode is increasingly replacing ASCII and EBCDIC as the choice of encoding
in most modern systems
§ Windows uses Unicode in an attempt to be accepted as universal Operating
System
§ See Unicode table in Figure 3.5, page 69
· Storage requirements for alphanumeric data is very low
· 1 byte of storage is required per character in case of ASCI/EBCDIC and 2 bytes in case of Unicode

Keyboard Operation
· Keyboard input is handled using the following process
o 2 binary codes, known as the scan code, are assigned for each key
o A scan code is generated when the key is pressed and another when the key is released
o The 2 codes are necessary to allow detection of multiple key combination (e.g. shift + A)
o Scan codes are translated into the appropriate Unicode/ASCII/EBCDEC code
· Keyboard input is treated as a sequential stream of characters, including the carriage return
· See Figure 3.7 in page 72 for illustration of keyboard operation (notice the mistake in the figure,
the I should be 1001001)

Image Data Representation

· Images come in many different shapes, sizes, textures, colors, and shading
· Images are formatted according to processing, display, application, and storage requirements
· Therefore, it is becoming more difficult to define a single universal format for image data
· Image formats are divided into 2 main categories:
o Continuous or Bit Map Images or Raster Images
o Geometric or Object Images or Vector Images

Bit Map Images


· Image with variation in shading, color, shape, and texture (e.g. photograph, painting)
· Bit map image is a representation of the individual points in the image
· Bit map image is made up of pixels with each pixel representing an individual point in the image
· Bit map image is only an approximation of the original image with variable accuracy
· Each pixel is represented as a numeric code representing an actual color value for the pixel
· 2 values determine the resolution/accuracy of the image:
o Size of the pixel number of pixels per inch (the smaller the size, the higher the
resolution)
o Number of levels number of gray levels or colors per pixel (1-3 bytes are typical)
· Computers represent bit map image data using several image representation standards
· Common standards include:
o GIF (Graphics Interchange Format) most common and de facto standard
o TIFF (Tagged Image File Format)
o PCX (PC Paintbrush)
o Windows Bitmap
o BMP
o PNG
· Figure 3.11 in page 78 shows the GIF file format

· Storage and transmission of bit map images requires image description data
o Dimension
o Resolution
o Number of intensity levels
o Definition of color palette
· Bit map storage requirement is large and increases with the increase of
o Image resolution
o Number of supported colors
o Size of the image
Storage Requirement = number of pixels in the image X number of color bytes

Example
300x200 pixels with 24-bits color image requires:
300 x 200 x 3 = 180KB

Object Images
· Image is made up of simple graphical shapes (e.g. line, square, circle, curve, square, etc.)
· Each shape is represented using geometric and relative position data
· Object images are typically produced using drawing software
· Computers represent object image data using several image representation standards
· Common standards include:
o PostScript
o JPEG
o SWF
o SVG
· Figures 3.13 and 3.14 in pages 80 and 81shows the PostScript file format

Advantages
o Efficient manipulation without lose of details (e.g. moving, rotation, resize, etc.)
o Great processing flexibility
o Low storage requirements

Disadvantages
o Some images cannot be represented (e.g. photographs)
o Object images must be converted into bit map before they can be displayed or printed
This is necessary since display screens and printers plot images pixel by pixel

Representing Characters as Images


· In GUI based systems, characters are represented in different styles and fonts
· To make this possible, characters are represented as images
· In these systems it is necessary to distinguish between the character and its image
o Special description code sequence is stored with each character (i.e. requires multiple
control characters)
· Different encoding techniques are in use, often in proprietary formats
· The Macintosh operating system has an encoding scheme, known as glyphs, that allow encoding of
characters with both its identification and its font

Input Devices
· The following are the main device for inputting image data
o Scanner
o Digital camera
o Video camera
o Graphic drawing and image processing software

Output Devices
· The following are the main device for outputting image data
o Screen
o Printer

Video Data

· Video representation is time sensitive (i.e. the longer the video clip, the higher the storage
requirements)
Storage Requirement = number of pixels in the image X number of color bytes X refresh rate
X video length

Example
640x480 pixels, 24-bits color, 30 frames per second, and 2 seconds long video camera
generates:
640 x 480 x 3 x 30 x 2 = 55.30 MB

· Increasing in the value of any of the above variables (e.g. resolution, number of colors, and length
of video clip) will result in an increase in the storage requirements
· Video representation needs special consideration due to its massive amount of storage requirements
· Video representation requires:
o High performance processing to decode such huge amount of data in real time
o High transmission rate needed to transfer such huge amount of data in real time
· There are number of techniques used to reduced video image size:
o Reduce the size of the image
o Reduce screen resolution
o Limit number of colors
o Reduce frame rate
o Compress data
· Each of the above technique has obvious drawbacks
· There are 2 methods used to read video data when needed for a playback
o Local data is stored in the local disk
o Streaming real time downloading, decoding, and presentation (e.g. Web download,
video conferencing)
· There are a number of encoding formats for video data
o Quicktime developed by Apple
o Indeo developed by Intel
o MPEG-2 movie quality images with data compression of 30-40 MB per minute
o Real Video
o WMV

Input Devices
· The following are the main device for inputting video data
o Digital video camera
o Analog video camera
Output Devices
· The following are the main device for outputting video data
o Screen

Audio Data
· Audio is becoming an important component of the Multimedia support in computers
· Audio is analog frequency
· The audio frequency is sampled thousands of times per second (e.g. 50,000 times a second)
· Each sample is encoded into a binary number (size depends on the resolution) (1 – 3 bytes)
· Audio representation is time sensitive (i.e. the longer the audio clip, the higher the storage
requirements)
Storage Requirement = number of sample bytes X sample rate X audio length

Example
Find the size of generated audio data given
An Audio wave is sampled at 50000 times per second
Each sample requires 2 bytes to represent the sample code
Audio length is 5 seconds
50000 x 2 x 5 = 500 KB

· Audio must be converted to digital format (i.e. binary format) for storage and processing
· The conversion process consists of the following high-level steps
o The analog waveform is sampled at a regular time intervals (e.g. 50,000 times a second)
o The amplitude of the sample is measured and converted to a specific number
§ Loudest possible sound is set as the maximum positive number
§ Lowest possible sound is set as the maximum negative number
§ Zero is set to as the middle point
o Convert a waveform sample into its encoding number
· See Figure 3.15 in page 84 for audio waveform digitizing process

· Computers represent audio data using several representation standards


· Common standards include:
o MOD store samples of sounds that can be manipulated to produce new sounds
o MIDI used to coordinate sounds and signals with connected musical instruments
o VOC general format with repeat and synchronize features (multimedia
presentations)
o WAV general purpose format used to store and reproduce sound
o MP3

Input Devices
· The following are the main device for inputting audio data
o Microphone
o Instruments connected to the computer (e.g. musical keyboard)

Output Devices
· The following are the main device for outputting audio data
o Speaker
Data Compression
· Data compression is a technique used to reduce the size of data
· It sometimes necessary to reduce storage requirements and allow for more efficient transmission
· Video, audio, and image data has particular use for compression
· Compression is a trade-off between compression efficiency (i.e. storage) and performance
impact (i.e. processing)
· There are 2 types of data compression
o Lossless
§ Lossless algorithms allow for the restoration of the exact original data
§ Used when data accuracy is essential and lose of data cannot be tolerated
§ GIF images and ZIP files are generated in this type of compression

Advantages
§ Very accurate

Disadvantages
§ Requires overhead to describe the eliminated data
§ Has low data compression ratio

o Lossy
§ Lossy algorithms do not allow the restoration of the exact original data
§ Eliminated data is not recoverable so there is no need to describe the eliminated
data
§ Used when data accuracy is not essential and lose of data can be tolerated
§ Main advantage is high compression ratio
§ JPEG formats is generated in this type of compression

Advantages
§ Low overhead
§ High compression ration

Disadvantages
§ Not accurate

You might also like