Chapter 3 - Data Formats
Chapter 3 - Data Formats
Introduction
· Computers store and process data using the binary number system regardless of the type of data
· Human, on the other hand, communicate using numbers, text, images, and sound (i.e. multimedia)
· Therefore, it is almost always necessary to:
o Convert multimedia data into binary format for computer storage and processing
o Convert computer binary data into multimedia format for human presentation
· Figure 3.1 in page 62, shows an example of text data conversion process
o Data is inputted into the computer through an input device
o Data is converted into the computer binary format for storage and processing
· The detail internal format for the data depends mainly on:
o Complexity and capability of the input device
o Complexity and capability of the output device
o Required accuracy and resolution of the data
o Complexity and capability of the Software processing the data
· Additional data describing the meaning of data is needed when storing and transmitting data
· This is necessary to allow reproduction of the data
· Application programs define certain format for the data they can recognize and process
· There are 2 types of data format:
o Standard format
§ Publicly known and published data format
§ Allows wide variety of hardware and software to recognize the data
§ Allows many different systems to share and exchange data
§ Standards are designed for high resolution, convenience and high performance
processing, efficiency in storage, flexibility, and other criteria
§ Sometimes some these objectives may contradict and may require trade offs (e.g.
resolution and performance)
o Proprietary format
§ Unpublished format and only known to a particular system
§ Other system will not be able to recognize the data in its original format
§ Data must be converted before it can be stored or processed by a different system
§ Example: WordPerfect and Microsoft Word uses different data formats
o Voice Input
§ Increasingly getting popular due to improvements in voice recognition
technology
§ Complex translation process is required
§ Voice data is translated into alphanumeric data
Encoding Standards
· Encoding is the process of converting alphanumeric data into binary format for storage and
processing
· Decoding is the process of converting from the binary representation into alphanumeric characters
· Each supported alphanumeric character has unique numeric code (e.g. 0 – 255 in ASCII)
· Codes are ordered, this order is known as Collating Sequence (e.g. ‘0’ – 9, a – z, A – Z, etc.)
· Coding standard is used to define the codes for the supported characters
· Standards defines two classes of codes
o Printing characters characters that can be displayed or printed
o Control characters characters cannot be displayed or printed
· 3 alphanumeric coding standards are in common use:
o ASCII (American Standard Code for Information Interchange)
§ Originally developed as 7-bit code supporting 27 = 128 characters
§ Amended to 8-bit code to support additional 128 characters (28 = 256 characters)
§ ASCII is an ISO standard
§ See English ASCII table in Figure 3.3, page 67
o Unicode
§ 16-bit international standard supporting 216 = 65,536 characters
Ø 49000 characters has been defined for common use
Ø 6400 are reserved for private use
Ø 10000 for future expansion
§ Multi language, developed to overcome ASCII and EBCDIC limitations
§ ASCII is a subset of Unicode (i.e. correspond to the first 256 Unicode characters)
§ Conversion between Unicode and ASCII is very simple (add/remove a hex digit
with value zero to the left)
§ Unicode is increasingly replacing ASCII and EBCDIC as the choice of encoding
in most modern systems
§ Windows uses Unicode in an attempt to be accepted as universal Operating
System
§ See Unicode table in Figure 3.5, page 69
· Storage requirements for alphanumeric data is very low
· 1 byte of storage is required per character in case of ASCI/EBCDIC and 2 bytes in case of Unicode
Keyboard Operation
· Keyboard input is handled using the following process
o 2 binary codes, known as the scan code, are assigned for each key
o A scan code is generated when the key is pressed and another when the key is released
o The 2 codes are necessary to allow detection of multiple key combination (e.g. shift + A)
o Scan codes are translated into the appropriate Unicode/ASCII/EBCDEC code
· Keyboard input is treated as a sequential stream of characters, including the carriage return
· See Figure 3.7 in page 72 for illustration of keyboard operation (notice the mistake in the figure,
the I should be 1001001)
· Images come in many different shapes, sizes, textures, colors, and shading
· Images are formatted according to processing, display, application, and storage requirements
· Therefore, it is becoming more difficult to define a single universal format for image data
· Image formats are divided into 2 main categories:
o Continuous or Bit Map Images or Raster Images
o Geometric or Object Images or Vector Images
· Storage and transmission of bit map images requires image description data
o Dimension
o Resolution
o Number of intensity levels
o Definition of color palette
· Bit map storage requirement is large and increases with the increase of
o Image resolution
o Number of supported colors
o Size of the image
Storage Requirement = number of pixels in the image X number of color bytes
Example
300x200 pixels with 24-bits color image requires:
300 x 200 x 3 = 180KB
Object Images
· Image is made up of simple graphical shapes (e.g. line, square, circle, curve, square, etc.)
· Each shape is represented using geometric and relative position data
· Object images are typically produced using drawing software
· Computers represent object image data using several image representation standards
· Common standards include:
o PostScript
o JPEG
o SWF
o SVG
· Figures 3.13 and 3.14 in pages 80 and 81shows the PostScript file format
Advantages
o Efficient manipulation without lose of details (e.g. moving, rotation, resize, etc.)
o Great processing flexibility
o Low storage requirements
Disadvantages
o Some images cannot be represented (e.g. photographs)
o Object images must be converted into bit map before they can be displayed or printed
This is necessary since display screens and printers plot images pixel by pixel
Input Devices
· The following are the main device for inputting image data
o Scanner
o Digital camera
o Video camera
o Graphic drawing and image processing software
Output Devices
· The following are the main device for outputting image data
o Screen
o Printer
Video Data
· Video representation is time sensitive (i.e. the longer the video clip, the higher the storage
requirements)
Storage Requirement = number of pixels in the image X number of color bytes X refresh rate
X video length
Example
640x480 pixels, 24-bits color, 30 frames per second, and 2 seconds long video camera
generates:
640 x 480 x 3 x 30 x 2 = 55.30 MB
· Increasing in the value of any of the above variables (e.g. resolution, number of colors, and length
of video clip) will result in an increase in the storage requirements
· Video representation needs special consideration due to its massive amount of storage requirements
· Video representation requires:
o High performance processing to decode such huge amount of data in real time
o High transmission rate needed to transfer such huge amount of data in real time
· There are number of techniques used to reduced video image size:
o Reduce the size of the image
o Reduce screen resolution
o Limit number of colors
o Reduce frame rate
o Compress data
· Each of the above technique has obvious drawbacks
· There are 2 methods used to read video data when needed for a playback
o Local data is stored in the local disk
o Streaming real time downloading, decoding, and presentation (e.g. Web download,
video conferencing)
· There are a number of encoding formats for video data
o Quicktime developed by Apple
o Indeo developed by Intel
o MPEG-2 movie quality images with data compression of 30-40 MB per minute
o Real Video
o WMV
Input Devices
· The following are the main device for inputting video data
o Digital video camera
o Analog video camera
Output Devices
· The following are the main device for outputting video data
o Screen
Audio Data
· Audio is becoming an important component of the Multimedia support in computers
· Audio is analog frequency
· The audio frequency is sampled thousands of times per second (e.g. 50,000 times a second)
· Each sample is encoded into a binary number (size depends on the resolution) (1 – 3 bytes)
· Audio representation is time sensitive (i.e. the longer the audio clip, the higher the storage
requirements)
Storage Requirement = number of sample bytes X sample rate X audio length
Example
Find the size of generated audio data given
An Audio wave is sampled at 50000 times per second
Each sample requires 2 bytes to represent the sample code
Audio length is 5 seconds
50000 x 2 x 5 = 500 KB
· Audio must be converted to digital format (i.e. binary format) for storage and processing
· The conversion process consists of the following high-level steps
o The analog waveform is sampled at a regular time intervals (e.g. 50,000 times a second)
o The amplitude of the sample is measured and converted to a specific number
§ Loudest possible sound is set as the maximum positive number
§ Lowest possible sound is set as the maximum negative number
§ Zero is set to as the middle point
o Convert a waveform sample into its encoding number
· See Figure 3.15 in page 84 for audio waveform digitizing process
Input Devices
· The following are the main device for inputting audio data
o Microphone
o Instruments connected to the computer (e.g. musical keyboard)
Output Devices
· The following are the main device for outputting audio data
o Speaker
Data Compression
· Data compression is a technique used to reduce the size of data
· It sometimes necessary to reduce storage requirements and allow for more efficient transmission
· Video, audio, and image data has particular use for compression
· Compression is a trade-off between compression efficiency (i.e. storage) and performance
impact (i.e. processing)
· There are 2 types of data compression
o Lossless
§ Lossless algorithms allow for the restoration of the exact original data
§ Used when data accuracy is essential and lose of data cannot be tolerated
§ GIF images and ZIP files are generated in this type of compression
Advantages
§ Very accurate
Disadvantages
§ Requires overhead to describe the eliminated data
§ Has low data compression ratio
o Lossy
§ Lossy algorithms do not allow the restoration of the exact original data
§ Eliminated data is not recoverable so there is no need to describe the eliminated
data
§ Used when data accuracy is not essential and lose of data can be tolerated
§ Main advantage is high compression ratio
§ JPEG formats is generated in this type of compression
Advantages
§ Low overhead
§ High compression ration
Disadvantages
§ Not accurate