02 CSA Data Representation
02 CSA Data Representation
AICT001-3-2
Data Representation
Diploma in Computing and IT: Level - 1
Prepared by: TPM First Prepared on: 03-06-08 Last Modified on: 20--06-08
Quality checked by: KNT
Copyright 2008 Asia Pacific Institute of Information Technology
Topic & Structure of the lesson
Data Formats
• General Considerations
• Number systems and Inter-conversion
• Representation of
– Text
– Images
– Sounds
• Data compression techniques
• Internal Computer Data format
Computer Systems Architecture Slide 2 of 63
Learning Outcomes
Data Formats
• Computers
– Process and store all forms of data in binary format.
• Human communication
– Includes language, images and sounds.
• Data formats
– Specifications for converting data into computer-
usable form.
– Define the different ways human data may be
represented, stored and processed by a computer.
Computer Systems Architecture Slide 5 of 63
Computing Systems Data
Data Formats
• Decimal to Binary
– 2510 = 1 10012
– 14510 = 1001 00012
– 210 = 102
• Binary to Decimal
– 1111 00002 = 24010
– 11112 = 1510
– 1002 = 410
Computer Systems Architecture Slide 14 of 63
Binary Conversion
Data Formats
• Octal to Binary
– 258 = 010 1012
– 1458 = 001 100 1012
– 28 = 0102
• Binary to Octal
– 1111 00002 = 3608
– 11112 = 178
– 1002 = 48
Computer Systems Architecture Slide 15 of 63
Binary Conversion
Data Formats
• Hexadecimal to Binary
– 2516 = 0010 01012
– 14516 = 0001 0100 01012
– 216 = 00102
• Binary to Hexadecimal
– 1111 00002 = F016
– 11112 = F16
– 1002 = 416
Computer Systems Architecture Slide 16 of 63
Binary Conversion
Data Formats
• 2’s Complement
+2510 = 1 10012
-2510 = ?
1 10012 (+25)
0 01102 (flip the bits)
+ 1 (add 1)
0 0111
-2510 = 0 01112
1. How many bits will it take to represent the decimal number 3,875,216? How
many bytes will it take to store the number?
2. Convert to Binary Base
a) 3578 =
b) EACB16 =
c) 17210 =
3. Convert to Hexadecimal Base
a) 53110 =
b) 3138 =
c) 1101111001102 =
4. Find the solution for the problems below (apply 2’s if necessary)
a) 3110 + 1110 =
b) 4510 - 1110 =
c) 2510 - 1510 =
Computer Systems Architecture Slide 20 of 63
Data Formats - How to Interpret Data
Data Formats
• The problem:
– Representing text strings, such as
“Hello, world”, in a computer
• Each character is coded as a byte ( = 8 bits)
• Most common coding system is ASCII
• ASCII = American National Standard Code for
Information Interchange
• Defined in ANSI document X3.4-1977
Computer Systems Architecture Slide 26 of 63
ASCII Features
Data Formats
• 7-bit code
• 8th bit is unused (or used for a parity bit)
• 27 = 128 codes
• Two general types of codes:
– 95 are “Graphic” codes (displayable on a console)
– 33 are “Control” codes (control features of the
console or communications channel)
95 Graphic codes
Computer Systems Architecture Slide 30 of 63
ASCII Reference Table
Data Formats
000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
33 Control codes
Computer Systems Architecture Slide 31 of 63
ASCII Reference Table
Data Formats
000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
Alphabetic codes
Computer Systems Architecture Slide 32 of 63
“Hello, world” Example
Data Formats
Numeric codes
Computer Systems Architecture Slide 34 of 63
“4+15” Example
Data Formats
or “34162B1631163516”
Punctuation, etc.
Computer Systems Architecture Slide 36 of 63
Common Control Codes
Data Formats
• CR 0D carriage return
• LF 0A line feed
• HT 09 horizontal tab
• DEL 7F delete
• NULL 00 null
1B16 5B16
Computer Systems Architecture Slide 39 of 63
Unicode
Data Formats
• Version 2.1
– 1998
– Improves on version 2.0
– Includes the Euro sign (20AC16 = )
– From the standard:
• …contains 38,887 distinct coded characters derived from the
supported scripts. These characters cover the principal written
languages of the Americas, Europe, the Middle East, Africa, India,
Asia, and Pacifica.
• Latest version of Unicode is 6.3.0 (September 2013 )
• More details can be found at Unicode's main website
http://www.unicode.org
Computer Systems Architecture Slide 41 of 63
Text Compression
Data Formats
Q&A