0% found this document useful (0 votes)
10 views62 pages

Madhusanka Liyanage: Lecture 3: Data Representation in Computer Systems

Lecture 3 of COMP 30660 focuses on data representation in computer systems, covering numerical data, character codes, and error detection techniques. It explains the basic units of data, such as bits, bytes, and words, and delves into integer and floating-point representations, including the IEEE-754 standard. The lecture also discusses character encoding schemes like ASCII and Unicode, as well as methods for data recording and transmission, highlighting the importance of error detection and correction in data integrity.

Uploaded by

1457981717
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views62 pages

Madhusanka Liyanage: Lecture 3: Data Representation in Computer Systems

Lecture 3 of COMP 30660 focuses on data representation in computer systems, covering numerical data, character codes, and error detection techniques. It explains the basic units of data, such as bits, bytes, and words, and delves into integer and floating-point representations, including the IEEE-754 standard. The lecture also discusses character encoding schemes like ASCII and Unicode, as well as methods for data recording and transmission, highlighting the importance of error detection and correction in data integrity.

Uploaded by

1457981717
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

COMP 30660: Computer Architecture and Organization (CONV)

Lecture 3: Data Representation in


Computer Systems http://www.flickr.com/photos/sarahseverson/

Madhusanka Liyanage
School of Computer Science
University College Dublin, Ireland
[email protected]
1
Learning Objectives

• Understand the fundamentals of numerical data


representation in digital computers.
• Gain familiarity with the most popular character codes.
• Become aware of the differences between how data is
stored in computer memory and how it is transmitted
over networks.
• Understand the concepts of error detecting and
correcting codes.

2
Data and Information

• Data can be defined as a representation of facts,


concepts, or instructions in a formalized manner, which
should be suitable for communication, interpretation, or
processing by human or electronic machine.
• Information is organized or classified data, which has
some meaningful values for the receiver.
• Information is the processed data on which decisions
and actions are based.

3
Basic Unit of Data

• Use to indicate the capacity of some standard


data storage system or communication channels.
• Units derived from
– bit
– Byte
– Nibble
– Crumb
– Word

4
Bit

• A bit is the most basic unit of data in a computer.


– It is a state of “on” or “off” in a digital circuit.
– Sometimes these states are “high” or “low”
voltage instead of “on” or “off”

5
Byte

• A byte is a group of eight bits.


– A byte is the smallest
possible addressable unit
of computer storage.
– The term, “addressable,”
means that a particular
byte can be retrieved
according to its location in
memory.

6
Nibble
• A group of four bits is called a nibble (or nybble).
– Half a byte
– Bytes, therefore, consist of two nibbles: a
“high-order/Upper nibble” and a “low-
order/lower nibble”.
– Nibble is most often used in the context of
hexadecimal number representations, since a
nibble has the same amount of information as
one hexadecimal digit.

7
Crumb
• A pair of two bits or a quarter byte was called a
crumb.
– Quarter of a byte
– Often used in early 8-bit computing.

8
Word

• A word is a contiguous group of


bytes.
– Words can be any number of
bits or bytes.
– Word sizes of 16, 32, or 64
bits are most common.
– In a word-addressable
system, a word is the
smallest addressable unit of
storage.
– The number of bits in a word
is usually defined by the size
of the registers in the
computer's CPU

9
Data Representation

• The computer work with binary numbers


• Therefore, the numbers, letters, and other
symbols must be converted into their binary
equivalents.
Integers

12
Integer Representation (Recap)

• The Representation of a positive integer number


is quite straight forward
– but we are interested to represent positive as well
as negative numbers.
• Add a sign bit to representation
• For a Positive number, the sign bit set to 0 and
for negative number the sign bit is set to 1.
Integer Representation (Recap)

▪ An integer can be represented by fixed point


representation
▪ The left most bit is considered as sign bit.
▪ The magnitude of the number represent by the
rest of the bits

14
Integer Representation (Recap)

▪ The magnitude of the number can be


represented in following three ways:
1. Signed magnitude representation.
2. Signed 1’s complement representation.
3. Signed 2’s complement representation.
But how to represent the Floating-
Point numbers?

16
Floating-Point Representation

• The signed magnitude, one’s


complement, and two’s
complement representation that
we have just presented deal with
integer values only.
• Without modification, these
formats are not useful in
scientific or business applications
that deal with real number
values.
• Floating-point representation
solves this problem.

17
Floating-Point: Scientific Notation
• Scientific notation is a way of expressing numbers
that are too large or too small to be conveniently
written in decimal form.
– For example:
0.125 = 1.25  10-1
5,000,000 = 5.0  106

18
Scientific Notation
• Scientific Notation: has a single digit to the left of the decimal point.
• Numbers written in scientific notation have three components:

19
Floating-Point Representation
• Computers use a form of scientific notation for
floating-point representation
• Computer representation of a floating-point number
consists of three fixed-size fields:

• This is the standard arrangement of these fields.


20
Floating-Point Representation

• The one-bit sign field is the sign of the stored value.


• The size of the exponent field, determines the range
of values that can be represented.
• The size of the significand (mantissa) determines the
precision of the representation.

21
Example:
For illustrative purposes, we use a 14-bit model with a 5-bit
exponent and an 8-bit significand.
• Example:
– Express 3210 in the simplified 14-bit floating-
point model.
• We know that 32 is 25. So in (binary) scientific
notation 32 = 1.0 x 25
• Using this information, we put 101 (= 510) in the
exponent field and 1 in the significand as shown.

22
Example: synonymous forms
32 = 1.0 x 25 = 0.1 x 26 = 0.01 x 27 = 0.001 x 28 = 0.0001 x 29

• The illustrations shown at


the right are all equivalent
representations for 32
using our simplified model.
• Not only these
synonymous
representations waste
space, but they can also
cause confusion.

23
Floating-Point Representation: Negative
exponents

• Another problem with our system is that we have made


no allowances for negative exponents.
• E.g. no way to express 0.25 =1/4 = 1.0 x 2-2 = 0.1 x 2-1
– Notice that there is no sign in the exponent field!

24
IEEE-754 Representation
• A technical standard for floating-point arithmetic by
the Institute of Electrical and Electronics Engineers
(IEEE).
• The standard defines several interchange formats,

26
IEEE-754 Representation: How to Solve
synonymous Issue
• To resolve the problem of synonymous forms,
IEEE-754 establish a rule that the first digit of
the significand must be 1 (and integer part
should be zero).
• e.g. 32 = 1.0 x 25 = 0.1 x 26
• This results in a unique pattern for each floating-point
number.
– In the IEEE-754 standard, this 1 is implied meaning
that a 1 is assumed after the binary point.

27
IEEE-754 Representation: How to
Solve negative exponents
• To provide for negative exponents, IEEE-754 uses a
biased exponent.
• A bias is a number that is approximately midway in
the range of values expressible by the exponent.
• Exponent filed in IEEE-754 is filled by adding the
bias to the real exponent value
– So, Need to subtract the bias from the value in the
exponent field to determine its true value.
• Exponent values less than bias are negative,
representing fractional numbers.
28
IEEE-754 Representation
• The IEEE-754 single precision floating point
standard uses bias of 127 over its 8-bit exponent.

• The double precision standard has a bias of 1023


over its 11-bit exponent.

29
Example 1:
– Express 3210 in the revised 14-bit
floating-point model with a 5-bit
exponent and an 8-bit significand. Use
16 as bias.
• We know that 32 = 1.0 x 25 = 0.1 x 26.
• To use our excess 16 biased exponent, we add 16 to
6, giving 2210 (=101102).
• Graphically:

30
Example 2:Representation
– Express 0.062510 in the revised 14-bit
floating-point model with a 5-bit
exponent and an 8-bit significand. Use
16 as bias.
• We know that 0.0625 is 2-4. So, in (binary) scientific
notation 0.0625 = 1.0 x 2-4 = 0.1 x 2 -3.
• To use our excess 16 biased exponent, we add
16 to -3, giving 1310 (=011012).

31
Example 3 (To Do):Representation
– Express -26.62510 in the revised 14-bit
floating-point model with a 5-bit
exponent and an 8-bit significand. Use 16
as bias.
• We find 26.62510 = 11010.1012. Normalizing, we have:
26.62510 = 0.11010101 x 2 5.
• To use our excess 16 biased exponent, we add 16 to 5,
giving 2110 (=101012).
• We also need a 1 in the sign bit (for a negative
number).

32
What about Characters?

33
Character Codes

34
Character Codes

• Calculations are not useful until their results can


be displayed in a manner that is meaningful to
people.
• Also need to store the results of calculations and
provide a meaning for data input.
• Thus, human-understandable characters must be
converted to computer-understandable bit patterns
(and vise versa) using some sort of character
encoding scheme.
• Character Codes are used for this purpose
35
Character Codes :
Binary-coded decimal (BCD)
• The earliest computer coding systems used six bits.
• Binary-coded decimal (BCD) was one of these early
codes.
• In BCD, each digit is represented by a fixed number
of bits, usually four or eight.
• It was used by IBM mainframes in the 1950s and
1960s.
• As computers have evolved, character codes have
evolved.
• Larger computer memories and storage devices
permit richer character codes.

36
Character Codes : EBCDIC

• In 1964, BCD was extended to an 8-bit code,


Extended Binary-Coded Decimal Interchange
Code (EBCDIC).
• EBCDIC was one of the first widely-used computer
codes that supported upper and lowercase
alphabetic characters, in addition to special
characters, such as punctuation and control
characters.
• EBCDIC and BCD are still in use by IBM
mainframes today.
37
ASCII (American Standard Code for
Information Interchange)
• Other computer manufacturers chose the 7-bit
ASCII (American Standard Code for Information
Interchange) as a replacement for 6-bit codes.
• Until recently, ASCII was the dominant character
code outside the IBM mainframe world.

39
The ASCII Code

40
41
Unicode
Unicode

• Many of today’s systems embrace Unicode, a 16-bit


system that can encode the characters of every
language in the world.
• Defines 144,697 characters covering 159 modern and
historic scripts, as well as symbols, emoji, and non-
visual control and formatting codes.
• Maintained by the Unicode Consortium

43
Unicode

• The Unicode codes-


pace allocation is
shown at the right.
• The lowest-numbered
Unicode characters
comprise the ASCII
code.
• The highest provide for
user-defined codes.

44
Data Recording and Transmission

45
Codes for Data Recording and
Transmission
• When character codes or numeric values are stored in
computer memory, their values are unambiguous (Fixed).
• However, this is not always the case when data is stored
on magnetic disk or transmitted over a distance of more
than a few feet.
– Owing to the physical irregularities of data
storage and transmission media, bytes can
become distorted or garbled.
• Data errors are reduced by use of suitable coding
methods as well as through the use of various error-
detection techniques.
46
Codes for Data Recording
and Transmission
• To transmit data, pulses of “high” and “low” voltage
are sent across communications media.
• To store data, changes are induced in the magnetic
polarity of the recording medium.
• The period of time during which a bit is transmitted,
or the area of magnetic storage within which a bit is
stored is called a bit cell.

47
Non-Return-to-Zero (NRZ)

• The simplest data recording and transmission code


is the non-return-to-zero (NRZ) code.
• NRZ encodes 1 as “high” and 0 as “low.”
• The coding of OK (in ASCII) is shown below.

The problem with NRZ code is that long strings of


zeros and ones cause synchronization loss.
48
Non-return-to-zero-invert (NRZI)

• Non-Return-to-Zero-Invert (NRZI) reduces this


synchronization loss by providing a transition (either
low-to-high or high-to-low) for each binary 1 and no
transition for binary zero (0)

Although it prevents loss of synchronization over long


strings of binary ones, NRZI coding does nothing to
prevent synchronization loss within long strings of zeros
49
Manchester coding

• Manchester coding (also known as phase modulation)


prevents this problem by encoding a binary one with an
“up” transition and a binary zero with a “down” transition.

50
Error Detection and Correction

51
2.8 Error Detection and Correction

• It is physically impossible for any data recording or


transmission medium to be 100% perfect 100% of the
time over its entire expected useful life.
• As more bits are packed onto a square centimeter of
disk storage, as communications transmission speeds
increase, the likelihood of error is increasing.
• Thus, error detection and correction is critical to
accurate data transmission, storage and retrieval.

52
Types of Error

• Single bit error


– Only one bit in the
data unit has
changed.
• Burst error
– Two or more bits
in the data unit
has changed.

53
Error detection/correction

• Error detection
– Check if any error has occurred
– Don’t care the number of errors
– Don’t care the positions of errors

• Error correction
– Need to know the number of errors
– Need to know the positions of errors
– More difficult

10.54
Error Detection

• Error detecting code is to include


only enough redundancy to allow
the receiver to deduce that an error
occurred, but not which error, and
have it request a retransmission.
• Error detection uses the concept of
redundancy, which means adding
extra bits for detecting error at the
destination.
55
Redundancy

• For error detection, a


shorter group of bits may
be appended to the end
of each unit.
• This technique is called
Redundancy because the
extra bits are redundant
to the information.
• They are discarded as
soon as the accuracy of
the transmission has
been determined.

56
Error Detection Techniques

• Some popular techniques for error detection are:


– Parity check
– Checksum
– Cyclic redundancy check
– Cryptographic hash function

57
Parity check

• Check bit or parity bit will be added.


• Two methods
– Even parity checking
– Odd parity checking
• Even parity checking
– 1 is added to the block if the data
contains odd number of 1’s,
– 0 is added if the data contains even
number of 1’s
– Adding the parity bit makes the total
number of 1’s in the data even, that is
why it is called even parity checking.
• Odd parity checking
– 0 is added to the block if the data
contains odd number of 1’s,
– 1 is added if the data contains even
number of 1’s
– Adding the parity bit makes the total
number of 1’s in the data odd, that is • Can detect on Odd
why it is called odd parity checking. numbers of errors
• Only useful for detecting
errors 58
Checksum
• A small data block derived
from transmitted/stored digital
data for the purpose of
detecting errors that may have
been introduced during its
transmission or storage.
• The procedure which
generates this checksum is
called a checksum function
or checksum algorithm.
• E.g. a checksum of a message
can be a modular arithmetic
sum of message code words of
a fixed word length

59
Home work

• Find out what is


– Cyclic redundancy check
– Cryptographic hash function

60
Summery

• Understand the fundamentals of numerical data


representation in digital computers.
• Gain familiarity with the most popular character
codes.
• Become aware of the differences between how
data is stored in computer memory and how it is
transmitted over telecommunication lines.
• Understand the concepts of error detecting and
correcting codes.

61
Thank You

62

You might also like