0% found this document useful (0 votes)
2 views

Lecture3

Computer Archtecture - Bitwise Standards

Uploaded by

minulo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture3

Computer Archtecture - Bitwise Standards

Uploaded by

minulo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 158

CSC 252/452: Computer Organization

Fall 2024: Lecture 3

Instructor: Yanan Guo


Department of Computer Science
University of Rochester
Carnegie Mellon

Announcement
• Programming Assignment 1 is out
• Details:
https://www.cs.rochester.edu/courses/252/fall2024/labs/
assignment1.html
• Due on Sep 16th, 11:59 PM
• You have 3 slip days

2
Carnegie Mellon

Announcement
• Programming Assignment 1 is in C language.
• Seek help from TAs.
• TAs are best positioned to answer your questions about
programming assignments!!!
• Programming assignments do NOT repeat the lecture
materials. They ask you to synthesize what you have
learned from the lectures and work out something new.
• Pay attention to Blackboard announcements
• There are changes about the office hour locations/time…
• I have to move my office hour tomorrow to early next
week.

3
Carnegie Mellon

Last Lecture
• Why Binary (bits)?
• Bit-level manipulations
• Integers
• Representation: unsigned and signed
• Conversion, casting
• Expanding, truncating
• Addition, negation, multiplication, shifting
• Summary

4
Carnegie Mellon

Encoding Negative Numbers


• Two’s Complement

5
Carnegie Mellon

Encoding Negative Numbers


• Two’s Complement
0 1 2 3 4 5 6 7

Unsigned Binary
0 000
1 001
2 010
3 011
4 100
5 101
6 110
7 111

5
Carnegie Mellon

Encoding Negative Numbers


• Two’s Complement
-4 -3 -2 -1 0 1 2 3

Unsigned Binary
0 000
1 001
2 010
3 011
4 100
5 101
6 110
7 111

5
Carnegie Mellon

Encoding Negative Numbers


• Two’s Complement
-4 -3 -2 -1 0 1 2 3

Signed Unsigned Binary


0 0 000
1 1 001
2 2 010
3 3 011
-4 4 100
-3 5 101
-2 6 110
-1 7 111

5
Carnegie Mellon

Encoding Negative Numbers


• Two’s Complement
-4 -3 -2 -1 0 1 2 3

b2b1b0 Signed Unsigned Binary


0 0 000
1 1 001
Weights in
Unsigned
22 21 20 2 2 010
3 3 011
-4 4 100
-3 5 101
-2 6 110
-1 7 111

5
Carnegie Mellon

Encoding Negative Numbers


• Two’s Complement
-4 -3 -2 -1 0 1 2 3

b2b1b0 Signed Unsigned Binary


0 0 000
1 1 001
Weights in
Unsigned
22 21 20 2 2 010
3 3 011
-4 4 100
Weights in
-22 21 20 -3 5 101
Signed -2 6 110
-1 7 111

5
Carnegie Mellon

Encoding Negative Numbers


• Two’s Complement
-4 -3 -2 -1 0 1 2 3

b2b1b0 Signed Unsigned Binary


0 0 000
1 1 001
Weights in
Unsigned
22 21 20 2 2 010
3 3 011
-4 4 100
Weights in
-22 21 20 -3 5 101
Signed -2 6 110
-1 7 111
1012 = 1*20 + 0*21 + (-1*22) = -310

5
Carnegie Mellon

Encoding Negative Numbers


• Two’s Complement
-4 -3 -2 -1 0 1 2 3

b2b1b0 Signed Unsigned Binary


0 0 000
1 1 001
Weights in
Unsigned
22 21 20 2 2 010
3 3 011
-4 4 100
Weights in
-22 21 20 -3 5 101
Signed -2 6 110
-1 7 111
1012 = 1*20 + 0*21 + (-1*22) = -310

5
Carnegie Mellon

Encoding Negative Numbers


• Two’s Complement
-4 -3 -2 -1 0 1 2 3

b2b1b0 Signed Unsigned Binary


0 0 000
1 1 001
Weights in
Unsigned
22 21 20 2 2 010
3 3 011
-4 4 100
Weights in
-22 21 20 -3 5 101
Signed -2 6 110
-1 7 111
1012 = 1*20 + 0*21 + (-1*22) = -310

5
Carnegie Mellon

Encoding Negative Numbers


• Two’s Complement
-4 -3 -2 -1 0 1 2 3

b2b1b0 Signed Unsigned Binary


0 0 000
1 1 001
Weights in
Unsigned
22 21 20 2 2 010
3 3 011
-4 4 100
Weights in
-22 21 20 -3 5 101
Signed -2 6 110
-1 7 111
1012 = 1*20 + 0*21 + (-1*22) = -310

5
Carnegie Mellon

Two-Complement Implications
• Only 1 zero Signed Binary

• There is (still) a bit that represents sign! 0


1
000
001
• Unsigned arithmetic still works 2 010
3 011
-4 100
-3 101
-2 110
-1 111

6
Carnegie Mellon

Two-Complement Implications
• Only 1 zero Signed Binary

• There is (still) a bit that represents sign! 0


1
000
001
• Unsigned arithmetic still works 2 010
3 011
-4 100
010 -3 101
+) 101 -2 110
111 -1 111

6
Carnegie Mellon

Two-Complement Implications
• Only 1 zero Signed Binary

• There is (still) a bit that represents sign! 0


1
000
001
• Unsigned arithmetic still works 2 010
3 011
-4 100
010 2 -3 101
+) 101 +) -3 -2 110
111 -1 -1 111

6
Carnegie Mellon

Two-Complement Implications
• Only 1 zero Signed Binary

• There is (still) a bit that represents sign! 0


1
000
001
• Unsigned arithmetic still works 2 010
3 011
-4 100
010 2 -3 101
+) 101 +) -3 -2 110
111 -1 -1 111

• 3 + 1 becomes -4 (called overflow. More on it later.)

6
Carnegie Mellon

Data Types (in C)


• Suppose you want to define a variable that represents a
person’s age. What assumptions can you make about this
variable’s numerical value?

7
Carnegie Mellon

Data Types (in C)


• Suppose you want to define a variable that represents a
person’s age. What assumptions can you make about this
variable’s numerical value?
• Integer
• Non-negative
• Between 0 and 255 (8 bits)

7
Carnegie Mellon

Data Types (in C)


• Suppose you want to define a variable that represents a
person’s age. What assumptions can you make about this
variable’s numerical value?
• Integer
• Non-negative
• Between 0 and 255 (8 bits)
• Define a data type that captures all these attributes:
unsigned char in C
• Internally, an unsigned char variable is represented as a 8-bit,
non-negative, binary number

7
Carnegie Mellon

Data Types (in C)


• What if you want to define a variable that could take
negative values?

8
Carnegie Mellon

Data Types (in C)


• What if you want to define a variable that could take
negative values?
• That’s what signed data types (e.g., int, short, etc.) are for

8
Carnegie Mellon

Data Types (in C)


• What if you want to define a variable that could take
negative values?
• That’s what signed data types (e.g., int, short, etc.) are for
• How are int values internally represented?
• Theoretically could be either signed-magnitude or two’s complement
• The C language designers chose two’s complement

8
Carnegie Mellon

Data Types (in C)

C Data Type 32-bit 64-bit

(unsigned) char 1 1
(unsigned) short 2 2
(unsigned) int 4 4
(unsigned) long 4 8

9
Carnegie Mellon

Data Types (in C)

• C Language
C Data Type 32-bit 64-bit •#include <limits.h>
(unsigned) char 1 1 •Declares constants, e.g.,
•ULONG_MAX
(unsigned) short 2 2
•LONG_MAX
(unsigned) int 4 4 •LONG_MIN
(unsigned) long 4 8 •Values platform specific

9
Carnegie Mellon

Mapping Between Signed & Unsigned


• Mappings between unsigned and two’s complement
numbers: Keep bit representations and reinterpret

Signed Unsigned Binary


0 0 000
1 1 001
2 2 010
3 3 011
-4 4 100
-3 5 101
-2 6 110
-1 7 111

10
Carnegie Mellon

Mapping Signed « Unsigned


Bits Signed Unsigned
0000 0 0
0001 1 1
0010 2 2
0011 3 = 3
0100 4 4
0101 5 5
T2U
0110 6 6
0111 7 7
1000 -8 U2T 8
1001 -7 9
1010 -6 +/- 16 10
1011 -5 11
1100 -4 12
1101 -3 13
1110 -2 14
1111 -1 15
11
Carnegie Mellon

Today: Representing Information in Binary


• Why Binary (bits)?
• Bit-level manipulations
• Integers
• Representation: unsigned and signed
• Conversion, casting
• Expanding, truncating
• Addition, negation, multiplication, shifting
• Summary

12
Carnegie Mellon

The Problem C Data Type 64-bit

short int x = 15213; char 1


int ix = (int) x;
short int y = -15213;
short 2
int iy = (int) y; int 4
long 8

13
Carnegie Mellon

The Problem C Data Type 64-bit

short int x = 15213; char 1


int ix = (int) x;
short int y = -15213;
short 2
int iy = (int) y; int 4
long 8
• Converting from smaller to larger integer data type
• Should we preserve the value?
• Can we preserve the value?
• How?

13
Carnegie Mellon

The Problem C Data Type 64-bit

short int x = 15213; char 1


int ix = (int) x;
short int y = -15213;
short 2
int iy = (int) y; int 4
long 8
• Converting from smaller to larger integer data type
• Should we preserve the value?
• Can we preserve the value?
• How?
Decimal Hex Binary
x 15213 3B 6D 00111011 01101101
ix 15213 00 00 3B 6D 00000000 00000000 00111011 01101101
y -15213 C4 93 11000100 10010011
iy -15213 FF FF C4 93 11111111 11111111 11000100 10010011

13
Carnegie Mellon

Signed Extension
• Task:
• Given w-bit signed integer x
• Convert it to (w+k)-bit integer with same value

14
Carnegie Mellon

Signed Extension
• Task:
• Given w-bit signed integer x
• Convert it to (w+k)-bit integer with same value
• Rule:
• Make k copies of sign bit:
• X ′ = xw–1 ,…, xw–1 , xw–1 , xw–2 ,…, x0

k copies of MSB

14
Carnegie Mellon

Signed Extension
• Task:
• Given w-bit signed integer x
• Convert it to (w+k)-bit integer with same value
• Rule:
• Make k copies of sign bit:
• X ′ = xw–1 ,…, xw–1 , xw–1 , xw–2 ,…, x0

k copies of MSB w
X •••

•••

X¢ ••• •••
k w 14
Carnegie Mellon

Another Problem
unsigned short x = 47981;
unsigned int ux = x;

Decimal Hex Binary


x 47981 BB 6D 10111011 01101101
ux 47981 00 00 BB 6D 00000000 00000000 10111011 01101101

15
Carnegie Mellon

Unsigned (Zero) Extension


• Task:
• Given w-bit unsigned integer x
• Convert it to (w+k)-bit integer with same value
• Rule:
• Simply pad zeros:
• X ′ = 0 ,…, 0 , xw–1 , xw–2 ,…, x0

k copies of 0 w
X •••

•••

X¢ 0 0
••• 0 0
•••
k w 16
Carnegie Mellon

Yet Another Problem


int x = 53191;
short sx = (short) x;

Decimal Hex Binary


x 53191 00 00 CF C7 00000000 00000000 11001111 11000111
sx -12345 CF C7 11001111 11000111

17
Carnegie Mellon

Yet Another Problem


int x = 53191;
short sx = (short) x;

Decimal Hex Binary


x 53191 00 00 CF C7 00000000 00000000 11001111 11000111
sx -12345 CF C7 11001111 11000111

• Truncating (e.g., int to short OR unsigned int to unsigned short)


• C’s implementation: leading bits are truncated, results reinterpreted
• So can’t always preserve the numerical value

17
Carnegie Mellon

Today: Representing Information in Binary


• Why Binary (bits)?
• Bit-level manipulations
• Integers
• Representation: unsigned and signed
• Conversion, casting
• Expanding, truncating
• Addition, negation, multiplication, shifting
• Summary
• Representations in memory, pointers, strings

18
Carnegie Mellon

Unsigned Addition
Unsigned Binary
0 000
1 001
2 010
3 011
4 100
5 101
6 110
7 111

19
Carnegie Mellon

Unsigned Addition
• Similar to Decimal Addition Unsigned Binary
0 000
1 001
2 010
3 011
4 100
5 101
6 110
7 111

19
Carnegie Mellon

Unsigned Addition
• Similar to Decimal Addition Unsigned Binary

• Suppose we have a new data type that is 0


1
000
001
3-bit wide (c.f., short has 16 bits) 2 010
3 011
4 100
5 101
6 110
010 2
Normal 7 111
+) 101 +) 5
Case 111 7

19
Carnegie Mellon

Unsigned Addition
• Similar to Decimal Addition Unsigned Binary

• Suppose we have a new data type that is 0


1
000
001
3-bit wide (c.f., short has 16 bits) 2 010
• Might overflow: result can’t be 3 011
represented within the size of the data type 4 100
5 101
6 110
010 2
Normal 7 111
+) 101 +) 5
Case 111 7

110 6
Overflow +) 101 +) 5
Case 1011 11

19
Carnegie Mellon

Unsigned Addition
• Similar to Decimal Addition Unsigned Binary

• Suppose we have a new data type that is 0


1
000
001
3-bit wide (c.f., short has 16 bits) 2 010
• Might overflow: result can’t be 3 011
represented within the size of the data type 4 100
5 101
6 110
010 2
Normal 7 111
+) 101 +) 5
Case 111 7

110 6
Overflow +) 101 +) 5
Case 1011 11 True Sum

19
Carnegie Mellon

Unsigned Addition
• Similar to Decimal Addition Unsigned Binary

• Suppose we have a new data type that is 0


1
000
001
3-bit wide (c.f., short has 16 bits) 2 010
• Might overflow: result can’t be 3 011
represented within the size of the data type 4 100
5 101
6 110
010 2
Normal 7 111
+) 101 +) 5
Case 111 7

110 6
Overflow +) 101 +) 5
Case 1011 11 True Sum
011 3 Sum with same bits
19
Carnegie Mellon

Unsigned Addition in C
Operands: w bits u •••
+v •••
True Sum: w+1 bits
u+v •••
Discard Carry: w bits UAddw(u , v) •••

20
Carnegie Mellon

Two’s Complement Addition


Signed Binary
0 000
1 001
2 010
3 011
-4 100
-3 101
-2 110
-1 111

21
Carnegie Mellon

Two’s Complement Addition


• Has identical bit-level behavior as Signed
0
Binary
000
unsigned addition (a big advantage 1 001
over sign-magnitude) 2 010
3 011
-4 100
-3 101
-2 110
-1 111

21
Carnegie Mellon

Two’s Complement Addition


• Has identical bit-level behavior as Signed
0
Binary
000
unsigned addition (a big advantage 1 001
over sign-magnitude) 2 010
3 011
-4 100
010 2 -3 101
Normal +) 101 +) -3 -2 110
Case 111 -1
-1 111

21
Carnegie Mellon

Two’s Complement Addition


• Has identical bit-level behavior as Signed
0
Binary
000
unsigned addition (a big advantage 1 001
over sign-magnitude) 2 010
• Overflow can also occur 3 011
-4 100
010 2 -3 101
Normal +) 101 +) -3 -2 110
Case 111 -1
-1 111

110 -2
Overflow +) 101 +) -3
Case 1011 -5

21
Carnegie Mellon

Two’s Complement Addition


• Has identical bit-level behavior as Signed
0
Binary
000
unsigned addition (a big advantage 1 001
over sign-magnitude) 2 010
• Overflow can also occur 3 011
-4 100
010 2 -3 101
Normal +) 101 +) -3 -2 110
Case 111 -1
-1 111

110 -2
Overflow +) 101 +) -3
Case 1011 -5
011 3

21
Carnegie Mellon

Two’s Complement Addition


• Has identical bit-level behavior as Signed
0
Binary
000
unsigned addition (a big advantage 1 001
over sign-magnitude) 2 010
• Overflow can also occur 3 011
Min -4 100
010 2 -3 101
Normal +) 101 +) -3 -2 110
Case 111 -1
-1 111

110 -2
Overflow +) 101 +) -3
Case 1011 -5
011 3

Negative Overflow
21
Carnegie Mellon

Two’s Complement Addition


• Has identical bit-level behavior as Signed
0
Binary
000
unsigned addition (a big advantage 1 001
over sign-magnitude) 2 010
• Overflow can also occur 3 011
Min -4 100
010 2 -3 101
Normal +) 101 +) -3 -2 110
Case 111 -1
-1 111

110 -2 011 3
Overflow +) 101 +) -3 +) 001 +) 1
Case 1011 -5 0100 4
011 3

Negative Overflow
21
Carnegie Mellon

Two’s Complement Addition


• Has identical bit-level behavior as Signed
0
Binary
000
unsigned addition (a big advantage 1 001
over sign-magnitude) 2 010
• Overflow can also occur 3 011
Min -4 100
010 2 -3 101
Normal +) 101 +) -3 -2 110
Case 111 -1
-1 111

110 -2 011 3
Overflow +) 101 +) -3 +) 001 +) 1
Case 1011 -5 0100 4
011 3 100 -4

Negative Overflow
21
Carnegie Mellon

Two’s Complement Addition


• Has identical bit-level behavior as Signed
0
Binary
000
unsigned addition (a big advantage 1 001
over sign-magnitude) 2 010
• Overflow can also occur Max 3 011
Min -4 100
010 2 -3 101
Normal +) 101 +) -3 -2 110
Case 111 -1
-1 111

110 -2 011 3
Overflow +) 101 +) -3 +) 001 +) 1
Case 1011 -5 0100 4
011 3 100 -4

Negative Overflow Positive Overflow


21
Carnegie Mellon

Two’s Complement Addition in C


Operands: w bits u •••
+ v •••
True Sum: w+1 bits
u+v •••
Discard Carry: w bits TAddw(u , v) •••

22
Carnegie Mellon

Is This Signed Addition an Overflow?


Signed Binary
0 000
1 001
111 2 010
+) 110 3 011
-4 100
1101 -3 101
-2 110
-1 111

23
Carnegie Mellon

Is This Signed Addition an Overflow?


Signed Binary
0 000
1 001
111 2 010
+) 110 3 011
-4 100
1101 -3 101
-2 110
-1 111

23
Carnegie Mellon

Is This Signed Addition an Overflow?


Signed Binary
0 000
1 001
111 2 010
+) 110 3 011
-4 100
1101 -3 101
Truncate -2 110
-1 111

23
Carnegie Mellon

Is This Signed Addition an Overflow?


Signed Binary
0 000
1 001
111 -1 2 010
+) 110 +) -2 3 011
-4 100
1101 -3 -3 101
Truncate -2 110
-1 111

23
Carnegie Mellon

Is This Signed Addition an Overflow?


Signed Binary
0 000
1 001
111 -1 2 010
+) 110 +) -2 3 011
-4 100
1101 -3 -3 101
Truncate -2 110
-1 111

• This is not an overflow by definition

23
Carnegie Mellon

Is This Signed Addition an Overflow?


Signed Binary
0 000
1 001
111 -1 2 010
+) 110 +) -2 3 011
-4 100
1101 -3 -3 101
Truncate -2 110
-1 111

• This is not an overflow by definition


• Because the actual result can be represented using
the bit width of the datatype (3 bits here)

23
Carnegie Mellon

Multiplication

24
Carnegie Mellon

Multiplication
• Goal: Computing Product of w-bit numbers x, y

24
Carnegie Mellon

Multiplication
• Goal: Computing Product of w-bit numbers x, y

Original Number (w bits)

OMax 2w –1–1

OMin –2w –1

24
Carnegie Mellon

Multiplication
• Goal: Computing Product of w-bit numbers x, y

Original Number (w bits) Product

OMax 2w –1–1

0
0

OMin –2w –1

24
Carnegie Mellon

Multiplication
• Goal: Computing Product of w-bit numbers x, y

Original Number (w bits) Product


PMax
OMax 2w –1–1

0
0

OMin –2w –1

24
Carnegie Mellon

Multiplication
• Goal: Computing Product of w-bit numbers x, y

Original Number (w bits) Product


PMax 22w-2 OMin2
OMax 2w –1–1

0
0

OMin –2w –1

24
Carnegie Mellon

Multiplication
• Goal: Computing Product of w-bit numbers x, y

Original Number (w bits) Product


PMax 22w-2 OMin2
OMax 2w –1–1

0
0

OMin –2w –1

PMin
24
Carnegie Mellon

Multiplication
• Goal: Computing Product of w-bit numbers x, y

Original Number (w bits) Product


PMax 22w-2 OMin2
OMax 2w –1–1

0
0

OMin –2w –1

PMin –22w–2 + 2w–1 OMin * OMax


24
Carnegie Mellon

Multiplication
• Goal: Computing Product of w-bit numbers x, y

Original Number (w bits) Product (2w bits)


PMax 22w-2 OMin2
OMax 2w –1–1

0
0

OMin –2w –1

PMin –22w–2 + 2w–1 OMin * OMax


24
Carnegie Mellon

Multiplication
• Goal: Computing Product of w-bit numbers x, y
• Exact results can be bigger than w bits
• Up to 2w bits (both signed and unsigned)
Original Number (w bits) Product (2w bits)
PMax 22w-2 OMin2
OMax 2w –1–1

0
0

OMin –2w –1

PMin –22w–2 + 2w–1 OMin * OMax


24
Carnegie Mellon

Unsigned Multiplication in C
u •••
Operands: w bits
* v •••
True Product: 2*w bits u·v ••• •••
Discard w bits: w bits
•••

• Standard Multiplication Function


• Ignores high order w bits
•Effectively Implements the following:
UMultw(u , v) = u · v mod 2w

25
Carnegie Mellon

Unsigned Multiplication in C
u •••
Operands: w bits
* v •••
True Product: 2*w bits u·v ••• •••
Discard w bits: w bits
•••

• Standard Multiplication Function


• Ignores high order w bits
•Effectively Implements the following:
UMultw(u , v) = u · v mod 2w
1110 1001 E9 223
* 1101 0101 * D5 * 213
**** **** 1101 1101 C1DD 47499
1101 1101 DD 221
25
Carnegie Mellon

Signed Multiplication in C
u •••
Operands: w bits
* v •••
True Product: 2*w bits u·v ••• •••
Discard w bits: w bits
•••

• Standard Multiplication Function


• Ignores high order w bits
• Some of which are different for signed vs. unsigned multiplication
• Lower bits are the same

26
Carnegie Mellon

Signed Multiplication in C
u •••
Operands: w bits
* v •••
True Product: 2*w bits u·v ••• •••
Discard w bits: w bits
•••

• Standard Multiplication Function


• Ignores high order w bits
• Some of which are different for signed vs. unsigned multiplication
• Lower bits are the same
1110 1001 E9 -23
* 1101 0101 * D5 * -43
**** **** 1101 1101 03DD 989
1101 1101 DD -35
26
Carnegie Mellon

Power-of-2 Multiply with Shift


• Operation
• u << k gives u * 2k
k
• Both signed and unsigned •••
u
Operands: w bits
* 2k 0 ••• 0 1 0 ••• 0 0
Carnegie Mellon

Power-of-2 Multiply with Shift


• Operation
• u << k gives u * 2k
k
• Both signed and unsigned •••
u
Operands: w bits
* 2k 0 ••• 0 1 0 ••• 0 0

True Product: w+k bits u · 2k ••• 0 ••• 0 0


Carnegie Mellon

Power-of-2 Multiply with Shift


• Operation
• u << k gives u * 2k
k
• Both signed and unsigned •••
u
Operands: w bits
* 2k 0 ••• 0 1 0 ••• 0 0

True Product: w+k bits u · 2k ••• 0 ••• 0 0

Discard k bits: w bits ••• 0 ••• 0 0


Carnegie Mellon

Power-of-2 Multiply with Shift


• Operation
• u << k gives u * 2k
k
• Both signed and unsigned •••
u
Operands: w bits
* 2k 0 ••• 0 1 0 ••• 0 0

True Product: w+k bits u · 2k ••• 0 ••• 0 0

Discard k bits: w bits ••• 0 ••• 0 0

• Examples
• u << 3 == u * 8
• (u << 5) – (u << 3) == u * 24
• Most machines shift and add faster than multiply
• Compiler generates this code automatically
Carnegie Mellon

Today: Representing Information in Binary


• Why Binary (bits)?
• Bit-level manipulations
• Integers
• Representation: unsigned and signed
• Conversion, casting
• Expanding, truncating
• Addition, negation, multiplication, shifting
• Summary

28
Carnegie Mellon

Arithmetic: Basic Rules


• Addition:
• Unsigned/signed: Normal addition followed by truncate,
same operation on bit level

• Multiplication:
• Unsigned/signed: Normal multiplication followed by truncate,
same operation on bit level
• Shift: Power-of-2 Multiply
Carnegie Mellon

Why Should I Use Unsigned?


• Don’t use without understanding implications
• Easy to make mistakes
unsigned int i;
for (i = cnt-2; i >= 0; i--)
a[i] += a[i+1];

• Can be very subtle


#define DELTA sizeof(int)
int i;
for (i = CNT; i-DELTA >= 0; i-= DELTA)
. . .
Carnegie Mellon

Why Should I Use Unsigned? – Bit Set


• Use bits to represent my availability of the week
b6 b5 b4 b3 b2 b1 b0
Sun Mon Tue Wed Thu Fri Sat
1 0 1 1 0 0 1

• Use 1 bit per day, 7 bits in total.


• If bit x is set to 1 then I’m available on day mapped to bit x.
Carnegie Mellon

Why Should I Use Unsigned? – Bit Set


• Use bits to represent my availability of the week
b6 b5 b4 b3 b2 b1 b0
Sun Mon Tue Wed Thu Fri Sat
1 0 1 1 0 0 1

• Use 1 bit per day, 7 bits in total.


• If bit x is set to 1 then I’m available on day mapped to bit x.

• In C: unsigned int aval;


Carnegie Mellon

Why Should I Use Unsigned? – Bit Set


• Use bits to represent my availability of the week
b6 b5 b4 b3 b2 b1 b0
Sun Mon Tue Wed Thu Fri Sat
1 0 1 1 0 0 1

• Use 1 bit per day, 7 bits in total.


• If bit x is set to 1 then I’m available on day mapped to bit x.

• In C: unsigned int aval;

aval = 1*20 + 0*21 + 0*22 + 1*23 + 1*24 + 0*25 + 1*26 = 8910


Carnegie
CarnegieMellon
Mello

Today: Floating Point


• Background: Fractional binary numbers and fixed-point
• Floating point representation
• IEEE 754 standard
• Rounding, addition, multiplication
• Floating point in C
• Summary

33
Carnegie Mellon

Can We Represent Fractions in Binary?


• What does 10.012 mean?
• C.f., Decimal

34
Carnegie Mellon

Can We Represent Fractions in Binary?


• What does 10.012 mean?
• C.f., Decimal

12.45 = 1*101 + 2*100 + 4*10-1 + 5*10-2

34
Carnegie Mellon

Can We Represent Fractions in Binary?


• What does 10.012 mean?
• C.f., Decimal

12.45 = 1*101 + 2*100 + 4*10-1 + 5*10-2

34
Carnegie Mellon

Can We Represent Fractions in Binary?


• What does 10.012 mean?
• C.f., Decimal

12.45 = 1*101 + 2*100 + 4*10-1 + 5*10-2

34
Carnegie Mellon

Can We Represent Fractions in Binary?


• What does 10.012 mean?
• C.f., Decimal

12.45 = 1*101 + 2*100 + 4*10-1 + 5*10-2

34
Carnegie Mellon

Can We Represent Fractions in Binary?


• What does 10.012 mean?
• C.f., Decimal

12.45 = 1*101 + 2*100 + 4*10-1 + 5*10-2

34
Carnegie Mellon

Can We Represent Fractions in Binary?


• What does 10.012 mean?
• C.f., Decimal

12.45 = 1*101 + 2*100 + 4*10-1 + 5*10-2

10.012 = 1*2 1 + 0*2 0 + 0*2 -1 + 1*2 -2

34
Carnegie Mellon

Can We Represent Fractions in Binary?


• What does 10.012 mean?
• C.f., Decimal

12.45 = 1*101 + 2*100 + 4*10-1 + 5*10-2

10.012 = 1*2 1 + 0*2 0 + 0*2 -1 + 1*2 -2

34
Carnegie Mellon

Can We Represent Fractions in Binary?


• What does 10.012 mean?
• C.f., Decimal

12.45 = 1*101 + 2*100 + 4*10-1 + 5*10-2

10.012 = 1*2 1 + 0*2 0 + 0*2 -1 + 1*2 -2

34
Carnegie Mellon

Can We Represent Fractions in Binary?


• What does 10.012 mean?
• C.f., Decimal

12.45 = 1*101 + 2*100 + 4*10-1 + 5*10-2

10.012 = 1*2 1 + 0*2 0 + 0*2 -1 + 1*2 -2

34
Carnegie Mellon

Can We Represent Fractions in Binary?


• What does 10.012 mean?
• C.f., Decimal

12.45 = 1*101 + 2*100 + 4*10-1 + 5*10-2

10.012 = 1*2 1 + 0*2 0 + 0*2 -1 + 1*2 -2

34
Carnegie Mellon

Can We Represent Fractions in Binary?


• What does 10.012 mean?
• C.f., Decimal

12.45 = 1*101 + 2*100 + 4*10-1 + 5*10-2

10.012 = 1*2 1 + 0*2 0 + 0*2 -1 + 1*2 -2

= 2.2510

34
Carnegie Mellon

Fractional Binary Numbers

2i
2i-1

4
••• 2
1

bi bi-1 ••• b2 b1 b0 b-1 b-2 b-3 ••• b-j


1/2
1/4 •••
1/8

2-j

35
Carnegie Mellon

Can We Represent Fractions in Binary?


• What does 10.012 mean?
• C.f., Decimal
• 12.45 = 1*101 + 2*100 + 4*10-1 + 5*10-2
• 10.012 = 1*21 + 0*20 + 0*2-1 + 1*2-2 = 2.2510

36
Carnegie Mellon

Can We Represent Fractions in Binary?


Decimal Binary
• What does 10.012 mean? 0 0000
• C.f., Decimal 1 0001
2 0010
12.45 = 1*101 + 2*100 + 4*10-1 + 5*10-2

3 0011
• 10.012 = 1*21 + 0*20 + 0*2-1 + 1*2-2 = 2.2510 4 0100
5 0101
6 0110
7 0111
8 1000
0 1 2 3 4 5 6 7 …. 15 9 1001
10 1010
11 1011
12 1100
13 1101
14 1110
15 1111
36
Carnegie Mellon

Can We Represent Fractions in Binary?


Decimal Binary
• What does 10.012 mean? 0 00.00
• C.f., Decimal 0.25 00.01
0.5 00.10
12.45 = 1*101 + 2*100 + 4*10-1 + 5*10-2

0.75 00.11
• 10.012 = 1*21 + 0*20 + 0*2-1 + 1*2-2 = 2.2510 1 01.00
1.25 01.01
1.5 01.10
1.75 01.11
2 10.00
0 1 2 3 2.25 10.01
2.5 10.10
2.75 10.11
3 11.00
3.25 11.01
3.5 11.10
3.75 11.11
37
Carnegie Mellon

Can We Represent Fractions in Binary?


Decimal Binary
• What does 10.012 mean? 0 00.00
• C.f., Decimal 0.25 00.01
0.5 00.10
12.45 = 1*101 + 2*100 + 4*10-1 + 5*10-2

0.75 00.11
• 10.012 = 1*21 + 0*20 + 0*2-1 + 1*2-2 = 2.2510 1 01.00
1.25 01.01
1.5 01.10
1.75 01.11
2 10.00
0 1 2 3 2.25 10.01
2.5 10.10
2.75 10.11
3 11.00
01.10 1.50
3.25 11.01
+ 01.01 + 1.25
3.5 11.10
10.11 2.75 3.75 11.11
37
Carnegie Mellon

Can We Represent Fractions in Binary?


Decimal Binary
• What does 10.012 mean? 0 00.00
• C.f., Decimal 0.25 00.01
0.5 00.10
12.45 = 1*101 + 2*100 + 4*10-1 + 5*10-2

0.75 00.11
• 10.012 = 1*21 + 0*20 + 0*2-1 + 1*2-2 = 2.2510 1 01.00
1.25 01.01
1.5 01.10
1.75 01.11
2 10.00
0 1 2 3 2.25 10.01
2.5 10.10
Integer Arithmetic Still Works!
2.75 10.11
3 11.00
01.10 1.50
3.25 11.01
+ 01.01 + 1.25
3.5 11.10
10.11 2.75 3.75 11.11
37
Carnegie Mellon

Fixed-Point Representation
Decimal Binary
• Binary point stays fixed 0 00.00
• Fixed interval between two representable 0.25 00.01
numbers as long as the binary point stays fixed 0.5 00.10
0.75 00.11
• The interval in this example is 0.2510
1 01.00
• Fixed-point representation of numbers 1.25 01.01
• Integer is one special case of fixed-point 1.5 01.10
1.75 01.11
2 10.00
0 1 2 3 2.25 10.01
2.5 10.10
2.75 10.11
3 11.00
3.25 11.01
3.5 11.10
3.75 11.11
38
Carnegie Mellon

Fixed-Point Representation
Decimal Binary
• Binary point stays fixed 0 00.00
0000.
• Fixed interval between two representable 0.25
1 00.01
0001.
numbers as long as the binary point stays fixed 0.5
2 00.10
0010.
0.75
3 00.11
0011.
• The interval in this example is 0.2510
1
4 01.00
0100.
• Fixed-point representation of numbers 1.25
5 01.01
0101.
• Integer is one special case of fixed-point 1.5
6 01.10
0110.
1.75
7 01.11
0111.
2
8 10.00
1000.
0 1 2 3 4 5 6 7 …. 15 2.25
9 10.01
1001.
2.5
10 10.10
1010.
2.75
11 10.11
1011.
3
12 11.00
1100.
3.25
13 11.01
1101.
3.5
14 11.10
1110.
3.75
15 11.11
1111.
38
Carnegie
CarnegieMellon
Mello

Limitations of Fixed-Point (#1)

39
Carnegie
CarnegieMellon
Mello

Limitations of Fixed-Point (#1)


• Can exactly represent numbers only of the form x/2k

39
Carnegie
CarnegieMellon
Mello

Limitations of Fixed-Point (#1)


• Can exactly represent numbers only of the form x/2k

b3b2.b1b0
0 1/4 1/2 3/4 5/4 3/2 7/4 2 …. 15/4

39
Carnegie
CarnegieMellon
Mello

Limitations of Fixed-Point (#1)


• Can exactly represent numbers only of the form x/2k
• Other rational numbers have repeating bit representations

b3b2.b1b0
0 1/4 1/2 3/4 5/4 3/2 7/4 2 …. 15/4

39
Carnegie
CarnegieMellon
Mello

Limitations of Fixed-Point (#1)


• Can exactly represent numbers only of the form x/2k
• Other rational numbers have repeating bit representations

Decimal Value Binary Representation


1/3 0.0101010101[01]…
1/5 0.001100110011[0011]…
1/10 0.0001100110011[0011]…

b3b2.b1b0
0 1/4 1/2 3/4 5/4 3/2 7/4 2 …. 15/4

39
Carnegie Mellon

Limitations of Fixed-Point (#2)

40
Carnegie Mellon

Limitations of Fixed-Point (#2)


• Can’t represent very small and very large numbers at
the same time

40
Carnegie Mellon

Limitations of Fixed-Point (#2)


• Can’t represent very small and very large numbers at
the same time
• To represent very large numbers, the (fixed) interval needs to be
large, making it hard to represent small numbers

40
Carnegie Mellon

Limitations of Fixed-Point (#2)


• Can’t represent very small and very large numbers at
the same time
• To represent very large numbers, the (fixed) interval needs to be
large, making it hard to represent small numbers

+∞
0 ….

40
Carnegie Mellon

Limitations of Fixed-Point (#2)


• Can’t represent very small and very large numbers at
the same time
• To represent very large numbers, the (fixed) interval needs to be
large, making it hard to represent small numbers

+∞
0 ….
A Large
Number
40
Carnegie Mellon

Limitations of Fixed-Point (#2)


• Can’t represent very small and very large numbers at
the same time
• To represent very large numbers, the (fixed) interval needs to be
large, making it hard to represent small numbers

Unrepresentable
small numbers
+∞
0 ….
A Large
Number
40
Carnegie Mellon

Limitations of Fixed-Point (#2)


• Can’t represent very small and very large numbers at
the same time
• To represent very large numbers, the (fixed) interval needs to be
large, making it hard to represent small numbers
• To represent very small numbers, the (fixed) interval needs to
be small, making it hard to represent large numbers

+∞
0

40
Carnegie Mellon

Limitations of Fixed-Point (#2)


• Can’t represent very small and very large numbers at
the same time
• To represent very large numbers, the (fixed) interval needs to be
large, making it hard to represent small numbers
• To represent very small numbers, the (fixed) interval needs to
be small, making it hard to represent large numbers

+∞
0
A Small
Number
40
Carnegie Mellon

Limitations of Fixed-Point (#2)


• Can’t represent very small and very large numbers at
the same time
• To represent very large numbers, the (fixed) interval needs to be
large, making it hard to represent small numbers
• To represent very small numbers, the (fixed) interval needs to
be small, making it hard to represent large numbers

Unrepresentable
large numbers
+∞
0
A Small
Number
40
Carnegie
CarnegieMellon
Mello

Today: Floating Point


• Background: Fractional binary numbers and fixed-point
• Floating point representation
• IEEE 754 standard
• Rounding, addition, multiplication
• Floating point in C
• Summary

41
Carnegie Mellon

Primer: (Normalized) Scientific Notation


• In decimal: M × 10E
• E is an integer
• Normalized form: 1<= |M| < 10

42
Carnegie Mellon

Primer: (Normalized) Scientific Notation


• In decimal: M × 10E
• E is an integer
• Normalized form: 1<= |M| < 10

Decimal Value Scientific Notation


2 2×100
-4,321.768 -4.321768×103
0.000 000 007 51 7.51×10−9

42
Carnegie Mellon

Primer: (Normalized) Scientific Notation


• In decimal: M × 10E
• E is an integer
• Normalized form: 1<= |M| < 10

M × 10E

Decimal Value Scientific Notation


2 2×100
-4,321.768 -4.321768×103
0.000 000 007 51 7.51×10−9

42
Carnegie Mellon

Primer: (Normalized) Scientific Notation


• In decimal: M × 10E
• E is an integer
• Normalized form: 1<= |M| < 10

M × 10E

Significand
Decimal Value Scientific Notation
2 2×100
-4,321.768 -4.321768×103
0.000 000 007 51 7.51×10−9

42
Carnegie Mellon

Primer: (Normalized) Scientific Notation


• In decimal: M × 10E
• E is an integer
• Normalized form: 1<= |M| < 10

M × 10E

Significand Base
Decimal Value Scientific Notation
2 2×100
-4,321.768 -4.321768×103
0.000 000 007 51 7.51×10−9

42
Carnegie Mellon

Primer: (Normalized) Scientific Notation


• In decimal: M × 10E
• E is an integer
• Normalized form: 1<= |M| < 10

M × 10E Exponent

Significand Base
Decimal Value Scientific Notation
2 2×100
-4,321.768 -4.321768×103
0.000 000 007 51 7.51×10−9

42
Carnegie Mellon

Primer: (Normalized) Scientific Notation


• In binary: (–1)s M 2E Sign Exponent
• Normalized form:
• 1<= M < 2
• M = 1.b0b1b2b3… (-1)s M × 2E
Fraction
Significand Base

Binary Value Scientific Notation


1110110110110 (-1)0 1.110110110110 x 212
-101.11 (-1)1 1.0111 x 22
0.00101 (-1)0 1.01 x 2-3

43
Carnegie Mellon

Primer: (Normalized) Scientific Notation


• In binary: (–1)s M 2E Sign Exponent
• Normalized form:
• 1<= M < 2
• M = 1.b0b1b2b3… (-1)s M × 2E
Fraction
Significand Base
• If I tell you that there is a number where:
• Fraction = 0101
• s=1
• E = 10
• You could reconstruct the number as (-1)11.0101x210

44
Carnegie Mellon

Primer: Floating Point Representation


• In binary: (–1)s M 2E Sign Exponent
• Normalized form:
• 1<= M < 2
• M = 1.b0b1b2b3… (-1)s M × 2E
Fraction
Significand Base

45
Carnegie Mellon

Primer: Floating Point Representation


• In binary: (–1)s M 2E Sign Exponent
• Normalized form:
• 1<= M < 2
• M = 1.b0b1b2b3… (-1)s M × 2E
Fraction
• Encoding Significand Base

45
Carnegie Mellon

Primer: Floating Point Representation


• In binary: (–1)s M 2E Sign Exponent
• Normalized form:
• 1<= M < 2
• M = 1.b0b1b2b3… (-1)s M × 2E
Fraction
• Encoding Significand Base

s exp frac

45
Carnegie Mellon

Primer: Floating Point Representation


• In binary: (–1)s M 2E Sign Exponent
• Normalized form:
• 1<= M < 2
• M = 1.b0b1b2b3… (-1)s M × 2E
Fraction
• Encoding Significand Base
• MSB s is sign bit s

s exp frac

45
Carnegie Mellon

Primer: Floating Point Representation


• In binary: (–1)s M 2E Sign Exponent
• Normalized form:
• 1<= M < 2
• M = 1.b0b1b2b3… (-1)s M × 2E
Fraction
• Encoding Significand Base
• MSB s is sign bit s
• exp field encodes Exponent (but not exactly the same, more later)

s exp frac

45
Carnegie Mellon

Primer: Floating Point Representation


• In binary: (–1)s M 2E Sign Exponent
• Normalized form:
• 1<= M < 2
• M = 1.b0b1b2b3… (-1)s M × 2E
Fraction
• Encoding Significand Base
• MSB s is sign bit s
• exp field encodes Exponent (but not exactly the same, more later)
• frac field encodes Fraction (but not exactly the same, more later)

s exp frac

45
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s exp frac

1 3 2

46
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s exp frac

1 3 2
• exp has 3 bits, interpreted as an unsigned value

46
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s exp frac

1 3 2
• exp has 3 bits, interpreted as an unsigned value
• If exp were E, we could represent exponents from 0 to 7

46
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s exp frac

1 3 2
• exp has 3 bits, interpreted as an unsigned value
• If exp were E, we could represent exponents from 0 to 7
• How about negative exponent?

46
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s exp frac

1 3 2
• exp has 3 bits, interpreted as an unsigned value
• If exp were E, we could represent exponents from 0 to 7
• How about negative exponent?
• Subtract a bias term: E = exp - bias (i.e., exp = E + bias)

46
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s exp frac

1 3 2
• exp has 3 bits, interpreted as an unsigned value
• If exp were E, we could represent exponents from 0 to 7
• How about negative exponent?
• Subtract a bias term: E = exp - bias (i.e., exp = E + bias)
• bias is always 2k-1 - 1, where k is number of exponent bits

46
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s exp frac

1 3 2
• exp has 3 bits, interpreted as an unsigned value
• If exp were E, we could represent exponents from 0 to 7
• How about negative exponent?
• Subtract a bias term: E = exp - bias (i.e., exp = E + bias)
• bias is always 2k-1 - 1, where k is number of exponent bits

• Example when we use 3 bits for exp (i.e., k = 3):

46
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s exp frac

1 3 2
• exp has 3 bits, interpreted as an unsigned value
• If exp were E, we could represent exponents from 0 to 7
• How about negative exponent?
• Subtract a bias term: E = exp - bias (i.e., exp = E + bias)
• bias is always 2k-1 - 1, where k is number of exponent bits

• Example when we use 3 bits for exp (i.e., k = 3):


• bias = 3

46
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s exp frac

1 3 2
• exp has 3 bits, interpreted as an unsigned value
• If exp were E, we could represent exponents from 0 to 7
• How about negative exponent?
• Subtract a bias term: E = exp - bias (i.e., exp = E + bias)
• bias is always 2k-1 - 1, where k is number of exponent bits

• Example when we use 3 bits for exp (i.e., k = 3):


• bias = 3
• If E = -2, exp is 1 (0012)

46
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s exp frac

1 3 2
• exp has 3 bits, interpreted as an unsigned value E exp
• If exp were E, we could represent exponents from 0 to 7 -3 000
• How about negative exponent? -2 001
• Subtract a bias term: E = exp - bias (i.e., exp = E + bias)
-1 010
0 011
• bias is always 2k-1 - 1, where k is number of exponent bits
1 100
• Example when we use 3 bits for exp (i.e., k = 3): 23 101
110
• bias = 3 4 111
• If E = -2, exp is 1 (0012)

46
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s exp frac

1 3 2
• exp has 3 bits, interpreted as an unsigned value E exp
• If exp were E, we could represent exponents from 0 to 7 -3 000
• How about negative exponent? -2 001
• Subtract a bias term: E = exp - bias (i.e., exp = E + bias)
-1 010
0 011
• bias is always 2k-1 - 1, where k is number of exponent bits
1 100
• Example when we use 3 bits for exp (i.e., k = 3): 23 101
110
• bias = 3 4 111
• If E = -2, exp is 1 (0012)
• Reserve 000 and 111 for other purposes (more on this later)

46
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s exp frac

1 3 2
• exp has 3 bits, interpreted as an unsigned value E exp
• If exp were E, we could represent exponents from 0 to 7 -3 000
• How about negative exponent? -2 001
• Subtract a bias term: E = exp - bias (i.e., exp = E + bias)
-1 010
0 011
• bias is always 2k-1 - 1, where k is number of exponent bits
1 100
• Example when we use 3 bits for exp (i.e., k = 3): 23 101
110
• bias = 3 4 111
• If E = -2, exp is 1 (0012)
• Reserve 000 and 111 for other purposes (more on this later)
• We can now represent exponents from -2 (exp 001) to 3 (exp 110)
46
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s exp frac

1 3 2
• frac has 2 bits, append them after “1.” to form M
• frac = 10 implies M = 1.10

47
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s exp frac

1 3 2
• frac has 2 bits, append them after “1.” to form M
• frac = 10 implies M = 1.10
• Putting it Together: An Example:

-10.12 = (-1) 1 1.01 x 21

47
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s exp frac

1 3 2
• frac has 2 bits, append them after “1.” to form M
• frac = 10 implies M = 1.10
• Putting it Together: An Example:

-10.12 = (-1) 1 1.01 x 21

47
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s
1 exp frac

1 3 2
• frac has 2 bits, append them after “1.” to form M
• frac = 10 implies M = 1.10
• Putting it Together: An Example:

-10.12 = (-1) 1 1.01 x 21

47
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s
1 exp frac

1 3 2
• frac has 2 bits, append them after “1.” to form M
• frac = 10 implies M = 1.10
• Putting it Together: An Example:

-10.12 = (-1) 1 1.01 x 21

47
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s
1 exp frac

1 3 2
• frac has 2 bits, append them after “1.” to form M E exp
• frac = 10 implies M = 1.10 -3 000
• Putting it Together: An Example: -2 001
-1 010
0 011
1 100

-10.12 = (-1) 1 1.01 x 21 2


3
101
110
4 111

47
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s
1 exp
100 frac

1 3 2
• frac has 2 bits, append them after “1.” to form M E exp
• frac = 10 implies M = 1.10 -3 000
• Putting it Together: An Example: -2 001
-1 010
0 011
1 100

-10.12 = (-1) 1 1.01 x 21 2


3
101
110
4 111

47
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s
1 exp
100 frac

1 3 2
• frac has 2 bits, append them after “1.” to form M E exp
• frac = 10 implies M = 1.10 -3 000
• Putting it Together: An Example: -2 001
-1 010
0 011
1 100

-10.12 = (-1) 1 1.01 x 21 2


3
101
110
4 111

47
Carnegie
CarnegieMellon
Mello

6-bit Floating Point Example


v = (–1)s M 2E s
1 exp
100 frac
01

1 3 2
• frac has 2 bits, append them after “1.” to form M E exp
• frac = 10 implies M = 1.10 -3 000
• Putting it Together: An Example: -2 001
-1 010
0 011
1 100

-10.12 = (-1) 1 1.01 x 21 2


3
101
110
4 111

47

You might also like