0% found this document useful (0 votes)
76 views

Unit-1 (Computer Architecture)

The document provides an overview of the basic components of a computer system, including: 1) The input unit takes in data from devices like keyboards and scanners and converts it to binary for processing. 2) The CPU, made up of the ALU and CU, processes the data through arithmetic, logical operations, and by coordinating tasks. 3) Memory, including primary RAM and secondary hard disks, temporarily and permanently stores input data and processed output.

Uploaded by

zakir hussain
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

Unit-1 (Computer Architecture)

The document provides an overview of the basic components of a computer system, including: 1) The input unit takes in data from devices like keyboards and scanners and converts it to binary for processing. 2) The CPU, made up of the ALU and CU, processes the data through arithmetic, logical operations, and by coordinating tasks. 3) Memory, including primary RAM and secondary hard disks, temporarily and permanently stores input data and processed output.

Uploaded by

zakir hussain
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

BY Zakir Hussain

UNIT-I (Computer Architecture)


------------------------------------------------------------------------------------------
Block Diagram of a Computer

Input

All the data received by the computer goes through the input unit. The input unit comprises different
devices like a mouse, keyboard, scanner, etc. In other words, each of these devices acts as a mediator
between the users and the computer.

The data that is to be processed is put through the input unit. The computer accepts the raw data in binary
form. It then processes the data and produces the desired output.

The 3 major functions of the input unit are-

 Take the data to be processed by the user.


 Convert the given data into machine-readable form.
 And then, transmit the converted data into the main memory of the computer. The sole purpose is
to connect the user and the computer. In addition, this creates easy communication between them.

CPU – Central Processing Unit

1
BY Zakir Hussain

Central Processing Unit or the CPU, is the brain of the computer. It works the same way a human brain
works. As the brain controls all human activities, similarly the CPU controls all the tasks.Moreover, the
CPU conducts all the arithmetical and logical operations in the computer.

Now the CPU comprises of two units, namely – ALU (Arithmetic Logic Unit) and CU (Control Unit).
Both of these units work in sync. The CPU processes the data as a whole.

ALU – Arithmetic Logic Unit

The Arithmetic Logic Unit is made of two terms, arithmetic and logic. There are two primary functions
that this unit performs.

1. Data is inserted through the input unit into the primary memory. Performs the basic arithmetical
operation on it. Like addition, subtraction, multiplication, and division. It performs all sorts of
calculations required on the data. Then sends back data to the storage.
2. The unit is also responsible for performing logical operations like AND, OR, Equal to, Less than,
etc. In addition to this it conducts merging, sorting, and selection of the given data.

CU – Control Unit

The control unit as the name suggests is the controller of all the activities/tasks and operations. All this is
performed inside the computer.The memory unit sends a set of instructions to the control unit. Then the
control unit in turn converts those instructions. After that these instructions are converted to control
signals.

These control signals help in prioritizing and scheduling activities. Thus, the control unit coordinates the
tasks inside the computer in sync with the input and output units.

Memory Unit

All the data that has to be processed or has been processed is stored in the memory unit. The memory unit
acts as a hub of all the data. It transmits it to the required part of the computer whenever necessary.The
memory unit works in sync with the CPU. This helps in faster accessing and processing of the data. Thus,
making tasks easier and quicker.

There are two types of computer memory-

1. Primary memory – This type of memory cannot store a vast amount of data. Therefore, it is only
used to store recent data. The data stored in this is temporary. It can get erased once the power is
switched off. Therefore, is also called temporary memory or main memory.

RAM stands for Random Access Memory. It is an example of primary memory. This memory is
directly accessible by the CPU. It is used for reading and writing purposes. For data to be

2
BY Zakir Hussain

processed, it has to be first transferred to the RAM and then to the CPU.

2. Secondary memory – As explained above, the primary memory stores temporary data. Thus it
cannot be accessed in the future. For permanent storage purposes, secondary memory is used. It is
also called permanent memory or auxiliary memory. The hard disk is an example of secondary
memory. Even in a power failure data does not get erased easily.

Output

There is nothing to be amazed by what the output unit is used for. All the information sent to the
computer once processed is received by the user through the output unit. Devices like printers, monitors,
projectors, etc. all come under the output unit.

The output unit displays the data either in the form of a soft copy or a hard copy. The printer is for the
hard copy. The monitor is for the display. The output unit accepts the data in binary form from the
computer. It then converts it into a readable form for the user.

Logic Gates

 The logic gates are the main structural part of a digital system.
 Logic Gates are a block of hardware that produces signals of binary 1 or 0 when input logic
requirements are satisfied.
 Each gate has a distinct graphic symbol, and its operation can be described by means of algebraic
expressions.
 The seven basic logic gates includes: AND, OR, XOR, NOT, NAND, NOR, and XNOR.
 The relationship between the input-output binary variables for each gate can be represented in
tabular form by a truth table.
 Each gate has one or two binary input variables designated by A and B and one binary output
variable designated by x.

AND GATE: The AND gate is an electronic circuit which gives a high output only if all its inputs are
high. The AND operation is represented b y a dot (.) sign.

3
BY Zakir Hussain

OR GATE: The OR gate is an electronic circuit which gives a high output if one or more of its inputs
are high. The operation performed by an OR gate is represented by a plus (+) sign

NOT GATE: The NOT gate is an electronic circuit which produces an inverted version of the input at its
output. It is also known as an Inverter.

NAND GATE: The NOT-AND (NAND) gate which is equal to an AND gate followed by a NOT gate.
The NAND gate gives a high output if any of the inputs are low. The NAND gate is represented by a
AND gate with a small circle on the output. The small circle represents inversion.

NOR GATE: The NOT-OR (NOR) gate which is equal to an OR gate followed by a NOT gate. The
NOR gate gives a low output if any of the inputs are high. The NOR gate is represented by an OR gate
with a small circle on the output. The small circle represents inversion.

4
BY Zakir Hussain

Exclusive-OR/ XOR GATE: The 'Exclusive-OR' gate is a circuit which will give a high output if
one of its inputs is high but not both of them. The XOR operation is represented by an encircled plus sign.

EXCLUSIVE-NOR/Equivalence GATE: The 'Exclusive-NOR' gate is a circuit that does the inverse
operation to the XOR gate. It will give a low output if one of its inputs is high but not both of them. The
small circle represents inversion.

Boolean algebra (Boolean function)

Boolean algebra can be considered as an algebra that deals with binary variables and logic operations.
Boolean algebraic variables are designated by letters such as A, B, x, and y. The basic operations
performed are AND, OR, and complement.

The Boolean algebraic functions are mostly expressed with binary variables, logic operation symbols,
parentheses, and equal sign. For a given value of variables, the Boolean function can be either 1 or 0. For
instance, consider the Boolean function:

F = x + y'z

The logic diagram for the Boolean function F = x + y'z can be represented as:

5
BY Zakir Hussain

 The Boolean function F = x + y'z is transformed from an algebraic expression into a logic diagram
composed of AND, OR, and inverter gates.
 Inverter at input 'y' generates its complement y'.
 There is an AND gate for the term y'z, and an OR gate is used to combine the two terms (x and
y'z).
 The variables of the function are taken to be the inputs of the circuit, and the variable symbol of
the function is taken as the output of the circuit.

 The truth table for the Boolean function F = x + y'z can be represented as:

Karnaugh Map(K-Map) method

The K-map is a systematic way of simplifying Boolean expressions. With the help of the K-map method,
we can find the simplest POS and SOP expression, which is known as the minimum expression. The K-
map provides a cookbook for simplification.

6
BY Zakir Hussain

Just like the truth table, a K-map contains all the possible values of input variables and their
corresponding output values. However, in K-map, the values are stored in cells of the array. In each cell, a
binary value of each input variable is stored.

The K-map method is used for expressions containing 2, 3, 4, and 5 variables. For a higher number of
variables, there is another method used for simplification called the Quine-McClusky method. In K-map,
the number of cells is similar to the total number of variable input combinations. For example, if the
number of variables is three, the number of cells is 2 3=8, and if the number of variables is four, the
number of cells is 24. The K-map takes the SOP and POS forms. The K-map grid is filled using 0's and
1's. The K-map is solved by making groups. There are the following steps used to solve the expressions
using K-map:

1. First, we find the K-map as per the number of variables.


2. Find the maxterm and minterm in the given expression.
3. Fill cells of K-map for SOP with 1 respective to the minterms.
4. Fill cells of the block for POS with 0 respective to the maxterm.
5. Next, we create rectangular groups that contain total terms in the power of two like 2, 4, 8, … and
try to cover as many elements as we can in one group.
6. With the help of these groups, we find the product terms and sum them up for the SOP form.

2 Variable K-Map

The number of cells in 2 variable K-map is four, since the number of variables is two. The following
figure shows 2 variable K-Map.

 There is only one possibility of grouping 4 adjacent min terms.


 The possible combinations of grouping 2 adjacent min terms are {(m0, m1), (m2, m3), (m0, m2) and
(m1, m3)}.

3 Variable K-Map

The number of cells in 3 variable K-map is eight, since the number of variables is three. The following
figure shows 3 variable K-Map.

7
BY Zakir Hussain

 There is only one possibility of grouping 8 adjacent min terms.


 The possible combinations of grouping 4 adjacent min terms are {(m0, m1, m3, m2), (m4, m5, m7,
m6), (m0, m1, m4, m5), (m1, m3, m5, m7), (m3, m2, m7, m6) and (m2, m0, m6, m4)}.
 The possible combinations of grouping 2 adjacent min terms are {(m0, m1), (m1, m3), (m3, m2), (m2,
m0), (m4, m5), (m5, m7), (m7, m6), (m6, m4), (m0, m4), (m1, m5), (m3, m7) and (m2, m6)}.
 If x=0, then 3 variable K-map becomes 2 variable K-map.

4 Variable K-Map

The number of cells in 4 variable K-map is sixteen, since the number of variables is four. The following
figure shows 4 variable K-Map.

 There is only one possibility of grouping 16 adjacent min terms.


 Let R1, R2, R3 and R4 represents the min terms of first row, second row, third row and fourth row
respectively. Similarly, C1, C2, C3 and C4 represents the min terms of first column, second column,
third column and fourth column respectively. The possible combinations of grouping 8 adjacent
min terms are {(R1, R2), (R2, R3), (R3, R4), (R4, R1), (C1, C2), (C2, C3), (C3, C4), (C4, C1)}.
 If w=0, then 4 variable K-map becomes 3 variable K-map.

5 Variable K-Map

The number of cells in 5 variable K-map is thirty-two, since the number of variables is 5. The following
figure shows 5 variable K-Map.

8
BY Zakir Hussain

 There is only one possibility of grouping 32 adjacent min terms.


 There are two possibilities of grouping 16 adjacent min terms. i.e., grouping of min terms from
m0 to m15 and m16 to m31.
 If v=0, then 5 variable K-map becomes 4 variable K-map.

In the above all K-maps, we used exclusively the min terms notation. Similarly, you can use exclusively
the Max terms notation.

Example

Let us simplify the following Boolean function, f(X,Y,Z)=∏M(0,1,2,4) using K-map.

The given Boolean function is in product of Max terms form. It is having 3 variables X, Y & Z. So, we
require 3 variable K-map. The given Max terms are M0, M1, M2 & M4. The 3 variable K-map with zeroes
corresponding to the given Max terms is shown in the following figure.

There are no possibilities of grouping either 8 adjacent zeroes or 4 adjacent zeroes. There are three
possibilities of grouping 2 adjacent zeroes. After these three groupings, there is no single zero left as
ungrouped. The 3 variable K-map with these three groupings is shown in the following figure.

9
BY Zakir Hussain

Here, e got three prime implicants X + Y, Y + Z & Z + X. All these prime implicants
are essential because one zero in each grouping is not covered by any other groupings except with their
individual groupings.

Therefore, the simplified Boolean function is

f = X+Y.Y+Z.Z+X

In this way, we can easily simplify the Boolean functions up to 5 variables using K-map method. For
more than 5 variables, it is difficult to simplify the functions using K-Maps. Because, the number
of cells in K-map gets doubled by including a new variable.

Due to this checking and grouping of adjacent ones minterms or adjacent zeros Maxterms will be
complicated. We will discuss Tabular method in next chapter to overcome the difficulties of K-map
method.

Combinational Circuits

A combinational circuit comprises of logic gates whose outputs at any time are determined directly from
the present combination of inputs without any regard to previous inputs.

A combinational circuit performs a specific information-processing operation fully specified logically by


a set of Boolean functions.

The basic components of a combinational circuit are: input variables, logic gates, and output variables.

10
BY Zakir Hussain

The 'n' input variables come from an external source whereas the 'm' output variables go to an external
destination. In many applications, the source or destinations are storage registers.

Half Adder

Half adder is a combinational logic circuit with two inputs and two outputs. The half adder circuit is
designed to add two single bit binary number A and B. It is the basic building block for addition of
two single bit numbers. This circuit has two outputs carry and sum.

Block diagram

Truth Table

Circuit Diagram

Full Adder

Full adder is developed to overcome the drawback of Half Adder circuit. It can add two one-bit numbers
A and B, and carry c. The full adder is a three input and two output combinational circuit.

Block diagram

11
BY Zakir Hussain

Truth Table

Circuit Diagram

Time and control :


The timing for all registers in the basic computer is controlled by a master clock generator. The clock
pulses are applied to all flip-flops and registers in the system, including the flip-flops and registers in the
control unit. The clock pulses do not change the state of a register unless the register is enabled by a
control signal. The control signals are generated in the control unit and provide control inputs for the

12
BY Zakir Hussain

multiplexers in the common bus, control inputs in processor registers, and microoperations for the
accumulator.

There are two major types of control organization:

1. hardwired control and


2. microprogrammed control.

In the hardwired organization, the control logic is implemented with gates, flip-flops, decoders, and
other digital circuits. It has the advantage that it can be optimized to produce a fast mode of operation. In
the microprogrammed organization, the control information is stored in a control memory. The control
memory is programmed to initiate the required sequence of microoperations. A hardwired control, as the
name implies, requires changes in the wiring among the various components if the design has to be
modified or changed.

In the microprogrammed control, any required changes or modifications can be done by updating the
microprogram in control memory.

The block diagram of the control unit is shown in Fig. 5.6.

It consists of two decoders,

1. a sequence counter, and


2. a number of control logic gates.

13
BY Zakir Hussain

Flip Flop
A flip flop in digital electronics is a circuit with two stable states that can be used to store binary data. The
stored data can be changed by applying varying inputs. Flip-flops and latches are fundamental building
blocks of digital electronics systems used in computers, communications, and many other types of
systems. Both are used as data storage elements.

Flip Flop Types

There are basically 4 types of flip-flops in digital electronics:

1. SR Flip-Flop
2. JK Flip-Flop
3. D Flip-Flop
4. T Flip-Flop
Let’s understand each Flip-flop one by one.

1. SR Flip Flop

This is the most common flip-flop among all. This simple flip-flop circuit has a set input (S) and a reset
input (R). In this system, when you Set “S” as active, the output “Q” would be high, and “Q‘” would be
low. Once the outputs are established, the wiring of the circuit is maintained until “S” or “R” go high, or
power is turned off.

As shown above, it is the simplest and easiest to understand. The two outputs, as shown above, are the
inverse of each other. The truth table of SR Flip-Flop is highlighted below.

S R Q Q’
14
BY Zakir Hussain

0 0 0 1

0 1 0 1

1 0 1 0

1 1 ∞ ∞

2. JK Flip-Flop

Due to the undefined state in the SR flip-flops, another flip-flop is required in electronics. The JK flip-
flop is an improvement on the SR flip-flop where S=R=1 is not a problem.

JK Flip Flop Circuit


The input condition of J=K=1 gives an output inverting the output state. However, the outputs are the
same when one tests the circuit practically.

In simple words, If J and K data input are different (i.e. high and low), then the output Q takes the value
of J at the next clock edge. If J and K are both low, then no change occurs. If J and K are both high at the
clock edge, then the output will toggle from one state to the other. JK Flip-Flops can function as Set or
Reset Flip-flops.

JK FF Truth Table:

J K Q Q’

0 0 0 0

15
BY Zakir Hussain

0 1 0 0

1 0 0 1

1 1 0 1

0 0 1 1

0 1 1 0

1 0 1 1

1 1 1 0

3. D Flip-Flop

D flip-flop is a better alternative that is very popular with digital electronics. They are commonly used for
counters and shift registers and input synchronization.

D Flip-Flop
In the D flip-flops, the output can only be changed at the clock edge, and if the input changes at other
times, the output will be unaffected.

Truth Table:

Clock D Q Q’

↓»0 0 0 1

↑»1 0 0 1

16
BY Zakir Hussain

↓»0 1 0 1

↑»1 1 1 0

The change of state of the output is dependent on the rising edge of the clock. The output (Q) is the same
as the input and can only change at the rising edge of the clock.

4. T Flip-Flop

A T flip-flop is like a JK flip-flop. These are basically single-input versions of JK flip-flops. This
modified form of the JK is obtained by connecting inputs J and K together. It has only one input along
with the clock input.

T flip flop
These flip-flops are called T flip-flops because of their ability to complement their state i.e. Toggle, hence
they are named Toggle flip-flops.

Truth Table:

Q
T Q
(t+1)

0 0 0

1 0 1

0 1 1

1 1 0

17
BY Zakir Hussain

Registers

A Register is a collection of flip flops. A flip flop is used to store single bit digital data. For storing a
large number of bits, the storage capacity is increased by grouping more than one flip flops. If we want to
store an n-bit word, we have to use an n-bit register containing n number of flip flops.

The register is used to perform different types of operations. For performing the operations, the CPU use
these registers. The faded inputs to the system will store into the registers. The result returned by the
system will store in the registers. There are the following operations which are performed by the registers:

Fetch:

It is used

 To take the instructions given by the users.


 To fetch the instruction stored into the main memory.

Decode:

The decode operation is used to interpret the instructions. In decode, the operation performed on the
instructions is identified by the CPU. In simple words, the decode operation is used to decode the
instructions.

Execute:

The execution operation is used to store the result produced by the CPU into the memory. After storing
this result, it is displayed on the user screen.

Types of Registers
There are various types of registers which are as follows:

18
BY Zakir Hussain

MAR or Memory Address Register

The MAR is a special type of register that contains the memory address of the data and instruction. The
main task of the MAR is to access instruction and data from memory in the execution phase. The MAR
stores the address of the memory location where the data is to be read or to be stored by the CPU.

Program Counter

The program counter is also called an instruction address register or instruction pointer. The next memory
address of the instruction, which is going to be executed after completing the execution of current
instruction is contained in the program counter. In simple words, the program counter contains the
memory address of the location of the next instruction.

Accumulator Register

The CPU mostly uses an accumulator register. The accumulator register is used to store the system result.
All the results will be stored in the accumulator register when the CPU produces some results after
processing.

MDR or Memory Data Register

Memory Data Register is a part of the computer's control unit. It contains the data that we want to store in
the computer storage or the data fetched from the computer storage. The MDR works as a buffer that
contains anything for which the processor is ready to use it. The MDR contains the copied data of the
memory for the processor. Firstly the MDR holds the information, and then it goes to the decoder.

The data which is to be read out or written into the address location is contained in the Memory Data
Register.

The data is written in one direction when it is fetched from memory and placed into the MDR. In write
instruction, the data place into the MDR from another CPU register. This CPU register writes the data into
the memory. Half of the minimal interface between the computer storage and the microprogram is the
memory data address register, and the other half is the memory data register.

Index Register

The Index Register is the hardware element that holds the number. The number adds to the computer
instruction's address to create an effective address. In CPU, the index register is a processor register used
to modify the operand address during the running program.

Memory Buffer Register

Memory Buffer Register is mostly called MBR. The MBR contains the Metadata of the data and
instruction written in or read from memory. In simple words, it adds is used to store the upcoming
data/instruction from the memory and going to memory.

Data Register
19
BY Zakir Hussain

The data register is used to temporarily store the data. This data transmits to or from a peripheral device.

Shift Register

A group of flip flops which is used to store multiple bits of data and the data is moved from one flip flop
to another is known as Shift Register. The bits stored in registers shifted when the clock pulse is applied
within and inside or outside the registers. To form an n-bit shift register, we have to connect n number of
flip flops. So, the number of bits of the binary number is directly proportional to the number of flip flops.
The flip flops are connected in such a way that the first flip flop's output becomes the input of the other
flip flop.

A Shift Register can shift the bits either to the left or to the right. A Shift Register, which shifts the bit to
the left, is known as "Shift left register", and it shifts the bit to the right, known as "Right left
register".

Counters
A special type of sequential circuit used to count the pulse is known as a counter, or a collection of flip
flops where the clock signal is applied is known as counters.

The counter is one of the widest applications of the flip flop. Based on the clock pulse, the output of the
counter contains a predefined state. The number of the pulse can be counted using the output of the
counter.

There are the following types of counters:

1. Asynchronous Counters
2. Synchronous Counters

1. Asynchronous or ripple counters

The Asynchronous counter is also known as the ripple counter. Below is a diagram of the 2-
bit Asynchronous counter in which we used two T flip-flops. Apart from the T flip flop, we can also use
the JK flip flop by setting both of the inputs to 1 permanently. The external clock pass to the clock input
of the first flip flop, i.e., FF-A and its output, i.e., is passed to clock input of the next flip flop, i.e., FF-B.

2. Synchronous counters

In the Asynchronous counter, the present counter's output passes to the input of the next counter. So, the
counters are connected like a chain. The drawback of this system is that it creates the counting delay, and
the propagation delay also occurs during the counting stage. The synchronous counter is designed to
remove this drawback.

In the synchronous counter, the same clock pulse is passed to the clock input of all the flip flops. The
clock signals produced by all the flip flops are the same as each other. Below is the diagram of a 2-bit
synchronous counter in which the inputs of the first flip flop, i.e., FF-A, are set to 1. So, the first flip flop

20
BY Zakir Hussain

will work as a toggle flip-flop. The output of the first flip flop is passed to both the inputs of the next JK
flip flop.

Sequential Circuits

The combinational circuit does not use any memory. Hence the previous state of input does not have any
effect on the present state of the circuit. But sequential circuit has memory so output can vary based on
input. This type of circuits uses previous input, output, clock and a memory element.

Block diagram

Flip Flop: Flip flop is a sequential circuit which generally samples its inputs and changes its outputs only
at particular instants of time and not continuously. Flip flop is said to be edge sensitive or edge triggered
rather than being level triggered like latches.

S-R Flip Flop: It is basically S-R latch using NAND gates with an additional enable input. It is also
called as level triggered SR-FF. For this, circuit in output will take place if and only if the enable input (E)
is made active. In short this circuit will operate as an S-R latch if E = 1 but there is no change in the
output if E = 0.

Block Diagram

21
BY Zakir Hussain

Truth Table:

Number Representation :
Digital Computers use Binary number system to represent all types of information inside the computers.
Alphanumeric characters are represented using binary bits (i.e., 0 and 1). Digital representations are easier
to design, storage is easy, accuracy and precision are greater.
There are various types of number representation techniques for digital number representation, for
example: Binary number system, octal number system, decimal number system, and hexadecimal
number system etc. But Binary number system is most relevant and popular for representing numbers in
digital computer system.

Storing Real Number

These are structures as following below −

22
BY Zakir Hussain

There are two major approaches to store real numbers (i.e., numbers with fractional component) in
modern computing. These are (i) Fixed Point Notation and (ii) Floating Point Notation. In fixed point
notation, there are a fixed number of digits after the decimal point, whereas floating point number allows
for a varying number of digits after the decimal point.

Fixed-Point Representation −

This representation has fixed number of bits for integer part and for fractional part. For example, if given
fixed-point representation is IIII.FFFF, then you can store minimum value is 0000.0001 and maximum
value is 9999.9999. There are three parts of a fixed-point number representation: the sign field, integer
field, and fractional field.

We can represent these numbers using:

 Signed representation: range from -(2(k-1)-1) to (2(k-1)-1), for k bits.


 1’s complement representation: range from -(2(k-1)-1) to (2(k-1)-1), for k bits.
 2’s complementation representation: range from -(2(k-1)) to (2(k-1)-1), for k bits.
2’s complementation representation is preferred in computer system because of unambiguous property
and easier for arithmetic operations.
23
BY Zakir Hussain

Example −Assume number is using 32-bit format which reserve 1 bit for the sign, 15 bits for the integer
part and 16 bits for the fractional part.

Then, -43.625 is represented as following:

Where, 0 is used to represent + and 1 is used to represent. 000000000101011 is 15 bit binary value for
decimal 43 and 1010000000000000 is 16 bit binary value for fractional 0.625.

The advantage of using a fixed-point representation is performance and disadvantage is relatively


limited range of values that they can represent. So, it is usually inadequate for numerical analysis as it
does not allow enough numbers and accuracy. A number whose representation exceeds 32 bits would
have to be stored inexactly.

These are above smallest positive number and largest positive number which can be store in 32-bit
representation as given above format. Therefore, the smallest positive number is 2 -16 ≈ 0.000015
approximate and the largest positive number is (2 15-1)+(1-2-16)=215(1-2-16) =32768, and gap between these
numbers is 2-16.

We can move the radix point either left or right with the help of only integer field is 1.

Floating-Point Representation −

This representation does not reserve a specific number of bits for the integer part or the fractional part.
Instead it reserves a certain number of bits for the number (called the mantissa or significand) and a
certain number of bits to say where within that number the decimal place sits (called the exponent).

24
BY Zakir Hussain

The floating number representation of a number has two part: the first part represents a signed fixed point
number called mantissa. The second part of designates the position of the decimal (or binary) point and is
called the exponent. The fixed point mantissa may be fraction or an integer. Floating -point is always
interpreted to represent a number in the following form: Mxre.

Only the mantissa m and the exponent e are physically represented in the register (including their sign). A
floating-point binary number is represented in a similar manner except that is uses base 2 for the
exponent. A floating-point number is said to be normalized if the most significant digit of the mantissa is
1.

So, actual number is (-1)s(1+m)x2(e-Bias), where s is the sign bit, m is the mantissa, e is the exponent value,
and Bias is the bias number.

Note that signed integers and exponent are represented by either sign representation, or one’s complement
representation, or two’s complement representation.

The floating point representation is more flexible. Any non-zero number can be represented in the
normalized form of ±(1.b1b2b3 ...)2x2n This is normalized form of a number x.
Example −Suppose number is using 32-bit format: the 1 bit sign bit, 8 bits for signed exponent, and 23
bits for the fractional part. The leading bit 1 is not stored (as it is always 1 for a normalized number) and
is referred to as a “hidden bit”.
Then −53.5 is normalized as -53.5=(-110101.1)2=(-1.101011)x25 , which is represented as following
below,

Where 00000101 is the 8-bit binary value of exponent value +5.

Note that 8-bit exponent field is used to store integer exponents -126 ≤ n ≤ 127.

25
BY Zakir Hussain

The smallest normalized positive number that fits into 32 bits is (1.00000000000000000000000) 2x2-126=2-
126
≈1.18x10-38 , and largest normalized positive number that fits into 32 bits is
(1.11111111111111111111111)2x2127=(224-1)x2104 ≈ 3.40x1038 . These numbers are represented as
following below,

The precision of a floating-point format is the number of positions reserved for binary digits plus one (for
the hidden bit). In the examples considered here the precision is 23+1=24.

The gap between 1 and the next normalized floating-point number is known as machine epsilon. the gap is
(1+2-23)-1=2-23for above example, but this is same as the smallest positive floating-point number because
of non-uniform spacing unlike in the fixed-point scenario.
Note that non-terminating binary numbers can be represented in floating point representation, e.g., 1/3 =
(0.010101 ...)2 cannot be a floating-point number as its binary representation is non-terminating.
IEEE Floating point Number Representation −

IEEE (Institute of Electrical and Electronics Engineers) has standardized Floating-Point Representation as
following diagram.

So, actual number is (-1)s(1+m)x2(e-Bias), where s is the sign bit, m is the mantissa, e is the exponent value,
and Bias is the bias number. The sign bit is 0 for positive number and 1 for negative number. Exponents
are represented by or two’s complement representation.

According to IEEE 754 standard, the floating-point number is represented in following ways:

 Half Precision (16 bit): 1 sign bit, 5 bit exponent, and 10 bit mantissa
 Single Precision (32 bit): 1 sign bit, 8 bit exponent, and 23 bit mantissa
26
BY Zakir Hussain

 Double Precision (64 bit): 1 sign bit, 11 bit exponent, and 52 bit mantissa
 Quadruple Precision (128 bit): 1 sign bit, 15 bit exponent, and 112 bit mantissa
Special Value Representation −

There are some special values depended upon different values of the exponent and mantissa in the IEEE
754 standard.

 All the exponent bits 0 with all mantissa bits 0 represents 0. If sign bit is 0, then +0, else -0.
 All the exponent bits 1 with all mantissa bits 0 represents infinity. If sign bit is 0, then +∞, else -∞.
 All the exponent bits 0 and mantissa bits non-zero represents denormalized number.
 All the exponent bits 1 and mantissa bits non-zero represents error.

27

You might also like