Microprocessor CoursePDF
Microprocessor CoursePDF
There are two typical components of a CPU: arithmetic logic unit (ALU or ALSU) and the control unit (CU). The
ALU performs arithmetic and logical operations, while the CU extracts instructions from memory and decodes
and executes them; calling on the ALU for help when necessary.
A microprocessor incorporates all of the functions of a computer’s CPU on a single integrated circuit (IC).
Initially, it required more than one circuit in order to house the CPU, as the technology became more
advanced, the need for that many circuits reduced. Today a microprocessor can house around 4 CPUs for
quad-core technologies. In addition to the CPU, the microprocessor also packs BIOS and memory access
circuits. (cache, some said also all RAM) It is a programmable device that receives data, processes it according
to stored directions and then gives results in the form of output. The primary language is binary code: 0, 1.
Due to the similarity in usage, it is easy to understand why both of these words have become synonymous. If a
person were to refer to a microprocessor as a CPU and vice-versa, it would be acceptable.
Main Memory:
Also known as physical memory, it is internal to the computer. The word main is used to distinguish it from
external mass storage devices such as disk drives. Another term for main memory is RAM.
The computer can manipulate only data that is in main memory. Therefore, every program you execute and
every file you access must be copied from a storage device into main memory. The amount of main
memory on a computer is crucial because it determines how many programs can be executed at one time and
how much data can be readily available to a program.
Because computers often have too little main memory to hold all the data they need, computer engineers
invented a technique called swapping, in which portions of data are copied into main memory as they are
needed. Swapping occurs when there is no room in memory for needed data. When one portion of data is
copied into memory, an equal-sized portion is copied (swapped) out to make room.
RAM is constructed from integrated circuits and needs to have electrical power in order to maintain its
information. When power is lost, the information is lost too! It can be directly accessed by the CPU. The access
time to read or write any particular byte are independent of whereabouts in the memory that byte is, and
currently is approximately 50 nanoseconds (a thousand millionth of a second). This is broadly comparable with
the speed at which the CPU will need to access data. Main memory is expensive compared to external
memory so it has limited capacity. The capacity available for a given price is increasing all the time. For
example many home Personal Computers now have a capacity of 16 megabytes (million bytes), while 64
megabytes is commonplace on commercial workstations. The CPU will normally transfer data to and from the
main memory in groups of two, four or eight bytes, even if the operation it is undertaking only requires a
single byte.
DRAM (Dynamic)- Its advantage over SRAM is its structural simplicity: only one transistor (MOSFET gates) and
a capacitor (to store a bit as a charge) are required per bit, compared to six transistors in SRAM. This allows
DRAM to reach very high density. Also it consumes less power and is even cheaper than SRAM (except when
the system size is less than 8 K). DRAM is the main Computer Memory.
But the disadvantage is that since it stores bit information as charge which leaks; therefore information needs
to be read and written again every few milliseconds. This is known as refreshing the memory and it requires
extra circuitry, adding to the cost of system.
DRAM evolved to FPM DRAM (Fast Page mode), to EDO DRAM (Extended Data-Out) to SDRAM (synchronous),
to DDR SDRAM (Double Data Rate) The improvements were in the Refresh Speed, Access time, Write Cycle
Time.
Classic DRAM has an asynchronous interface, which means that it responds as quickly as possible to changes in
control inputs. SDRAM has a synchronous interface, meaning that it waits for a clock signal before responding
to control inputs and is therefore synchronized with the computer's system bus. The clock is used to drive an
internal circuit that pipelines incoming commands. The data storage area is divided into several banks,
allowing the chip to work on several memory access commands at a time, interleaved among the separate
banks. This allows higher data access rates than an asynchronous DRAM.
Pipelining means that the chip can accept a new command before it has finished processing the previous one.
In a pipelined write, the write command can be immediately followed by another command, without waiting
for the data to be written to the memory array. In a pipelined read, the first requested data appears after a
fixed number of clock cycles after the read command (latency). During these clock cycles, additional
commands can be sent. (This delay is called the latency and is an important performance parameter to
consider when purchasing SDRAM for a computer.) SDRAM has same read and write cycle time as it is
synchronized by CPU clock. Generally we use latency term when talking about SDRAM, and Access time when
talking about normal DRAM
SDRAM is widely used in computers; after the original SDRAM, further generations of double data rate
(DDR) RAM have entered the mass market – DDR (also known as DDR1), DDR2, DDR3 and DDR4, with the
latest generation (DDR4) released in second half of 2014.
SDRAM is efficient for burst read or burst write, the resulting multiple word read or write from or to
consecutive addresses. Only one clock cycle is needed for the next word. See below example
SDRAM has a Memory Bandwidth which is the rate at which data can be read from or written into it, and is
generally associated with the burst read and write.
SDRAM Example:
Memory data path width: 1 word = 4 bytes
Burst size: 8 words = 32 bytes
Memory clock frequency: 5 ns
Latency time (from application of row address until first word available): 4 clock cycles
RAM Block
seen in CO
a a
Expansion by Address: A = (new 2 / old 2 )
Old a = log2 (128K) = log2 (217) = 17 lines, New a = log2 (1M) = log2 (220) = 20 lines
a a
Expansion by Address: A = (new 2 / old 2 ) = (1M / 128K ) = (220 / 217 ) = (23 ) = 8 Blocks
16
20 17 To all Blocks
0 To cs, 2 by 2
3, 8 DEC
E
CS
As I have a larger memory expansion, the decoder size will become large. A larger decoder has many gates in
it.
A good way called coincident selection can be used to split the decoder size in two and have instead of N, 2 N
decoder, two N/2,2N/2 decoders, row and column decoders. To be selected, a memory block must have both
its row and column decoders enabled. We associate a gate from each 2 outputs of the decoders to be
connected to the RAM row chip Selects. If all are active lows, we use OR gates
Note that each RAM memory block is designed internally with coincident selection decoders to save space and
power dissipation.
Theoretical Example: Construct a 1GB* 8 from 2MB*8
1GB/2MB = 230/221 = 29
So a 9, 29 decoder is needed, this decoder contains at least 29 = 512 AND Gates with 9 inputs each.
Using one 4,16 and 5,32 decoders instead will be much more efficient. You have just 48 AND Gates and also
with lesser number of inputs.
Imagine for larger expansion how much you will save.
256K/16K = 16. A 4,16 decoder is needed. 256K has 18 bits address, 16K has 14 bits address
Splitting into two 2,4 decoders, we will have 16 connections with 16 2 inputs OR Gates
18 14 to all blocks to CS
4
1
2, 4 DEC
0 1 2 3
First Second
2, 4 DEC
0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
1 1 1 0
1 1 1 1
2,4
Decoder
ROM Properties:
The ROM basic structure contains a decoder and an Array of gates to collect Outputs
The ROM may be Active High (AH) or Active Low (AL) (Decoder AH or AL then Array of OR, or Nand Gates)
The ROM has a size which is equal to 2#inputs * Outputs. In the above example, I have a 4*8 ROM.
The ROM has a truth table which indicates the 0 and 1 bits values for each row and column.
ROM Applications:
There are two major ROM Applications:
In Computers (forming the BIOS, Basic Input Output System, primary role for booting), or computer
based applications (modern cameras, modern cars, planes…)
They can be used to generate mathematical functions (CO, squarer, comparator…)
Standard ROM
ROM chips are fundamentally different from RAM chips. While RAM uses transistors (or FFs) to turn on or off
access to a capacitor to set or reset a bit value, ROM uses a diode to connect the lines if the value is 1. If the
value is 0, then the lines are not connected at all.
A diode normally allows current to flow in only one direction and has a certain threshold, known as the
forward breakover, that determines how much current is required before the diode will pass it on which is
approximately 0.6 volts. If the diode is present, it is a 1, otherwise it is a 0.
As you can see, the way a ROM chip works necessitates the programming of perfect and complete data when
the chip is created. You cannot reprogram or rewrite a standard ROM chip. If it is incorrect, or the data needs
to be updated, you have to throw it away and start over. Creating the original template for a ROM chip is
often a laborious process full of trial and error. But the benefits of ROM chips outweigh the drawbacks. Once
the template is completed, the actual chips can cost as little as a few cents each. They use very little power,
are extremely reliable and, in the case of most small electronic devices, contain all the necessary programming
to control the device.
PROM
Creating ROM chips totally from scratch is time-consuming and very expensive in small quantities. For this
reason, mainly, developers created a type of ROM known as programmable read-only memory (PROM). Blank
PROM chips can be bought inexpensively and coded by anyone with a special tool called a programmer.
PROM chips (see above) have a grid of columns and rows just as ordinary ROMs do. The difference is that
every intersection of a column and row in a PROM chip has a fuse connecting them. A charge sent through a
column will pass through the fuse in a cell to a grounded row indicating a value of 1. Since all the cells have a
fuse, the initial (blank) state of a PROM chip is all 1s. To change the value of a cell to 0, you use a programmer
to send a specific amount of current to the cell. The higher voltage breaks the connection between the column
and row by burning out the fuse. This process is known as burning the PROM.
PROMs can only be programmed once. They are more fragile than ROMs. Some static electricity can easily
cause fuses in the PROM to burn out, changing essential bits from 1 to 0. But blank PROMs are inexpensive
and are great for prototyping the data for a ROM before committing to the costly ROM fabrication process.
EPROM
Working with ROMs and PROMs can be a wasteful business. Even though they are inexpensive per chip, the
cost can add up over time. Erasable programmable read-only memory (EPROM) addresses this issue. EPROM
chips can be rewritten many times. Erasing an EPROM requires a special tool that emits a certain frequency
of ultraviolet (UV) light. EPROMs are configured using an EPROM programmer that provides voltage at
specified levels depending on the type of EPROM used. To reprogram an EPROM, I have to remove it from
where it is, place it on the EPROM programmer, erase it all, and then rewrite it.
EEPROM
Electrically erasable programmable read-only memory (EEPROM) is based on a similar semiconductor
structure to EPROM, but allows its entire contents (or selected banks, advantage on EPROM) to be electrically
erased, then rewritten electrically, so that they need not be removed from the computer (or camera, MP3
player, etc.).
Electrically alterable read-only memory (EAROM) is a type of EEPROM that can be modified one bit at a time.
Writing is a very slow process and again needs higher voltage (usually around 12 V) than is used for read
access. EAROMs are intended for applications that require infrequent and only partial rewriting.
Flash memory (or simply flash) is a modern type of EEPROM invented in 1984. Flash memory can be erased
and rewritten faster than ordinary EEPROM, and newer designs feature very high speed. Modern flash makes
efficient use of silicon chip area, resulting in individual ICs with a capacity as high as 32 GB as of 2007; this
feature, along with its endurance and physical durability, has allowed modern flash to be used frequently for
backups. Flash memory is sometimes called flash ROM or flash EEPROM and now it is called USB flash drives.
DVD
Digital Versatile Disc or Digital Video Disc, a type of optical disk technology similar to the CD-ROM. A DVD
holds a minimum of 4.7GBof data, enough for a full-length movie. DVDs are commonly used as a medium for
digital representation of movies and other multimedia presentations that combine sound with graphics.
The DVD specification supports disks with capacities of from 4.7GB to 17GB and access rates of 600KBps to
1.3 MBps. One of the best features of DVD drives is that they are backward-compatible with CD-ROMs,
meaning they can play old CD-ROMs.
Computer Bus Architecture
The BUS connects all computer parts together. They are three sets of parallel lines
The three buses are the address bus, the data bus, and the control bus.
The data bus consists of 8, 16, or 32, or 64 parallel signal lines. The data bus lines are bidirectional. Many
devices in a system will have their outputs connected to the data bus, but only one device at a time will have
its outputs enabled. Any device connected on the data bus must have three-state outputs (1, 0, or high
resistance, not connected) so that its outputs can be disabled when it is not being used to put data on the bus.
The address bus consists of 16, 20, 24, or 32 parallel signal lines. On these lines the CPU sends out the address
of the memory location that is to be written to or read from. The number of memory locations that the CPU
can address is determined by the number of address lines. If the CPU has N address lines, then it can directly
address 2N memory locations. When the CPU reads data from or writes data to a port, it sends the port
address out on the address bus.
The control bus consists of many parallel signal lines depending on the computer complexity. The CPU sends
out signals on the control bus to enable the outputs of addressed memory devices or port devices. Typical
control bus signals are Memory Read, Memory Write, Keyboard Read, and Printer Write. To read a byte of
data from a memory location, the CPU sends out the memory address of the desired byte on the address bus
and then sends out a Memory Read signal on the control bus. The Memory Read signal enables the addressed
memory device to output a data word onto the data bus. The data word from memory travels along the data
bus to the CPU.
Today all computers utilize two types of buses, an internal or local bus and an external bus. An internal bus
enables a communication between internal components such as a computer video card and memory (e.g. ISA,
EISA, PCI, AGP, etc.) and an external bus is capable of communicating with external components such as a SCSI
bus, USB, etc.
A computer or devices bus speed or throughput is always measured in bits per second or megabytes per
second.
The bus is not only cable connection but also hardware (bus architecture), protocol, software, and bus
controller, including decoders and MUXs. In general a bus system will multiplex k registers of n bits each to
produce an n-line common bus. Number of MUXs needed will be n, equal to the number of bits in each
register. Each bit in the register will pass through a MUX output before going to the bus. The MUX size is k,1.
The decoder uses 3-state buffers to deliver data to the BUS. An n,2n decoder will select one out of n bits of
data to pass to a bus single line. Decoders use a buffer that has an enable input. If enabled, data will pass
otherwise it is disabled.
B0
C0
D0
Select
2, 4 Dec
Enable
The above decoder will ensure that only one data bit ( 1 out of 4) will pass to the bus, if it is disabled, no data
will pass. Other connected circuit to this bus line may let its data to pass. The Enable bit of this decoder is one
example of a control bus line
Internal BUSES
ISA BUS
Introduced by IBM, ISA or Industry Standard Architecture was originally an 8-bit bus that was later expanded
to a 16-bit bus in 1984. It was widely standard for two decades and was also known as an internal and external
bus.
In 1993, Intel and Microsoft introduced a PnP (Plug and Play) ISA bus that allowed the computer to
automatically detect and setup computer ISA peripherals such as a modem or sound card.
Using the PnP technology an end-user would have the capability of connecting a device and not having to
configure the device using jumpers (to open or shortcut two extremities) or dipswitches (two settings).
Many manufacturers are trying to eliminate the usage of the ISA slots; however for backwards compatibility
you may find 1 or 2 ISA slots with additional newer technology slots.
An ISA Computer main board contained 8 slots with two usually used for
1. VGA display controller adapter
2. Multi I/O Port adapter with
- 2 serial ports (mouse…)
- a parallel port (printer)
- floppy disk drive interface (1.44 MB)
- IDE (Integrated Drive Electronics) interface for hard disk drives, and CDROM
- a Game Port (joystick)
The other slots were initially empty; the user may add any additional card like a network card.
External Buses
AGP Bus
Introduced by Intel in 1997, AGP or Advanced Graphic Port is a 32-bit bus designed for the high demands of 3-
D graphics. AGP bus began to show out with Pentium II machines. It is used with PCI.
PCMCIA Bus
The Personal Computer Memory Card Industry Association (PCMCIA) was founded to provide a standard bus
for laptop computers. So it is basically used in the small computers.
SCSI Bus
Small Computer System Interface (SCSI) is a parallel interface standard used by Apple Macintosh computers,
PC's and Unix systems for attaching peripheral devices to a computer.
Registers
Seen in CO.
Registers are storage circuits composed of generally of D FFs, they are part of the CPU. They are used to store
information for different operations and computations. When talking about a CPU we specify its registers,
their names, numbers, jobs, number of bits… More to say later on.
Control Unit
The Control Unit is the circuitry that controls the flow of data through the processor, and coordinates the
activities of the other units within it. In a way, it is the "brain within the brain", as it controls what happens
inside the processor, which in turn controls the rest of the computer. The Control Unit receives external
instructions or commands which it converts into a sequence of control signals that it applies to the bus to
implement a sequence of register-transfer level operations. It supervises the transfer of information among
the registers and Memory, and instructs the ALU which operation to perform.
The Control Unit (CU) is generally a sizable collection of complex digital circuitry interconnecting and
controlling the many execution units contained within a CPU. The CU accepts an instruction stored in Memory,
decode it into individual sequential steps (fetching addresses/data from registers/memory, managing
execution [i.e. data sent to the ALU or I/O], and storing the resulting data back into registers/memory). It
controls and coordinates the CPU’s interworks. These detailed steps from the CU dictate which of the
numerous CPU’s interconnecting hardware control signals to enable/disable or which CPU units are
selected/de-selected and the unit’s proper order of execution as required by the instruction’s operation. CU
will make use of the Control Bus there. It manages the translation of instructions (but not the data containing
portion) into several micro-instructions at the machine level. There exist two different CU structures,
hardwired and microprogrammed ones.
The Hardwired are implemented through use of sequential logic units, featuring a finite number of gates that
can generate specific results based on the instructions that are used. Hardwired control units are generally
faster than microprogrammed ones, but has become less popular as computers have evolved. (Ones has been
seen in CO). The Microprogrammed control units use Microprograms or Control Words that are a sequence of
micro instructions stored in special control memory.
Hardwired Control Unit: (Seen in CO Course)
Explanation:
Sel A, first Operand Register, Sel B, Second Operand Register, Sel D, Result Register
More Operations can be added. Up to 32 different ALSU mics for this structure. Imagine a larger structure.
R7 R1 it is: 001 000 111 00000 (000 no second Operand register, 00000 for TSF)
R4 SHL R4 it is: 100 000 100 11000 (D and A are the same, no B)
R6 0 it is: 110 110 110 01100 (Xoring with itself gives 0, this is one way to clear a register)
R1 R5 – 1 it is: 101 000 001 00110 (Decrement a register and store result in another one)
It is apparent from these examples that many other mics can be generated in the CPU. The most efficient way
to generate control words like the above ALSU mics is to store them in a memory unit referred as a control
memory. By reading consecutive control words from memory, it is possible to initiate the desired sequence of
mics for the CPU.
The Stack
A useful feature that is included in most CPU is a stack or a last in, first out (LIFO) list. It is a storage device that
stores information where the last inserted item (pushed) will be the first removed or retrieved one (popped).
Think of a stack of plates. In computers, it is a reserved memory unit where a special register, the stack pointer
(SP) holds the address of the stack, the last item pushed: the top of the stack. When pushing and popping, the
SP is decremented (with push) or incremented (with pop). The stack is initially empty (empty flag = 1, no
popping is possible), and it has a max size to avoid overrun the memory. If max size is reached (full flag = 1), no
more pushing. A major application of stacks in computers is to save register values upon subroutine call.
Stack Idea:
When converting from Infix to PostFix, you have to add the operand variable, then you add the operator when
you are evaluating two values, either directly or a current result with a previous one.
6
Stack content
11 15 15 90
3 3 33 33 33 33 123
3 11 * 15 6 * + Value being processed
5- Consider F5 = A B C D E + * – /
D+E then C*(D+E) then B – C*(D+E) then A / (B – C*(D+E))
6- Consider F6 = A B C * / D – E F / +
B*C then A / (B*C) then A / (B*C) – D then E / F then A / (B*C) – D + E / F
Instruction Formats:
A computer will usually have a variety of instruction code formats, which are depicted in a rectangular box
symbolizing the bits of the instruction. These bits are divided into groups called fields.
In CO, we have seen an instruction format:
16 bits: 1 indirect bit, 3 bits opcode, 12 bits Address or type.
All operations were carried in AC. For example ADD X: AC AC + M[X]
This Format is known as a Single Accumulator Organization Instruction Format (Small Computer)
In a more general Register Organization, all registers have same importance, meaning much more hardware
must be added.
ADD R1, R2, R3 (R1 R2 + R3)
ADD R1, R2 (R1 R1 + R2)
ADD R1, X (R1 R1 + M[X])
ADD X, R1, R2 (M[X] R1 + R2)
If a CPU implements stack operations (most of the cases, it is called stack organized CPU) , we have two
general instructions Push X, and POP.
Push X will push the memory word at address X into the stack.
POP will retrieve the top value in the stack and store it in an assigned register.
The SP will be updated automatically.
Saying ADD in a stack organized CPU means: pop 2 values from stack, add them then push answer back.
So we have generally three instruction Formats CPUs. Most of the CPUs combine the three features like Intel
8080, the father of Pentium, which has 7 CPU Registers, one of them is the AC, also it has a stack pointer
register and an assigned memory stack.
When writing Assembly instructions, we must take into account the number of registers or addresses an
instruction can have. Sometimes one or two or all of them can be available.
Evaluation Example:
Compute X (A+B) * (C+D) using 0, 1, 2, 3 address instructions.
A, B, C, D are memory addresses, meaning A is M[A]: meaning we want to Add M[A] and M[B] then multiply
by M*C++M*D+ and store result in M*X+. The point is that I don’t work with direct address but content.
Use the following Assembly Symbols: ADD, SUB, MUL, DIV, MOV, LOAD, and STORE
3 Address Instructions:
ADD R1, A, B R1 M[A] + M[B]
ADD R2, C, D
MUL X, R1, R2 M[X] R1 * R2
2 Address Instructions:
MOV R1, A
ADD R1, B The point is that I cannot add directly 2 memory locations to a register
MOV R2, C
ADD R2, D
MUL R1, R2
MOV X, R1
For short computations, this one is the most efficient in terms of hardware and number of instructions. For
more complex computations, 2 Address Instructions is better.
PUSH A
PUSH B
ADD M[A] are M[B] and result pushed automatically to the stack
PUSH C
PUSH D
ADD M[C] are M[D] and result pushed automatically to the stack
MUL We have two addition results in the stack, we multiply them, then we push
POP X Get the final result
Addressing Modes
The way the operands are chosen while executing an instruction depends on the addressing mode.
It specifies a rule for interpreting the address field of the instruction before the operand is actually referenced.
It gives program versatility to the user by providing such facilities as pointers to memory, counters for loops,
array indexing… The good assembly programmer will choose between different Addressing modes to achieve
best performance. Of course most of the times a specific assembly program task can be achieved in different
ways using different addressing modes.
In CO, we saw this Instruction Format:
PC holds the Address of the instruction (step 1). The decoding done in step 2 determines the operation to be
performed, the addressing mode of the instruction and the location of the operands.
The addressing mode may be specified with a distinct binary code.
The opcode, operation code, specifies the operation to perform (ADD, MOVE…)
The mode tells how to locate the operands needed for operation.
The Address will give an address if it is MRI, or a register number or the operand itself if immediate.
Mode Kinds:
Mode 0: zero address instruction, we are using a stack; the address will be that of the address value being
popped or pushed. If not push or pop, this address field is ignored.
Mode 1: we are using an AC, operation like what we saw in CO. This mode will accept direct and indirect
addressing, also Register and I/O Kinds. But working through AC.
Immediate Mode: the operand here is specified in the instruction itself, and is located in the address field.
Register Mode: The operand is in a register that is within the CPU. A particular register is selected from the
address field in the instruction.
Register Indirect Mode: the instruction specifies a register whose content gives the address of the operand in
memory.
Autoincerment / Autodecrement Mode: like register indirect but the register value is either incremented or
decremented directly after execution of the instruction to access contiguous memory locations. This is useful
for table of data: arrays.
Direct Access Mode (seen in CO): the address of the operand is equal to the address part of the instruction;
the difference with mode 1 is that the operand will come to any data register not AC.
Indirect Access Mode (seen in CO): the address of the operand is equal to the address of the address part of
the instruction; the difference with mode 1 is that the operand will come to any data register not AC.
Relative Address Mode: In this mode, the PC is added to the address part of the instruction. Useful for huge
amount of data to process, also useful in subroutine.
Indexed addressing mode: we have an index register (XR) which value is added to the address part of the
instruction to find the address of the operand.
Effective Address: after seeing all these modes, we define effective address to be the final address of the
operand.
Example for all addressing modes: (Numbers in Hex)
Suppose PC = 200, R1 = 400, XR = 100, M[200] = Load AC + mode, M[201] = B00, M[202] next instruction
Immediate Mode: AC will get what is directly after, 201 is the effective address, B00 is the operand
Relative Address Mode: effective address is: B00 + 202 = D02, AC M[D02]
202 and not 201 because PC is incremented two times to point to next instruction
Indexed Addressing Mode: effective address: B00 + XR = B00 + 100 = C00, AC M[C00]
Number Representations:
1- Whole Numbers: Integers (Review from CO)
For n bits Register
Unsigned goes from 0 2n – 1
Signed goes from –2n-1 to 2n-1 – 1
Examples:
Unsigned Short int 16 bits: 0 216 – 1, 0 to 65535 (largest unsigned short)
Increase 1, you go back to 0
Unsigned Int 32 bits: 0 232 – 1, 0 to 4,294,967,295 (largest unsigned int)
In signed, we have a sign bit which is the most significant bit (MSB) in the register.
If 0, stored number is positive. If 1, stored number is negative and already in two’s complement.
By default, a number is signed.
Examples:
Short int 16 bits: –215 215 – 1, –32768 to 32767
Negative has one more element due to the Zero counted with positives.
Int (32 bits) from –231 to 231 – 1 (See Figure for examples and overflows)
Long (64 bits) (in C++ long long) from –263 to 263 – 1
Program Control
Instructions are stored in successive memory locations. A program control instruction may change the PC, thus
we will have a break in the sequence of instruction execution. This is a capability for branching to different
program segments.
Description Mnemonic Comments
Branch BR or BRA BR and JMP are the same but generally are
Jump JMP used with different addressing modes
Skip SKP
Call CALL Used with Subroutines (in CO: BSA)
Return RET
Compare CMP Compare 2 values, result affect status bits
Test TST Test a single bit (status register)
Status bits: we have four standard status bits which after an arithmetic operation, they may change
Carry bit
Sign bit These form the status register, we may have more bits
Zero bit
Overflow bit
You can see above an 8 bits ALU with the 4 bits status register. The bits are set or cleared as a result of an ALU
operation.
C8 is the end Carry of the operation.
F7 is the MSB of the result, 1 negative, 0 positive
Check for zero output above block is an 8 inputs NOR Gate, if all zeros, output is 1: Z1
The Overflow V bit is set to 1 if the last two carries are different, meaning I have a change of sign.
The status bits can be checked after an ALU Operation to determine certain relationships that exist between
the values of A and B.
The table on next page lists the most common branch instructions. If the stated condition is true, program
control is transferred to the address specified by the instruction. If not, control continues with the instruction
that follows.
These conditional instructions can be associated also with the JMP, SKP, CALL, and RET of program control
instructions. Imagine how many instructions we may have, just only during comparison.
Numerical Example on 8 bits:
Let A = 11110000, B = 00010100
1- For Unsigned: A is 240, B is 20
A – B will give 1 11011100, C=1, S=1, V=0, Z=0
The compare instruction CMP updates the status bits as shown above.
The instructions that will cause a branch after this comparison are BHI, BHE, BNE.
It should be noted that recursive subroutine will work correctly only through a stack, you just continue
pushing returning addresses.
Program Interrupt
It refers to the transfer of program control from a currently running program to another service program as a
result of an external or internal generated request. Control returns to the original program after the service
program is executed.
The interrupt procedure is similar to a subroutine call except for some variations:
It is initiated by internal or external signals like I/O rather than an instruction.
Address of interrupt Service Program is determined by hardware, not address field of the instruction.
Interrupt procedure usually stores all the information in Registers rather than only the PC.
The state of the CPU when the interrupt happens is determined from all the registers including PC and certain
status bits forming Program Status Word (PSW), which is stored in a specific register and has several functions:
It includes status bits from last ALU Operation
It specifies what interrupts are allowed to occur.
It specifies if CPU is running in user or supervisor mode.
In Supervisor or Kernel mode, the CPU can run advanced instructions or privileged instructions in relation with
the Operating System (OS), these are not accessible in the normal or user mode. Switching between the two
modes is done through an interrupt.
The major advantage of having the supervisor mode is to protect the OS. Without this mechanism there
would be no "security" in an OS, as the most obscure piece of code could simply access OS (Kernel) memory
for viewing, deleting or changing.
The CPU does not respond to an interrupt until the end of the instruction. Before fetching a new one, it checks
its PSW for any pending interrupt.
The CPU may disable processing interrupts while in supervisor mode; it may be processing more privileged
instructions. In normal user mode, interrupts are always enabled. (Recall IEN and R in CO)
More than one interrupt may be received simultaneously or an interrupt may interrupt another interrupt. The
CPU has priorities assigned to it and will react according to that.
The last instruction in an interrupt service program is a return (like subroutine), where the stack is popped for
the old Register values and PC, and the original state of CPU before interrupt is restored.
Type of Interrupts:
1. External: it comes from I/O, transfer of data, networking. Even power failure is detected few
milliseconds so that CPU has some time to save critical data, or stop any happening save or update to
avoid memory corruption or Hard Disk Damage.
2. Internal: it comes from illegal or erroneous use of an instruction or data. Also called traps, internal
interrupts are caused by register overflow, divide by zero, stack overflow… The trap determine the
corrective measure to be taken. Note we are not talking about what is happening with a student doing
this (like in C++), but rather during a running of a large program. (Office, AutoCAD, Game…)
3. Software: It is initiated by an instruction, like switching to and from user and supervisor mode. Many
assemblers offer a special instruction to interrupt the CPU and perform I/O (trap #15 in 68K)
Multi-core Microprocessor
A multi-core processor is a single computing component with two or more independent actual CPUs (called
"cores"), the multiple cores can run multiple instructions at the same time, increasing overall efficiency.
Processors were originally developed with only one core. Multi-core processors were developed in the early
2000s by Intel, AMD and ARM. Multicore processors may have two cores (dual-core CPUs, for example AMD
Phenom II X2 and Intel Core Duo), four cores (quad-core CPUs, for example AMD Phenom II X4,
Intel's i5 and i7 processors), six cores (hexa-core CPUs, for example AMD Phenom II X6 and Intel Core i7
Extreme Edition 980X), eight cores (octo-core CPUs, for example Intel Xeon E7-2820 and AMD FX-8350), ten
cores (for example, Intel Xeon E7-2850), or more.
A multi-core processor implements multiprocessing in a single physical package. Multi-core processors are
widely used across many application domains including general-purpose, embedded, network, digital signal
processing (DSP), and graphics.
The improvement in performance gained by the use of a multi-core processor depends very much on the
software algorithms used and their implementation. In particular, possible gains are limited by the fraction of
the software that can be run in parallel. . This is called parallel computing. For that, a high level language must
now support that. An example is multithreading, available in new C++ and Java.
If there was a single semiconductor chip maker the average consumer is aware of it would likely be Intel. It is
the premier chip maker for personal computers—companies such as Apple, Dell, HP, Samsung, and Sony have
product lines that depend on the processors that Intel produces. Intel's processors generally offer the best
performance for all-around usage. This has been especially the case the last several years with the
introduction and evolution of Intel's Core series product line. Currently, Intel's flagship consumer product line
consists of mobile and desktop-grade Core i3, Core i5 and Core i7 processors now in their second generation
(dubbed "Sandy Bridge"). The third and latest generation of these processors (dubbed "Ivy Bridge") began to
roll out for release late April 2012.
ii.
Though not considered the behemoth in the personal computing space as Intel, AMD (Advanced Micro
Devices) is a decisive runner-up—and arguably the only true competitor Intel has in this domain. After
spending much of the early to middle 2000's as being the performance and value leader with their Athlon 64
line of personal computing processors, AMD—unable to mimic this success in more recent years, has shifted
their focus towards both enthusiast and budget-oriented system configurations. As a result, AMD is
considered to be a viable alternative to Intel. Their current offerings are flanked by the Phenom series
processors and Fusion APU processors. The Fusion APU (AMD A-Series) is a relatively new platform (as of 2011
and ongoing) that attempts to merge high-end graphical capabilities on the same chip as the processor. This
means if your work or play requires a powerful graphics card, then AMD can potentially offer a cost effective
alternative.
iii.
The increased need for mobile productivity and entertainment has given rise to a relatively new class of
devices: smartphones and tablets. ARM (Advanced RISC Machines ) is well-known for the design of mobile,
power-efficient processor designs. In recent years it has seen its technology used in the products of many
prominent electronics companies. Apple's A4/A5/A5X, Nvidia's Tegra, Samsung's Exynos and Texas
Instruments' OMAP products all integrate ARM processors into what is known as a system-on-a-chip (SoC).
SoCs merge many of the essential components of a computer (such as the CPU, RAM, ROM etc.) on a single
chip which allows devices that utilize them to be lightweight and compact. These SoCs have gone on to be
implemented in blockbuster products such as Apple's iPhone and iPad or Samsung's series of Galaxy phones.
ARM's presence as the CPU and architecture of choice on many mobile devices cannot be understated as
estimates put their numbers in the billions.