CCE 131 Lecture1
CCE 131 Lecture1
CCE131
Lecture 1
Introduction-1
Servers
Are the modern form of what were once much larger computers and are usually
accessed only via a network. Servers are oriented to carrying sizable workloads,
which may consist of either single complex applications—usually a scientific or
engineering application—or handling many small jobs, such as would occur in
building a large web server.
Supercomputers
consist of tens of thousands of processors and many terabytes of memory, and cost
tens to hundreds of millions of dollars. Supercomputers are usually used for high-
end scientific and engineering calculations, such as weather forecasting, oil
exploration, protein structure determination, and other large-scale problems.
Embedded computers
The largest class of computers and span the widest range of applications and
performance. Embedded computers include the microprocessors found in your car,
the computers in a television set, and the networks of processors that control a
modern airplane or cargo ship. Embedded computing systems are designed to run
one application or one set of related applications that are normally integrated with
the hardware and delivered as a single system
Application software
A typical application, such as a word processor or a large database system, may
consist of millions of lines of code and rely on sophisticated software libraries that
implement complex functions in support of the application.
systems software
Sitting between the hardware and the application software.
There are many types of systems software, but two types of systems software are central to every computer system today: an
operating system and a compiler.
An operating system interfaces between a user’s program and the hardware and provides a variety of services and supervisory
functions. Among the most important functions are:
❑ Handling basic input and output operations
❑ Allocating storage and memory
❑ Providing for protected sharing of the computer among multiple
applications using it simultaneously
Compilers
Perform another vital function: the translation of a program
written in a high-level language, such as C, C++, Java, or
Visual Basic into instructions that the hardware can execute.
Given the sophistication of modern programming languages
and the simplicity of the instructions executed by the
hardware, the translation from a high-level language
program to hardware instructions is complex.
The embedded system build process is usually done on the host PC using cross-compilation tools. Because target
hardware does not have enough resources to run tools that are used to generate a binary image for target embedded
hardware. The process of compiling code on one system ( host system) and generated source code runs on the other
system is known as cross-compilation.
The five classic components
The five classic components of a computer are input, output, memory, datapath, and control, with the last two sometimes
combined and called the processor. This organization is independent of hardware technology: you can place every piece of
every computer, past and present, into one of these five categories.
Instruction set architecture Also called architecture. An abstract interface between the hardware and the lowest-level
software that encompasses all the information necessary to write a machine language program that will run correctly,
including instructions, registers, memory access, I/O, and so on.
Source: https://geteducationskills.com/computer-architecture/
Technologies for Building Processors and Memory
The integrated circuit (IC) combined dozens to hundreds of transistors into a single chip.
If you were running a program on two different desktop computers, you’d say that the faster one is the desktop computer
that gets the job done first.
As an individual computer user, you are interested in reducing response time (the time between the start and completion of a
task) also referred to as execution time
Datacenter managers often care about increasing throughput or bandwidth (the total amount of work done in a given
time).
In most cases, we will need different performance metrics as well as different sets of applications to benchmark personal
mobile devices, which are more focused on response time, versus servers, which are more focused on throughput.
A task
To maximize performance, we want to minimize response time or execution time for some task. Thus, we can relate
performance and execution time for a computer X:
This means that for two computers X and Y, if the performance of X is greater than the performance of Y, we have
That is, the execution time on Y is longer than that on X, if X is faster than Y.
We will use the phrase “X is n times faster than Y” or equivalently “X is n times as fast as Y”
If X is n times as fast as Y, then the execution time on Y is n times as long as it is on X:
Example
If computer A runs a program in 10 seconds and computer B runs the same program in 15 seconds, how much faster is A
than B?
system CPU time The CPU time spent in the operating system performing tasks on behalf of the program.
A simple formula relates the most basic metrics (clock cycles and clock cycle time) to CPU time:
CPU execution time for a program = CPU clock cycles for a program x Clock cycle time
Alternatively, because clock rate and clock cycle time are inverses,
This formula makes it clear that the hardware designer can improve performance by reducing the number of clock cycles
required for a program or the length of the clock cycle.
Example
Our favorite program runs in 10 seconds on computer A, which has a 2 GHz clock. We are trying to help a computer designer
build a computer, B, which will run this program in 6 seconds. The designer has determined that a substantial increase in the
clock rate is possible, but this increase will affect the rest of the CPU design, causing computer B to require 1.2 times as
many clock cycles as computer A for this program. What clock rate should we tell the designer to target?
One way to think about execution time is that it equals the number of instructions executed multiplied by the average time
per instruction. Therefore, the number of clock cycles required for a program can be written as
The term clock cycles per instruction, which is the average number of clock cycles each instruction takes to
execute, is often abbreviated as CPI. Since different instructions may take different amounts of time depending on
what they do, CPI is an average of all the instructions executed in the program. CPI provides one way of comparing
two different implementations of the identical instruction set architecture, since the number of instructions executed
for a program will, of course, be the same.
Example
Suppose we have two implementations of the same instruction set architecture. Computer A has a clock cycle time of 250
ps and a CPI of 2.0 for some program, and computer B has a clock cycle time of 500 ps and a CPI of 1.2 for the same
program. Which computer is faster for this program and by how much?
We know that each computer executes the same number of instructions for the program; let’s call this number I. First, find
the number of processor clock cycles for each computer:
The Classic CPU Performance Equation
We can now write this basic performance equation in terms of instruction count (the number of instructions executed by the
program), CPI, and clock cycle time:
Which code sequence executes the most instructions? Which will be faster? What is the CPI for each sequence?
Both clock rate and power increased rapidly for decades and then flattened off recently. The reason they grew together is
that they are correlated, and the reason for their recent slowing is that we have run into the practical power limit for
cooling commodity microprocessors.
For CMOS, the primary source of energy consumption is so-called dynamic energy—that is, energy that is consumed
when transistors switch states from 0 to 1 and vice versa. The dynamic energy depends on the capacitive loading of each
transistor and the voltage applied:
Frequency switched is a function of the clock rate. The capacitive load per transistor is a function of both the
number of transistors connected to an output (called the fanout) and the technology, which determines the
capacitance of both wires and transistors.
Multiprocessors
To reduce confusion between the words processor and microprocessor, companies refer to processors as “cores,” and such
microprocessors are generically called multicore microprocessors. Hence, a “quadcore” microprocessor is a chip that
contains four processors or four cores.
Today, for programmers to get significant improvement in response time, they need to rewrite their programs to take
advantage of multiple processors. Moreover, to get the historic benefit of running faster on new microprocessors,
programmers will have to continue to improve the performance of their code as the number of cores increases.
Amdahl’s Law
A rule stating that the performance enhancement possible with a given improvement is limited by the amount that the
improved feature is used.
A simple design problem illustrates it well. Suppose a program runs in 100 seconds on a computer, with multiply
operations responsible for 80 seconds of this time. How much do I have to improve the speed of multiplication if I want
my program to run five times faster?