Tute Answers
Tute Answers
a) Computer architecture refers to those attributes of a system visible to a programmer or, put another way,
those attributes that have a direct impact on the logical execution of a program. Computer organization
refers to the operational uni5 and their interconnections that realize the architectural specifications.
Examples of architectural attributes include the instruction set, the number of bits used to represent
various data types (e.g., numbers, characters), I/0 mechanisms, and techniques for addressing memory.
Organizational attributes include those hardware del2iils transparent to the programmer, such as control
signals; interfaces between the computer and peripherals; and the memory technology used.
b) Computer structure refers to the way in which the component of a computer are interrelated. Computer
function refers to the operation of each individual component as part of the structure.
2) X
3)
4) To read a value from memory, the CPU puts the address of the value it wants into the MAR. The CPU then
asserts the Read control line to memory and places the address on the address bus. Memory places the
contents of the memory location passed on the data bus. This data is then transferred to the MBR. To write a
value to memory, the CPU puts the address of the value it wants to write into the MAR. The CPU also places
the data it wants to write into the MBR. The CPU then asserts the Write control line to memory and places the
address on the address bus and the data on the data bus. Memory transfers the data on the data bus into the
corresponding memory location.
5)
This program will store the absolute value of content at memory location 0FA into memory location 0FB.
6) The whole point of the clock is to define event times on the bus; therefore, we wish for a bus arbitration
operation to be made each clock cycle. This requires that the priority signal propagate the length of the daisy
chain (Figure 3.26) in one clock period. Thus, the maximum number of masters is determined by dividing the
amount of time it takes a bus master to pass through the bus priority by the clock period.
7) As noted in the answer to Problem 2.7, even though the Intel machine may have a faster clock speed (2.4 GHz
vs. 1.2 GHz), that does not necessarily mean the system will perform faster. Different systems are not
comparable on clock speed. Other factors such as the system components (memory, buses, architecture) and
the instruction sets must also be taken into account. A more accurate measure is to run both systems on a
benchmark. Benchmark programs exist for certain tasks, such as running office applications, performing
floating-point operations, graphics operations, and so on. The systems can be compared to each other on how
long they take to complete these tasks. According to Apple Computer, the G4 is comparable or better than a
higher-clock speed Pentium on many benchmarks.
8)
a) Assuming the same instruction mix means that the additional instructions for each task should be allocated
proportionally among the instruction types. So we have the following table:
c) The speedup factor is the ratio of the execution times. Using Equation 2.2, we calculate the execution time
𝐼𝐶
as 𝑇 = 𝑀𝐼𝑃𝑆× 106
2 × 106
For the single-processor case, 𝑇1 = 178 × 106
= 11𝑚𝑠. With 8 processors, each processor executes 1/8 of
the 2 million instructions plus the 25,000 overhead instructions.
For this case, the execution time for each of the 8 processors is
2 × 106
8 + 0.025 × 106
𝑇8 = = 1.8𝑚𝑠
125 × 106
Therefore we have
𝑡𝑖𝑚𝑒 𝑡𝑜 𝑒𝑥𝑒𝑐𝑢𝑡𝑒 𝑝𝑟𝑜𝑔𝑟𝑎𝑚 𝑜𝑛 𝑎 𝑠𝑖𝑛𝑔𝑙𝑒 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑜𝑟 11
𝑆𝑝𝑒𝑒𝑑𝑢𝑝 = 𝑡𝑖𝑚𝑒 𝑡𝑜 𝑒𝑥𝑒𝑐𝑢𝑡𝑒 𝑝𝑟𝑜𝑔𝑟𝑎𝑚 𝑜𝑛 𝑁 𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑜𝑟𝑠
= 1.8
= 6.11
d) The answer to this question depends on how we interpret Amdahl's' law. There are two inefficiencies in
the parallel system. First, there are additional instructions added to coordinate between threads. Second,
there is contention for memory access. The way that the problem is stated, none of the code is inherently
serial. All of it is parallelizable, but with scheduling overhead. One could argue that the memory access
conflict means that to some extent memory reference instructions are not parallelizable. But based on the
information given, it is not clear how to quantify this effect in Amdahl's equation. If we assume that the
fraction of code that is parallelizable is f = 1, then Amdahl's law reduces to Speedup = N = 8 for this case.
Thus, the actual speedup is only about 75% of the theoretical speedup.
9)
a)
b) Although machine B has a higher MIPS than machine A, it requires a longer CPU time to execute the same
set of benchmark programs.
10)
a) Speedup = (time to access in main memory) / (time to access in cache) = T2/T1.
b) The average access time can be computed as T = H x T1 + (1 – H) x T2
Using Equation (2.8):
𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝑏𝑒𝑓𝑜𝑟𝑒 𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑚𝑒𝑛𝑡 𝑇2 𝑇2 1
𝑆𝑝𝑒𝑒𝑑𝑢𝑝 = = = = 𝑇
𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝑎𝑓𝑡𝑒𝑟 𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑚𝑒𝑛𝑡 𝑇 𝐻 × 𝑇1 +(1−𝐻)𝑇2 (1−𝐻)+𝐻 1
𝑇2
c) T = H × 𝑇1 + (1 – H) × (𝑇1 + 𝑇2 ) = 𝑇1 + (1 – H) × 𝑇2
𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝑏𝑒𝑓𝑜𝑟𝑒 𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑚𝑒𝑛𝑡 𝑇2 𝑇2 1
𝑆𝑝𝑒𝑒𝑑𝑢𝑝 = = = = 𝑇
𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝑎𝑓𝑡𝑒𝑟 𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑚𝑒𝑛𝑡 𝑇 𝑇1 +(1−𝐻)𝑇2 (1−𝐻) + 1
𝑇2
11)
a) 224 = 16𝑀𝐵𝑦𝑡𝑒𝑠
b) (i) If the local address bus is 32 bits, the whole address can be transferred at once and decoded in memory.
However, because the data bus is only 16 bits, it will require 2 cycles to fetch a 32-bit instruction or
operand.
(ii) The 16 bits of the address placed on the address bus can't access the whole memory. Thus a more
complex memory interface control is needed to latch the first part of the address and then the second part
(because the microprocessor will end in two steps). For a 32-bit address, one may assume the first half will
decode to access a "row" in memory, while the second half is sent later to access a "column" in memory. In
addition to the two-step address operation, the microprocessor will need 2 cycles to fetch the 32 bit
instruction/operand.
c) The program counter must be at least 24 bits. Typically, a 32-bit microprocessor will have a 32-bit external
address bus and a 32-bit program counter, unless on- chip segment registers are used that may work with
a smaller program counter. If the instruction register is to contain the whole instruction, it will have to be
32-bits long; if it will contain only the op code (called the op code register) then it will have to be 8 bits
long.
12) In cases (a) and (b), the microprocessor will be able to access 216 = 64K bytes; the only difference is that with
an 8-bit memory each access will transfer a byte, while with a 16-bit memory an access may transfer a byte or
a 16-byte word.
For case (c), separate input and output instructions are needed, whose execution will generate separate "I/O
signals" (different from the "memory signals" generated with the execution of memory-type instructions); at a
minimum, one additional output pin will be required to carry this new signal.
For case (d), it can support 28 = 256 input and 28 = 256 output byte ports and the same number of input and
output 16-bit ports; in either case, the distinction between an input and an output port is defined by the
different signal that the executed input or output instruction generated.
13)
a) During a single bus cycle, the 8-bit microprocessor transfers one byte while the 16-bit microprocessor
transfers two bytes. The 16-bit microprocessor has twice the data transfer rate.
b) Suppose we do 100 transfers of operands and instructions, of which 50 are one byte long and 50 are two
bytes long. The 8-bit microprocessor takes 50 + (2 x 50) = 150 bus cycles for the transfer. The 16-bit
microprocessor requires 50 + 50 = 100 bus cycles. Thus, the data transfer rates differ by a factor of 1.5.
14)
a) With a clocking frequency of 10 MHz, the clock period is 10–9 s = 100 ns. The length of the memory read
cycle is 300 ns.
b) The Read signal begins to fall at 75 ns from the beginning of the third clock cycle (middle of the second half
of T3). Thus, memory must place the data on the bus no later than 55 ns from the beginning of T3.
15)
a) The clock period is 125 ns. Therefore, two clock cycles need to be inserted.
b) From Figure 3.19, the Read signal begins to rise early in T2. To insert two clock cycles, the Ready line can
be put in low at the beginning of T2 and kept low for 250 ns.
16)
a) The clock period is 125 ns. One bus read cycle takes 500 ns = 0.5 µs. If the bus cycles repeat one after
another, we can achieve a data transfer rate of 2 MB/s.
b) The wait state extends the bus read cycle by 125 ns, for a total duration of 0.625 µs. The corresponding
data transfer rate is 1/0.625 = 1.6 MB/s.
17)
a) The refresh period from row to row must be no greater than 4000/256 = 15.625 µs.
b) An 8-bit counter is needed to count 256 rows (28 = 256).
18)
19) In 1ms, the time devoted to refresh is 64 x 150 ns = 9600 ns.
The fraction of time devoted to memory refresh is (9.6 x 10–6 s)/10–3 s = 0.0096, which is approximately 1%.
20) The cache is divided into 16 sets of 4 lines each. Therefore, 4 bits are needed to identify the set number.
Main memory consists of 4K = 212 blocks. Therefore, the set plus tag lengths must be 12 bits and therefore the
tag length is 8 bits. Each block contains 128 words. Therefore, 7 bits are needed to specify the word.
21)
a) 8 leftmost bits = tag; 5 middle bits = line number; 3 rightmost bits = byte number
b) slot 3; slot 6; slot 3; slot 21
c) Bytes with addresses 0001 1010 0001 1000 through 0001 1010 0001 1111 are stored in the cache
d) 256 bytes
e) Because two items with two different memory addresses can be stored in the same place in the cache. The
tag is used to distinguish between them.
22) Block size = 4 words = 2 doublewords; associativity K = 2; cache size = 4048 words; C = 1024 block frames;
number of sets S = C/K = 512; main memory = 64K x 32 bits = 256 Kbytes = 218 bytes; address = 18 bits.
23)
24) 24/25/26
27)
i) 1. RAID is a set of physical disk drives viewed by the operating system as a single logical drive.
2. Data are distributed across the physical drives of an array.
3. Redundant disk capacity is used to store parity information, which guarantees data recoverability in
case of a disk failure.
ii) 0: Non-redundant
1: Mirrored; every disk has a mirror disk containing the same data.
2: Redundant via Hamming code; an error-correcting code is calculated across corresponding bits on
each data disk, and the bits of the code are stored in the corresponding bit positions on multiple parity
disks.
3: Bit-interleaved parity; similar to level 2 but instead of an error-correcting code, a simple parity bit is
computed for the set of individual bits in the same position on all of the data disks.
4: Block-interleaved parity; a bit-by-bit parity strip is calculated across corresponding strips on each
data disk, and the parity bits are stored in the corresponding strip on the parity disk. 5: Block-
interleaved distributed parity; similar to level 4 but distributes the parity strips across all disks.
6: Block- interleaved dual distributed parity; two different parity calculations are carried out and
stored in separate blocks on different disks.
iii) The disk is divided into strips; these strips may be physical blocks, sectors, or some other unit. The
strips are mapped round robin to consecutive array members. A set of logically consecutive strips that
maps exactly one strip to each array member is referred to as a stripe.
iv) For RAID level 1, redundancy is achieved by having two identical copies of all data. For higher levels,
redundancy is achieved by the use of error-correcting codes.
v) In a parallel access array, all member disks participate in the execution of every I/O request. Typically,
the spindles of the individual drives are synchronized so that each disk head is in the same position on
each disk at any given time. In an independent access array, each member disk operates
independently, so that separate I/O requests can be satisfied in parallel.
vi) For the constant angular velocity (CAV) system, the number of bits per track is constant. At a constant
linear velocity (CLV), the disk rotates more slowly for accesses near the outer edge than for those near
the center. Thus, the capacity of a track and the rotational delay both increase for positions nearer the
outer edge of the disk.
vii) 1. Bits are packed more closely on a DVD. The spacing between loops of a spiral on a CD is 1.6 µm and
the minimum distance between pits along the spiral is 0.834 µm. The DVD uses a laser with shorter
wavelength and achieves a loop spacing of 0.74 µm and a minimum distance between pits of 0.4 µm.
The result of these two improvements is about a seven-fold increase in capacity, to about 4.7 GB.
2. The DVD employs a second layer of pits and lands on top of the first layer A dual-layer DVD has a
semireflective layer on top of the reflective layer, and by adjusting focus, the lasers in DVD drives can
read each layer separately. This technique almost doubles the capacity of the disk, to about 8.5 GB.
The lower reflectivity of the second layer limits its storage capacity so that a full doubling is not
achieved.
3. The DVD-ROM can be two sided whereas data is recorded on only one side of a CD. This brings total
capacity up to 17 GB.
viii) The typical recording technique used in serial tapes is referred to as serpentine recording. In this
technique, when data are being recorded, the first set of bits is recorded along the whole length of the
tape. When the end of the tape is reached, the heads are repositioned to record a new track, and the
tape is again recorded on its whole length, this time in the opposite direction. That process continues,
back and forth, until the tape is full.
28)
1 𝑛
i) 𝑡𝐴 = 𝑡𝑆 + 2𝑟 + 𝑟𝑁
29)
a) Capacity = 8 x 512 x 64 x 1 KB = 256 MB
30)
a) The time consists of the following components: sector read time; track access time; rotational delay; and
sector write time. The time to read or write 1 sector is calculated as follows: A single revolution to read or
write an entire track
takes 60,000/360 = 16.7 ms. Time to read or write a single sector = 16.7/32 =0.52 ms. Track access time = 2
ms, because the head moves between adjacent tracks. The rotational delay is the time required for the
head to line up with sector 1 again. This is 16.7 x (31/32) = 16.2 ms. The head movement time of 2 ms
overlaps the with the 16.2 ms of rotational delay, and so only the rotational delay time is counted. Total
transfer time = 0.52 + 16.2 + 0.52 = 17.24 ms.
b) The time to read or write an entire track is simply the time for a single revolution, which is 16.7 ms.
Between the read and the write there is a head movement time of 2 ms to move from track 8 to track 9.
During this time the head moves past 3 sectors and most of a fourth sector. However, because the entire
track is buffered, sectors can be written back in a different sequence from the read sequence. Thus, the
write can start with sector 5 of track 9. This sector is reached 0.52 x 4 = 2.08 ms after the completion of the
read operation. Thus the total transfer time = 16.7 + 2.08 + 16.7 = 35.48 ms.
31) In the first addressing mode, 28 = 256 ports can be addressed. Typically, this would allow 128 devices to be
addressed. However, an opcode specifies either an input or output operation, so it is possible to reuse the
addresses, so that there are 256 input port addresses and 256 output port addresses. In the second addressing
mode, 216 = 64K port addresses are possible.
32)
a) Each I/O device requires one output (from the point of view of the processor) port for commands and one
input port for status.
b) The first device requires only one port for data, while the second devices requires and input data port and
an output data port. Because each device requires one command and one status port, the total number of
ports is seven.
c) seven.
33)
a) The processor scans the keyboard 10 times per second. In 8 hours, the number of times the keyboard is
scanned is 10 x 60 x 60 x 8 = 288,000.
34) Let us ignore data read/write operations and assume the processor only fetches instructions. Then the
processor needs access to main memory once every microsecond. The DMA module is transferring characters
at a rate of 1200 characters per second, or one every 833 µs. The DMA therefore "steals" every 833rd cycle.
This slows down the processor approximately (1 / 833)x100% =0.12%
35)
a) Telecommunications links can operate continuously, so burst mode cannot be used, as this would tie up
the bus continuously. Cycle-stealing is needed.
b) Because all 4 links have the same data rate, they should be given the same priority.
36) The answers are the same for (a) and (b). Assume that although processor operations
cannot overlap, I/O operations can.
1 Job: TAT = NT Processor utilization = 50%
2 Jobs: TAT = NT Processor utilization = 100%
4 Jobs: TAT = (2N – 1)NT Processor utilization = 100%
37) The number of partitions equals the number of bytes of main memory divided by the number of bytes in each
partition: 224/216 = 28. Eight bits are needed to identify one of the 28 partitions.
38)
39) 38/39
a) A memory location whose initial contents are zero is needed for both X → AC and AC → X. The
program for X → AC, and its effects are shown below. Assume AC initially contains the value a.
b) For addition, we again need a location, M(0), whose initial value is 0. We also need destination
location, M(1). Assume the initial value in M(1) is y.
40)
a) AB + C + D + E +
b) AB + CD + * E +
c) AB * CD * + E +
d) AB - CDE * – F/G/ * H *
41) a) 20 b) 40 c) 60 d) 30 e) 50 f) 70
42)
a) the address field
b) memory location 14
c) the memory location whose address is in memory location 14
d) register 14
e) the memory location whose address is in register 14
43)
The scheme is similar to that for problem 11.16. Divide the 36-bit instruction into 4 fields: A, B, C, D. Field A is
the first 3 bits; field B is the next 15 bits; field C is the next 15 bits, and field D is the last 3 bits. The 7
instructions with three operands use B, C, and D for operands and A for opcode. Let 000 through 110 be
opcodes and 111 be a code indicating that there are less than three operands. The 500 instructions with two
operands are specified with 111 in field A and an opcode in field B, with operands in D and C. The opcodes for
the 50 instructions with no operands can also be accommodated in B.
44)
a)
A total of 3 machine registers are used, but now that the two additions use the same register, we no
longer have the opportunity to interleave the calculations for scheduling purposes.
LD SR1, A
LD SR2, B
LD SR4, C
LD SR5, D
ADD SR3, SR1, SR2
ADD SR6, SR4, SR5
This avoids the pipeline conflicts caused by immediately referencing loaded data. Now we do the register
assignment:
LD MR1, A
LD MR2, B
LD MR3, C
LD MR4, D
ADD MR5, MR1, MR2
ADD MR1, MR3, MR4
Five machine registers are used instead of three, but the scheduling is improve