0% found this document useful (0 votes)
307 views39 pages

ACA Solution Manual

aca

Uploaded by

Mitesh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF or read online on Scribd
0% found this document useful (0 votes)
307 views39 pages

ACA Solution Manual

aca

Uploaded by

Mitesh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF or read online on Scribd
You are on page 1/ 39
Problem 11 45 x 1432x2415 x 24+8X2_ 5 ‘ eye ie ae 1.85 cycles/instruction. cP) 40_x 10S cycles/sec 1.55 eydles/instruction (45000 x 1 + 32000 x 2+ 15000 x 2 + 8000 x 2}cycles 40x 10)eydes/s ‘The execution time can also be obtained by dividing the total number of instructions by the MIPS rate: (45000 + 32000 + 15000 + 8000)instructions ‘25.8 x 10° instructions/s MIPS rate = 10°° x = 25.8MIPS. Execution time = = 3.875 ms. = 3.875 ms, Problem 1.2 Instruction set and compiler technology affect the length of the ex- ecutable code and the memory access frequency. CPU implementation and control determines the clock rate, Memory hierarchy impacts the effective memory access time. ‘These factors together determine the effective CPI, as explained in Section 1.1.4. Problem 1.3 (a) The effective CPI of the processor is calculated as 15 x 10° cyeles/sec sec__is instructi 10% 107 instructions sec ~ V5 “Yeles/instruction CPL = (b) The effective CPI of the new processor is (140.3 x 240.05 x 4) = 1.8 cycles/instruction. L Therefore, the MIPS rate is 30 x 10° cycles/sec 1.8 cycles/instruction ~ 16-7 MIPS. Problem 1.4 (a) Average CPI = 10.6 +2% 0.18 +4 x 0.12 +8 x 0.1 = 2.24 cycles / instruction. (b) MIPS rate = 40/2.24 = 17.86 MIPS. Problem 1.5 (a) False. The fundamental idea of multiprogramming is to overlap the computations of some programs with the I/O operations of other programs. (b) True. In an SIMD machine, all processors execute the same instruction at the same time, Hence it is easy to implement synchroaization in hardware, In an MIMD machine, different processors may execute different instructions at the same time and it is difficult to support synchronization in hardware. (c) True. Interprocessor communication is facilitated by sharing variables on a mul- tiprocessor and by passing messages among nodes of a multicomphter. ‘The mul- ticomputer approach is usually more difficult to program since the programmer taust pay attention to the actual distribution of data among the processors. (4) False. In general, an MIMD machine executes different instruction streams on different processors. (@) True, Contention among processors to access the shared memory may create hot spots, making multiprocessors less scalable than multicomputers. Problem 1.6 The MIPS rates for different machine-program combinations are shown, in the following table: C ‘Machine Programm | Compater A | Computer B | Computer C Program 1 100) 10 5 Program 2 | ___O1 T 3 Program 3 oz or 2 Program 4 i os fa Various means of these values can be used to compare the relative performance of the computers. Definition of the means for a sequence of positive numbers @3,02,...,y are summarized below. (See also the discussion in Section 3.1.2.) (a) Arithmetic mean: AM = (SUL, a:)/n. (b) Geometric mean: GM = (FTL, a:)!/. (c) Harmonic mean: HM = n/[S72.,(1/ai)) In general, AM > GM > HM. (ua) Based on the definitions, the following table of mean MIPS rates is obtained: ‘Computer A | Computer B [ Computer C ‘Arithmetic mean 25.3 281 325 ‘Geometric mean Lis 0.59 2.66 ‘Harmonic mean 0.25 020, 21 Note that the arithmetic mean of MIPS rates is proportional to the inverse of the harmonic mean of the execution times. Likewise, the harmonic mean of the MIPS rates is proportional to the inverse of the arithmetic mean of execution times. The two ‘observations are consistent, with Bq. 1.1. ‘If we use the harmonic mean of MIPS rates as the performance criterion (i.e., each program is executed the same number of times on each computer), computer C has the best performance. On the other hand, if the arithmetic mean of MIPS rates is used, which is equivalent to allotting an equal amount of time for the execution of each program on each computer (i.e., fast-running programs are executed more frequently), then computer A is the best choice. Problem 1.7 » An SIMD computer has a single control unit. The other processors are simple slave processors which accept instructions from the control unit, and perform an identical operation at the same time on different data. Each processor in an MIMD computer has its own control unit, and execution unit. At any moment, a processor can execute an instruction different from the other processors. * Multiprocessors have a shared memory structure. The degree of resource sharing is bigh, and interprocessor communication is carried out via shared variables in the shared memory. In multicomputers, each node typically consists of a pro- cessor and local memory. The nodes are connected by communication channels which provide the mechanism for message interchanges among processors. Re- source sharing is light among processors + In UMA architecture, each memory iocation in the system is equally accessible to all processors, and the access time is uniform. In NUMA architecture, the access time to a memory location depends on the proximity of 2 processor to the memory location. Therefore, the access time is nonuniform. in NORMA architecture, each processor has its own private memory; no memory is shared among processors. Hach processor is allowed to access its private memory only. In COMA architecture, such as that adopted by KSR-1, each processor lias its private cache, which together constitutes the global address space of the system. Itis like a NUMA with cache in place of memory. A page of data can be migrated to a processor upon demand or be replicated on more than one processor Problem 1.8 (a) The total number of cycles needed on a se 4) x 64 = 1664 cycles. (b) Bach PE executes the same instruction on the corresponding elements of the vectors involved. There is no communication among the processors, Hence the total number of cycles on each PE is 44448444244 = 26. quential processor is (44+ 4+8+4424 (¢) The speedup is 64 with a perfectly parallel execution of the code. Problem 1.9 Because the processing power of a CRCW-PRAM and an EREW-PRAM is the sazne, we neod only focus on memory accessing. Below, we prove that the time com. plexity of simulating a concurrent write or a concurrent read on an EREW-PRAM is Ollogn). Before the proof, we assume it is known that an BREW PRAM can sort numbers or write a number to n memory locations in O(log n) time (a) We prescnt the proof for simulating concurrent writes below. 1. Create an auxiliary array A of length n. When CROW processor P,, for i = 0,1,...n ~ 1, desires to write a datum z; to a location |, each corresponding EREW processor P; writes the ordered pair (Ij,.7;) to location Ali]. These writes are exclusive, since each processor writes to a distinct memory location. 2 Sort the array by the first coordinate of the ordered pairs in O(log n) time, hich causes all data written to the same location to be brought together in the output. 3 Buch EREW processor P,, for i = 1,2,...~ 1, now inspects Afi] = (l,.23) and Ali—i] = (i,,24), where j and are values in the range 0 < j.k ) If we change the positions of some switch modules in the Baseline network, it becomes: > P 2» 9 wo ™ » ‘ 2 1" = 2 2 10 ° ° % : ‘ “ ” » S me ™ x > a which is just the Plip network. (c) Since both the Omega network and the Flip network are topologically equivalent to the baseline network, they are topologically equivalent to each other. Problem 2.16 (a) RY (b) Lk /2h (c) 2k"? (a) 2n. fe) oo # A Kary Leube is a ring with k nodes. A beary 2-cube is a 2D k x torus, A mesh is a torus without end-around connections. A Dary n-cube is a binary n-cube. 4m Omega network is the multistage network implementation of shuffie- exchange network, Its switch modules can be repositioned to have the same interconnection topology as a binary n-cube. (4) The conventional torus has long end-around connections, but the folded torus has equal-length connections. (See Figure 2.21 in the text). fg) ‘The relation B= 2QuN/k will be shoum in the solution of Problem 2.18. Therefore, if both the number of nodes IV and wire bisection width B are constants, the channel width W will be proportional to &: w= Bb = Bk/(2N) ‘The latency of a wormhole-routed network is +=, » Twa w which is inversely proportional to w, hence also inversely proportional to i. This means a network with a higher k will have lower latency. For two k-ary n-cube networks with the same number of nodes, the one with a lower dimension has a larger &k, and hence a lower latency. It will be shown in the solution of Problem 2.18 that the hot-spot throughput, is equal to the bandwidth of a single channel: Low-dimensional networks have a larger k, hence @ higher hot-spot through- put Problem 4.1 (a) Processor design space is a coordinated space with the « and y axes representing clock rate and CPI, respectively. Each point in the space corresponds to a de- sign choice of a processor whose performance is determined by the values of the coordinates. (b) The time required between issuing two consecutive instructions. (c} The number of instructions issued per cycle (a) The number of cycles required for the execution of a simple instruction, such as add, move, etc. (e) Two or more instructions attempt to use the same functional unit at the same time. (4) A coprocessor is usually attached to a processor and performs special functions at a fast speed. Examples are floating-point and graphical coprocessor (g) Registers which are not designated for special usage, as opposed to special-purpose registers such as base registers or index registers. (h) Addressing mode specifies how the effective address of an operand is generated so that its actual value can be fetched from the correct memory location (i) In the case of a unified cache, both data and instructions are kept in the same cache. In split caches, data and instructions are held in separate caches, (j) Hardwired control: Control signals for each instruction are generated by proper Gircuitry such as delay elements. Microcoded control: Bach instruction is inmple- mented by a set of microinstructions which are stored in a control memory. The decoding of microinstractions generates appropriate signals to control the execu- tion of an instruction. Problem 4.2 (a) Virtual address space is the memory space required by a process during its execu- tion to accommodate the variables, buffers, etc., used in the computations. (b) Physical address space is the set of addresses assigned to the physically available memory words. (c) Address mapping is the process of translating a virtual address to a physical ad- dress. (d) The entirety of a cache is divided into fixed-size entities called blocks. A block is the unit of data transfer between main memory and cache (¢) Multiple levels of page tables used to translate a virtual page number into a page frame number. In this case, some tables actually store pointers to other tables, similar to indirect addressing mode. The objective is to deal with a large memory space and facilitate protection. (£) Hit ratio at level i of the memory hierarchy is the probability that a data item is found in M. (g) Page fault is the situation in which a demanded page cannot be found in the main memory and has to be brought in from the disk. (h) A hash function maps an element in a large set to an index in a small set. Usually it treats the input element as a number or a sequence of numbers and performs arithmetic operation on it to generate the index. A suitable hash function should map the input set uniformly into the output set. (i) An inverted page table contains entries that record the virtual page number asso- ciated with each page frame that has been allocated. This is contrary to a direct mapping page table. (J) The strategies used to select page or pages resident in the main memory to be replaced in case such needs arise, Problem 4.4 (a) The comparison is tabulated below: Tem CISC RISC. Tnstruction | 16-64 bits fixed (SDbit) format __| per instruction format ‘Addressing 12-24 limited to 3-5 modes (anostly register-based, except load/store) CPI 21, on the average 5 | < 15, very close tol (b) © Advantages of separate caches: 1. Double the bandwidth because two complementary requests are ser- viced at the same time. 2. Simplify logic design as arbitration between instruction and data. ac- cesses to the cache is simplified or eliminated. 3. Access time is reduced because data and instruction can be placed close to the functional units which will access them. For instance, instruction cache can be placed close to the instruction fetch and decode units, ‘© Disadvantages of separate caches: 1. Complicate the problem of consistency because data and instruction may coexist in the same cache block. This is true if self modifying code is allowed or when data and instructions are intermixed and stored in the same cache block. To avoid this would require compiter support to ensure that instruction and data are stored in different cache blocks 2. May lead to inefficient use of cache memory because the working set size of a program varies with time and the fraction devoted to data and instruction also varies. Hence, the sum of data cache size and instruction cache size is usually larger than the size of a unified cache. As a result, the utilization of instruction cache and/or data cache is likely to be lower. For separate caches, dedicated data paths are required for both instruction and data caches, Separate MMUs and TLBs are also desirable for separate caches to shorten the time of address translation. & higher memory band- width should be used for separate caches to support the increased demand. In actual implementation, there is tradeoff between the degree of support provided and the resulting hardware complexity, (ec) * Instruction issue: Scalar RISC processor issues one per cycle; superscalar RISC can usually issue more than one per cycle. © Pipeline architecture: In ani m-issue superscalar processor, up to m pipelines may be active in any base cycle. A scalar processor is equivalent to a superscalar processor with m = 1. * Processor performance: An m-issne superscalar can have a performance m times that of a scalar processor, provided both are driven by the same clock rate, no dependence relation or resource conflicts exist among instructions, (4) Both superscalar and VETW architectures employ multiple functional units to al- low concurreut instruction executions. Superscalar requires more sophisticated hardwate support such as large reorder registers and reservation tables iu order to make efficient use of the system resources. Software support is needed to resolve data dependences and improve efficiency. In VLIW, instructions are compacted by compiler which explicitly packs to- gether instructions which can be executed in concurrency based on heuristics or run-time statistics. Because of the explicit specification of parallelism, the hard- ware and software support at run time is usually simplified. For instance, the decoding logic can be simple. Problem 4.5 Only a single pipeline in scalar CISC or RISC architecture is active at a time, exploiting parallelism at microinstruction level. Operation requirement is simple. In a superscalar RISC, multiple pipelines can be active simultaneously, To do 80 requires extensive hardware and software support to effectively exploit instruction parallelism. In VLIW architecture, multiple pipelines can be active ai the same time. Sophisticated compilers are needed to compact irregular codes inte a long instruction word for concurrent execution. Problem 4.6 (a) i486 is a CISC processor. The following diagram shows the general instruction, format, A few variations also exist for some instructions. (RETTITETT TTA aime armcttnone crane DassT ee code x bys) “nad rn” “eit ats inode Telopaiehd byte be diglcement ea an ad mes ea bges St erent) reper a ass Trdeopecter Data format: Byte (8 bits): 0-255 Word (16 bits): 0-64K DWord (32 bits): 0-4G 8-bit integer (8 bits): 107 16-bit integer (8 bits): 10 32-bit integer (8 bits): 10° 65-bit integer (8 bits): 10'% 8-bit unpacked BCD (1 digit): 0-9 8-bit. packed BCD (2 digits): 0-9 80-bit packed BCD (18 digits): 10%! Single-precision real (24 bits): +10*°* Double-precision real (53 bits): £10*#°8 Extended-precision real (64 bits): +10#498? Byte string, Word string, DWord string, Bit string to support ASCII data eceveececoreere (b) There ated. eeoceee ° are 12 different modes whereby the effective address (BA) can be gener- register mode immediate mode direct mode: EA + displacement rogister indirect or base: BA + (base register) based with displacement: EA + (base register) + displacement index with displacement: EA + (index register) + displacement scaled index with displacement: EA + (index register) x scale + displace- ment based index: EA @ (base register) + (index register) based scaled index: EA + (base register) + (index register) x scale hased index with displacement: EA = (base register) + (index register) + displacement based scaled index with displacement: EA + (base register) + {index reg- ister) x scale + displacement relative: New.PC — PC + displacement. (used in conditional jumps, loops, and call instructions) Problem 4.9 (a) ‘Two situations may cause pipelines to be underutilized: (i) the instruction latency is longer than one base eyele, and (ii) the combined cycle time is greater than the base cycle. (b) Dependence among instructions or rescuree conflicts among instructions can pre- vent simultaneous execution of instructions, Problem 4.19 (a) Vector instructions perform identical operations on vectors of length usually auch larger than 1. Scalar instructions operate on a number or a pair of numbers at a time. (b) Suppose the pipeline is composed of & stages and the vector is of length V. The first output is generated in the k-th cycle. Afterward, an additional output is generated in each cycle, The last result comes out of the pipeline in cycle (N+ k—1). Using a base scalar machine, it takes Né oycles. Thus dhe speedup is Wk/(N + —1). (c) If m-issue vector processing is employed, each vector is of length N/m. Therefore, the execution time is (N/m +k — 1) cycles. If only parallel issue is used, the execution time is (N/m)k. Thus, the speed improvernent is Nim+k Nk (N/m)k ~ N+m(k—1)° Problem 4.11 (a) ‘The average cost is nt oaso Sse For ¢ to approach ¢2, the conditions are 43 >> #1 and ¢¢2 >> e181 (b) The effective access time is ta = Do Site = nts + (1 Pa hate = hts + = Ate (©) Witz = rts, Then t. = (A+ (1 Adria B= G/ty =1/(h+ (1 —h)r). (d) The plottings are shown in the following diagram: Bs oss (e) fr = 100, we have B= 1/(h-+ (1— h) « 100) > 0.95, Solving the inequality, we obtain the condition Problem 4.14 (a) Inclusion property refers to the property that information present in a lower-level memory must be a subset of that in a higher-level memory, (b) Coherence property requires that copies of an information item be identical through- out the memory hierarchy. (c) Write-through policy requires that changes made to a data item in a lower level memory be made to the next higher level memory immediately. (d) Write-back policy postpones the update at level (i +1) memory until the item is replaced or removed from level i memory. (@) Paging divides virtual memory and physical memory into pages of fixed sizes to simplify memory management and alleviate fragmentation problem. (f) Segmentation divides the virtual address space into variable-sized seginents, Each segment corresponds to a logical unit. The main purpose of segmentation is to facilitate sharing and protection of information among programs. Problem 4.18 Attributes Symbolic processing Numeric processing Data objects Lists, relational databases, scripts, semantic nets, frames, blackboards, objects, production systems. Integer, floating-point numbers, vectors, matrices, Common operations ‘Search, sort, pattern matching, filtering, contexts, partitions, transitive closures, unification, text retrieval, set operations, reasoning. ‘Aad, subtract, multiply, divide, matrix multiplication, matrix-vector multiplication, reduction operations like dot product of vectors, ete Memory requirements. Large memory with intensive access pattern. Addressing is often content-based. Locality of reference may not hold. Great memory demand with intense access. Access pattern usually exhibits high degree of spatial and temporal localities. Communication patterns Message traffic varies in size and destination; geanularity and: format of message units Message traffic and granularity are relatively uniform. Proper mapping can restrict ‘communication to largely between neighboring pruvessurs “Algorithm Properties Nondeterministic, possibly parallel and distributed. computations. Data dependences may be global and irregular in pattern and granularity. Typically deterministic. Amenable to paraliel and distributed computations. Data dependence is mostly local and segular. Input/Output requirements Inputs ean Be graphical and audio as well as from keyboard; access to very large on-line databases, Targe data sets usually exceed memory capacity. Fast 1/0 is highly desirable, ‘Architecture Features Parallel update of large knowledge bases, dynamic load balancing; dynamic memory allocation; hardware-supported garbage collection; stack Processor architecture; symbolic processors. Can be pipelined vector processor, MIMD, or SIMD processors using various memory and interconnection structures. Systolic array is suitable for certain types of computations. Problem 5.9 (a) Bach set of the cache consists of 256/8 = 32 block frames, and the entire cache has 16 x 1024/256 = 64 sets. Similarly, the memory contains 1024 x 1024/8 = 131072 blocks. "Thus, the memory address format is as shown ia the following figure: Qa a Cache address tag Set Word address address A dlock B of the main memory is mapped to a block frame in set F of the cache if F = B mod 64. (b) The effective memory access time for this memory hierarchy is 50 x 0.95 + 400 x (10.95) = 47.5 + 29 = 67.5 ns. Problem 5.10 (a) The address assignment is shown in the following diagram: ene ato ni t I Sedat ce ts Mo My My M3 z= B 2 a] me] fw] [ee os} Cwm] Cw] Pen woo on] Paar} | see _} Dota wont Memory daa register (b) There are 1024 / 16 = 64 blocks in the main memory, and 256 / 16 = 16 block frames in the cache. (c) 10 bits are needed to address each word in the main memory: 2 for selecting the memory module and 8 for the offset of 2 word within the module, 6 bits are required to select a word in the cache: 2 bits to select the set number and 4 bits to select a word within a block. Besides, each block frame neods a 4-bit address tag to determine the block resident in it. (d) The mapping of memory blocks to the block frames in cache is shown in the following diagram: cache Moin Memory oe = at 52 sao = Be 35 8 87 es Sett 89 seta | re] seta Bis) B80) = 514 Bis asz 53 After the set in which a memory block can be mapped into is identified, the address tag of the block frames in that set is compared by associative search with the physical memory address to determine if the desired block is in cache, Problem 5.14 (a) ‘Phe effective access time for each memory access is ta = fill — hilton + fall — Rall ‘The CPI in jis can be estimated as a(mt, + t,) +1 cPi=mt, ++ fat,= z = ‘The effective MIPS of the entire system is thus =2-____ MIPS = pi = Smt +i) #1 (8) Using the data given, we have the following values: t, = 0.5 » (1 — 0.95) x 0.5 40.5 x (16.7) x 0.5 = 0.0875 CPT = 0.4 x 0.0875 + 1 +008 x5 = 0.485, And finally, —P_ = 95, 0.485, Hence, the number of processors needed is p = 13. Problem 6.1 (a) nk 15000% 5 _ 75000 _ 4 gog6 Speedup = ¢2Gq—1) = 5+ (1500-1) ~ 15004 (b) Efficiency = n/[k + (n ~ 1)} = 15000/15004 = 0.9997. Throughput = nf /[k+(n~1)] = 15000%25x10° (instructions/s) /15004 = 24.99 MIPS. Problem 6.5 Lower bound of MAL = the maximum number of checkmarks in anj row of the reservation table. Upper bound of MAL = the number of 1’s in the initia collision vector plus. 1. Detailed proof can be found in the paper by Shar (1972). Problem 6.6 (a) Forbidden latencies: 1, 2, and 5. Initial collision vector: (10011). (b) State transition diagram: (c) MAL = 3 (a) Throughput = 4 = 16.67 million operations per second (MOPS). (e) Lower bound of MAL = 2, The optimal latency is not achieved. Problem 6.7 (a) Reservation table: (b) State transition diagram: {c) Simple cycles: (4), (5), (7), (3,1), (34), (3.5.4), (3,5,7), (1,7), (5,4), (5,7), (3,7), (2,3,4), (1,3,5,4), (1,3,5,7), (1,3,7), (1.4.3), (1,44), (1,4,7), (5.5.4), (5,3,7), (5,3,1,7) Greedy cycle: (1,3) (a) 1+3 MAL =*t*=2 fe) 1 Throughput = 5- Problem 6.9 (a) Forbidden latency: 3; collision vector: (100). (b) State transition diagram is shown below: (c) Simple cycles: (2), (4), (1,4), (11,4), and (2,4); greedy cycles: (2) and (1,1,4) (4) Optimal constant latency cycle: (2); MAL = 2 (e) Throughput = = 25 MOPS. —t_ 2x 20m Problem 6.10 (a). Forbidden latencies: 3, 4, 5 ; collision vector: (11100). (b) State transition diagram is shown below: ({¢) Simple cycles: (1,1,6), (2,6), (6), and (1,6). (d) Greedy cycle: (1,2,6). {e) MAL = 1414 6/3 = 2.67. (£) minimum allowed constant cycle: (6). (g) Maximum throughput = 1 MAL xr > 3/87)- (bh) 1/(6r). Problem 6.11 The (ree pipeline stages are referred to as IF, OF, and EX for instruc tion fetch, operand fetch, and execution, respectively. ‘The following diagram shows the sequence of execution: 4 Cy ‘5 u ts % & wy Se Ref oa Ex wo Pace Par Pace [a At to, Ok) 9 I(e) = {RO} —> RAW hazard At ty, O(2) NI) = {Acc} — RAW hazard, At ts, O(L,) MIUs) = {Ace} —s RAW hazard. ‘The following shows a scheduling which avoids the hazard conditions: oF we | Raw! Saw Bend Saw a Problem 6.12 (a) For the given value ranges of m and n, we know that mn(N—-1) > N-1>N-m. Now, Eq, 6.32 can be rewritten as mn(N —1) +mnk Sm) = “Cy in) mk” From elementary algebra, we know that the right hand side of the above equation will attain the largest value when the term mnk is smallest. As a result, the value of & should be 1 in order to maximize $(m,n) (b) Instructional level parallelism limits the growth of superscalar degree (c) The multiphase clocking technique limits the growth of superpipeline degree. Problem 6.13 © Solution 1 (a) Reservation table: si [x x $2 x x $3 x 84 [x (b) Forbidden latency: 4. Collision vector: (1000). (c} State transition diagram: (4) Simple cycles: (1,5), (14,5), (1,1,1,5), (1,2,5), (1,2,3,8), (1,2,4,2,5), (1,2,8,2,1,5), (2,5), (2,1,8), (211,238), (2,1,2,8,5}, (2,3;5), (3,5), (3,2,8), (3,2,1,5), (,3,2,1,2,8), (5), (3,2,1,2), and (3). (e) Greedy cycles: (1,1,1,5) and (1,2,3,2). EH141+5_ ; = (4) MAL = 2. (g) Maximum throughput = 1/(2r). * Solution 2 {a) Reservation table: si (xX] Xt s2 x x 83 x x $4 x (b) Forbidden latency: 2, 4. Collision vector: (1010). (ce) State transition diagram (a) Sinsple cycles: (3), (5), (1,5), and (8,5). (e) Greedy cycles: (1,5) and (3). (f) MAL =3. (g) Maximum throughput = 1/(37). Problem 6.14 (a) The complete reservation table for the composite pipeline is as follows: 12.3 4 5 6 7 8 9 10 It 12 x x (b) Forbidden latencies: 8 1, 7, 9, 3, 2- Collision vector: (111000111). (c) State transition diagram: moun, (€) Simple eycles: (5), (6), (10), (4,6), (4,10), (5,6), and (5,10). Greedy cycles: (5) and (4,6). (e) MAL = 5. (f) Maximum throughput = 1/(57)- ‘Problem 7.12 (a) A 16 x 16 Omega network using 2 x 2 switches is shown below: 000 001 colo oon 0100 10 ono ~ om 1100) ~S 1100 04 mo cove ott 100 cue ont mn 1100 Hor ino ua (b) 1011 — 0101 is indicated by — in the above diagram; 0111 -» 1001 is indicated k -—-. As can be seen, there is no blocking for the two connections. (c) Bach switch box can implement two permutations in one pass (straight or cross ‘There are log, 16 x 16/2 switch boxes. Therefore, the total number of single-pa: permutations can be computed as gif log, 18 = 95? = 168, The total number of permutations is 16!, therefore, Number of single pass permutations _ 16% 22.05 «107 Total number of permutations ‘16! (d) At most logy 16 = 4 passes are needed to realize all permutations. Problem 7.13 (a) A unicast pattern is a one-to-one communication, and a multicast pattern is a one-to-many communication. (b) A broadeast pattern is a one-to-all communication, and a conference pattern is a many-to-many communication (c) The channel traffic at any time instant is indicated by the number of channels used to deliver the message involved. (4) The communication latency is indicated by the longest packet transmission time involved (e) Partitioning of a physical network into several logical subnetworks. In each of the subnetworks, appropriate routing schemes can be used to avoid deadlock.

You might also like