0% found this document useful (0 votes)
23 views5 pages

CA Classes-226-230

The document discusses the computer architecture of MPP (Massively Parallel Processor) and CM5 (Connection Machine 5). MPP was developed by NASA for analyzing Landsat images using a 2D mesh connected array of processors. It deals with processor faults by replacing faulty columns. CM5 is a coarse-grained SIMD architecture with each PE containing a SPARC microprocessor and vector processors with 32MB memory.

Uploaded by

SrinivasaRao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views5 pages

CA Classes-226-230

The document discusses the computer architecture of MPP (Massively Parallel Processor) and CM5 (Connection Machine 5). MPP was developed by NASA for analyzing Landsat images using a 2D mesh connected array of processors. It deals with processor faults by replacing faulty columns. CM5 is a coarse-grained SIMD architecture with each PE containing a SPARC microprocessor and vector processors with 32MB memory.

Uploaded by

SrinivasaRao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Computer Architecture Unit 10

inevitably increased by the use of custom integrated circuits, in which


economic constraints lead to poor characterisation. The problem is
compounded in a data-parallel mesh-connected array, since failure at some
point of the array disrupts the very data structures on which efficient compu-
tations are predicated. The MPP deals with this problem by allowing a
column of processors which contains one faulty element to be switched out
of use, while one of the four spare columns is added to the edge of the array
to maintain its size and shape. Naturally, if a fault occurs during
computation, the sequence of instructions following the last dump to
external memory must be repeated after replacement of the fault-containing
column.
The processing elements are linked by a 2-dimension near-neighbour mesh.
This resolution gives a number of important advantages over other likely
alternatives, such as trouble-less data structures maintenance in shifting,
engineering ease, high bandwidth, and a close conceptual match to the
formulation of many image processing calculations.
The principal disadvantage of this system is the sluggish transmission of
data between remote processors in array. However, this can be only seen if
comparatively minute amount of data is to be transmitted (rather than whole
images).
The option of 4 rather than 8-connectedness is perhaps surprising in view of
the minimal increase of complexity which latter involves, compared to a
twofold improvement in performance on some operations. There is one
special purpose staging memory meant for conversion of data format. All
extremely parallel computers have problems related with the data input &
output, and in those parallel computers which represent single-bit
processors, the problems are many and compounded. The problem is that
external source data is usually formatted as one individual string of integers.
So, if such a data is utilised in a two-dimensional array in any simple
manner, considerable amount of time is wasted before successful
processing can start, basically because of the unmatched format of the
data.
The MPP included two solutions for this problem. The 1st was a distinct
data input/output register. The 2nd was the staging memory, which allowed
conversion between bit plane & integer string formats. Using jointly, these

Manipal University of Jaipur B1648 Page No. 226


Computer Architecture Unit 10

two solutions allowed the processor array to function continuously, and so


giving out the maximum output.
10.4.2 Programming and applications
The MPP system was commissioned by NASA principally for the analysis of
Lands at images (satellite imagery of Earth) This meant that, initially, most
applications on the system were in the area of image processing, although
the machine eventually proved to be of wider applicability. At the same time,
NASA also utilised the MPP system for various other applications listed
below. See figure 10.9).

Figure 10.9: MPP integrated Circuits

Stereo image analysis: The stereo analysis algorithm on the MPP was
designed to work out elevations from artificial aperture images obtained at

Manipal University of Jaipur B1648 Page No. 227


Computer Architecture Unit 10

different viewing angles during a Shuttle mission. By means of an


appropriate geometric model, elevations can be worked out from the
differing locations of corresponding pixels in a pair of images acquired at
different incidence angles, which form a pseudo stereo pair. The main
difficulties observed in the matching algorithm are:
 The brightness levels are different in corresponding areas of the two
images.
 Images have areas of low contrast and high noise.
 There are local distortions which differ from image to image.
We can overcome the first two difficulties by the use of normalised
correlation functions (a standard image processing technique) but the third
arises due to the different viewing angles. The MPP algorithm operates as
follows:
 For each pixel in one of the images (the reference image) a local
neighbourhood area is defined. This is correlated with the similar area
surrounding each of the candidate match pixels in the second image.
 The measure applied is the normalised mean and variance cross
correlation function. The candidate yielding the highest correlation is
considered to be the best match, and the locations of the pixels in the
two images are compared to produce the disparity value at that point of
the reference image.
 The algorithm is iterative. It begins at low resolution, that is, with large
areas of correlation around each of a few pixels. When the first pass is
complete, the test image is geometrically warped according to the
disparity map.
 The process is then repeated with a higher resolution (usually reducing
the correlation area. and increasing the number of computed matches,
by a factor of two), a new disparity map is calculated and a new warping
applied, and so on.
 The procedure is continued either for a predetermined number of
passes or until some quality criterion is exceeded.
Self Assessment Questions
9. MPP is the acronym for ____________.
10. All highly parallel computers have problems concerned with
______________________.

Manipal University of Jaipur B1648 Page No. 228


Computer Architecture Unit 10

10.5 Coarse-Grained SIMD Architecture


There are several technical difficulties that arise in fulfilling completely the
fine-grained SIMD ideal of one processor per data element. Thus, it is better
to begin with the coarse-grained approach and therefore, develop a more
rational architecture. Currently, a number of parallel computers
manufacturers, including nCUBE and Thinking Machines Inc., have adopted
this outlook.
The manufacturers which are more familiar with the mainstream of
computer design than the application-specific architecture field often
develop the Coarse-grained data-parallel architectures. It is the result of
MIMD programmes that have helped discover the complexities of this
approach and seek to mitigate them. The consequences of these roots are
systems which can employ a number of different paradigms including
MIMD. Multiple-SIMD and what is often called single program multiple data
(SPMD) in which each processor executes its own program, but all the
programs are the same, and so remain in lock-step. Such systems are
frequently used in this data-parallel mode, and it is therefore reasonable to
include them within the SIMD paradigm. Naturally, when they are used in a
different mode, their operation has to be analysed in a different way.
Coarse-grained SIMD systems of this type embody the following concepts:
 Each PE is of high complexity, comparable to that of a typical
microprocessor.
 The PE is usually constructed from commercial devices rather than
incorporating a custom circuit.
 There is a (relatively) small number of PEs, on the order of a few
hundreds or thousands.
 Every PE is provided with ample local memory.
 The interconnection method is likely to be one of lower diameter and
lower bandwidth than the simple two-dimensional mesh. Networks such
as the tree, the crossbar switch and the hypercube can be utilised.
 Provision is often made for huge amounts of relatively high-speed, high-
bandwidth backup storage, often using an array of hard disks.
 The programming model assumes that some form of data mapping and
remapping will be necessary, whatever the application.
 The application field is likely to be high-speed scientific computing.

Manipal University of Jaipur B1648 Page No. 229


Computer Architecture Unit 10

This type of systems have a number of advantages as compared to fine-


grained SIMD, such as the capability to take maximum advantage from
latest processor technology, the aptitude to perform highly precise
computations with no performance penalty and the easier mapping to a
selection of different data types which the lesser number of processors and
improved connectivity permits.
The software required for such systems offers an advantage as well as a
disadvantage at the same time. The advantage lies in its closer similarity to
normal programming: the disadvantage lies in the less natural programming
for some applications. Coarse-grained systems also offer greater variety in
their designs, because each component of the design is less constrained
than in a tine-grained system. The example given below is, therefore, less
specifically representative of its class than was the MPP machine
considered earlier.
10.5.1 An example: the CM5
The Connection Machine family marketed by Thinking Machines Inc. has
been one of the most commercially successful examples of niche marketing
in the computing field in recent years (one other which springs to mind is the
CRAY family). Although the first of the family, CM1, was fine-grained in
concept, the latest, CM5, is definitely coarse-grained. The first factor in its
design is the processing element, illustrated in figure 10.10. The important
components of the design include:
 internal 64-bit organisation;
 a 40MHz SPARC microprocessor;
 separate data and control network interfaces;
 up to four floating-point vector processors with separate data paths to
memory;
 32 Mbytes of memory.

Manipal University of Jaipur B1648 Page No. 230

You might also like