0% found this document useful (0 votes)
7 views79 pages

Parallel Programming Practical Aspects Models and Current Limitations Mikhail S Tarkov Download

The document discusses the book 'Parallel Programming: Practical Aspects, Models and Current Limitations' edited by Mikhail S. Tarkov, which addresses the challenges and methodologies in parallel programming. It covers various topics including neural networks, parallel algorithms, and the optimization of computational processes on distributed systems. The book aims to provide insights for researchers and practitioners in the field of high-performance computing.

Uploaded by

xjozargs150
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views79 pages

Parallel Programming Practical Aspects Models and Current Limitations Mikhail S Tarkov Download

The document discusses the book 'Parallel Programming: Practical Aspects, Models and Current Limitations' edited by Mikhail S. Tarkov, which addresses the challenges and methodologies in parallel programming. It covers various topics including neural networks, parallel algorithms, and the optimization of computational processes on distributed systems. The book aims to provide insights for researchers and practitioners in the field of high-performance computing.

Uploaded by

xjozargs150
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

Parallel Programming Practical Aspects Models

And Current Limitations Mikhail S Tarkov


download

https://ebookbell.com/product/parallel-programming-practical-
aspects-models-and-current-limitations-mikhail-s-tarkov-6771156

Explore and download more ebooks at ebookbell.com


Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Programming In Parallel With Cuda A Practical Guide Richard Ansorge

https://ebookbell.com/product/programming-in-parallel-with-cuda-a-
practical-guide-richard-ansorge-58469760

Parallel Programming Concepts And Practice 1st Edition Bertil Schmidt

https://ebookbell.com/product/parallel-programming-concepts-and-
practice-1st-edition-bertil-schmidt-7233282

Parallel Programming With Microsoft Net Design Patterns For


Decomposition And Coordination On Multicore Architectures Patterns
Practices 1st Edition Colin Campbell

https://ebookbell.com/product/parallel-programming-with-microsoft-net-
design-patterns-for-decomposition-and-coordination-on-multicore-
architectures-patterns-practices-1st-edition-colin-campbell-2488296

Parallel Programming With Microsoft Visual C Design Patterns For


Decomposition And Coordination On Multicore Architectures Patterns And
Practices 1st Edition Colin Campbell

https://ebookbell.com/product/parallel-programming-with-microsoft-
visual-c-design-patterns-for-decomposition-and-coordination-on-
multicore-architectures-patterns-and-practices-1st-edition-colin-
campbell-2112720
Parallel Programming And Concurrency With C 10 And Net 6 1st Edition
Alvin Ashcraft

https://ebookbell.com/product/parallel-programming-and-concurrency-
with-c-10-and-net-6-1st-edition-alvin-ashcraft-45461298

Parallel Programming For Multicore And Cluster Systems 3rd Edition


Thomas Rauber

https://ebookbell.com/product/parallel-programming-for-multicore-and-
cluster-systems-3rd-edition-thomas-rauber-48456870

Parallel Programming For Multicore And Cluster Systems Thomas Rauber

https://ebookbell.com/product/parallel-programming-for-multicore-and-
cluster-systems-thomas-rauber-2047006

Parallel Programming With Python Develop Efficient Parallel Systems


Using The Robust Python Environment Martini

https://ebookbell.com/product/parallel-programming-with-python-
develop-efficient-parallel-systems-using-the-robust-python-
environment-martini-20632918

Parallel Programming Techniques And Applications Using Networked


Workstations And Parallel Computers 2nd Ed Allen

https://ebookbell.com/product/parallel-programming-techniques-and-
applications-using-networked-workstations-and-parallel-computers-2nd-
ed-allen-22041392
MATHEMATICS RESEARCH DEVELOPMENTS

PARALLEL PROGRAMMING

PRACTICAL ASPECTS, MODELS


AND CURRENT LIMITATIONS

No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or
by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no
expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No
liability is assumed for incidental or consequential damages in connection with or arising out of information
contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in
rendering legal, medical or any other professional services.
MATHEMATICS RESEARCH DEVELOPMENTS

Additional books in this series can be found on Nova’s website


under the Series tab.

Additional e-books in this series can be found on Nova’s website


under the e-book tab.
MATHEMATICS RESEARCH DEVELOPMENTS

PARALLEL PROGRAMMING

PRACTICAL ASPECTS, MODELS


AND CURRENT LIMITATIONS

MIKHAIL S. TARKOV
EDITOR

New York
Copyright © 2015 by Nova Science Publishers, Inc.

All rights reserved. No part of this book may be reproduced, stored in a retrieval system or
transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical
photocopying, recording or otherwise without the written permission of the Publisher.

For permission to use material from this book please contact us: [email protected]

NOTICE TO THE READER


The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or
implied warranty of any kind and assumes no responsibility for any errors or omissions. No
liability is assumed for incidental or consequential damages in connection with or arising out of
information contained in this book. The Publisher shall not be liable for any special,
consequential, or exemplary damages resulting, in whole or in part, from the readers’ use of, or
reliance upon, this material. Any parts of this book based on government reports are so indicated
and copyright is claimed for those parts to the extent applicable to compilations of such works.

Independent verification should be sought for any data, advice or recommendations contained in
this book. In addition, no responsibility is assumed by the publisher for any injury and/or damage
to persons or property arising from any methods, products, instructions, ideas or otherwise
contained in this publication.

This publication is designed to provide accurate and authoritative information with regard to the
subject matter covered herein. It is sold with the clear understanding that the Publisher is not
engaged in rendering legal or any other professional services. If legal or any other expert
assistance is required, the services of a competent person should be sought. FROM A
DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE
AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS.

Additional color graphics may be available in the e-book version of this book.

LIBRARY OF CONGRESS CATALOGING-IN-PUBLICATION DATA


Parallel programming : practical aspects, models and current limitations / [edited by] Mikhail S.
Tarkov (Institute of Semiconductor Physics, Siberian Branch, Russian Academy of Sciences,
Russia).
pages cm -- (Mathematics research developments)
Includes bibliographical references and index.
ISBN:  (eBook)
1. Parallel programming (Computer science) I. Tarkov, Mikhail S., editor.
QA76.642.P356 2014
005.2'75--dc23
2014034859

Published by Nova Science Publishers, Inc. † New York


CONTENTS

Preface vii
Chapter 1 Mapping Data Processing Neural Networks onto Distributed 1
Computer Systems with Regular Structures
Mikhail S. Tarkov
Chapter 2 Mapping Parallel Program Graphs onto Graphs of Distributed 33
Computer Systems by Neural Network Algorithms
Mikhail S. Tarkov

Chapter 3 Large-Scale and Fine-Grain Parallelism in Plasma Simulation 59


A. Snytnikov

Chapter 4 Numerical Modelling of Astrophysical Flow on Hybrid 71


Architecture Supercomputers
I. Kulikov, I. Chernykh, A. Snytnikov, V. Protasov,
A. Tutukov and B. Glinsky

Chapter 5 Efficient Computational Approaches for Parallel Stochastic 117


Simulation on Supercomputers
Mikhail A. Marchenko

Chapter 6 Lattice Gas Cellular Automata for a Flow Simulation 143


and Their Parallel Implementation
Yury G. Medvedev

Chapter 7 Parallel Simulation of Asynchronous Cellular Automata 159


Konstantin Kalgin

Chapter 8 XPU: A C++ Metaprogramming Approach to Ease Parallelism 175


Expression: Parallelization Methodology, Internal Design
and Practical Application
Nader Khammassi and Jean-Christophe Le Lann
vi Contents

Chapter 9 An Approach to the Construction of Robust Systems 199


of Interacting Processes
Igor N. Skopin

Chapter 10 Early Learning in Parallel Programming 219


Igor N. Skopin

Index 231
PREFACE

Parallel programming is designed for the use of parallel computer systems for solving
time-consuming problems that cannot be solved on a sequential computer in a reasonable
time.
These problems can be divided into two classes:
1. Processing large data arrays (including processing images and signals in real time).
2. Simulation of complex physical processes and chemical reactions.
For each of these classes prospective methods are designed for solving problems. For
data processing one of the most promising technologies is the use of artificial neural
networks. Method of particles-in-cell and cellular automata are very useful for simulation.
Problems of scalability of parallel algorithms and the transfer of existing parallel
programs to future parallel computers are very acute now. An important task is to optimize
the use of the equipment (including the CPU cache) of parallel computer. Along with
parallelizing information processing it is essential to ensure the processing reliability by the
relevant organization of systems of concurrent interacting processes. From the perspective of
creating qualitative parallel programs it is important to develop advanced methods of teaching
parallel programming.
The above reasons are the basis for the creation of this book, chapters of which are
devoted to solving these problems.
The first chapter (by Dr. Mikhail S. Tarkov) is devoted to mapping neural networks onto
regular structures (hypercube, torus) of distributed computer systems (CS). These structures
are now used not only in supercomputers but also serve as a basis for the construction of
parallel systems for VLSI chips (System-on-Chip). As a result of mapping neural networks on
such a structure we can obtain an effective solution of the problem of organizing interactions
between the neurons within chip and within the entire distributed CS.
The second chapter (by Dr. Mikhail S. Tarkov) examines the possibility of using the
Hopfield recurrent neural network for mapping structures of parallel programs onto the
structures of distributed CS. It is shown that such a network can be successfully used for
mapping of parallel programs on a multicore computer and for constructing Hamiltonian
cycles in the structure of distributed CS. In the last case the neural network algorithms are not
inferior in speed than the permutation ones.
The third chapter (by Dr. Alexey Snytnikov) investigates the relaxation processes in high-
temperature plasma caused by the propagation of the electron beam. The mathematical Particle-
In-Cell (PIC) model is used in the problem. To achieve high performance both large-scale and
viii Mikhail S. Tarkov

fine-grain parallelization techniques are used. Large-scale parallelization is achieved by domain


decomposition on computing nodes of the cluster supercomputer. Fine-grain parallelization is
done by implementing the computation of motion of each particle as a separate stream. Thus,
the highest performance is achieved by hybrid supercomputers with GPU.
The fourth chapter (by Dr. Igor Kulikov et al.) describes the technology of numerical
modeling of astrophysical flows on the hybrid supercomputer with NVIDIA accelerators. To
solve this problem the software packages GPUPEGAS (modeling of astrophysical objects),
AstroPhi (simulation of the dynamics of stars and molecular clouds), PADME (simulation of
the formation of planetary systems) are developed.
The fifth chapter (by Dr. Mikhail A. Marchenko) focuses on the problems of using the
Monte Carlo method (method of numerical stochastic modeling) on supercomputers. An
effective approach is proposed for parallel stochastic modeling and its application in practice,
in particular in the problem of modeling the evolution of electron avalanches in gases. On the
basis of this approach the software library PARMONC was created. The results of the study
on the scalability of parallel algorithms of stochastic modeling are presented.
The sixth chapter (by Dr. Yuri Medvedev) presents the development of cellular automata
models of gas flows. Transitions from Boolean models to integral models, from two-
dimensional models to three-dimensional models, and from gas flow models to models of
multiphase flows are considered. Implementation of models on a cluster using the MPI library
is proposed. The problem of dynamic load balancing is solved for multiple cores of the cluster
in the implementation of models with integer alphabet.
The seventh chapter (by Dr. Konstantin Kalgin) is devoted to the modeling of parallel
asynchronous cellular automata. A comparative analysis of their parallel implementations is
done. Parallel algorithms on the model of physical and chemical process of surface CO + O2
reaction are tested on different parallel computers: a computer with shared memory, a cluster
(distributed memory computer), and GPU. The specialized language CACHE and programs
for conversion of this language to C language are proposed for cellular automata models of
physical and chemical processes.
In the eighth chapter (by Dr. Nader Khammassi and Prof. Jean-Christophe Le Lann) the
parallel programming model XPU is suggested. It facilitates the programming of parallel
computations without loss of performance. The XPU technology is based entirely on
traditional parallel programming language C++ and can be easily integrated into many
systems. The XPU uses methods of C++ metaprogramming in order to simplify the creation of
various kinds of parallelism (task parallelism, data parallelism, temporal parallelism) at all
levels of granularity.
In the ninth chapter (by Dr. Igor N. Skopin) a robust approach to the construction of
systems of interacting processes is proposed. It is shown how the successive solution of some
problems can be naturally represented as a system of interacting concurrent processes. Such
representation improves the robustness of computations.
Tenth (final) chapter (by Dr. Igor N. Skopin) considers the problems associated with
learning parallelism. A new efficient approach is proposed to learning parallel programming.
This approach is based on constructing program sketches, which does not take into account
any restrictions on concurrency, and the subsequent mapping of the sketches on a real
computer.
Preface ix

We hope this book will be of interest to researchers, students and all those who work in
the field of parallel programming and high performance computing.

Mikhail S. Tarkov, Ph.D.


Institute of Semiconductor Physics SB RAS
Novosibirsk, Russia
Tel: +7 (383) 330-84-94
Fax: +7 (383) 330-52-56
Email: [email protected]
In: Parallel Programming ISBN: 978-1-63321-957-1
Editor: Mikhail S. Tarkov © 2015 Nova Science Publishers, Inc.

Chapter 1

MAPPING DATA PROCESSING NEURAL NETWORKS


ONTO DISTRIBUTED COMPUTER SYSTEMS
WITH REGULAR STRUCTURES

Mikhail S. Tarkov*
Institute of Semiconductor Physics SB RAS, Novosibirsk, Russia

Abstract
The methods for efficient mapping data processing neural networks onto robust distributed
computer systems (CS) are proposed. Cellular neural networks are mapped onto the graphs of
parallel programs with structures "mesh" and "line". The efficiency of the proposed methods
for neural networks with global connections (Hopfield network, Kohonen network, and
multilayer perceptron) is based on a butterfly scheme and mapping this scheme onto
hypercube with subsequent embedding of the hypercube onto a torus. This networks are
mapped onto regular graphs of parallel programs ("line", "ring", "mesh", "hypercube",
"torus") intended for implementation on distributed computer systems.

Keywords: neural networks, distributed computer systems, mapping, hypercube, torus

1. Introduction
Currently, there is a steady growth in the volume of processed measurement information
(signals and images) in modern information systems. This also increases the performance
requirements for such systems.
Neural networks realize a perspective model of parallel computations [1-9]. The artificial
neural networks are based on the following features of live neural networks allowing them to
cope with irregular tasks:

*
E-mail address: [email protected]
2 Mikhail S. Tarkov

 a simple processing element, the neuron;


 huge number of neurons are participated in information processing;
 each neuron is connected to many other neurons (global connections);
 huge number of inter-neuron connections with changing weights;
 massive parallelism of information processing.

The network possessing of these properties belongs to the class of connectionist models
of information processing. Their main feature is the use of weighed connections between
processing elements as a means of storing information. The processing is carried out at the
same time by a large number of neurons, and each neuron is connected to many other
neurons; so, the neural network is resistant to malfunctions and is capable of fast computing.
To create a neural network for a specific task is to determine:

 a neuron model;
 connection topology;
 connection weights.

Neurocomputer is a device that contains a neural network as the major component and
has applications in many areas:

 artificial intelligence: pattern recognition, image processing, reading handwritten


characters, etc;
 control system and technical control;
 creation of special parallel computers;
 study of the human brain.

Neural networks are different not so much in their neuron model as the topology of
connections and rules determining the weights (training rules). The neural network structures
are divided into single-layer structures and multi-layer ones. Single-layer networks are
cellular neural networks, Hopfield and Kohonen networks.
Multi-layer network has an input layer, output layer and hidden layers. The input layer
receives input data; the output layer generates the output result of processing and hidden
layers are involved in processing information.
Unlike traditional means of information processing, the neural network programming
performed implicitly in the training process. Training is constructed as follows. There is a
training set, i.e., given a set of examples with answers. These examples are presented to the
neural network. The neurons receive conditions of the example and transform them. Then the
neurons are communicated repeatedly by transformed signals and, finally, give a response in
the form of an output set of signals. A deviation of an output from the correct answer is
penalized. Training is meant to minimize the penalty as an implicit function of weights of
neuronal interconnections.
Traditional computer systems have the following problems:
Mapping Data Processing Neural Networks … 3

1. They need a precise description of the algorithm (the computer is oriented to


character processing).
2. The data must be exact. The equipment is easily damaged. A destruction of the main
elements of memory makes the system faulty.
3. Each object to be processed must be explicitly specified in the memory.
4. It is difficult to build a good algorithm for pattern recognition and associative
sampling.

In neurocomputers (neural networks):

1. The method of data processing is more similar to signal processing. Instead of the
program there is a set of neuron weights; instead of programming there is a training
of neurons (adjustment of neuron weights).
2. The neural network is tolerant to noise; data distortion does not substantially affect
the result (including the failure of individual neurons).
3. Processed objects are represented implicitly by neuron weights. As a result, the
network can work with the objects not previously come across and is capable of
training results generalization.
4. The neural network is good for solving the problems of perception and associative
sampling.

Real time image and signal processing require the creation of a highly parallel data
processing means. Autonomous means of a computer image and signal processing require not
only high performance of computing facilities, but also their high reliability and the ability to
learn and generalize the training outcomes in relation to new situations that arise in the
process of information processing. Artificial neural networks to be implemented in hardware
have all these qualities.
The enormous number of global interneuron connections (synapses) complicates the
neural network implementation as a VLSI layout because their connection length is inevitably
increased, which makes it necessary to reduce the clock frequency of operation of the digital
devices. On the other hand, increasing the degree of VLSI integration requires a larger
amount of clock generators on the chip, i.e., leads to the distributed processing of the
information that means the information processing by multiple cooperating processors
(elementary computers).
A system of interacting elementary computers (ECs) is a subject scalable to its
homogeneity (ECs are all the same). Scaling is possible only if means of communications
between processors are distributed, which, for a given technology of VLSI production, leads
to the interprocessor network representation in the form of a regular graph with a vertex of a
bounded degree.
Due to the rapid development of VLSI manufacturing technologies, a question of
combining the components in large systems on a chip is aroused. The most common
approach, using the principle of a common bus, shows a lack of scalability and a decrease in
throughput with increasing number of connected elements. One of the methods to eliminate
such deficiencies may be the use of network technology to exchange the data between
subsystems of the VLSI chip. The concept of combining computing cores in the network −
NoC (Network-on-Chip) is originated. The essence of this approach is to combine the kernels,
4 Mikhail S. Tarkov

typically processor cores with local memory and additional devices by specialized routers.
This approach to communication in VLSI has the advantages of scalability (the increasing
size of the network increases its bandwidth) and parallelism (data in different network
segments are transmitted simultaneously). The structures of such networks are regular and
have a limited degree of the node (hypercube, torus). The most common structures in such
networks are multi-dimensional tori.
The advantage of this concept over the classic alone is that the point-to-point links
between routers provide a high throughput by the presence of intermediate registers in the
signal path, and the data on different network segments are transmitted and switched
simultaneously. It provides high performance, capacity and resource savings, making the
research and development of new network architectures on a chip topical. A modern VLSI
chip is no longer seen as a monolithic block of synchronous hardware where all state
transitions occur simultaneously. Most of VLSI chips now are regarded as distributed systems
of interacting subsystems - System-on-Chip (SoC) and Network-on-Chip (NoC) [10, 11].
In this chapter, algorithms for mapping neural networks of signal and image processing
onto distributed computer systems with a regular structure are proposed. The algorithms for
mapping neural networks onto hypercube and torus are investigated.

2. Distributed Computer System Topology


A distributed computer system (DCS) [1217] is a set of elementary computers (ECs)
connected by a network program-controlled by these computers. Every elementary computer
includes a computing unit (CU) and a system device (SD) (a router). The system device
works under CU control and has input and output poles connected accordingly to the output
and input poles of the neighboring ECs. The DCS structure is described by a graph
Gs (Vs , E s ) where Vs is a set of ECs and Es  Vs Vs is a set of connections between ECs.
For a distributed computer system, a graph G p( V p ,Pp ) of a parallel program is usually
defined as a set V p of the program branches (virtual elementary computers) communicating
to each other by point-to-point principle, i.e., by transferring messages across logical (virtual)
channels (one- and two-directed) of a set E p  Vp Vp . In a general case, nodes x , y V p
and edges (or arcs)  x, y   E p are weighed by numbers characterizing computing
complexities of the branches and intensities of communications between them.
There are many factors that must be taken into consideration for a choice of the DCS
architecture to be used for image processing. The most popular types of the DCS topology are
hypercubes and tori [1217].
In d-dimensional hypercube Hd with 2d nodes, two nodes i and j are connected if and only
if the Hamming distance between them is H  i, j   1 (Figure 1). For a given number of
nodes the hypercube architecture has the best communication possibilities, but the scalability
of hypercube is constrained by the fact that the interconnection cost per node (node degree)
increases with the total number of nodes. From the scalability point of view, torus is a more
interesting type of interconnection topology.
Mapping Data Processing Neural Networks … 5

Figure 1. Example of hypercube ( d  4 ).

Figure 2. Example of two-dimensional torus.

Torus En(k1,…,kn) with n dimensions has N  k1  k2  ...  kn nodes and nodes i and j are
connected if and only if l {1,..., n}, (i  j )mod kl  1 (Figure 2).
The mapping data fragments onto processors must save their neighborhood. The
consideration above shows that, for the low-level image processing, it is better to use the
parallel program with a mesh structure because of a well-fitting torus. Actually, the two- or
three-dimensional tori are produced accordingly from the two- or three-dimensional meshes
by wrapping its columns and rows. Thus, the torus is a trade-off topology effectively used for
6 Mikhail S. Tarkov

low-level image processing. In modern supercomputer distributed systems the


multidimensional tori are usually used as network graphs [15, 16].
A p1  p 2  ...  p n mesh is the nearest neighbor network where each node is labeled as
 a1 , a2 ,..., an  , ai 0,1,..., pi  1 , i  1,2,..., n . Two nodes  a1 , a2 ,..., an  and
 b1 , b2 ,..., bn  are adjacent in the mesh if, for some i , ai  bi  1 and for every j  i ,
a j  b j . Multidimensional tori are multidimensional meshes with a “wrap around”. That is a
p1  p2  ...  pn torus is a network of p1  p2  ...  pn nodes where two nodes  a1 , a2 ,..., an 
and  b1 , b2 ,..., bn  are adjacent if, for some i , ai  bi  1 mod pi and for every j  i ,
a j  bj .
A parallel algorithm for DCS should be considered as a set of mutually interacting
processes, where each process involves a sequential program and local data memory [1821].
The processes interact by sending and receiving messages. The message transfer operation is
asynchronous. Receiving operation of the communication is synchronous: it causes a
blockage of the process up to receiving the message. In other words, a parallel algorithm for
distributed CS will be considered as a distributed virtual CS and the interacting processes of a
parallel algorithm as virtual processors (elementary computers) of such a system.
Methodology for designing parallel algorithm involves the following phases [18]:

 task decomposition to processes;


 synthesis of a graph of interactions between processes;
 merging processes;
 mapping processes to the processors of a computer system.

The interaction between the processes is called local if the process interacts with a small
number of other processes called neighbors (the number of neighbors is significantly less than
the total number of processes). An example of a neural network with local interactions is a
cellular neural network. If the number of processes involved in the interaction is compared to
the total number of processes, then the interaction is the global one (e.g., in computation of
the sum of the vector components distributed over all the processes of the parallel program).
Neural networks with global interactions are sigmoidal multilayer neural networks, Hopfield
networks, Kohonen networks and others.
The algorithms with local interactions are easily mapped onto regular structures of
parallel programs of types "line" or "mesh" (Figure 3). The algorithms with global
interactions are mapped onto a hypercube which can then be embedded into tori of various
dimensions (including the ring, i.e., one-dimensional torus).

3. Mapping Neural Networks with Local Interactions


(Cellular Neural Networks)
In a cellular neural network (CNN) neurons are located in the mesh nodes (for example, see
Figure 3a), or in the nodes of a regular graph. Each neuron has weighed connections with his
Mapping Data Processing Neural Networks … 7

neighbors. Such networks are useful for the realization of filtering operations [22], which are
often described by the convolution G(i, j ) of the image I (i, j ) with a set of filter weights:

h(k , l , i, j ), k , l  M ,...,0, M , i  1,..., N1, j  1,..., N2 :


M M
G(i, j )    h( k , l , i , j )  I ( k  i , l  j ) .
k  M l  M
(1)

Filtration (1) is usually preceded to other, more complex image transformations.

Figure 3. Examples of mapping local algorithms of preliminary image processing onto regular
structures of parallel programs.
8 Mikhail S. Tarkov

In transformation (1) a rectangular window of size  2M  1   2M  1 , M  min  N1 , N2 


is commonly used; so, the calculation of values in a point of the image is associated with the
processing of a small neighborhood of this point, i.e., filtering algorithms of form (1) are
local. From the locality of transformation (1) it follows that it should be used as a geometric
parallelism, i.e.,

1) the neighborhood graph of processes in a parallel program must correspond to the


neighborhood of image pixels;
2) the mapping of data fragments processed by the algorithms (1) onto the processes
must preserve fragments neighborhood;
3) as a graph of parallel program, implementing image filtering, it is advisable to use
the "mesh" (Figure 3а) or the “line” (Figure 3b).

In the limit, with a maximum parallelization, each process is in the one-to-one


correspondence to a pixel.
The mesh and the line are well mapped onto hypercubic and toroidal structures of
computer systems. Further we always assume that the image components are uniformly
distributed throughout the computers of the system so that the neighboring pixels are always
located in the same EC or in adjacent computers of the mesh or the line (geometric
parallelism).
Computations of neuron activations in networks with global communications (sigmoid
multilayer networks, Hopfield networks, Kohonen network, etc.) are usually reduced to the
computation of the sum of data array components (scalar product of the neuron weight vector
and the corresponding input vector of the neuron). The computation of this sum is one of the
semigroup array operations.

4. Mapping Semigroup Array Operations onto Distributed


Computer System with Torus Topology
Semigroup operation is a binary associative operation [19]. The examples of such
operation are addition, multiplication, conjunction, disjunction, excluding OR and
computation of minimum or maximum. In [19] the problem of implementation of semigroup
operations on data arrays distributed on the mesh is solved so that the operation result is
located in every processor. This solution can be easily translated onto the torus.
In this chapter we consider an alternative approach to mapping the semigroup operation
onto computer systems with torus topology. This approach is based on using the butterfly
scheme in parallel realization of semigroup operation and mapping the butterfly onto the
hypercube with subsequent XOR-embedding [12] of the hypercube onto torus. We show that
this approach gives more efficient parallel execution of the semigroup operation than the
approach proposed in [19].
Mapping Data Processing Neural Networks … 9

4.1. Semigroup Operations on Mesh and Torus Using Cyclic Intercomputer


Data Shifts

Let array x  x1,..., xn  be initially mapped onto a 2D-mesh with n  n processors


so that processor Pij contains data element xi n j
, i, j {1,..., n} . It is required to realize a
semigroup operation  on x so that all processors would get the operation result.
In [19] an algorithm for implementation of semigroup operations on the processor mesh
is proposed as a sequence of cyclic shifts with the execution of operation  after each shift:

1. to implement in parallel a cyclic data shift for each row i so that every processor in
1/ 2
row gets the result ri   j 1 xi
n
n j

2. to implement in parallel a cyclic data shift for each column j so that every processor
1/ 2
in column gets the result s  in1 ri which is equal to the required value
1/ 2 1/ 2
s  in1 nj 1 xi n j
.

A cyclic shift in a row (or column) of the mesh is as follows: every data element moves
from any processor to the right, while it does not come to the rightmost processor; after that
the shift direction is changed to the contrary (the caterpillar’s algorithm [19]) (see an example
in Figure 4a).

a b

Figure 4. a) Caterpillar’s algorithm of data exchanges in a mesh row (a column); b) Data shifts in a row
(a column) of a torus (exchange of components of vector x in a ring of computers).
10 Mikhail S. Tarkov

The shifts in the 2D-torus are different from those in the 2D-mesh in the following: the
shift direction does not change because, in the 2D-torus, the rows and the columns are
wrapped around (see Figure 4b). This algorithm is easily generalized to k -dimensional torus
 
Ek 2d1 ,...,2d k : the semigroup operation is implemented as a sequence of k steps with
2d i  1 cyclic shifts on i -th step, i  1,...,k .

4.2. Semigroup Operations in a Hypercube

The above algorithm of the semigroup operation realization on the torus is not optimal.
This operation can be realized quicker by a system of parallel processes which is known as
“butterfly” (see Figure 5: an example of sum evaluation on the butterfly). This system
implements multiplication of computations with a maximum number of  operations
executed simultaneously with different pairs of operands.

Figure 5. Sum evaluation on the butterfly.

Figure 6. Hypercube obtained from the butterfly of Figure 5.


Mapping Data Processing Neural Networks … 11

The butterfly is easily mapped onto the hypercube: the butterfly operations, which cannot
be realized in parallel, are merged to the same process. In Figure 5 the united operations are
on the same vertical line. The hypercube in Figure 6 is realized by merging the operations of
Figure 5. Here the numbers in brackets are the summation step numbers.

4.3. Mapping Hypercube onto a Torus

The hypercube can be effectively mapped onto a torus. The method for mapping a
hypercube onto a torus ( XOR -embedding) is proposed in [12].
Embedding graph G into graph H is an injection f from the nodes of G to the nodes of H.
If graph G is not isomorphic to some subgraph of H, then dilations of G edges are inevitable.
The dilation of an edge a, b  of G is the distance in H between nodes f (a) and f (b) .
The XOR-embedding of a hypercube Hd onto k -dimensional torus Ek 2d1 ,...,2d k ,  
k

d
i 1
i  d is realized as follows. First, Kj are defined:

K1  0,
j 1
Kj   d , 1  j  k  1.
i 1
i

If G is hypercube Hd and T is the torus, then node v of G is mapped onto node


m1, ...,mk   f XOR (v) in T as follows [12]:

 
m j (i)  v(i  K j ), i  0, d j  1 , i  d j  2,
m j (d j  2)  XORvK j 1  1, vK j 1 
2 .

Here x(i ) is the ith bit of the binary representation of x. It is shown [12] that hypercube

  d
k
Hd can be embedded into torus Ek 2d1 ,...,2d k , i  d with the average distance
i 1

 k 
 
 3  2d i  2   k

D   i 1  . (2)
d

The hypercube average edge dilations on two measured tori E2 (2m ,2m ) with
m  2,3,4,5,6 are shown in Table 1.
As a result of mapping the hypercube onto the torus, we have paths in the torus instead of
edges in the hypercube. The paths, in principle, can intersect each other; thus, besides dilations,
the congestions on edges are possible. Congestions can increase communication latency.
12 Mikhail S. Tarkov

Table 1. The hypercube edge dilations on the 2D-torus

n 16 64 256 1024 4096


D 1 1.667 2.75 4.6 7.833

Theorem 1. The XOR hypercube-onto-torus embedding does not produce congestions.

Proof.
1. Let us first consider one-dimensional torus (a ring) and arbitrary mapping hypercube
onto torus i.e., arbitrary node enumeration. Let two arbitrary paths with different source and
destination nodes are intersecting on the ring (i.e., have common edges) and both message
transfers are beginning simultaneously.
If these two paths are oppositely oriented, then there are no congestions because every
edge is used for simultaneous message transmissions as two oppositely oriented arcs (links).
If two paths are oriented the same way, then there are no congestions because of the
beginning of a simultaneous transfer. When some input message comes to a node for
translation, a suitable output arc is already free because an output message transfer across the
arc is finished.

2. Let us consider a general case of the XOR embedding. Butterfly communication on d-


dimensional hypercube is realized in d steps. At step

s {1,2,..., d} (3)

node v communicates with node v’ if | v  v ' |  2s 1 .

Consider the standard mapping d-dimensional hypercube onto torus En (k1 , k2 ,..., kn ),

n
ki  2di ,  ki  2d (4)
i 1

as f (v)  ( p1 , p2 ,..., pn ) where

i i 1
pi  (v mod  k j )div k j , i  1,2,..., n. (5)
j 1 j 1

Two different nodes v and v’ lie on the m-th one-measured torus if

f (| v  v ' |)  (0,...,0, pm ,0,...,0), pm  0.


m 1 nm

Show that for s {1,2,..., d} , m  n is such that


Mapping Data Processing Neural Networks … 13

m1
s  di 1
f (| v  v ' |)  f (2s 1 )  (0,...,0, 2 i 1
,0,...,0) . (6)
m 1 nm

From (3) and (4) it follows that m  n is such that

m 1 m

2i 1
di
2 s 1
 2 i 1
di
. (7)

From (7) it follows that:

i
for i {1,2,..., m  1} we have 2s 1 mod  2 j  0 . Then by (5) we have
d

j 1
s 1
pi (2 )  0, i  1,2,..., m  1 ;
m1

2s 1 s  di 1
 pm (2s 1 )  m 1 2 i 1
;
2
di

i 1
i i 1
for i {m  1,..., n} we have 2s 1 mod  2 j  2s 1 and 2s 1 div 2 j  0 because
d d

j 1 j 1

i 1
s 1
of 2   2 j for i  m . Expression (6) is proved.
d

j 1

From (6) it follows that:

1. for the standard mapping any two communicated nodes belong to one-dimensional
torus;
2. any two different pairs of communicated nodes either belong to the same one-
dimensional torus or two different non-intersected tori. In both cases, in accordance
with point 1 of the proof we have no congestions.
3. Consider the XOR embedding for the general case.

From (6) it follows that for any two communicated nodes v and v’ the standard
embeddings are as follows

f (v)  ( p1 ,..., pm 1 , pm , pm 1 ,..., pn ),


m1
s  di 1
f (v ')  ( p1 ,..., pm 1 , pm  2 i 1
, pm 1 ,..., pn ),

where s,m,n satisfy (3),(4),(7).


The XOR embedding changes the same bits in components of f (v) and f (v ') .
Therefore, embeddings f xor (v) and f xor (v' ) are differentiated by the m-th component only.
14 Mikhail S. Tarkov

Hence, these two nodes are on a one-dimensional torus and, for the XOR embedding, there are
no congestions.
Theorem 1 is proved.

4.4. Time Analysis of Semigroup Operation Execution on Torus

Let t w be the time for moving the data element between adjacent computers, to − the
time of the semigroup operation execution on two arguments, TC − the time of parallel
execution of complete semigroup operation on a torus with the use of cyclic data shifts, and
THT − the time of the same operation execution on the hypercube mapped onto a torus.

Theorem 2. TC  THT .

Proof.
k
For arbitrary k , d
i 1
i  d , the time of parallel execution of complete semigroup

operation on the torus with the use of cyclic data shifts is

 2 
k
TC  di
 1 t w  to  .
i 1

The time for execution of the semigroup operation on the hypercube mapped onto the torus

THT  ( Dtw  to )  d .

Taking into account equation (2) for D we have

k
THT  ( 3 2
i 1
di  2
 k )t w  dto .

Then

  3 2  
 k di k
di 2   k di 
TC  THT   2 1   k t w   2  1  d t o 
 i 1 i 1   i 1 

 2 
k k

2
3
 tw  di
 to  di
 d i  1  0,
4 i 1 i 1

because 2di  di  1  0 for di  1 . Theorem 2 is proved.


Mapping Data Processing Neural Networks … 15


In particular, time values TC and THT for k -dimensional torus with kn n
k

...k
n
k

computers are correspondingly equal to TC  k (k n  1)  t w  to  and

THT  log2 n  (to  Dt w ) 


 3  .
 log2 n  to   k  k n  2 t w
 4 

From here we have

 k n  
TC  THT  k   1  2t w 
  4   (8)


 k k
 
n  1  log2 n to ,

 
where k k n  1  log2 n  0 for n  2k . From (2) for n   we get the following
asymptotical equation

 
TC  THT  O k n  tw  to  .

So, the algorithms for mapping semigroup (binary associative) array operations onto
distributed computer systems with torus topology are analyzed. The first algorithm is based
on the cyclic data shifts in rows and columns of the torus. The second algorithm is based on
the use of the butterfly scheme mapped onto the hypercube with subsequent XOR  mapping
of the hypercube onto the torus. It is shown that, in spite of the hypercube edges dilation on
the torus, the hypercube-onto-torus mapping algorithm provides the time of the semigroup
operation execution on the torus less than the time provided by the cyclic-data-shift
algorithm.

5. Mapping Neural Networks with Global Connections


onto Hypercubic Computer Systems
The neural network element (neuron) implements transformation of the form

y  f (w, x)

where x is a vector of input signals, w is a vector of the neuron weight coefficients, y is an


output signal of the neuron. Each weight coefficient corresponds to one input (synapse) of the
neuron. A set of neurons, processing the same input vector x , forms a layer of neurons. The
layer operation is described by the formula:
16 Mikhail S. Tarkov

Y  f (W , x) , (9)

where W is a matrix with neuron weight vectors of the layer as rows, Y is a vector of the
layer output signals.
The bulk of the computations in the neural networks [23, 24] form semigroup operations
on the rows of matrix W of weight coefficients of the neuron layer and vector x whose
elements correspond to the pixels of the processed image. It leads to a large volume of
computations and the need to use highly parallel computer systems [21]. The maximum
achievable degree of parallelism is equal to the number of the image pixels.
The method for interprocess exchange organization in parallel implementation of
operation (9) is determined by the distribution of coefficients of the weight matrix W to the
processes. At present, there are many methods of mapping neural networks onto parallel
computer systems [27], but these methods do not use mapping semigroup operations onto
the hypercube with subsequent embedding in the torus, which provides less time for
performing calculations in comparison with other methods. This mapping is used here.
Let image (or signal) x contains N  2q pixels (indications). These pixels can be
mapped into vertices of the hypercube of dimension q . The number N of the hypercube
vertices can be reduced 2 times, r  q , by multiple merging vertices for the corresponding
r

coordinates i 1,2,..., q .
Consider a layer of neurons in which operation (9) takes the form

Y  f W  x  (10)

where W  x is a product of the weight matrix W of the layer on the vector x composed of
pixels of the image, the fragments of which are evenly distributed among the computers of the
mesh (see Fig. 3), f is the neuron activation function. Formula (10) describes the
computations in the layers of multilayer backpropagation networks and Hopfield networks.
Consider two ways of embedding layers of neurons onto the structure of distributed CS:

 placement of the rows of weight matrix W onto computers of CS (parallelism of


neurons);
 placement of the columns of weight matrix W onto computers of CS (parallelism of
synapsis).

5.1. Placement of the Weight Matrix Rows to Hypercube Nodes

Consider the organization of intercomputer exchanges in the distribution of the rows of


weight matrix W onto the hypercube nodes. Since each row of the weight matrix corresponds
to one network neuron, then the rows distribution describes the placement of the neurons. To
perform calculations for all neurons according to formula (10), it is necessary to collect all
pixels of image x at each vertex, i.e., to perform alltoall exchange of the vector x
components. Further weight matrix rows multiplication on this vector can be made in all
Mapping Data Processing Neural Networks … 17

processes in parallel (number of concurrent multiplications of vectors pairs is the hypercube


vertices number).
An all-to-all data exchange in the q -dimensional hypercube reduces to sequential
implementation of bidirectional point-to-point exchanges across dimensions i  1,2,..., q of
the hypercube that corresponds to sequential executions of the butterfly stages. At each stage,
all the bidirectional exchanges are performed simultaneously. After each stage the amount of
data doubles in each process. Upon completion of q stages each process contains all the
vector x data. Figure 6 shows an example of organizing intercomputer exchanges on a
hypercube for q  3 . The numbers in brackets are the exchange step numbers.
Suppose that N is the image pixels number, m is the neurons number in layer, n  2q is
the processes number in the parallel program (CPUs in the system), to is the execution time
of one arithmetic operation, t w is the transmission time of one data item. Then the time of
exchanges in the hypercube is

N q 1 i  1
Tex  tw  2  N 1   tw .
n i 0  n

So, we have

Theorem 3. Time Tex of all-to-all data exchange in the hypercube is equal to

 1
Tex  N 1   tw . (11)
 n

Since the pixels number in the image is N  1 , we get the time of sequential
implementation of multiplication operation W  x in the layer:

Tseq  2m  N  to .

Let the neurons number m and the pixels number N in the image are multiples of the
number of processes n of a parallel program. Consider the case where the layer neurons are
evenly distributed over the processes. Under the assumption that the parallel program
(computer system) has a hypercube topology of n  2q processes (processors), we obtain the
computation time of the parallel computations implementation in the layer (due to N  1 the
time of computing activation function is neglected):

Tseq 2mN  1
Tr   Tex  t0  N  1   t w .
n n  n

From here we have the speedup coefficient


18 Mikhail S. Tarkov

Tseq 2mNto 1
Sr   n .
Tr 2mNto  1
 N 1   tw 1
 n  1 tw
n  n 2mto

Thus, we have proved the theorem.

Theorem 4. In allocating the rows of matrix W to computers (parallelism of neurons)


the speedup coefficient does not depend on the number of pixels in the image and is equal to

Tseq 1
Sr  n . (12)
Tr
1
 n  1 tw
2mto

5.2. Placement of the Weight Matrix Columns to Hypercube Nodes

When placing the columns of weight matrix W on the processes, the parallel
computation of product W  x can be organized as follows:
a) Matrix W coefficients are multiplied to the corresponding components of vector x in
parallel and, for each neuron, the summation of the resulting products is performed. When
N  2d is multiple to x , partial sums are computed in parallel in all processes.
b) To calculate the total sum for each neuron, it is necessary to make interprocess data
exchanges using a binary tree of interprocess communications embedded in the graph of the
computer system. A number of computed sums is equal to number m of the neurons which
may be arbitrary and, in particular, times of the number of processes.
The time of multiplying the corresponding components of the weights vector and the
image vector is equal to

mN
Tmult  to .
n

The time of summing the resulting products:

N 
Tadd  m   1 to .
n 

Then the computation time of all partial sums in the computers working in parallel is
equal to

mN N   2N 
Tps  Tmult  Tadd  t o  m   1 t o  m   1 t o . (13)
n n   n 
Mapping Data Processing Neural Networks … 19

Next, the full sums are calculated for each of m neurons on the hypercube of n
processes in log 2 n steps. At each of these steps, a summing operation can be performed for
no more than two neurons. Since, during the transition from step to step, the number of terms
is half-reduced, and the minimum possible number of sums computed at each step is 1, the
time of calculation of all complete sums for m neurons equals to

 m
log 2 n
Tcs   tw  to   max 1, i  . (14)
i 1  2 

Full-time of parallel implementation:

 2N   m
log2 n
Tc  Tps  Tcs  m   1 to   tw  to   max 1, i  . (15)
 n  i 1  2 

For m multiple n from (14) we have:

log 2 n
1 n 1
Tcs  m  tw  to   i
m  t w  to  . (16)
i 1 2 n

From formulas (13)  (16) it follows:

 2N  n 1
Tc  Tps  Tcs  m   1  to  m  t w  to  
 n  n
m
  2 N  1 to   n  1 tw  .
n

The speedup is

Tseq 2mNto
Sc   
m
Tc  2 N  1 to   n  1 tw 
n
(17)
1
n .
2 N  1  n  1 tw

2N 2 Nto

For N  1 from (17) we have

Theorem 5. In the allocation of the matrix W columns to computers (parallelism of


synapses) speedup Sc does not depend on number m of neurons in the layer and is
20 Mikhail S. Tarkov

1
Sc  n . (18)
1
 n  1 tw
2 Nto

From (12) and (18) it follows

Theorem 6. If m  N , then Sr  Sc , else Sr  Sc , i.e., if the neurons number m is


more than the synapses number N of the neuron (image pixels), then the matrix W rows
distribution to computers (parallelism of neurons) is more efficient than the distribution of its
columns (parallelism of synapses), and vice versa.

5.3. Mapping the Hopfield Network

In the Hopfield neural network, weight matrix W is square, i.e., the number of neurons
m is equal to synapses number N (number of pixels in the image). From this and (12), and
(18) we obtain:

1
S r  Sc  n ,
1
 n  1 tw
2 Nto

i.e., the following theorem is correct.

Theorem 7. For all values of the image parameters and computer system parameters, the
neuron parallelization efficiency is equivalent to synapses parallelization efficiency for
mapping the Hopfield network onto the hypercube.

5.4. Mapping the Kohonen Network

In the analysis of mapping the Kohonen network onto the hypercubic DCS, we should
note that, instead of the scalar product of the vector wi of neuron weights to input vector x
of the signal (image), the measure of proximity ("distance") between these vectors is
calculated. For the distance we may use the value:

d  wi , x    w  xj  .
N 2
i
j (19)
j 1

For N  1 , the time of sequential computation of all distances for m neurons is equal to

Tseq  3mNto . (20)


Mapping Data Processing Neural Networks … 21

The analysis, similar to the above-mentioned, shows that, for parallel distances
computations, we have the equalities:

1
Sr  n (21)
1
  1 tw
n
3mto

and
1
Sc  n . (22)
1
 n  1 tw
3Nto

From (21) and (22) we obtain theorem 7 for the Kohonen network.
The time of the sequential choise of value dmin  min d ( w , x) is equal to
i
i 1,..., m

Tmin  to   m  1 . (23)

 
The parallel choise of minimum distanse d min among distances d wi , x is realized in
two steps:

s
1. In the first step, in each s -th computer the minimum dmin , s  1,2,..., n is searched.
2. In the second step of the search of dmin  min dmin
0 1
, dmin ,..., dmin
n 1
, using a 
hypercube, the butterfly is used, where the addition operation is replaced by an
operation of selecting the minimum of the two values.

As a result, each computer will have the value d min .


The time of parallel search of d min is equal to

m 
Tpar min  to   1   tw  to  log 2 n . (24)
n 

From (23) and (24) we get the search speedup

Tmin m 1
S par min  n . (25)
Tpar min
mn
 t w  to 
n log 2 n
to

For m  n from (25) we get


22 Mikhail S. Tarkov

1
S par min  n , (26)
1
 t w  t o  n log 2 n

to m

t w  to
and for m  n log2 n and  O(1) from (26) we have
to

S par min  n . (27)

Formula (27) informs us about the possibility of having effective parallelization of the
search of value d min for a big number m of neurons (relative to the number of computers n ).

6. Mapping Neural Networks with Global Connections


onto Toroidal Computer Systems
In modern DCS the multidimensional torus [15, 16] is most commonly used as a graph of
intercomputer connections.

6.1. Placement of the Weight Matrix Rows to Torus Nodes

An all-to-all exchange in the k-dimensional torus is reduced to implementation of all-to-


all exchanges in the torus rings, i.e., in the structures described by cyclic subgroups. In each
ring the exchanges are performed as shown in Figure 5.
Every computer M j , j  1,..., ni , i  1,..., k transmits its pixel array to the computer
M ( j 1) mod ni , j  1,..., ni and receives an array from the computer M ( j 1) mod ni , j  1,..., ni . It
is assumed that the links of the ring are operated simultaneously.
These steps continue until each computer of the ring receives all the pixels distributed
over the computers. The exchanges are performed in parallel for all the rings of dimension i
and successively across all dimensions i  1,2,..., k . For a two-dimensional torus (Fig. 2), the
exchanges, for example, can be executed in parallel in all horizontal rings, and then in parallel
for all the vertical rings.
Upon completion of the l -th step of an exchange, l 1,2,..., k , each computer comprises
N l
 ni data elements and, after k steps, it contains N elements, respectively, because
n i 1
k

n
i 1
i  n . The time of performing l exchange steps is

N l i 1 
Te (l )   n1  1    ni  1 n j  tw . (28)
n i 2 j 1 
Mapping Data Processing Neural Networks … 23

For l  k , transforming formula (28), we obtain:

Theorem 8. When distributing the weight matrix W rows to the torus computers, the time
Tex  Te (k ) of all-to-all data exchange does not depend on torus dimension k and is equal to

 1
Tex  N 1   tw . (29)
 n

From (11) and (29) it follows that the times of all-to-all communication on the hypercube
and torus are the same. Given the fact that, after the exchange, all the calculations are
performed in parallel, we obtain for the torus the following:

Theorem 9. In allocating the matrix W rows (parallelism of neurons) to the torus


computers, the speedup does not depend on the number of pixels in the image and is equal to
the speedup (12) for the hypercube.

6.2. Placement of the Weight Matrix Columns to Torus Nodes

The hypercube, resulted by merging the butterfly processes, can be embedded into a torus
(XOR- embedding) [12]. When placing the matrix W columns to computers, the computation
time of all partial sums in computers, working in parallel, is given in (13). Next, the complete
sums are calculated in log 2 n steps for each of m neurons on the hypercube with n nodes
(computers) embedded in a torus. At each of these steps the summing operations can be
performed for no more than two neurons. Similar to calculations (14)  (18), subject to
dilation D of the hypercube edges on the torus for N  1 , we get

Theorem 10. In allocating the matrix W columns (parallelism of synapses) to the torus
computers, speedup Sc does not depend on the number of m neurons in a layer and is equal to

1
Sc  n . (30)
1
 n  1 Dtw
2 Nto

From (18) and (30) we have the following:

N
Theorem 11. If m  then Sr  Sc else Sr  Sc , i.e., if the neurons number m is
D
greater than the ratio of the neuron synapses number N (image pixels) to the middle dilation
D of the hypercube edges on the torus, then the distribution of the weight matrix W rows
(parallelism of neurons) to computers is more efficient than its columns distribution
(parallelism of synapses), and vice versa.
24 Mikhail S. Tarkov

Sr m
Table 2. Speedup coefficient with respect neurons number

m 1024 2048 4096 8192 16384 32768 65536


Sr 171 293 455 630 780 885 949

To obtain the numerical values of the speedup we use the parameters of computer Cray
T3E [25]: CPU performance (1200 Mflops) and communication bandwidth (480 Mb/sec).
1
Assume that the data element size is 4 bytes. Then we have to   0,83 109 seconds
1, 2 109
4
and tw   8,3 109 seconds. Considering n  1024 , from the above formulas, we
480 106

obtain the speedup coefficients Sc  753 and S r shown in Table 2. From Table 2 it follows
that for a large number of neurons in the layer
( m  16384, 32768, 65536 ), it is profitable to obtain parallelization through neurons and,
when m  8192 , it is appropriate to realize the parallelization by synapses.

6.3. Mapping Hopfield Network and Kohonen Network onto the Torus

For the Hopfield network with m  N from (18) and (30), we obtain the following:

Theorem 12. In mapping the Hopfield network onto the torus for D  1 and any
parameter values of the image and computer system, the neurons parallelization is more
efficient than synapses parallelization, i.e.

S r  Sc .

It is easy to see that, for mapping the Kohonen network onto the torus when placing rows
of the weight matrix on computers, formula (21) is correct for speedup S r , while for placing
the columns the speedup is

1
Sc  n . (31)
1
 n  1 Dtw
3Nto

Comparing (21) and (31), we see that, for the Kohonen network, theorem 12 is correct.
In higher-order Hopfield networks [26], the neurons data inputs are the products of pixel
intensities, the choice of which for multiplication is determined by the objects configuration
to be recognized and, therefore, it can be arbitrary. In this respect, the efficient parallelization
in such networks can be achieved by neurons parallelism when the all-to-all exchange is prior
to computing, when each process obtains the whole image. At the same time, the synapses
Mapping Data Processing Neural Networks … 25

parallelism cannot be used because the multiplication of the pixel values, distributed over
different processes, involves a large number of interprocess communication.

6.4. Mapping Multilayer Neural Networks

Consider a two-layer sigmoid neural network with m neurons in the hidden layer and k
neurons in the output layer. When processing data arrays (images, for example) with N
elements, the following relations are usually performed:

m  N , k  N .

Let the neural network is mapped onto DCS with the structure of a hypercube or torus
with the number of computers n  m  N . According to Theorem 6 for m  N the weights
matrix of the hidden layer should be placed on processors by columns (parallelism of
synapses).
For the output neural network layer, we obtain the parallelization coefficients:

a) for the hypercube

n n
Sr  , Sc  ;
(n  1)tw (n  1)tw
1 1
2kto 2mto

b) for the torus

n n
Sr  , Sc  .
(n  1) Dtw (n  1) Dtw
1 1
2kto 2mto

Hence, when k  m , we see Sr  Sc ; otherwise Sc  Sr , i.e., for image (or signal)


compression problems ( k  m ) in the output layer, we should use neurons parallelism, and,
for classification tasks (recognition) ( m  k ), we should use synapses parallelism.

7. Training Neural Networks on Distributed Computer Systems


7.1. Training Hopfield Networks

7.1.1. Training Hopfield Network According to Hebb Rule

According to Hebb rule [1], the matrix of the Hopfield network weight coefficients is
evaluated as
26 Mikhail S. Tarkov

n
W   X p X pT , (32)
p 1

where X is a reference vector, p  1,..., n , n is a number of reference vectors.


p

p
The components of vector X are distributed to m computers (processors) of the
p
computer system. According to (32), each vector X needs the pairwise multiplication of all
its components. For parallel execution of this multiplication it is necessary to:

p
1. Perform all-to-all exchange of the vector X components among the computers to
p
obtain the vector X entirely in each processor.
p
2. Multiply the corresponding share of N / m components of the vector X by all
components of the same vector in each processor.

As a result, we have N / m rows of matrix W in each computer.

7.1.2. Training Hopfield Network According to Projection Method

This training is based on the formulas [2]:

1. W 0  0 .

 Y p  W p 1  E  X p ,

2.  Y pY pT
p 1
W p
 W  , p  1,..., n,
 Y pT Y p

where E is the identity matrix.


For each vector X p , p  1,..., n we need to perform the following steps:

p
1) all-to-all exchange of the vector X components among the computers to obtain the
p
vector X entirely in each processor;
2) parallel multiplication of matrix W p 1  E bands by vector X to obtain vector Y
p p

fragments;
p
3) all-to-all exchange of vector Y components among the computers to obtain the
p
vector Y entirely in each computer;
pT p p
4) inner product Y Y computation: vector Y components parallel multiplication,
using the doubling scheme with mapping it onto hypercube to calculate the sum of
the products;
p pT p
5) outer product Y Y computation: parallel multiplication of column vector Y
pT
fragments to row vector Y ;
6) parallel computation of matrix W p bands.
Mapping Data Processing Neural Networks … 27

7.2. Training Two-Layer Sigmoidal Neural Network on a Hypercube

Consider the backpropagation algorithm [1] which calculates the weight corrections for the
output and hidden layers. Let a two-layer sigmoidal neural network processes a vector
x  ( x1 , x2 ,..., xN ) . The outputs of the neurons in the hidden layer are calculated by the formula

u j  f  a (1)
j  , a j   w ji xi , j  1,..., n ,
N
(1) (1)

i 0

where n is a neurons number in the hidden layer, a (1)


j is the activation of the j -th neuron of

ji , i  0,1,..., N , j  1,..., n are this neuron weights, f is the neuron


the hidden layer, w(1)
activation function. The network outputs (the neurons output in the output layer) are

yk  f  ak(2)  , ak(2)   wkj(2)u j , k  1,..., m ,


n

j 0

where m is a neurons number in the output layer, ak(2) is the k -th neuron activation of the
output layer, wkj(2) , k  1,..., m, j  1,..., n are this neuron weights.

7.2.1. Training Neurons in the Output Layer Is Carried Out by the Formulas

f (ak(2) )
w (2)
  k u j ,  k   yk  d k  , (33)
ak(2)
kj

wkj(2) (t  1)  wkj(2) (t )  wkj(2) (t ) , (34)

where d k is a desired k -th neuron output of the output layer, t is a number of the current
training iteration.
Values  k , k  1,..., m and u j , j  1,..., n are distributed on computers, as well as the
neurons of the output and hidden layers, respectively. Formula (33) implies the need for
computing products  k u j , k  1,..., m, j  1,..., n . Therefore, before you perform calculations
using formulas (33)−(34), you must perform an all-to-all intercomputer exchange of
quantities  k , k  1,..., m or values u j , j  1,..., n , depending on the method of matrix w(2) (t )
elements distribution on computers.
If the matrix w(2) (t ) rows are distributed on computers (neurons parallelism), it is
necessary to make an all-to-all exchange of the signal u j , j  1,..., n values and then perform
parallel computations using formulas (33)−(34). In each computer signals u j , j  1,..., n are
multiplied by the elements  k , k  1,..., m that are located on this computer.
28 Mikhail S. Tarkov

If the matrix w(2) (t ) columns are distributed on computers (synapses parallelism), there
must first be an all-to-all exchange by values  k , k  1,..., m and then one should perform
parallel computations by formulas (33) − (34). So, to complete the step of training the neural
network output layer, it is necessary to carry out:

 an all-to-all intercomputer data exchange;


 parallel computations of products  k u j , k  1,..., m, j  1,..., n .

7.2.2. Training the Hidden Layer Is Carried Out by the Equations

f (a (1)
j )
m

ji   j xi ,  j 
w(1)
a (1)  w
k 1
k
(2)
kj ,
j

ji (t  1)  w ji (t )  w ji (t ) .
w(1) (1) (1)

The basic operation here is to calculate quantities  j , j  1,..., n . The method of  j


parallel computing depends on the distribution of the matrix w(2) elements on computers.
If the matrix w(2) (t ) rows are distributed on computers (neurons parallelism of the output
layer), then it is necessary:

1) to multiply the elements k disposed therein to their respective rows


w , k {1,..., m} (parallel operation of all computers) in every computer;
(2)
k

2) to sum products  k wkj(2) for all values j  1,..., n using a doubling scheme;
f (a (1)
j )
3) to multiply the sums obtained for the corresponding derivatives , j  1,..., n
a (1)
j

(parallel operation of all computers).

If the matrix w(2) (t ) columns are distributed on computers (synapses parallelism of the
output layer neurons), then it is necessary:

 to carry out an all-to-all intercomputer exchange of values  k , k  1,..., m ;


 to calculate values  j , j  1,..., n in parallel for all the computers of the system.

ji   j xi is similar to calculating values


The method of calculating values w(1)
wkj(2)   k u j . It is determined by the type of distribution on computers of the weight
matrix w(1) elements of the hidden layer.
Mapping Data Processing Neural Networks … 29

Conclusion
The methods for efficient mapping data processing neural networks onto robust
distributed computer systems (DCS) are proposed. The cellular neural networks are mapped
onto the graphs of parallel programs with structures "mesh" and "line". The efficiency of the
proposed methods for neural networks with global connections (Hopfield network, Kohonen
network and multilayer perceptron) is based on a butterfly scheme and doubling scheme, and
mapping these schemes onto a hypercube with a subsequent hypercube embedding onto the
torus. These networks are mapped onto regular graphs of parallel programs ("line", "ring",
"mesh", "hypercube", "torus"), intended for their implementation on DCS.
The method for mapping a neuron layer ( for multilayer feedforward networks, Hopfield
network or Kohonen network ) depends on the ratio of the neurons number in the layer and
the number of neuron weight coefficients (image pixels number) : if the neurons number is
relatively small, it is a more efficient method of distributing the layer weight matrix columns
(synapses parallelism). Otherwise the distribution of the weight matrix rows (neurons
parallelism) is more efficient. In particular, for mapping the Hopfield network onto the torus,
the rows distribution gives the best result. On the hypercube, for the Hopfield network, both
weights distribution methods give the same result.
The proposed mapping techniques provide a uniform distribution of the results of image
transformation by the neurons layer on the processes of the parallel program with a toroidal
structure. Hence, the mapping of the weight matrix of the second neurons layer can be
realized similar to the the first layer mapping. The mapping method (by rows or columns) is
also determined by the ratio of the neurons number of the second layer to its input signals
number.
So, we found that the neural network layers mapping onto a distributed computing
system, during the network operation and during the training phase, is reduced to the
following types of distributed computing schemes:

1) parallel (independent) computations in elementary computers of the system;


2) an all-to-all intercomputer data exchange;
3) sums computation by a doubling scheme with the mapping of this scheme onto a
hypercube with a subsequent embedding of the hypercube into the torus
(implementation of semigroup array operations on the hypercube and torus).

The proposed methods lead to the parallel programs generation with the following regular
structures: line, mesh, binary tree, hypercube and toroidal structures of different dimensions
(in particular, the ring).

References
[1] Haykin, S., (1999). Neural Networks. A Comprehensive Foundation, Prentice Hall Inc.
[2] Michel, A., Farrel, J, (1990). Associative memories via artificial neural networks, IEEE
Control System Magazine. 10, 6-16.
[3] Sundararajan, N., Saratchandran, P., (1988). Parallel Architectures for Artificial Neural
Networks. Paradigms and Implementations. IEEE Computer Society.
30 Mikhail S. Tarkov

[4] Ayoubi, R.A., Bayoumi, M.A., (2003). Efficient Mapping Algorithm of Multilayer
Neural Network on Torus Architecture. IEEE Trans. on Parallel and Distributed
Systems. 14, 932-943.
[5] Lin, W.-M., Prasanna, V.K., Prjitula, K.W., (1991). Algorithmic Mapping of Neural
Network Models onto Parallel SIMD Machines. IEEE Trans. on Computers. 40,
1390-1401.
[6] Mahapatra, R.N., Mahapatra, S., (1996). Mapping of neural network models onto two-
dimensional processor arrays. Parallel Computing. 22, 1345-1357.
[7] Mahapatra, S., Mahapatra, R.N., Chatterji, B.N., (1997). A parallel formulation of back-
propagation learning on distributed memory multiprocessors. Parallel Computing. 22,
1661-1675.
[8] Fujimoto, Y., Fukuda, N., Akabane, T., (1992). Massively Parallel Architectures for
Large Scale Neural Network Simulations. IEEE Trans. on Neural Networks. 3,
876-888.
[9] Tarkov, M.S., Mun, Y., Choi, J., Choi, H.I., (2002). Mapping Adaptive Fuzzy
Kohonen Clustering Network onto Distributed Image Processing System. Parallel
Computing. 28, 12391256.
[10] Wolf, W., Jerraya, A.A., Martin, G. Multiprocessor System-on-Chip (MPSoC)
Technology., (2008). IEEE Trans. On Computer-Aided Design Of Integrated Circuits
And Systems, 27, 1701-1713.
[11] Pham, P.-H., Jelaca, D., Farabet, C., Martini, B., LeCun, Y., Culurcielo, E., (2012).
NeuFlow: Dataflow Vision Processing System-on-a-Chip. IEEE 55th International
Midwest Symposium on Circuits and Systems (MWSCAS). 1044 – 1047.
[12] Gonzalez, A., Valero-Garcia, M., Diaz de Cerio, L., (1995). Executing Algorithms with
Hypercube Topology on Torus Multicomputers. IEEE Trans. on Parallel and
Distributed Systems. 6, 803814.
[13] Lakshmivarahan, S., Dhall, S.K., (1999). Ring, torus and hypercube
architectures/algorithms for parallel computing. Parallel Computing. 25, 1877-1906.
[14] Tarkov, M.S., (2011). Mapping semigroup array operations onto multicomputer with
torus topology. Proc. of the 5th Int. Conf. on Ubiquitous Information Management and
Communication. Article No. 135.
[15] Yu, H., Chung, I-Hsin, Moreira, J., (2006). Topology Mapping for Blue Gene/L
Supercomputer. Proc. of the ACM/IEEE SC2006 Conf. on High Performance
Networking and Computing. ACM Press, 5264.
[16] Balaji, P.; Gupta, R.; Vishnu, A. & Beckman, P. (2011). Mapping Communication
Layouts to Network Hardware Characteristics on Massive-Scale Blue Gene Systems,
Comput. Sci. Res. Dev. 26, 247–256.
[17] Palmer, J.E., (1986). The NCUBE family of parallel supercomputers. Proc. of the
International Conference on Computer Design. IEEE.
[18] Foster, I. Designing and Building Parallel Programs, Available at: http://www-
unix.mcs.anl.gov/dbpp
[19] Miller, R., Boxer, L., (2000). Algorithms Sequential and Parallel: A Unified Approach.
Prentice Hall.
[20] Ortega, J.M. (1988). Introduction to Parallel and Vector Solution of Linear Systems,
New York: Plenum.
Mapping Data Processing Neural Networks … 31

[21] Parhami, B. (2002). Introduction to Parallel Processing. Algorithms and Architectures,


New York: Kluwer Academic Publishers.
[22] Gonzalez, R.C., Woods, R. E., (2008). Digital Image Processing. Prentice Hall.
[23] de Ridder, D., Duin, R.P.W., Verbeek, P.W., van Vliet, L.J., (1999). The Applicability
of Neural Networks to Non-linear Image Processing. Pattern Analysis & Applications.
2, 111-128.
[24] Egmont-Petersen, M., de Ridder, D., Handels, H., (2002). Image processing with neural
networks—a review, Pattern Recognition. 35, 2279–2301.
[25] The Cray T3E, Available at: http://www.cray-cyber.org/systems/t3e.php
[26] Spirkovska, L., Reid, M.B., (1994). Higher-Order Neural Networks Applied to 2D and
3D Object Recognition, Machine Learning, 15, 169-199.
In: Parallel Programming ISBN: 978-1-63321-957-1
Editor: Mikhail S. Tarkov © 2015 Nova Science Publishers, Inc.

Chapter 2

MAPPING PARALLEL PROGRAM GRAPHS


ONTO GRAPHS OF DISTRIBUTED COMPUTER
SYSTEMS BY NEURAL NETWORK ALGORITHMS

Mikhail S. Tarkov *
Institute of Semiconductor Physics SB RAS, Novosibirsk, Russia

Abstract
The problem of mapping a parallel program with weighed vertices (processes) and edges
(interprocess exchanges) onto a weighed graph of the distributed computer system is
considered. An algorithm for solving this problem based on the use of Hopfield networks is
proposed. The algorithm is tested on mapping a number of graphs of parallel programs onto a
multicore computer. Experiments have shown that the proposed algorithm provides well-
balanced sub-optimal mappings. Optimal solutions are found for mapping a “line”-graph onto
a two-dimensional torus. To increase the probability of finding an optimal mapping, a method
for splitting the mapping is proposed. The method’s essence is reducing the solution matrix to
a block-diagonal form. The Wang recurrent neural network is used to exclude incorrect
solutions of the problem of mapping the line-graph onto a three-dimensional torus. An
algorithm based on a recurrent neural Wang network and the WTA (“Winner takes all”)
principle is proposed for the construction of Hamiltonian cycles in graphs of distributed
computer systems.

Keywords: mapping, graphs of parallel programs, multicore systems, load balancing, neuron,
Hopfield networks

1. Introduction
Due to the ever increasing power of computers it is possible to create high-performance
multicomputer systems [1]. Such a system, in general, is a unified, by communication lines,
set of computing nodes, which has a high isolation degree, i.e., each node has its own

* E-mail address: [email protected]


34 Mikhail S. Tarkov

processor, often multicore, the memory, perhaps, own hard drive and a way to communicate
with other nodes (network card, modem, etc.). Nodes of the computer system can have a
different performance and different communication capabilities. In general, the structure of a
computer system is arbitrary and depends on several factors, such as, for example, the
hardware capabilities, the purpose for which the system is created, the financial constraints.
Unlike systems with shared memory, in multicomputer systems, a more urgent issue is
reducing the amount of intercomputer interactions, since the capacity of the network is low.
We must, therefore, allocate the processes of a parallel program to available computing nodes
to minimize the execution time of a parallel program. To make this, we must download the
processors according to their performance (the higher the performance, the higher the load is)
and minimize the interaction between the processes at different nodes, which can reduce the
CPU idle time, reducing the efficiency of the computer system.
In general, this problem is reduced to the mapping of the graph of the parallel program
onto the graph of the computer system [2-6]. The purpose of the mapping is to minimize the
execution time of the program. Due to the complexity of the mapping problem (it is NP-
complete), various heuristics are widely used to find the best mappings. Currently, popular
methods are based on the analogies with physics and biology, such as the method of
simulated annealing, genetic algorithms and neural networks [7]. The latter include the
Hopfield network [8, 9].

2. Mapping Problem
The purpose of optimal allocation of parallel program processes (branches) to the system
processors is to minimize the execution time of the program, which is equivalent to
minimizing the downtime of each processor involved in the task and, at the same time,
minimizing the cost of communications among the processes placed in different processors.
Let
Gp (Vp , E p ) is a graph of a parallel program;
V p is a set of the program branches, n  Vp ;
E p is a set of logical (virtual) channels, each channel implements an interaction between
two branches;
Gs (Vs , Es ) is a graph of the computer system;
Vs is a set of processors (processor cores  CPUs), m  Vs , m  n ;
Es is a set of connections between processors;
wx is a weight (computational complexity) of branch x Vp ;
i is a performance of processor i Vs ;
w
 xi  x is a run-time of branch x Vp on CPU i Vs ;
i
cxy is a weight of edge ( x, y)  E p , equal to the number of information units
transmitted between branches x and y ;
Mapping Parallel Program Graphs … 35

d ij is a time of information unit transmission between processors (cores) i and j .


Let f m : Gp  Gs is a mapping of program graph Gp (Vp , E p ) onto graph Gs (Vs , Es ) of
the computer system. The mapping f m quality will be estimated by the objective function

H g ( f m )  H gc ( f m )  H gt ( f m ) ,

where H gc ( f m ) is an estimate of the computational load imbalance, H gt ( f m ) is an estimate


of the total interprocessor communications time.
For mapping f m , the full time of i -th CPU computations is equal to

Ti  
f m ( x ) i
 xi .

From here we have

m
H gc ( f m )   Ti  Tmin  ,
2

i 1

w x
where Tmin  x
is the minimum possible (ideal) execution time of the parallel program

i
i

(run time of the program on a single processor with a capacity equal to the total performance
of the system at no cost of the interactions between the program branches).
The interaction cost is estimated by the function

H gt ( f m )  
x  y ,i  f m ( x ), j  f m ( y )
cxy dij ,

where the summation is realized over all pairs ( x, y) of interacting branches of the parallel
program.

3. Hopfield Network for the Mapping Problem


Consider the neuron states matrix v with size n  m . Each row of the matrix
corresponds to the branch of a parallel program, each column corresponds to the processor
(core). Each row of matrix v must contain one and only one non-zero element equal to one;
the other elements are zero (the branch of a parallel program can not be mapped
simultaneously onto several processors). Each column can contain any number of elements
equal to one (including zero), but the total number of unique elements must be equal to the
36 Mikhail S. Tarkov

number of branches of the parallel program. We call matrix v satisfying these restrictions the
permissible (correct) solution of the mapping problem.
The corresponding Hopfield neural network energy is described by the Lyapunov
function

A B
H Hc  H g . (1)
2 2

Here A and B are the Lyapunov function parameters. The H c minimum ensures the
above restrictions on the elements of matrix v ; H g is the objective function.

2 2
   
H c  H c1  H c 2 , H c1    vxi  n  , H c 2     vxi  1 . (2)
 x i  x  i 

The H c1 minimum ensures n units in the matrix v exactly. The H c 2 minimum


provides exactly one unit in each matrix v row.

2
 
H g  H gc  H gt , H gc     vxi xi  Tmin  ,
i  x  (3)
H gt   vxi  v yj cxy dij .
x i y j

Here vxi is the state of the neuron in row x and column i of matrix v .
The Hopfield network dynamics, minimizing function (1), is described by the system of
equations

u xi H
 , (4)
t vxi

1
where u xi is activation of the neuron with indexes x, i , x, i  1,..., n, vxi 
1  exp    uxi 
is the state (output signal) of the neuron,  is the activation function parameter.
From (1)-(4) we have

u xi  
  A   v yj   vxj  n  1 
t  y j j  (5)
  
 B   v yi yi  tmin    xi   v yj cxy dij  .
 y  y j 
Mapping Parallel Program Graphs … 37

The corresponding (5) difference equation is ( t is the number of the current iteration,
t is the time step value):

 
u xit 1  u xit  t  { A   v yj   vxj  n  1 
 y j j 
(6)
  
 B   v yi yi  tmin    xi   v yj cxy dij }.
 y  y j 

In order to accelerate the convergence, the Hopfield network (6) is transformed into the
Wang network [10, 11] by multiplying the objective function of the optimization problem to
exp   t   , where  is a parameter:

 
u xit 1  u xit  t  { A   v yj   vxj  n  1 
 y j j 
(7)
  
 B   v yi yi  tmin    xi   v yj cxy d ij   exp   t
 }.
 y  y j 

t 1
The new value of vxi is calculated immediately after finding the corresponding value of
u xit 1 (Gauss–Seidel method).

4. Mapping Parallel Programs onto Multicore Computers


In experiments, the mappings onto the computer with m  2,4,8 cores are investigated
for the following types of the parallel program with the same weights wx  1, x  0,..., n  1
of nodes (branches) and weights cxy  0 (there is no edge ( x, y) ) or cxy  1 (there is an edge
( x, y) ), x, y  0,..., n  1, x  y of the program graph edges:

1. a set of independent branches (no data exchanges between branches, i.e., cxy  0 for
any edge  x, y  ),
2. typical graphs of parallel programs (line, ring and mesh) (Figure 1), cxy  1 to
adjacent vertices x and y of the program graph,
3. irregular grids with identical edges: cxy  1 for adjacent vertices x and y .

The computer system parameters:


core performance i  1, i  0,..., m  1 ;
38 Mikhail S. Tarkov

0, i  j ,
dij   (8)
1, i  j.

According to (8), we consider the cost of data exchanges between the branches of the
program within the core to be negligible with respect to the intercore exchanges. In other
words, the data in the core are considered as arrays processed by one branch.

Figure 1. Typical graphs of parallel programs (line, ring and mesh).

For wx  1 and i  1 we have  xi  1 , x  1,..., n, i  1,..., m . The neural network


parameters are equal to A  1000, B  100, t  1,   1,   100 . The mapping procedure is
as follows:

do
{ initialize();
do
{ iterate(); iter = iter + 1;
 = nobalance();
} while (  >0 && iter < maxiter);
} while (  >maxnb || noncorrect());

Here
0 0
initialize sets the initial value of matrix u ( and, respectively v ) elements using a
random number generator ;
iterate performs a step of the iterative process (7), calculating new values of the elements
t 1 t 1
of matrices u and v ;
nobalance calculates the load imbalance by the formula

m 1

 T  T 
2
i min

 i 0
;
m  Tmin
Mapping Parallel Program Graphs … 39

t 1
noncorrect verifies conditions (2) of solution v correctness.
Iterative process (7) continues until the load balance is obtained (   0 ) or the
maximum number of iterations maxiter  1000 is reached. If under these conditions the
iterative process is completed, the correctness of solution vt 1 is verified. If solution vt 1 is
incorrect or the load imbalance exceeds the allowable maximum maxnb, then we set new
initial conditions and the iteration process is repeated (the algorithm is restarted).
Tables 1-4 show the following results obtained on the CPU Pentium (R) Dual-Core CPU
E 52000, 2,5 GHz in the series of 100 tests of the algorithm for mapping the program graphs
onto a computer with four cores ( m  4 ):

I a is the average number of iterations to 100 tests;


I m is the maximum number of iterations;
ta is the average running time of the mapping algorithm (in seconds);
tm is the maximum running time of the mapping algorithm (in seconds);
N a is the average number of restarts of the algorithm if condition  >maxnb ||
noncorrect() is not satisfied;
N m is the maximum number of restarts of the algorithm if condition  >maxnb ||
noncorrect() is not satisfied;
Ca is the average total amount of data transferred between the cores for the evaluated
mapping;
Cm is the maximum total amount of data transferred between the cores for the evaluated
mapping.

Note that under the above weights cxy , values Ca and Cm specify the number of the
program graph Gp (Vp , E p ) edges connecting the graph vertices mapped onto different
computer cores.

Table 1. Independent tasks

n 4 8 16 32 64
Ia 61 50 50 57 70
Im 374 265 275 356 222
ta 0,0084 0,026 0,05 0,066 0,441
tm 0,109 0,235 0,579 0,781 2,92
Na 0,51 0,59 0,41 0,28 0,82
Nm 6 5 7 3 5
Exploring the Variety of Random
Documents with Different Content
Mr. Belin. You are known as M. L. Baker?
Mr. Baker. That is right, sir.
Mr. Belin. What is your occupation?
Mr. Baker. With Dallas Police Department.
Mr. Belin. How long have you been with the Dallas Police
Department?
Mr. Baker. Almost 10 years.
Mr. Belin. How old are you, Officer Baker?
Mr. Baker. Thirty-three.
Mr. Belin. Where were you born?
Mr. Baker. In a little town called Blum, Tex.
Mr. Belin. Did you go to school in Blum, Tex.?
Mr. Baker. Yes, sir; I think I went to about the sixth grade.
Mr. Belin. Then where did you go?
Mr. Baker. We moved to Dallas and I continued schooling at the
Roger Q. Mills School, elementary, went to junior high school, I
believe it was called Storey, and then I finished high school in
Adamson High School.
Mr. Belin. In Dallas?
Mr. Baker. Yes, sir.
Mr. Belin. What did you do after you graduated from high school?
Mr. Baker. I think I got married.
The Chairman. Gentlemen, at this time I must go to the court, we
have a session of the court today hearing arguments and Mr. Dulles,
you are going to be here through the morning, so if you will conduct
the meeting from this time on.
Excuse me, gentlemen.
(At this point, the Chief Justice left the hearing room.)
Mr. Belin. After you got married, sir, what did you do. I mean in
the way of vocation?
Mr. Baker. I took up a job as a sheetmetal man at the Continental
Tin Co.
Mr. Belin. How long did you work for Continental?
Mr. Baker. Approximately 3 months.
Mr. Belin. Then what did you do?
Mr. Baker. At that time I quit this job and went to the Ford Motor
Co.
Mr. Belin. What did you do at Ford?
Mr. Baker. Well, at that time I stayed there approximately 11
months and they laid me off and I went to the, I believe they call it
Chance Vought at that time, aircraft.
Mr. Belin. What did you do at Ford, sir?
Mr. Baker. I was a glass installer, I believe that is what you would
call it.
Mr. Belin. All right.
When you went to this aircraft factory what did you do?
Mr. Baker. I was a material clerk.
Mr. Belin. How long did you work for them?
Mr. Baker. I didn't understand?
Mr. Belin. How long did you work for the aircraft company?
Mr. Baker. It seemed like somewhere around a year and a half.
Mr. Belin. All right, then what did you do?
Mr. Baker. At that time it was uncertain out there whether you
would stay there or not, they were laying off a few of the men and I
went with the neighbor's trailer company which was located in Oak
Cliff there.
Mr. Belin. What did you do there?
Mr. Baker. I was, I guess you would call it a mechanic. I did a
little bit of everything there, I did all the road work, and did all the
delivering at that time.
Mr. Belin. How long did you stay with them?
Mr. Baker. A little over 3 years.
Mr. Belin. Then what did you do?
Mr. Baker. Then I became, I went with the city of Dallas.
Mr. Belin. With the police department?
Mr. Baker. Yes, sir.
Mr. Belin. Did you take a course of instruction for the police
department?
Mr. Baker. Yes, sir; I went to the Dallas Police Academy School
out there.
Mr. Belin. How long was this schooling period, approximately?
Mr. Baker. Four months.
Mr. Belin. After you were graduated from the Dallas Police
Academy, did you right away become a motorcycle policeman or
were you first a patrolman or what did you do?
Mr. Baker. No, sir; at first I was a patrolman and I spent some 23
months in radio patrol division. And then I volunteered solo division.
Mr. Belin. When you were in this radio car, was this a patrol car
where two men would be——
Mr. Baker. That is right, sir.
Mr. Belin. And have you been a motorcycle policeman then, say,
for the last 7 or 8 years?
Mr. Baker. Yes, sir; that is pretty close to it.
(At this point, Representative Ford left the hearing room.)
Mr. Belin. By the way, you use the word solo; generally do people
in police cars ride in pairs during the daytime or solos or what?
Mr. Baker. If you are talking about the squad cars at the time
that I worked in the radio patrol division, most of them were two-
men squads.
Mr. Belin. Were there some one-man squads, too?
Mr. Baker. Very few.
Mr. Belin. What about today, do you know what the situation is?
Mr. Baker. They still have, say, very few two-men squads and a
lot of one-man squads now.
Mr. Belin. They have a lot of one-man squads now?
Mr. Baker. Yes, sir.
Mr. Dulles. Is that because of a shortage of men for the jobs to
cover?
Mr. Baker. Yes, sir.
Mr. Dulles. Not because of the procedures?
Mr. Baker. Now, at night they try to ride them two men.
Mr. Belin. In the daytime what is the situation now?
Mr. Baker. Usually the downtown squads which I work are two
men, and the outlying squads are one man.
Mr. Belin. All right.
Coming down to November 22, 1963, what was your occupation
on that day?
Mr. Baker. I was assigned to ride a motorcycle.
Mr. Belin. And where were you assigned to ride the motorcycle?
Mr. Baker. At this particular day in the office up there before we
went out, I was, my partner and I, we received instructions to ride
right beside the President's car.
Mr. Belin. About when was this that you received these
instructions?
Mr. Baker. Let's see, I believe we went to work early that day,
somewhere around 8 o'clock.
Mr. Belin. And from whom did you receive your original
instructions to ride by the side of the President's car?
Mr. Baker. Our sergeant is the one who gave us the instructions.
This is all made up in the captain's office, I believe.
Mr. Belin. All right.
Mr. Dulles. Captain Curry?
Mr. Baker. Chief Curry; our captain is Captain Lawrence.
Mr. Belin. Were these instructions ever changed?
Mr. Baker. Yes, sir. When we got to the airport, our sergeant
instructed me that there wouldn't be anybody riding beside the
President's car.
Mr. Belin. Did he tell you why or why not?
Mr. Baker. No, sir. We had several occasions where we were
assigned there and we were moved by request.
Mr. Belin. On that day, you mean?
Mr. Baker. Well, that day and several other occasions when I
have escorted them.
Mr. Belin. On that day when did you ride or where were you
supposed to ride after this assignment was changed?
Mr. Baker. They just—the sergeant told us just to fall in beyond
it, I believe he called it the press, behind the car.
Mr. Belin. Beyond the press?
Mr. Baker. Yes, sir.
Mr. Belin. Did he tell you this after the President's plane arrived
at the airport or was it before?
Mr. Baker. It seemed like it was after he arrived out there.
Mr. Belin. Had you already seen him get out of the plane?
Mr. Baker. Yes, sir.
Mr. Belin. About what time was it before the motorcade left that
you were advised of this, was it just before or 5 or 10 minutes
before, or what?
Mr. Baker. It was 5 or 10 minutes before.
Mr. Belin. All right.
Then the motorcade left and you rode along on a motorcycle in
the motorcade?
Mr. Baker. Yes, sir.
Mr. Belin. Was it a two-wheeler or a three-wheeler?
Mr. Baker. It was a two-wheeler.
Mr. Belin. You rode with the motorcade as it traveled through
downtown Dallas?
Mr. Baker. Yes, sir.
Mr. Belin. And eventually what is the fact as to whether or not
the motorcade got to Main Street?
Mr. Baker. You say how fast?
Mr. Belin. No; did the motorcade get to Main Street in Dallas,
was it going down Main Street at anytime?
Mr. Baker. Yes, sir; it did.
Mr. Belin. All right.
I wonder if you would pick up your actions with the motorcade
as it went down Main Street commencing at, say, Main and Record
Streets.
Mr. Baker. Well, it was the usual escort. We were traveling about
somewhere around 5 to 10 miles an hour.
Mr. Dulles. There is a map right behind you.
(Discussion off the record.)
Mr. Belin. Back on the record again.
Mr. Dulles. Would you state exactly where you were riding? We
know a good deal about this, the cars the way they were paced.
There was a car right behind the President's car that followed it, I
think 6 or 7 feet right behind the President's car.
Mr. Baker. That was the Secret Service car.
Mr. Dulles. That is right. Were you in that gap between the two
cars or what?
Mr. Baker. No, sir; I was, it seemed to me like, there was this car.
Mr. Dulles. When you say "this car" what do you mean?
Mr. Baker. That Secret Service car.
Mr. Dulles. The Secret Service car right behind the President?
Mr. Baker. And there was one more car in there.
Mr. Dulles. Behind that?
Mr. Baker. Yes, sir.
Mr. Dulles. That was the Vice President's car, wasn't it?
Mr. Baker. Yes, sir.
Mr. Dulles. And then?
Mr. Baker. There were four press cars carrying the press and I
was right at the side of that last one.
Representative Boggs. The last press car?
Mr. Dulles. The last press car?
Mr. Baker. Yes, sir.
Mr. Dulles. So you were roughly how far behind the President's
car at this stage?
Mr. Baker. Sometimes we got, at this stage we were possibly a
half block.
Mr. Dulles. A half block?
Mr. Baker. Yes, sir; as I say as I turned the corner the front of it
was turning the corner at Elm Street.
Mr. Belin. You mean as you were turning right from Main on to
Houston Street heading north onto Houston, the President's car had
already turned to the left off Houston heading down that entrance to
the expressway, is that correct?
Mr. Baker. That is right.
Mr. Belin. All right.
I believe—pardon me, Mr. Dulles, does that answer your
question?
Mr. Dulles. That answers my question. I wanted to see where he
was.
Mr. Belin. You said you were going down Main Street at around
Record at from 5 to 10 miles an hour?
Mr. Baker. Yes, sir.
Mr. Belin. All right.
Will you take up your trip from there, please?
Mr. Baker. As we approached the corner there of Main and
Houston we were making a right turn, and as I came out behind that
building there, which is the county courthouse, the sheriff building,
well, there was a strong wind hit me and I almost lost my balance.
Mr. Belin. How fast would you estimate the speed of your
motorcycle as you turned the corner, if you know?
Mr. Baker. I would say—it wasn't very fast. I almost lost balance,
we were just creeping along real slowly.
Mr. Dulles. That is turning from Main into Houston?
Mr. Baker. That is right, sir.
Mr. Belin. You turned—do you have any actual speed estimate as
you turned that corner at all or just you would say very slow?
Mr. Baker. I would say from around 5 to 6 or 7 miles an hour,
because you can't hardly travel under that and you know keep your
balance.
Mr. Belin. From what direction was the wind coming when it hit
you?
Mr. Baker. Due north.
Mr. Belin. All right.
Now, tell us what happened after you turned on to Houston
Street?
Mr. Baker. As I got myself straightened up there, I guess it took
me some 20, 30 feet, something like that, and it was about that time
that I heard these shots come out.
Mr. Belin. All right.
Could you just tell us what you heard and what you saw and
what you did?
Mr. Baker. As I got, like I say as I got straightened up there, I
was, I don't know when these shots started coming off, I just—it
seemed to me like they were high, and I just happened to look right
straight up——
Mr. Dulles. I wonder if you would just tell us on that chart and I
will try to follow with the record where you were at this time, you
were coming down Houston.
Mr. Belin. Sir, if you can—I plan to get that actual chart in a
minute. If we could——
Mr. Dulles. I want to see where he was vis-a-vis the building on
the chart there.
Mr. Baker. This is Main Street and this is Houston. This is the
corner that I am speaking of; I made the right turn here. The
motorcade and all, as I was here turning the front car was turning
up here, and as I got somewhere about right here——
Mr. Dulles. That is halfway down the first block.
Mr. Belin. No, sir; can I interrupt you for a minute?
Mr. Dulles. Certainly.
Mr. Belin. Officer Baker, when we were in Dallas on March 20,
Friday, you walked over with me and showed me about the point
you thought your motorcycle was when you heard the first shot, do
you remember doing that?
Mr. Baker. Yes, sir.
Mr. Belin. And then we paced this off measuring it from a
distance which could be described as the north curbline of Main
Street as extended?
Mr. Baker. Yes, sir; that would be this one right across here.
Mr. Belin. And we paced it off as to where you thought your
motorcycle was when you heard the first shot and do you remember
offhand about where you said this was as to what distance it was,
north of the north curbline of Main Street?
Mr. Baker. We approximated it was 60 to 80 feet there, north of
the north curbline of Main on Houston.
Mr. Dulles. Thank you.
Mr. Belin. Does that answer your question?
Mr. Dulles. That answers my question entirely.
Mr. Belin. In any event you heard the first shot, or when you
heard this noise did you believe it was a shot or did you believe it
was something else?
Mr. Baker. It hit me all at once that it was a rifle shot because I
had just got back from deer hunting and I had heard them pop over
there for about a week.
Mr. Belin. What kind of a weapon did it sound like it was coming
from?
Mr. Baker. It sounded to me like it was a high-powered rifle.
Mr. Belin. All right. When you heard the first shot or the first
noise, what did you do and what did you see?
Mr. Baker. Well, to me, it sounded high and I immediately kind of
looked up, and I had a feeling that it came from the building, either
right in front of me or of the one across to the right of it.
Mr. Belin. What would the building right in front of you be?
Mr. Baker. It would be this Book Depository Building.
Mr. Belin. That would be the building located on what corner of
Houston and Elm?
Mr. Baker. That would be the northwest corner.
Mr. Belin. All right. And you thought it was either from that
building or the building located where?
Mr. Baker. On the northeast corner.
Mr. Belin. All right. Did you see or hear or do anything else after
you heard the first noise?
Mr. Baker. Yes, sir. As I was looking up, all these pigeons began
to fly up to the top of the buildings here and I saw those come up
and start flying around.
Mr. Belin. From what building, if you know, do you think those
pigeons came from?
Mr. Baker. I wasn't sure, but I am pretty sure they came from the
building right on the northwest corner.
Mr. Belin. Then what did you see or do?
Mr. Baker. Well, I immediately revved that motorcycle up and was
going up there to see if I could help anybody or see what was going
on because I couldn't see around this bend.
Mr. Belin. Well, between the time you revved up the motorcycle
had you heard any more shots?
Mr. Baker. Yes, sir; I heard—now before I revved up this
motorcycle, I heard the, you know, the two extra shots, the three
shots.
Mr. Belin. Do you have any time estimate as to the spacing of
any of these shots?
Mr. Baker. It seemed to me like they just went bang, bang, bang;
they were pretty well even to me.
Mr. Belin. They were pretty well even.
Anything else between the time of the first shot and the time of
the last shot that you did up to the time or saw——
Mr. Baker. No, sir; except I was looking up and I could tell it was
high and I was looking up there and I saw those pigeons flying
around there.
Mr. Belin. Did you notice anything in either of those two buildings
either on the northeast or northwest corner of Houston and Elm?
Mr. Baker. No, sir; I didn't.
Mr. Belin. Were you looking at any of those windows?
Mr. Baker. I kind of glanced over them, but I couldn't see
anything.
Mr. Belin. How many shots did you hear?
Mr. Baker. Three.
Mr. Belin. All right. After the third shot, then, what did you do?
Mr. Baker. Well, I revved that motorcycle up and I went down to
the corner which would be approximately 180 to 200 feet from the
point where we had first stated, you know, that we heard the shots.
Mr. Belin. What distance did you state? What we did on Friday
afternoon, we paced off from the point you thought you heard the
first shot to the point at which you parked the motorcycle, and this
paced off to how much?
Mr. Baker. From 180 to 200 feet.
Mr. Belin. That is where you parked the motorcycle?
Mr. Baker. Yes.
Mr. Belin. All right.
I wonder if we could go on this plat, Officer Baker, and first if
you could put on here with this pen, and I have turned it upside
down.
With Exhibit 361, show us the spot at which you stopped your
motorcycle approximately and put a "B" on it, if you would.
Mr. Baker. Somewhere at this position here, which is
approximately 10 feet from this signal light here on the northwest
corner of Elm and Houston.
Mr. Belin. All right.
You have put a dot on Exhibit 361 with the line going to "B" and
the dot represents that signal light, is that correct?
Mr. Baker. That is right, sir.
Mr. Belin. You, on Friday, March 20, parked your motorcycle
where you thought it was parked on November 22 and then we
paced off the distance from the nearest point of the motorcycle to
the stop light and it was 10 feet, is that correct?
Mr. Baker. That is correct, sir.
Mr. Belin. All right.
Now, I show you Exhibit 478 and ask you if you will, on this
exhibit put an arrow with the letter "B" to this stoplight.
Mr. Baker. Talking about this one here?
Mr. Belin. The stoplight from which we measured the distance to
the motorcycle. The arrow with the letter "B" points to the stoplight,
is that correct?
Mr. Baker. That is correct, sir.
Mr. Belin. And you stopped your motorcycle 10 feet to the east of
that stoplight, is that correct?
Mr. Baker. That is correct, sir.
Mr. Belin. We then paced off the distance as to approximately
how far it was from the place your motorcycle was parked to the
doorway of the School Book Depository Building, do you remember
doing that, on March 20?
Mr. Baker. Yes, sir.
Mr. Belin. And it appears on Exhibit 477 that that doorway is
recessed, is that correct?
Mr. Baker. That is correct, sir.
Mr. Belin. Do you remember how far that was from the place
your motorcycle was parked to the doorway?
Mr. Baker. Approximately 45 feet.
Mr. Belin. This same stoplight appears as you look at Exhibit 477
to the left of the entranceway to the building, is that correct?
Mr. Baker. That is correct, sir.
Mr. Belin. After you parked your motorcycle, did you notice
anything that was going on in the area?
Mr. Baker. Yes, sir. As I parked here——
Mr. Belin. You are pointing on Exhibit 361 to the place that you
have marked with "B."
Mr. Baker. And I was looking westward which would be in this
direction.
Mr. Belin. By that, you are pointing down the entrance to the
freeway and kind of what I will call the peninsula of the park there?
Mr. Baker. Yes, sir.
Mr. Belin. Toward the triple underpass.
Representative Boggs. Where is the underpass?
Mr. Baker. The underpass is down here. This is really Elm Street,
and this would be Main and Commerce and they all come together
here, and there is a triple overpass.
Representative Boggs. Right.
Mr. Baker. At this point, I looked down here as I was parking my
motorcycle and these people on this ground here, on the sidewalk,
there were several of them falling, and they were rolling around
down there, and all these people were rushing back, a lot of them
were grabbing their children, and I noticed one, I didn't know who
he was, but there was a man ran out into the crowd and back.
Mr. Belin. Did you notice anything else?
Mr. Baker. Except there was a woman standing—well, all these
people were running, and there was a woman screaming, "Oh, they
have shot that man, they have shot that man."
Mr. Belin. All right.
Now, you are on Exhibit 361, and you are pointing to people
along the area or bordering the entrance to that expressway and
that bit of land lying to the west and north, as to where you describe
these people, is that correct?
Mr. Baker. That is correct, sir.
Mr. Dulles. Would you mark where the overpass would be, right
at the end of those lines, just so we get oriented on it.
Mr. Belin. I am trying to see down here.
Mr. Dulles. I just wanted to get a general idea.
Mr. Belin. On Exhibit 361, sir, it wouldn't show but it basically
would be off in this direction coming down this way. The entrance to
the freeway would go down here and the overpass would roughly be
down here.
Mr. Dulles. As far as that?
Mr. Belin. Yes, sir; I think Mr. Redlich is going to get a picture
that will better describe it.
Mr. Dulles. All right.
Mr. Belin. All right.
Is there anything else you saw there, Officer Baker, before you
ran to the building?
Mr. Baker. No, sir; not at that time.
Mr. Belin. All right.
Then what did you do after surveying the situation?
Mr. Baker. I had it in mind that the shots came from the top of
this building here.
Mr. Belin. By this building, you are referring to what?
Mr. Baker. The Book Depository Building.
Mr. Belin. Go on.
Representative Boggs. You were parked right in front of the
Building?
Mr. Baker. Yes, sir; ran right straight to it.
Representative Boggs. Right.
Let me ask you a question. How far away, approximately, were
these people who were running and falling and so forth from the
entrance to the Building?
Mr. Baker. Well, now, let me say this. From this position here.
Mr. Belin. That is position "B" on Exhibit 361?
Mr. Baker. There were people running all over this here.
Mr. Belin. And you are pointing to the street and the parkway all
in front of the School Building?
Mr. Baker. You see, it looked to me like there were maybe 500 or
600 people in this area here.
Representative Boggs. Yes.
Mr. Baker. As those shots rang out, why they started running,
you know, every direction, just trying to get back out of the way.
Mr. Dulles. For the record, by this area right here, you have that
little peninsula between the Elm Street extension and the Building?
Mr. Baker. That is right. This little street runs down in front of the
building down here to the property of the railroad tracks and this is
all a parkway.
Mr. Dulles. Yes. I just wanted to get it for the record.
Mr. Belin. You then ran into the Building, is that correct?
Mr. Baker. That is correct, sir.
Mr. Belin. What did you see and what did you do as you ran into
the Building?
Mr. Baker. As I entered this building, there was, it seems to me
like there was outside doors and then there is a little lobby.
Mr. Belin. All right.
Mr. Baker. And then there are some inner doors and another door
you have to go through, a swinging door type.
As I entered this lobby there were people going in as I entered.
And I asked, I just spoke out and asked where the stairs or elevator
was, and this man, Mr. Truly, spoke up and says, it seems to me like
he says, "I am a building manager. Follow me, officer, and I will
show you." So we immediately went out through the second set of
doors, and we ran into the swinging door.
Mr. Belin. All right.
Now, during the course of running into the swinging door, did
you bump into the back of Mr. Truly?
Mr. Baker. Yes, sir; I did.
Mr. Belin. Then what happened?
Mr. Baker. We finally backed up and got through that little
swinging door there and we kind of all ran, not real fast but, you
know, a good trot, to the back of the Building, I was following him.
Mr. Belin. All right.
Then what did you do?
Mr. Baker. We went to the northwest corner, we was kind of on
the, I would say, the southeast corner of the Building there where
we entered it, and we went across it to the northwest corner which
is in the rear, back there.
Mr. Belin. All right.
Mr. Baker. And he was trying to get that service elevator down
there.
Mr. Belin. All right. What did you see Mr. Truly do?
Mr. Baker. He ran over there and pushed the button to get it
down.
Mr. Belin. Did the elevator come down after he pushed the
button?
Mr. Baker. No, sir; it didn't.
Mr. Belin. Then what did he do?
Mr. Baker. He hollered for it, said, "Bring that elevator down
here."
Mr. Belin. How many times did he holler, to the best of your
recollection?
Mr. Baker. It seemed like he did it twice.
Mr. Belin. All right.
Then what did he do?
Mr. Baker. I said let's take the stairs.
Mr. Belin. All right. Then what did you do?
Mr. Baker. He said, "Okay" and so he immediately turned around,
which the stairs is just to the, would be to the, well, the west of this
elevator.
Mr. Belin. All right.
Mr. Baker. And we went up them.
Mr. Belin. You went up the stairs then?
Mr. Baker. Yes, sir.
Mr. Belin. When you started up the stairs what was your
intention at that time?
Mr. Baker. My intention was to go all the way to the top where I
thought the shots had come from, to see if I could find something
there, you know, to indicate that.
Mr. Belin. And did you go all the way up to the top of the stairs
right away?
Mr. Baker. No, sir; we didn't.
Mr. Belin. What happened?
Mr. Baker. As I came out to the second floor there, Mr. Truly was
ahead of me, and as I come out I was kind of scanning, you know,
the rooms, and I caught a glimpse of this man walking away from
this—I happened to see him through this window in this door. I don't
know how come I saw him, but I had a glimpse of him coming down
there.
Mr. Dulles. Where was he coming from, do you know?
Mr. Baker. No, sir. All I seen of him was a glimpse of him go away
from me.
Mr. Belin. What did you do then?
Mr. Baker. I ran on over there——
Representative Boggs. You mean where he was?
Mr. Baker. Yes, sir. There is a door there with a glass, it seemed
to me like about a 2 by 2, something like that, and then there is
another door which is 6 foot on over there, and there is a hallway
over there and a hallway entering into a lunchroom, and when I got
to where I could see him he was walking away from me about 20
feet away from me in the lunchroom.
Mr. Belin. What did you do?
Mr. Baker. I hollered at him at that time and said, "Come here."
He turned and walked right straight back to me.
Mr. Belin. Where were you at the time you hollered?
Mr. Baker. I was standing in the hallway between this door and
the second door, right at the edge of the second door.
Mr. Belin. He walked back toward you then?
Mr. Baker. Yes, sir.
Mr. Belin. I hand you what has been marked Commission Exhibit
497 which appears to be a diagram of the second floor of the School
Book Depository, and you will notice on this diagram there are circles
with arrows. I want you to state, if you will, what number or the
arrow approximates the point at which you were standing when you
told him to "Come here". Is there a number on there at all or not?
Mr. Baker. This 24 would be the position where I was standing.
Mr. Belin. The arrow which is represented by No. 24, is that
correct?
Mr. Baker. That is correct.
Mr. Belin. On Exhibit 497. When you first saw him in which
direction was he walking?
Mr. Baker. He was walking east.
Mr. Belin. Was—his back was away from you, or not, as you first
saw him?
Mr. Baker. As I first caught that glimpse of him, or as I saw him,
really saw him?
Mr. Belin. As you really saw him.
Mr. Baker. He was walking away from me with his back toward
me.
Mr. Dulles. Can I suggest if you will do this, put on there where
the officer was and where Lee Oswald was, or the man who turned
out to be Lee Oswald, and which direction he was walking in. I think
that is quite important.
Mr. Belin. Yes, sir. We are going to get to that with one more
question, if I can, sir. When you saw him, he then turned around, is
that correct, and then walked back toward you?
Mr. Baker. Yes, sir.
Mr. Belin. Was he carrying anything in his hands?
Mr. Baker. He had nothing at that time.
Mr. Belin. All right. Were you carrying anything in either of your
hands?
Mr. Baker. Yes, sir; I was.
Mr. Belin. What were you carrying?
Mr. Baker. I had my revolver out.
Mr. Belin. When did you take your revolver out?
Mr. Baker. As I was starting up the stairway.
Mr. Belin. All right. Now, turning to Exhibit 497, if you would
approximate on Exhibit 497 with a pen the point at which you saw
this man in the lunch room when you told him to turn around.
Mr. Dulles. Could we get first where he first saw him.
Representative Boggs. You have that already.
Mr. Dulles. I don't think you have it on the chart where he was.
Mr. Belin. This is when he first saw him after he got in the room,
sir. If I can go off the record.
Mr. Dulles. What I wanted to get is where he first saw him as he
was standing down here, as he was going up the stairs and stopped
and then in what direction he was—he seemed to be moving at that
time before he saw.
Mr. Belin. Just answer the question, if you will. Where were you
when you first caught a glimpse of this man?
Mr. Baker. I was just coming up these stairs just around this
corner right here.
Mr. Belin. All right. You were coming up the stairs at the point on
Exhibit 497 where there are the letters "DN" marking down.
Mr. Baker. Yes, sir.
Mr. Belin. And you saw something move through a door which is
marked as what number on Exhibit 497?
Mr. Dulles. Where was he when you first saw him?
Mr. Baker. At this doorway right here, this 23.
Mr. Belin. At 23.
Representative Boggs. May I ask a couple of questions because I
have to go?
Mr. Belin. Surely.
Representative Boggs. Were you suspicious of this man?
Mr. Baker. No, sir; I wasn't.
Representative Boggs. And he came up to you, did he say
anything to you?
Mr. Baker. Let me start over. I assumed that I was suspicious of
everybody because I had my pistol out.
Representative Boggs. Right.
Mr. Baker. And as soon as I saw him, I caught a glimpse of him
and I ran over there and opened that door and hollered at him.
Representative Boggs. Right.
Mr. Dulles. He had not seen you up to that point probably?
Mr. Baker. I don't know whether he had or not.
Representative Boggs. He came up to you?
Mr. Baker. Yes, sir; and when I hollered at him he turned around
and walked back to me.
Representative Boggs. Right close to you?
Mr. Baker. And we were right here at this position 24, right here
in this doorway.
Representative Boggs. Right. What did you say to him?
Mr. Baker. I didn't get anything out of him. Mr. Truly had come up
to my side here, and I turned to Mr. Truly and I says, "Do you know
this man, does he work here?" And he said yes, and I turned
immediately and went on out up the stairs.
Mr. Belin. Then you continued up the stairway?
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge


connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and


personal growth every day!

ebookbell.com

You might also like