Pub Parallel-Programming PDF
Pub Parallel-Programming PDF
PROGRAMMING
PARALLEL PROGRAMMING
Ivan Stanimirović
ARCLER
P r e s s
www.arclerpress.com
Parallel Programming
Ivan Stanimirović
Arcler Press
2010 Winston Park Drive,
2nd Floor
Oakville, ON L6H 5R7
Canada
www.arclerpress.com
Tel: 001-289-291-7705
001-905-616-2116
Fax: 001-289-291-7601
Email: [email protected]
This book contains information obtained from highly regarded resources. Reprinted material
sources are indicated and copyright remains with the original owners. Copyright for images and
other graphics remains with the original owners as indicated. A Wide variety of references are
listed. Reasonable efforts have been made to publish reliable data. Authors or Editors or Publish-
ers are not responsible for the accuracy of the information in the published chapters or conse-
quences of their use. The publisher assumes no responsibility for any damage or grievance to the
persons or property arising out of the use of any materials, instructions, methods or thoughts in
the book. The authors or editors and the publisher have attempted to trace the copyright holders
of all material reproduced in this publication and apologize to copyright holders if permission has
not been obtained. If any copyright holder has not been acknowledged, please write to us so we
may rectify.
Notice: Registered trademark of products or corporate names are used only for explanation and
identification without intent of infringement.
Arcler Press publishes wide variety of books and eBooks. For more information about
Arcler Press and its products, visit our website at www.arclerpress.com
ABOUT THE AUTHOR
Ivan Stanimirovic gained his PhD from University of Niš, Serbia in 2013.
His work spans from multi-objective optimization methods to applications of
generalized matrix inverses in areas such as image processing and computer
graphics and visualisations. He is currently working as an Assistant professor
at Faculty of Sciences and Mathematics at University of Niš on computing
generalized matrix inverses and its applications.
TABLE OF CONTENTS
viii
6.4. Statistical Tools ................................................................................. 95
6.5. Methodological Framework .............................................................. 99
ix
Chapter 10 Branch And Bound ................................................................................ 181
10.1. General Description ..................................................................... 182
10.2. Poda Strategies.............................................................................. 184
10.3. Branching Strategies...................................................................... 184
10.4. The Traveling Salesman Problem (TSP) .......................................... 187
x
LIST OF FIGURES
xi
Figure 6.4. Sample letter of control
Figure 8.1. Johann Gutenberg (1398–1468)
Figure 8.2. Al Khwarizmi (lived between 780 and 850 D.C.)
Figure 8.3. Leonardo of Pisa (1170–1250)
Figure 8.4 . The growth of the main functions of low complexity
Figure 8.5. An example for the shortest path algorithm
Figure 8.6. An example for the shortest time
Figure 9.1. Graph 5 stages
Figure 9.2. Corresponding graph 3 project stages problems
Figure 9.3. Recursive tree traveling salesman
Figure 9.4. Positions that can attack the queen
Figure 9.5. Example two threatened queens on the board 4 by 4
Figure 9.6. Scheme reduced tree solutions
Figure 9.7. A decision tree for the program 4 queens
Figure 9.8. Example of a Hamiltonian cycle
Figure 10.1. Strategies branch FIFO
Figure 10.2. Strategies branching LIFO
Figure 10.3. Tree states for a traveling salesman problem with n = 4 and i0 = i4
=1
Figure 10.4. Possible paths
Figure 11.1. Computable and non-computable problems
Figure 11.2. A quantum Turing machine
xii
LIST OF TABLES
xvi
PREFACE
Introduction
CONTENTS
1.1. Background ........................................................................................ 2
1.2. Definition of the Problem ................................................................... 3
1.3. Main Objectives ................................................................................. 3
1.4. Justification ......................................................................................... 4
1.5. Cloud Computing ............................................................................... 4
1.6. Fdtd Method ....................................................................................... 5
1.7. Computational Parallelism .................................................................. 5
1.8. Parallel Programming Models ............................................................. 7
2 Parallel Programming
1.1. BACKGROUND
Maxwell made, circa 1870, partial differential equations of electrodynamics.
It represents the fundamental unification of the electric and magnetic fields,
predicting the phenomenon of electromagnetic waves. To which Nobel
and Feynman called the most outstanding achievement of science in the
nineteenth [2] century.
The method of finite difference time domain (FDTD) solves Maxwell’s
equations, directly modeling the propagation of electromagnetic waves
within a volume. This method was introduced in 1966 by Kane Yee, a
numerical modeling technique electrodynamic.
At first, it was almost impossible to implement this method
computationally, probably due to the lack of computing resources.
However, with the advent of more powerful, modern equipment, with
greater accessibility to acquire and further improvements in the algorithm,
the FDTD method has become a standard tool for solving problems of this
Introduction 3
type. Today, we can say that this method brings together a set of numerical
techniques for solving Maxwell’s equations in the time domain and allow
the electromagnetic analysis of a wide range of problems [3]. Now scientists
and engineers use computers for solutions of these equations in order to
investigate these electromagnetic fields.
1.4. JUSTIFICATION
The main justification for the development of “System creation and
management of computer clusters for simulations FDTD with Meep
package service EC2” is to decrease the time it takes to perform FDTD
simulations with a yield more optimal, and monitoring the resources used
during execution of multiple simulations are performed in parallel FDTD
that. In this way, the user can check the status of their jobs while they are
resolved through Meep package.
1.5.2. Functionality
EC2 presents a computational environment virtually, allowing us to use
Web services interfaces to request cluster for use, load our own application
environment, manage permissions and network access and execute the
image using as many systems as required.
It is designed for use with other Amazon services together as Amazon
S3, Amazon EBS, Amazon Simple DB, and Amazon SQS to provide a
Introduction 5
Figure 1.2: Sending messages via MPI step between two computers.
Chapter 2
CONTENTS
2.1. Starcluster ......................................................................................... 10
2.2. Meep Parallel Package ...................................................................... 11
2.3. GWT – Google Web Toolkit .............................................................. 12
2.4. Ganglia............................................................................................. 12
2.5. Architecture ...................................................................................... 14
2.6. Creating And Configuring A Public Ami ............................................ 16
2.7. Efficiency Test (FDTD Problem) ......................................................... 22
2.8. Efficiency Tests (Using Amazon EC2 Platform) ................................... 22
2.9. Analysis Of The Results ..................................................................... 31
10 Parallel Programming
In this chapter, we will see the details about the tools we have used to
develop the project, and explain about important concepts and features
provided each of them.
2.1. STARCLUSTER
StarCluster is a utility that enables the creation, management, and monitoring
of computer clusters that are hosted on the service Amazon Elastic Compute
Cloud (EC2) all through a master instance. Its main objective is to minimize
the administration associated with the configuration and use of computer
clusters used in research laboratories or general applications using distributed
computing.
To use this tool a configuration file where account information on
Amazon Web Services (AWS), type of AMI’s to use and configure additional
features we want in the detailed cluster is created. Subsequently, it can be
made running the tool by using commands [10].
2.1.3. OpenMPI
It is a project combining technologies and resources from other projects (FT-
MPI, LA-MPI, LAM/MPI, Y PACX-MPI) to build an MPI library. OpenMPI
is an open source implementation of the MPI-1 and MPI-2 standard [11].
is set in time steps, and processors are responsible for communicating the
values using MPI.
2.3.1. Introduction
GWT or Google Web Toolkit is a framework created by Google which
facilitates the use of technology AJAX. Can solve the big problem of
compatibility of client code (HTM, javascript) between browsers, enabling
the user to develop an application without the need to test them in various
browsers. The concept of Google Web Toolkit is quite simple, basically what
you should do is create the code Java using any development environment
(IDE) Java, and the compiler will translate to HTML Y JavaScript.
2.4. GANGLIA
2.4.1. Introduction
Monitoring a computer cluster requires proper management of resources
and the administrator can spend less time detect, investigate, troubleshoot
failures happen with this information and also may pose a contingency plan.
Creating and Managing Clusters for FDTD Computational Simulations.... 13
2.4.2. Functioning
Ganglia are defined in a hierarchical scheme. It is based on a communication
through a multicast protocol send/receives to monitor the status of the cluster
and uses a tree point connections between levels of cluster nodes to report
their status. Ganglia used status messages in a multicast environment, such
as the basics for a communication protocol. To maintain communication
each node sends its state in a time interval; in this way discloses that is active
at the time to stop sending that node is no longer involved in monitoring.
2.5. ARCHITECTURE
The CTL file is part of the libctl library is a set of language-based tools
Scheme. The CTL file can be written in any of three ways:
• Scheme, which is a programming language developed by MIT.
This language conforms to the shape (Function arguments.) and
which can be executed under a GNU Guile interpreter;
• Libctl, it is a library for the compiler Guile, which simplifies
communication between Scheme and scientific computing
software. Libctl defines the basic interface and a host of useful
features; and
• Meep, you can write a CTL-based on Meep. This defines all the
problems specific to the calculation of FDTD interfaces.
2.5.3. Monitoring
Monitoring resources used by the cluster nodes is performed by Ganglia tool.
Usage information is sent from the cluster via XML files and is processed by
the tool for viewing.
Ganglia provide a graphical view of information in real time through its
Web Front-End, administrators, and cluster users about the use of resources
consumed.
16 Parallel Programming
• 1.7 GB RAM;
• 2 virtual cores 2.5 GHz;
• 350 GB hard drive;
• Ubuntu 9.04 32-bit.
Before installing StarCluster, we describe what their some of their most
important rooms are.
• Python (2.4): Interpreted programming language that can divide
a program into modules for reuse in other programs.
• boto(1.9b+): It is an integrated python to handle current and
future services offered by AWS infrastructure module.
• paramiko (1.7.6+): It is another python module that implements
the SSH2 protocol (encryption and authentication) for connections
to remote computers.
When installing StarCluster have downloaded the latest development
version from the repository GIT (software version management), then
compiled and installed via python, the following commands describe
explained:
• Download the installer the StarCluster from the repository
• At the end an executable file “tomcat.sh” that can lift or stop the
server and placed in “/etc/init.d/”de so you can get up the apache-
tomcat whenever a start node configured.
Resolution 40
When you run with different numbers of nodes the problem five
outcomes was obtained, the relevant comparison was made to verify the
drivability of them and actually found that all generate exactly the same.
Thus, it is assumed that no matter the number of nodes to be used provided
the result is the same.
24 Parallel Programming
Figure 2.4. Graphic node vs. time (minutes) exercise resonance ring.
As we can see in Figure 2.4, we have executed the problem five times
with different numbers of nodes and can realize improved time there when
solving the exercise with more nodes. This happens to some extent; in this
Creating and Managing Clusters for FDTD Computational Simulations.... 25
• Define the Gaussian pulse source allow the ring into resonance.
26 Parallel Programming
• Write the output condition in this case when the pulse is turned
off and waiting 150 timeslots, and then define what we, in this
case, the flow spectrum.
Table 2.2 shows the data of the geometric structure.
Spectral Flow
Frequency Input port Paso port Extraction port
Creating and Managing Clusters for FDTD Computational Simulations.... 27
5,33633135330676e-
0.36 1,87190375615421e-8 2,25302194395918e-10
9
5,74839347857691e-
0.360200100050025 1,91956223787719e-8 2,32817150153588e-10
9
Such as two executions at the end we obtain two tables as shown above,
which were then made to divide the corresponding data in each column
except the frequency between the outturn rings for execution result ringless.
At the end, we get a single table with which to plot the frequency proceed
against each spectral flow as shown in Figure 2.6.
we see are due to the resonance of the ring, we must take into account the
miscalculations, for this case, between the range of 0.3 and 0.4, we can see
that several fluctuations appear in the response curve (Figure 2.8).
Figure 2.9: Statistical graph nodes vs. time (minutes) exercise transmission
with ring.
Creating and Managing Clusters for FDTD Computational Simulations.... 29
As we can see in Figure 2.9, we have executed every problem five times
with different numbers of nodes, we can denote first execution problem with
the ring takes a little more than the implementation of the problem without
ring, however, the interesting thing lies that the proportion decreasing over
time in the two years to run more nodes are very approximate.
Freq. Imag-
Freq. Real Q | Amp | Amplitude Error
inaria
harminv0: .4621 1,76E-04 -1,329.10 0,002 –0,0095–0,0012i 2,69E–04
To process the end result of the tabulated data, we find the frequencies
where no resonance, but not all values should be considered because as we
might look there is a margin of error should be considered. To find out if each
frequency values are still correct comparison is performed. The imaginary
frequency absolute value must be greater than the margin of error. In Table
2.5, we note which candidates are to be correct values.
Freq, Freq.
Q | Amp | Amplitude Error
Real Imaginaria
harminv0: 0.46 1,76E–04 –1,329.10 0.002 –0.0095–0.0012i 2,69E–04
harminv0: 0.49 –0.0016 149.42 0.049 0.017 + 0.04874i 3,63E–05
harminv0: 0.50 –5.20E–04 490.13 0.065 –0.037–0.05496i 1,40E–05
harminv0: 0.51 –0.0027 94.93 0.059 0.0519 + 0.013851i 1,15E–04
harminv0: 0.52 –3.66E–04 723.34 0.134 0.06928 + 0.11025i 2,31E–05
Figure 2.10: Graph nodes vs. time (minutes) exercise with ring transmission
using Harminv.
Creating and Managing Clusters for FDTD Computational Simulations.... 31
Figure 2.11: Graph nodes vs. time (minutes) without exercise ring transmission
using Harminv.
As we can see in Figures 2.10 and 2.11, to run each issue five times with
various numbers of nodes, we decrease the time we notice that in each case
as we use more nodes.
CONTENTS
3.1. Partition ............................................................................................ 34
3.2. Domain Decomposition ................................................................... 34
3.3. Functional Decomposition................................................................ 35
3.4. List Partitions Design ......................................................................... 36
3.5. Communication ................................................................................ 36
3.6. Agglomeration .................................................................................. 37
3.7. Reducing Costs of Software Engineering ........................................... 40
3.8. Load Balancing Algorithms ............................................................... 41
3.9. Task Scheduling Algorithms............................................................... 41
3.10. Allocation List Design ..................................................................... 42
3.11. Model of The Atmosphere ............................................................... 42
3.12. Agglomeration ................................................................................ 45
3.13. Load Distribution ............................................................................ 47
34 Parallel Programming
3.1. PARTITION
Stage partition is a design intended to expose opportunities for parallel
execution. Therefore, the focus is defining a large number of small tasks in
order to produce a fine-grained decomposition of a problem. Fine-grained
decomposition of a problem, just as fine sand, is easier to pour than a pile
of bricks. The fine-grained decomposition provides the most flexibility in
terms of possible parallel algorithms. In later design stages, evaluation of
communication requirements, the target architecture, or software engineering
issues lead us to forego may opportunities identified for parallel execution
at this stage.
A good partition divides into small pieces both the computation
related to a problem and the data on which this calculation operates. When
designing a partition, programmers most commonly focus first on the data
associated with a problem, then determine an appropriate partition for data
and eventually find a way to associate the calculation data. This partitioning
technique is known as domain decomposition. The alternative approach—
first decomposing the computation to be performed and first focus to break
the calculation to be performed and then dealing with the data—is termed
functional decomposition. These are complementary techniques applicable
to different components of a single problem or even applied at the same
problem for alternative parallel algorithms.
Initially, one task is created by the tree root. A task evaluates its node and
then if that node is not a leaf, creates a new task for each page call (subtree).
3.5. COMMUNICATION
Generated tasks must run independently but cannot. The calculation requires
is that the data are associated with other tasks that mean that tasks require
communication between two homework, as direct linking tasks “in which
tasks can send messages, and the other can receive.” Then the communication
associated with an algorithm can be specified on two faces. The first phase is
to define the structure of the canal linking either directly or indirectly, tasks
that require data (consumers) with tasks that have that data (producers). In
the second phase, the messages will be sent and received on these channels
are specified. It all depends on our technology deployment ultimately.
In the domain decomposition problems, reporting requirements may be
difficult to determine. Remember that this strategy produces the tasks for the
first partition data structures into disjoint subsets and then associated with
each data operations that operate solely on that data. This part of the design
is usually simple. However, some operations that require data from multiple
tasks usually remain. Communication is then necessary to manage the
transfer of data required for these tasks to continue. The organization of this
Parallel Algorithm Designed by Technique “PCAM” 37
3.6. AGGLOMERATION
Regarding the previous two phases, an algorithm that is not considered
complete in the sense that it is not specialized for efficient execution on
any parallel equipment, in particular is indeed very inefficient. For example,
if many more tasks are created, it is obtained that processors on the target
computer are not designed for the efficient execution of small tasks.
Therefore, rereading the decisions made in the division and the
communication phase focuses on obtaining an algorithm to be implemented
efficiently in a class of parallel computing. In particular, it is judged whether
it is useful to combine or agglomerate, tasks identified by the partitioning
phase, in order to provide fewer, larger tasks. It also determines whether it is
useful to replicate data calculations.
The three examples describing this phase are:
1. In this example, the size of the tasks are increased by reducing the
size of the decomposition of three to two.
38 Parallel Programming
Despite the reduced tasks produced by this stage, the design is still
somewhat abstract, since issues relating to the allocation of processors
remain unsolved. On the other hand, we can choose at this stage to reduce
the number of tasks to exactly one processor. We are able to do this because
our goal is to create a parallel program within the environment that requires
an SPMD program.
Parallel Algorithm Designed by Technique “PCAM” 39
This phase focuses on the general issues that arise when there is
increased granularity of tasks. There are three possible choices or alternatives
objectives agglomeration and reproduction: (i) reducing communication
costs by increasing computing and granularity communication, (ii) maintain
flexibility with regard to scalability and allocation decisions, and (iii) reduce
the costs of software engineering.
Regarding the increased granularity, we have the following:
• A critical issue that affects performance parallel is communication
costs; clearly this improvement can be achieved by sending less
data, or can also be achieved by using fewer messages, even if it
is sent amount of data.
• Another concern is the cost of creating the task.
The following images show the same task of fine-grained and coarse-
grained second that example is exploded to show your outgoing messages
(dark gis) and incoming messages (shaded of course).
The grid is divided into 8x8 = 64 tasks, each responsible for one point,
and they require 64x4 = 256 communications, 4 tasks 256 the data value.
40 Parallel Programming
This network is periodic within the x and y dimensions, meaning that grid
point is regarded as being adjacent to , and . A vector
of values is maintained at each grid point, representing quantities such as
pressure, temperature, wind speed, and humidity.
At this stage, let us explain the example with each of the techniques
“Partition, Communication, Agglomeration, and Mapping.” The grid used
to represent the state on the model of the atmosphere is a natural candidate
for domain decomposition. Decompositions of x, y, and/or z dimensions are
possible.
This task maintains its status as the various values associated with the grid point
and is responsible for the calculation required to update that state in each time step.
Therefore, we have a total of tasks, each with O(1) data and computing
time.
First, we consider the communication needs. Let us identify three
distinct communications as depicted in Figure 3.1.
Figure 3.1: The task and channel structure for calculating the difference of two
finite-dimensional template nine points, assuming a grid point for each proces-
sor. Only the channels used by the task are shown shaded.
44 Parallel Programming
where denotes the mass at grid point (i, j, k). This sum can
be computed using one of the parallel algorithms presented in
Section 2.4.1.
3. Physics Computations. If each task encapsulates a single grid
point, then a component of the physics the model requires
significant communication atmosphere. For example, the overall
clear sky (TCS) at level, for example, the total clear sky (TCS) at
is defined as
k
TCS
= k ∏ (1 − cld )TCS
i =1
i 1
= TCSk −1 (1 − cld k )1
where 0 is the top level of the atmosphere and the EPC is the cloud
fraction at level I. is the fraction of clouds at level I. This product
prefixes operation. In all, the component of the physics model
requires on the order of 30 per grid point and communications per
Parallel Algorithm Designed by Technique “PCAM” 45
Figure 3.2: Use agglomeration to reduce the communication needs in the mod-
el atmosphere. (a) A single point is responsible for each task and therefore must
obtain data from eight other tasks to apply the template nine points. (b) Granu-
larity is increased and therefore it increases granularity by 2x2 points.
3.12. AGGLOMERATION
Our fine grain domain decomposition atmosphere created model
N ×N ×N
x y
between tasks: 10 − 10 , depending on the size of the
z
5 7
problem. This is likely to be much more than they need and some degree of
agglomeration can be considered. We identified three reasons for achieving
agglomeration:
1. A small amount of agglomeration (from one to four mesh points
per task) can reduce communication requirements associated
46 Parallel Programming
with the template nine points from eight to four posts per task
over time.
2. Communication requirements in the horizontal dimension are
relatively small: a total of four messages containing eight values of
data. In contrast, the vertical dimension requires communication
not only for the template finite difference (two messages, two
data values) but also for various other computations. These
communications can be avoided by agglomerating tasks within
each vertical column.
3. Agglomeration in the vertical is also desirable from a standpoint
of software engineering. Horizontal units are restricted to the
dynamic component model; the physical component operates
within individual columns only. Therefore, a two-dimensional
horizontally sequential physical decomposition allows existing
code to be reused in a parallel program without modification.
This analysis makes it appear sensible to refine our parallel algorithm to
use a horizontal decomposition two-dimensional mesh model in which each
task encapsulates at least four mesh points. Communication requirements
are then reduced to the template associated with nine points and the addition
operation. Note that this algorithm can create in most N x × N y / 4 between
tasks: 10 − 10 , depending on the size of the problem. This number is likely
3 5
and northern and southern hemispheres. The image shows the reduction in
load imbalance can be achieved with this technique; this reduction should be
compared with the resulting increase in communication costs.
Chapter 4
CONTENTS
4.1. History.............................................................................................. 53
4.2. Parallel Computing ........................................................................... 53
4.3. Background ...................................................................................... 54
4.4. Types Of Parallelism .......................................................................... 59
4.5. Hardware ......................................................................................... 61
4.6. Applications ..................................................................................... 67
4.7. History.............................................................................................. 68
52 Parallel Programming
Parallel systems are those that have the ability to perform multiple
operations simultaneously. Generally, these systems typically handle large
amounts of information in the order of terabytes and can process hundreds
of requests per second. Parallel systems are composed of several systems
sharing information, resources, and memory somehow. Parallel Systems
– multiprocessor systems with more than one processor communication
between strongly coupled system – processors share memory and clock;
communication is usually done by shared memory. The advantages are
gained through increase in reliability (Figure 4.1).
Figure 4.1: Supercomputer cray-2 – the fastest in the world from 1985 to 1989.
Parallel computing is a programming technique in which many
instructions are executed simultaneously. It is based on the principle that large
problems can be divided into smaller parts that can be solved concurrently
(“in parallel”). Several types of parallel computing: bit-level parallelism,
instruction level parallelism, data parallelism, and task parallelism. For
many years, parallel computing has been implemented in high-performance
computing (HPC), but interest in it has increased in recent years due to the
physical constraints preventing frequency scaling. Parallel computing has
become the dominant paradigm in computer architecture mainly in multicore
processors. But recently, the power consumption of parallel computers has
become a concern.
Parallel computers can be classified according to the level of parallelism
that supports your hardware: multicore and multiprocessing computers have
multiple processing elements on a single machine, while clusters and MPP
grids use multiple computers to work on the same task.
The parallel computer programs are more difficult to write than sequential
because concurrency introduces new types of software errors. Communication
and synchronization between the different subtasks are typically the greatest
Parallel Computer Systems 53
4.1. HISTORY
The software has traditionally oriented computing series. To solve a problem,
it builds an algorithm and is implemented on a serial instruction stream.
These instructions are executed on the central processing unit of a computer.
At the time when an instruction is completed, the next run.
Parallel computing uses multiple processing elements simultaneously to
solve a problem. This is achieved by dividing the problem into independent
parts so that each processing element can execute its part of the algorithm
at the same time as everyone else. Processing elements can be diverse and
include resources such as a single computer with many processors, multiple
networked computers, specialized hardware or a combination thereof.
4.3. BACKGROUND
Traditionally, software has been written for serial computation. To solve
a problem, algorithm it is constructed which produces a serial stream of
instructions. These instructions are executed on the central processing unit
in a computer. Just it runs only one instruction can be executed in a time-
after that instruction, the next.
Parallel computing, on the other hand, uses multiple processing elements
simultaneously to solve a problem. This is accomplished by breaking the
problem into separate parts so that each processing element can execute its
part of the algorithm simultaneously. Processing elements can be diverse
and include resources such as a single computer with multiple processors,
multiple networked computers, specialized hardware, or any combination
of the above.
Frequency scaling was the dominant reason for improvements in
computer performance from the mid-eighties until 2004. The runtime of a
program is equal to the number of instructions multiplied by the average time
per instruction. Keeping everything constant, increasing the clock frequency
decreases the average time it takes to execute an instruction. An increase in
frequency thus decreases runtime for all computer-limited programs.
However, power consumption by a chip is given by the equation P = V
× C2 × F the where P is power, and C is capacitance. It is being changed by
being clock cycle (proportional to the change of the inputs of the transistors),
V voltage and F is the processor frequency (cycles per second). Increases
in frequency increase the amount of energy used in a processor. Increase
power consumption processor ultimately led to Intel May 2004 cancellation
of its Tejas and Jayhawk processors, which is usually cited as the end of
frequency scaling as the dominant paradigm in computer architecture.
Moore’s Law is the empirical observation that transistor density in a
microprocessor doubles every 18 to 24 months. Despite power consumption
issues, and repeated predictions of its end, Moore’s Law is still in effect.
With the end of frequency scaling, these additional transistors (which are
no longer used for frequency scaling) can be used to add extra hardware for
parallel computing.
Parallel Computer Systems 55
4.3.2. Outbuildings
Understanding data dependencies it is essential to implement parallel
algorithms. No program can run faster than the longest chain of dependent
calculations (known as critical path), since calculations that depend on
previous calculations in the chain must be executed in order. However, most
algorithms do not consist of just a long chain of dependent calculations; there
are usually opportunities to execute independent calculations in parallel.
56 Parallel Programming
will often need to put any day variable that is shared between them. The
instructions between the two programs may be interleaved in any order. For
example, consider the following program:
Thread A Thread B
1A: Read variable V 1B: Read variable V
2A: Add 1 to variable V 2B: Add 1 to variable V
3A write back to V Variable 3B: Write back to variable V
If instruction 1B is executed between 1A and 3A, or if instruction 1A is
executed between 1B and 3B, the program will produce incorrect data. This
is known as race condition. The programmer must use a lock to provide
mutex. A lock is a programming language construct that allows one thread to
take control of a variable and prevent other threads from reading or writing
it, until it opens that variable. The thread holding the lock is free to execute
its critical section (the section of a program that requires exclusive access to
some variable), and open the data when it is finished. Therefore, to guarantee
correct program execution, the program can be rewritten above to use locks:
Thread A Thread B
1A: Lock variable V 1B: Lock variable V
2A: Read variable V 2B: Read variable V
3A: Add 1 to variable V 3B: Add 1 to variable V
4A written back to variable V 4B: Write back to variable V
5A: Open V Variable 5B: Open V Variable
Here, A thread will lock successfully V variable while the other will
thread locked out- unable to proceed until V is opened again. This ensures
the correct execution of the program. Locks while they are necessary to
ensure correct program execution can greatly slow a program.
Fixing multiple uses of non-atomic variables locks introduce the
possibility of program dead end. An atomic lock locks multiple variables
all at once. If you cannot lock everyone, which wants not lock them? If two
threads each need to lock the same two variables using non-atomic clocks,
it is possible that a thread will lock one and the second thread will lock the
second variable. In this case, neither threads can end, and the results of the
impasse.
Many parallel programs require that their subtasks act in synchrony.
This requires the use of the barrier. Barriers typically are implemented using
a software lock. One class of algorithms, known as algorithms lock-free
and wait-free avoids altogether the use of locks and barriers. However, this
approach is generally difficult to implement and requires data structures
58 Parallel Programming
physically put ideas into execution of the data flow theory. Beginning in the
late ‘70s, calculation process, for example, calculation of communicating
systems and communicating sequential processes were developed to permit
algebraic reasoning about components composed of interacting systems.
The most recent calculation process additions are π-calculus, which added
the ability for reasoning about dynamic topologies. Logics such as Lamport
TLA + mathematical models, for example, traces and event diagrams agent
have also been developed to describe the behavior of concurrent systems.
4.5. HARDWARE
4.6. APPLICATIONS
While parallel computers become larger and faster, it becomes feasible to
solve problems that previously took too long operation. Parallel computing
is used in a wide range of fields; Bioinformatics (For protein folding) to
economics (do simulation in mathematical finance). Common types of
problems found in parallel computing applications are:
● Dense linear algebra;
● Sparse linear algebra;
● Spectral methods (e.g., Cooley-Tukey Fast Fourier transform);
● N-body problems (for example Simulation of Barnes-Hut);
● structured grid problems (e.g., Lattice Boltzmann methods);
68 Parallel Programming
4.7. HISTORY
The origins of true parallelism (MIMD) go back to Federico Luigi, Conte
Menabrea and his “sketch Analytical Engine Invented by Charles Babbage.
“IBM 1954 introduced the 704, with a project in which Gene Amdahl. It
was one of the main architects. It became the first commercially available
computer to use fully automatic floating point Arithmetic command. In
1958, researchers at IBM John Cocke and Daniel Slotnick discussed the use
of parallelism in numerical calculations for the first time.
Burroughs Corporation introduced in 1962 a four-processor computer
had access to memory modules 16 with a crossbar switch. In 1967, Amdahl,
and Slotnick published a discussion about the feasibility of parallel processing
at American Federation Conference societies of information processing. It
was during this debate that Amdahl’s Law was coined to define the limit
speeds due to parallelism.
In 1969, US Company introduced its first Multics system, a SMP
system capable of running up to eight processors in parallel. “C-mmp”, a
multiprocessor from the 70s at the Carnegie Mellon University was “among
the first multiprocessors with more than a few processors.” “The first bus-
connected multiprocessor with snooping caches was the Synapse N + 1 in
1984.”
SIMD parallel computers can be traced back to the 70. The motivation
behind early SIMD computers was to amortize the door delay processor
control unit Multi-statements excessive. In 1964, Slotnick had proposed
building a massively parallel computer for National Laboratory Lawrence
Livermore. His design was funded by the US Air Force, which was parallel-
Parallel Computer Systems 69
computing effort before the SIMD ILLIAC-IV. The key to its design was
a fairly high parallelism, with up to 256 processors, which allowed the
machine to work on large datasets in what would be known later as vector
process.
However, ILLIAC IV was called “the most infamous of Supercomputers,”
because the project was only one fourth completed, but took 11 years and
cost nearly four times the original estimate. When I was finally ready to run
its first real application in 1976, it was surpassed by existing commercial
supercomputers such as Cray-1.
Chapter 5
CONTENTS
5.1. Web Compatibility Tests .................................................................... 72
5.2. Proposed Technique .......................................................................... 73
5.3. Results ............................................................................................. 76
5.4. Conclusion ....................................................................................... 76
72 Parallel Programming
2. Get the full-screen dimension using the Java Toolkit API; and
3. Generate the screenshot through the Java Robot API.
For the compatibility testing section, an HTML report has been developed
that has the format which is accessed by clicking on each fault of the
comparison.
5.3. RESULTS
This technique has been implemented in a real environment of continuous
development, and the results demonstrate their efficiency. On the one hand,
the implementation of the algorithm was a very simple task to be carried
out by the development team. The technique has accelerated by 92% the
time of execution of the tests of compatibility and therefore the whole test
stage. A conventional compatibility test performed by a team member takes
approximately 10 minutes depending on the page. A compatibility test on
the same page with the technique proposal takes approximately 1 minute,
required to observe the heat map with the results. A decrease in total time of
the release process of each version of the site, from the implementation of
the proposed technique, is observed.
5.4. CONCLUSION
The proposed technique is an initiative to automate the compatibility through
an image comparison algorithm. To validate it, it has been implemented in
a large company that works with continuous development of software. The
results show that the tool accelerates the testing, through the automation of
web compatibility tests.
On the one hand, test execution times decreased by 82%, and this also
reduced the total time of the release process of the site versions. In addition,
it allows generating reports in HTML format very easy to understand.
With the implementation of this approach, comprehensive visualization
of components in search of incompatibilities about the different browsers. In
this way, manual tasks are simply reduced to observe the reports. Likewise,
it allows detecting more defects that can be passed when the tests are
performed manually.
However, there were drawbacks in comparing sites that contain different
advertisements and/or emerging elements, producing false in the tests.
As future work it is proposed, firstly, to improve the algorithm
implemented in order to avoid comparisons in non-site elements web (such
as ads). Secondly, tools will be used virtualization (such as the Docker tool)
to running functional tests. Finally, techniques that can be implemented
Parallelization of Web Compatibility Tests in Software Development 77
to avoid work required to observe the results of way to make the process
fully automated. Also, algorithm to avoid alarming scenarios should be
investigated because of small differences between the browsers.
Chapter 6
Theoretical Framework
CONTENTS
6.1. Definition of Process......................................................................... 80
6.2. Analysis of Key Processes.................................................................. 84
6.3. Review Process ................................................................................. 88
6.4. Statistical Tools ................................................................................. 95
6.5. Methodological Framework .............................................................. 99
80 Parallel Programming
6.1.3. Documented
Some of the aspects that should include documentation of a process are as
follows:
a. Process flow diagram including possible interactions with other
processes.
b. Performance measures of the different phases of the process
(usually stands PPM, abbreviation used Performance
Measurement Process).
c. Process owner’s name.
d. Members of the management team process.
Narrative of the process steps must be clearing, concise, operational, and
communication capabilities, so that it is useful for training and
analysis. Besides the flowchart are useful the use of checklists,
performance criteria and the classification of inputs and outputs
of the process2.
6.1.4. Measured
The process must be measured to know their level of performance relative
to the expectations of their internal or external customers, and we act
accordingly.
Measures performance of a process, or PPM, should be a clear indicator
of the health of this. Such measures have to be few and very representative
of the “health” of the process. They should be an indicator of the value
added, both business operations and to customer satisfaction.
It is also important to establish a hierarchy among the metrics used
throughout the process, so that, ultimately, we can ensure the satisfaction of
customer requirements.
6.1.6. Stakeholders/Clients/Users
For mapping of processes of an organization should identify stakeholders,
clients’ users.
• Interest Group: all those who have an interest in an organization,
its activities, and achievements. These can include customers,
partners, employees, shareholders, owners, management, and
legislators4.
• Customer/user: Person who regularly uses the services of a
professional or business
To identify stakeholders/customers/users Unit/Service must perform a
work session starting with an initial brainstorm. Then for each of the possible
groups, clients’ users a debate in which we will try to clearly identify each
and relate to the following points were made: services required, needs that
they would in the view of the Unit or based on previously conducted studies
and analysis, expectations that customers might have about our services or
future, etc.
6.2.1. Flowchart
The flowchart is one of the most widespread processes analysis tools. The
graphical view of a process facilitates a comprehensive understanding of
it and detection of improvement points. The flow diagram is the graphical
representation of the process. There is an extensive bibliography and
standards for the preparation of flowcharts. However, it is advisable to use
some very simple concepts that are easily assimilated by all components of
the unit or service. Once developed the flowchart, it can be used to identify
opportunities for improvement or simple adjustments and, on it, perform
process optimization. The flow chart is used in these cases to visualize the
sequence of changes to run.
The flow chart should be drawn up while describing the process is
done; thus the work of the commission and understanding of the process is
facilitated. It should start by setting the starting point and end of the process.
Subsequently, they identify and classify the different activities that make up
the process, the interrelationship between all the areas of decision making,
etc. This whole network is represented by the predefined symbols according
to the type of diagram.
Flowcharts use a number of predefined to represent the flow of
operations with their relationships and dependencies symbols. The format
of the flowchart is not fixed; there being different types employing different
zymology. An important aspect before making the flowchart is to establish
how deep it is intended in the description of activities, always trying to
maintain the same level of detail.
9 Time and motion study, Fred Meyers, Pearson Education Mexico (2000).
86 Parallel Programming
officials who are fighting for their individual progress and their areas by
creating jobs and rigid useless and incomprehensible rules.
Bureaucracy generates excessive paperwork in the office. Heads employ
between 40% and 50% of their time writing and reading material related to
work; 60% of the time all the administrative work is used in activities such
as review, archive, locate, and select information, while only 40% is spent
on important tasks related to the process.
The downside of bureaucracy is innumerable, so we must evaluate and
minimize delays to eliminate them. Excessive bureaucracy can be identified
by asking the following questions:
a. Unnecessary checks and balances are made?
b. Do you test or approve the activity work for someone else?
c. Is it required more than one signature?
d. Multiple copies are needed?
e. Are copies are stored without any apparent reason?
f. Are copies are sent to people who do not need the information?
g. Are there individuals or entities that hinder the effectiveness and
efficiency of the process?
h. Unnecessary correspondence is written?
i. Do you regularly prevent existing organizational procedures
effective, efficient, and timely execution of tasks?
j. Should anyone approve what is already approved?
Management must lead an attack against excessive bureaucracy that
has infiltrated the systems that control an entity. Many activities do not
contribute to the content of the process output. These are for information
only and protective purposes, and should be made every effort to minimize
them14.
The attack against bureaucracy must begin with guidelines to inform
management and employees that the company will not tolerate unnecessary
bureaucracy; each signature of approval and review each activity must be
justified financially; the total cycle time reduction is a key purpose of the
enterprise, and that any non-value added activity, and that slows the process,
will target for elimination.
6.3.1.4. Simplification
The increasing complexity creates increasing difficulties in all parties,
as the activities, decisions, relationships, and essential information
6.3.1.9. Standardization
You should choose an easy way to do an activity, ensuring that those involved
94 Parallel Programming
in the process take him just the same way every time.
This standardization is very important because it verifies that all
current and future workers use best practices relate to the use of successful
procedures, for which they should be:
• Realists;
• Defining responsibilities;
• Establish limits of authority;
• Cover emergencies;
• Not be exposed to different interpretations;
• Easy to understand.
These procedures often include a flow chart and instructions.
• New concepts;
• New view of the process;
• New factors for success;
• Development of new options;
• Overcoming organizational barriers.
6.4.1. Histogram
Histograms are a statistical tool that allows a variable graphed using bars.
The values found on the vertical axis represent the frequencies of the
values shown on the horizontal axis. For that, classes are merely intervals in
which they will find the observations are set, so we would have to:
The base of each rectangle represents the interval width and height is
determined by the frequency (Figure 6.2)20.
20 Histogram – http://www.ucv.cl/web/estadistica/histogr.htm.
96 Parallel Programming
Ejempo de Histograma
50
40
Frecuencia
30
20
10
0
-13,5 -9,0 -4,5 0,0 4,5 9,0 13,5
C1
Diagrama de Pareto de C1
160
140 100
120
80
Porcentaje
100
60
C2
80
60 40
40
20
20
0 0
C1 Corte Pulido Pegado Lacado Otro
C2 60 45 20 11 6
Porcentaje 42,3 31,7 14,1 7,7 4,2
% acumulado 42,3 73,9 88,0 95,8 100,0
Gráfica Xbarra de C5
UCL=6,792
6,75
6,50
Media de la muestra
6,25
_
_
X=6,120
6,00
5,75
5,50
LCL=5,447
1 2 3 4 5 6 7 8 9 10
Muestra
6.4.4. Correlation
Probability and statistics, the correlation indicates the strength and direction
of a linear relationship and proportionality between two statistical variables.
It is considered that two quantitative variables are correlated when the values
of them vary systematically with respect to the values homonyms other: if
we have two variables (A and B) there is correlation if increasing values of
A do also of B and vice versa. The correlation between two variables does
not, by itself, no causal.
Since in many cases the times are large time series shown in minutes or
hours depending on the case.
Once collected the information regarding time, we proceeded to take
data on the costs of each of the activities in order to establish approximate
costs versus the cycle time of each of the processes resulting in a diagram
Cycle Time vs. costs. The costs of these activities contribute to the total cost
of the service and how these activities provide value to the organization for
the provision of full service.
Additionally, it was analyzed each process activities in order to establish
whether or not businesses providing customer value, organization or if in
fact, this does not add any value to the achievement of the service.
Once obtained the encode chart of the processes causes, effects, and
solutions were reviewed each of the problems encountered so that it
will improve the productivity of the company following the steps in the
methodology Harrington.
Finally, as part of the model, they have been established to follow the
proposed flow diagrams for the execution of each of the processes that shape.
Chapter 7
Modular Programming
CONTENTS
7.1. Programs And Judgments ................................................................ 102
7.2. Modularity Linguistics..................................................................... 102
7.3. Normal And Pathological Connections ........................................... 103
7.4. How To Achieve Minimum Cost Systems ........................................ 103
7.5. Complexity In Human Terms........................................................... 104
7.6. Cohesion ........................................................................................ 116
102 Parallel Programming
The sentences are in the order, and they will enter in a compiler. This
order is lexicographical order known as of a program. For our study, the
lexicographical term always mean “as written” or the order in which the
sentences of a program listing of a compiler appear. Returning to the
example, we say the sentence C is lexicographically after A2.
It is important to distinguish the lexicographical order almost always
does not correspond to the order of execution of judgments.
One purpose of the boundary elements (A1 and A2 in the example) is
to check the extent to which identifiers are defined and associated objects
(variables).
We are now able to define the term program module or simply module:
A module is a lexicographically contiguous sequence of sentences,
enclosed between boundary elements, and having a set of identifiers.
In other words, a module is a group of contiguous sentences having a
single identifier by which they are referenced.
This definition is generally within the same and can find special
implementations of specific language: such as “paragraphs,” “sections,” and
“applets” COBOL “functions” C, “procedures” Pascal, etc.
A programming language includes a certain type of module that can
be executed only if specific linguistic constructions are performed by the
defining characteristics and activation of these modules.
carry all tasks and keep in mind all items at once. Successful design is based
on an old principle Divide and conquer. Specifically, we will say that the
cost of implementing a computer system can be minimized when it can be
separated into parts:
• Manageably small;
• Solvable separately.
Of course, the interpretation varies from person to person. On the other
hand, many small attempts to partition into parts derived systems with
increased deployment times. This is primarily due to the second point in
many separate systems to implement part A.
Similarly, we can say maintenance costs can be minimized when parts
of a system are:
• Easily relatable to the application;
• Manageably small;
• Correctable separately.
Often the person making the modification is not designed the WHO
system.
It is important that the parts of a system are manageably small in order
to simplify maintenance. Work to find and correct an error in a “piece” 1,000
lines of code is far superior to do with parts of 20 lines. Not only you reduce
the time to locate the fault but also if the change is very cumbersome, you
can rewrite the piece completely. This concept of “disposable module,”
which you have been used successfully many times.
Moreover, to minimize maintenance costs, we must ensure that is
independent of another. In other words, we must be able to make changes to
the module A without introducing unwanted effects in the module B.
read. If we break down the instead expression into smaller parts, we have a
greater amount of linear elements and a reduction in nesting. The resulting
sequence is easier to read expressions:
• A = diff (Y1, Y0) B = diff (X1, X0).
• A2 = square (A) B2 = square (B).
• DISTANCE = SQRT (sum (A2, B2)).
Many aspects of the modularization can be understood only if modules
are discussed in relation to others. In principle, we see the concept of
independence. We say that two modules are completely independent if both
can work completely without presence of the other. This implies that there
are no interconnections between the modules, and has a zero value that on
the scale of “dependency.”
We see generally the greater the number of interconnections between
two modules, have less independence.
The concept of functional independence is a direct derivation of
modularity and concepts of abstraction and information hiding.
The question here is: how much do we need to know about a module to
understand another module? The more we should know about the module B
in order to understand the module A, it is less independent of B.
The sheer number of connections between modules is not a complete
measure of functional independence. Functional independence is measured
with two qualitative criteria: coupling and cohesion. We study in principle
the first one.
Highly modules “coupled” will be joined by strong interconnections,
loosely coupled modules will have few weak interconnections, while the
modules “uncoupled” will have no interconnections between them and be
independent.
Clearly, the overall cost of the system will be strongly influenced by the
degree of coupling between modules.
Using multiple entry points will be there that ensures more than the
minimum number of interconnections to the system. Moreover, if each entry
point functions determines with minimal connection to other modules, the
system behavior is similar to one minimally interconnected.
However, the presence of multiple entry points to the same module may
be an indication that the module is performing over a specific function.
Also, it is an excellent opportunity for the programmer to partially overlap
the code the actions included within the same module, being coupled by
functions Said content.
Similarly, alternate return points are useful often within the spirit of the
normally connected systems. This is a module when will continue to run on
a point that depends on the value resulting from a decision by a subordinate
module previously invoked. In a case of minimal connection, the feeder
module returns the value as a parameter, which must be tested again in the
upper module. However, the upper module may indicate directly by some
means should be continued to the point where program execution, (a relative
value + or – direction from the calling instruction or a parameter to an
explicit address).
We will say that connection has to check if the upper coupling module
control communicates information to subordinate the execution. This
information can be passed as data used as signals or “flags” (flags) or as
memory addresses for conditional jump instructions (branch-address).
These elements control are “disguised” as data.
The data link is minimal, and no system can function without it.
Data communication is necessary for system operation, however,
control, communication is an undesirable characteristic and dispensable,
and however which OCCURS very frequently in programs.
It can be minimized if only the link data is transmitted through the
system interfaces.
The control linkage includes all forms of connection elements to
communicate control. This not only involves transfer of control (addresses
or flags), but it may Involve passage changing data, Regulate, or synchronize
execution of another module.
This form of indirect or secondary coupling control is known as
coordination. Coordination involves a module in the procedural context
of another. This can be understood by the following example: Suppose the
module a calls B will supply module discrete data elements. The function of
112 Parallel Programming
the module B is to group data item and compound module A (upper). The B
module will send the module A, signs or flags indicating that it needs another
elementary item, or to indicate that you are returning the item compound.
These flags will be used within the module A to coordinate B and provide
their operation as required.
When a module modifies the content procedural another module, we
say that there hybrid coupling. The hybrid coupling is a modification of
intermodule sentences. In This case, destination or modified module, the
coupling is seen as while controlling for the caller or switch module is
considered data.
The degree of interdependence between two connected modules with
very strong hybrid coupling is. Fortunately, it is a practice in decline and
reserved almost exclusively in assembler programmers.
Alternatives:
1. We write the literal “72” in all printing routines all programs.
(Linked in writing time).
2. We replace the literal in the manifest constant LONG_PAG which
we assign to the value “72” in all programs (linked at compile
time).
3. We put the LONG_PAG constant external file inclusion programs
(linked at compile time).
4. Our language does not allow the declaration of constants which
defines global variable LONG_PAG to which we assign the
initialization value “72” (linked-time link-editing).
5. We define a parameter file system with a LONG_PAG field to the
assigned which is the value “72.” This value is read along with
other parameters when the system starts. (Linked at runtime).
6. Defined in the parameter file for each record one terminal system
and customize the value of the field depending on the printer
LONG_PAG Have linked each terminal. With printers thus
terminals 12 “print 72 lines per page, and a printer using legal
size having inkjet print paper 80 (linked at runtime).
We now consider the relationship between time and linked intermodule
connections, as it affects the degree of coupling between modules.
Again, a reference intermodule fixed to a specific time reference or
object definition, will have a greater link to a fixed reference-time translation
or even later.
The possibility of a separate compilation module and other system
maintenance facilitate modification, if it should compile all modules
together. Similarly, if the link-editing module is deferred prior to execution
until moment, the implementation of changes will be simplified.
There is a particularly derivative coupling modules lexicographical
program structure. We speak in case coupling ESTA content.
Two forms of coupling can be distinguished content:
• Lexicographical including: Occurs when a module is included
lexicographically in another, and it is a minor form of coupling.
Generally, modules can be run separately not. This is the case in
which the slave module is activated online within the context of
the upper module.
• Partial overlap: is an extreme case of coupling content. Part of
114 Parallel Programming
7.5.5. Decoupling
Decoupling is any systematic or technique to make more independent
program modules method.
Each type of coupling suggests a method of generally decoupling. For
example, the coupling caused by bound can be uncoupled by changing the
parameters appropriate as seen in the example of the line counter programs
printers.
Decoupling from the functional point of view, it can rarely done except
in the early design phase.
As general rule, a design discipline that encourages input-output coupling
and coupling control over the content coupling and coupling hybrid, and
seek to limit the scope of engagement common environment is the Most
effective approach.
Modular Programming 115
module writes the data of another module, for example, misuse of pointers
or explicit memory usage to modify variables.
7.6. COHESION
We can have a console module read from a card with parameters control
records to file erroneous transactions on tape, with valid transaction records
from another file on tape, and the previous master records to file on disk.
This module could be called “readings” That groups and all input operations,
is logically cohesive.
Logical cohesion is stronger than the casual, because it represents a
minimum of association between the problem and the module elements.
However, we can see that a logically cohesive module does not perform a
specific function, but includes a number of functions.
In the above diagram, we can see the processing elements that 1, 2, and
3, are associated by communication on the current input data, while 2, 3, and
4 are linked by the output data.
The data flow diagram (DFD) is an objective means to determine whether
the elements of a module are associated by communication.
The communication relationships have an acceptable degree of cohesion.
The cohesion communication is common in commercial applications.
Typical examples include:
• A print module or record a transaction file.
• A module that data from different sources receives, and transform
and assemble in a print line.
In practical terms, we can say that is one functional cohesion that is not
sequential, for communication, procedure, temporal, logical, casual or.
The most clear and understandable examples eats from the field of
mathematics. A module for calculating square root certainly is highly
cohesive, and probably fully functional. It is unlikely to be superfluous more
beyond absolutely essential for the mathematical function, and are unlikely to
processing elements can be added without altering the calculation somehow.
In contrast, module to calculate square root and cosine, it is unlikely to
be fully functional (two ambiguous functions should be performed).
• 7: Communication;
• 9: Sequential;
• 10: Functional.
Anyway, it is not a fixed rule, but a conclusion.
The obligation of the designer is to understand the effects of variation in
cohesion, especially in terms of modularity, in order to make compromise
solutions benefiting one aspect against another.
Chapter 8
Recursive Programming
CONTENTS
8.1. Classification Of Recursive Functions ............................................. 126
8.2. Design Recursive Functions ............................................................ 127
8.3. Bubble Method ............................................................................... 138
8.4. Sorting By Direct Selection ............................................................. 140
8.5. Method Binary Insertion ................................................................. 141
8.6. Method Quicksort (Quicksort)......................................................... 141
8.7. Mixing Method (Merge Sort) ........................................................... 143
8.8. Sequential Search ........................................................................... 146
8.9. Binary Search (Binary Search) ......................................................... 147
8.10. Seeking The Maximum And Minimum .......................................... 149
8.11. Greedy Method ............................................................................ 152
8.12. Optimal Storage On Tape (Optimal Storage On Tapes) .................. 152
8.13. The Knapsack Problem .................................................................. 153
8.14. Single Source Shortest Paths (Shortest Route From an Origin) ........ 156
126 Parallel Programming
The programming area is very large and with many details. In the
programming language C, as well as in other programming languages can
be applied to technique that was given the name recursion functionality. The
memory allocation can be static or dynamic and at any time can employ
given the combination of these two.
Recursion is a technique with which a problem is solved by substituting a
problem in the same way but simpler.
For example, the definition of factorial for N >= 0.
Another element is the ability to consider checking and verifying that the
solution is correct (mathematical induction).
Generally, recursive solutions are inefficient in time and space iterative
versions, due to subroutine calls, creating dynamic variables in the recursive
Recursive Programming 129
GCD (M, N)
Gcd (N, M MOD N) If N> 0
Perform a recursive tree for MCD (15.4) and MCD (15.3)
5. That makes a program prints a word backwards (no matter their
size) without the use of arrays or dynamic memory.
Two thoughts changed the world. In 1448 in the city of Mainz, Johann
Gutenberg called a goldsmith discovered the way to print books by putting
together two metal parts moving. At this time the dissipation of the Dark
Age and the human intellect is freed, science and technology triumphed
HAD thereby initiated the fermentation of the seed that led to the Industrial
Revolution. Several historians, we owe it suggests that to typography.
Imagine a world in only an elite which could read these lines. But others
insist that was not the key development typography; it was algorithmic
(Figure 8.1).
Recursive Programming 131
The execution time of the algorithm grows as fast as the Fibonacci numbers:
T (n) is exponential in n, which implies that the algorithm is impractical
except for very small values of n.
A demonstration of the complexity of the algorithm can be seen as follows:
Consider a naive approach to calculating the complexity of the recursive
function Fibonacci. If we call S (n) the number of additions to find necessary
F (n). For the first values we have:
S (1) = 0 = S (2) S (3) = 1, S (4) = 2, S (5) = 4, S (6) = 7.
And in general, by induction, the number of sums to compute F (n) is equal
to S (n) = S (n – 1) + S (n–2) +1
fib (200) will take 292 seconds at least. This means that if we start counting
today, would be working after the sun becomes a red giant star.
But the technology has improved so the steps of calculation are roughly
doubling every 18 months: this phenomenon is called Moore’s law. With
this extraordinary growth, possibly fib function will run much faster in the
next year. Its check data, the runtime fib (n) ≈ 20.694n (1.6) n, so it takes
1.6 times longer to compute F(n + 1) than F(n). And under Moore’s Law,
computing power grows each year about 1.6 times. If we can compete with
the F(100) reasonably growth of technology, the following year F(101) can
be calculated. And the following year, F(102) and so on: only one Fibonacci
number each year. This is the behavior of an exponential time.
How long does the algorithm? The loop has only one step and is executed
n – 1 times. So the algorithm to be linear in n. From the exponential time, we
have moved to a polynomial time, great progress at runtime. F(200) is now
reasonable to estimate or even F(200000).
When analyzing an algorithm, you must first check which operations are
employed and their cost.
These operations can be limited in a constant time. Example, comparison
can be done between characters in a fixed time.
Comparing strings depends on the size of the chains. (You can narrow time).
To see the times of an algorithm analysis a priori and a posteriori not
required. In an analysis, a priori that function limits the time of the algorithm
is obtained in an ad hoc analysis on statistical algorithm development in
136 Parallel Programming
F (n) = O (g (n)) if there are two constants “c” and “not” such that
| f (n) | ≤ C | g (n) | for all n> no.
When it is said that an algorithm has a computational time O (g (n)) indicates
that the algorithm is run on a computer x with the same type of data but with
n greater, the time will be less than some constant time | g (n) |.
If A (n) = amnm +. + A1n + ao then A (n) ≈ O (nm)
Where A (n) the number of steps of a given algorithm.
If an algorithm has x order of magnitude whose steps is c1nm1, c2nm2,
c3nm3…., cknmk, Then the order of the algorithm is:
A(Nm) where m = max {I], 1 ≤ i ≤ k.
As an example to understand the order of magnitude suppose there are two
algorithms for the same task in which each requires O (n2) and the other of
OR(N log n). If n = 1024 is required for the first algorithm and 1,048,576
operations for the second algorithm are 10241 operations. If it takes a
computer to perform each operation (in seconds), the algorithm requires
approximately one second, and 1.05-second algorithm requires about 0.0102
seconds for the same input.
The most common times are:
OR (1) <O (log n) <O (n) <O (n log n) <O (n2) <O (n3) <O (2n).
Note: The basis of the logarithm is two. (Log b N = Log(a) N/log b)
OR (1) The number of basic operations is fixed at that time it is bounded by
a constant.
OR (N), O (n2) and O (n3) are the polynomial type.
OR (2n) it is exponential.
Algorithms with greater complexity O (n log n) are sometimes impractical.
An exponential algorithm is practical only with a small value of n.
Example (Table 8.1):
Recursive Programming 137
..... n 10 20 30 40 50 70 100
f(n)--
n 0.00001 s 0.00002 s 0.00003 s 0.00004 s 0.00005 s 0.00007 s 0.0001 s
n logo) 0.00003 s 0.00008 s 0.00014 s 0.00021 s 0.00028 s 0.00049 s 0.0006 s
n' 0.0001 s 0.0004 s 0.0009 s 0.0016 s 0.0025 s 0.0049 s 0.01 s
n3 0.001 s 0.008 s 0.027 s 0.064 s 0.125 s 0.343 s Is
n4 0.01 s 0.16s 0.81 s 2.56s 6.25s 24s 1.6 min
it 0.1 s 3.19 s 24.3 s 1.7 s 5.2 min 28 min 2.7 horas
numl" 0.002 s 4.1 s 17.7 s 5 min 35 s 1 h 4min 2.3 dias 224 dias
2° 0.001 s 1.04 s 17 min 12 dias 35.6 a. 37 Ma 2.6 MEU
3° 0.059 s 58 min 6.5 a. 385495 a. 22735 Ma 52000 Ma 1018 MEU
In Figure 8.4 you can graphically observe the growth of the main
functions of temporal complexity.
The notation O (or large O) is used to indicate an upper bound. You can also
set a lower bound.
F (n) = Ώ (g (n)) iff there is a constant “c” and “not” such that
| f (n) | ≥c | g (n) | for all n> no if F (n) = Ώ (g (n)) f (n) = O (g (n)) then:
F (n) = Ө (g (n)) iff there are no positive constants c, such that for all n> no,
C1 | g (n) | ≤ | f (n) | ≤ Ctwo| G (n) |
This indicates that the worst and best case makes the same amount of time.
Example: That an algorithm searches the maximum n disorderly elements
always perform n – 1 iteration, therefore
Ө (n).
Seek a settlement value
OR(N) Ώ (1)
Exercises
1. Since the algorithm of the towers of Hanoi shown in chapter
recursion, determine their order. That assuming 63 rings have.
2. Investigate the complexity of the algorithm and compare it
Ackermann with the algorithm of the Towers of Hanoi.
So that:
T (N) = N + 5 * N2 is so complexity O (n2).
finds its correct position in the array. At this time, the algorithm shown in C.
#include <stdio.h>
#define N 10
void quicksort (int [], int, int); main () {
int [] = {10,8,7,2,1,3,5,4,6,9}, i;
quicksort (a, 0, N – 1); for (i = 0; i <10; i ++)
printf ( “% d.,” a [i]); putchar ( ‘\ n’);
getchar ();
}
MergeSort void (int [], int, int); void merge (int [], int, int, int); main () {
int i, a [N] = {9,7,10,8,2,4,6,5,1,3};
MergeSort (a, 0,9); for (i = 0; i <10; i ++)
printf ( “% d.,” a [i]); getchar ();
}
void merge (int a [], int low, int mid, int high) {int b [N], h, i, j, k;
h = low; i = low; j = mid + 1;
while (h <= mid && j <= high) {if (a [i] <= a [j]) {
b [i] = a [h]; h ++;
}
else {
b [i] = a [j]; j ++;
} I ++;
}
if (h> mid)
for (k = j; k <= high; k ++) {b [i] = a [k];
i ++;
}
else
for (k = hk <= mid; k ++) {b [i] = a [k];
Recursive Programming 145
i ++;
}
for (k = low; k <= high; k ++) a [k] = b [k];
}
Consider the array of ten elements A (310, 285, 179, 652, 351, 423, 861,
254, 450, and 520). MergeSort starts to split into two subarrays five sizes.
The elements A (1: 5) are in turn divided into three size two arrays. Then l
elements A (1: 3) are divided into two arrangements of their size two and
one. The two values A (1: 2) are divided into a single-element subarray and
melting begins. Therefore, no movement has-been performed. Pictorially
arrangement can be seen as follows:
(310 | 285 | 179 | 652351 | 423, 861, 254, 450, 520)
Vertical bars show the shoulder of the subarrays. A (1) and A (2) are fused
to produce:
(285, 310 | 179 | 652351 | 423, 861, 254, 450, 520)
Then A (3) is fused with A (1: 2) to afford
(179, 285, 310 | 652351 | 423, 861,254, 450, 520)
Elements A (4) and A (5) are fused:
(179, 285, 310 | 351652 | 423, 861, 254, 450, 520)
Following fusion of A (1: 3) and A (4: 5) have
(179, 285, 310, 351, 652 | 423, 861, 254, 450, 520)
At this point the algorithm has returned to the first invocation of MergeSort
and second recursive call is performed. Recursive calls to the right sub
produces the following arrangements:
(179, 285, 310, 351, 652 | 423 | 861 | 254 | 450, 520)
A (6) and A (7) are fused and then A (8) merges with A (6: 7) to give: (179,
285, 310, 351, 652, | 254, 423, 861 | 450, 520)
146 Parallel Programming
#include <stdio.h>
#define N 10 main () {
int x, i = 0, a [N] = {9,7,10,8,2,4,6,5,1,3};
scanf ( “% d,” & x);
while (i <N && a [i] = x!) i ++;
if (i> N – 1)
printf ( “data not found”); else
printf ( “The data is in the position [on% d \ n,” i); getchar ();
getchar ();
}
If there are two or more occurrences of the same value, it is the first one.
However, you can change the algorithm for all occurrences of data sought.
Then a variant of ESTA algorithm is presented, but using recursion, instead
of interactivity.
Recursive Programming 147
#include <stdio.h>
#define N 10
sequential void (int [], int, int, int); main () {
int x, i = 0, a [N] = {9,7,10,8,2,4,6,5,1,3};
scanf ( “% d,” & x); sequential (a, N, x, 0); getchar ();
getchar ();
}
sequential void (int a [], int n, int x, int i) {if (i> n – 1)
printf ( “Data not located \ n”);
else if (a [i] == x)
printf ( “located at position% d \ n data,” i);
else
sequential (a, n, x, i + 1);
}
#include <stdio.h>
148 Parallel Programming
#define N 10
main () {
int x, low, high, mid, j, n, a [] = {1,2,3,5,6,7,8,9,10,13}; low = j = 0;
high = N – 1; scanf ( “% d,” & x); while (low <= high) {
mid = (low + high)/2; if (x <a [mid])
mid-high = 1; else if (x> a [mid])
low = mid + 1; else {
j = mid; break;
}
}
if (j == 0)
printf ( ““); else
printf ( “% d \ n.,” j); getchar ();
getchar ();
}
printf ( “The maximum value is the minimum value% d% d \ n,” max, min);
getchar ();
}
150 Parallel Programming
Now the best case is the elements are increasingly when as the best
n – 1 comparisons require is in the best and the worst 2 required (n – 1)
comparisons. The average will be.
[2 (n – 1) + n – 1]/2 = 3n/2–1
Then a recursive algorithm finds the maximum and that minimum of a set of
elements and handles of divide and conquers strategy shown. This algorithm
sends four parameters; the first two are handled as step parameters by value
and last two are handled as pass by reference parameters. In this case, the
second and third parameters indicate the subset to be analyzed and the last
two parameters are used to return the minimum and maximum of a special
subset. After the recursion minimum and maximum of the given are obtained
in September. The algorithm shown in C:
#include <stdio.h>
void MaxMin (int [], int, int, int *, int *); int max (int, int);
int min (int, int); main () {
int fmax, fmin, a [] = {4,2,10,5, –7,9,80,6,3,1};
MaxMin (a, 0,9, & fmax, fmin &);
printf (“The maximum value is the minimum value% d% d \ n,” fmax, fmin);
getchar ();
}
void MaxMin (int a [], int i, j int, int * fmax, fmin int *) {int gmax, gmin,
hmax, hmin, mid;
if (i == j)
Fmin fmax * = * = a [i]; else if (i == j–1)
if (a [i] <a [j]) {
* Fmax = a [j];
Recursive Programming 151
* Fmin = a [i];
}
else {
* Fmax = a [i];
* Fmin = a [j];
else {
}
}
}
mid = (i + j)/2; MaxMin (a, i, mid, & gmax, gmin &); MaxMin (a, mid + 1,
j, & hmax, & hmin);
* Fmax = max (gmax, hmax);
* Fmin = min (gmin, hmin);
int max (int g, int h) {if (g> h)
g return; return h;
}
T(| _ n / 2 _ |) + T( | n / 2 |) n>2
T(n) = 1 n=2
0 n =1
T (n) = 2T (n/2) +2 = 2 (2T (n/4) +2) +2 = 4T (n/4) + 4 + 2 = 4 (2T (n/8) +
2) + 4 + 2 T (n) = 8T (n/8) + 8 + 4 + 2 = 8 (2T (n/16) +2) + 8 + 4 + 2 = 16T
152 Parallel Programming
Indicate that does this better in practice? Not necessarily. In terms of storage
it is worse because it requires a battery to store i, j, fmax, fmin. Given n log n
+1 element require levels of recursion. Save to 5 values are required and re-
turn address. MaxMin is as inefficient as a battery and recursion is handled.
Example:
N = 3 (I one, I2, I3) = (5, 10, 3).
There are possible orderings.
1, 2,3 5 + 5 + 10 + 5 + 10 + 3 =38
1,3, 2 5 + 5 + 3 + 5 + 3 + 10 =31
2,1,3 10 + 10 + 5 + 10 + 5 + 3 = 43
2,3,1 10 + 10 + 3 + 10 + 3 + 5 = 41
3,1, 2 3 + 3 + 5 + 3 + 5 + 10 = 29
3, 2,1 3 + 3 + 10 + 3 + 10 + 5 =34
0 ≤ Xi ≤ 1 1 ≤ i ≤ n (3)
One possible solution is any set (X1, X2,…, Xn) satisfying (2) and (3). An
optimal solution is a feasible solution which maximizes the gain (1) (Table
8.3).
Example: n = 3, M = 20, (P1, P2, P3) = (25, 24, 15) (W1, W2, W3) = (18,
15, 10)
Four possible solutions are:
154 Parallel Programming
∑WX i i
∑PX i i
(0,2/3,1) 20 31
(0,1,1/2) 20 31.5
1 2 3 4 5 6 NEAR
1 ∞ 10 ∞ 30 45 ∞ 0
2 10 ∞ 50 ∞ 40 25 0
3 ∞ 50 ∞ ∞ 35 15 2, 6
4 30 ∞ ∞ ∞ ∞ 20 1, 6, 0
5 45 40 35 ∞ ∞ 50 2, 3, 0
6 ∞ 25 15 20 55 ∞ 2, 0
156 Parallel Programming
generated. Then the shortest vertex to the nearest second path is generated
and so on. For graphical example, the closest path to V0 is V2 (c (V0, V2)
= 10). So the path V0 V2 will be the first trip generated. The second closest
route is V0 V3 with a distance of 25. The journey V0 V2 V3 generated path
is as follows. To generate the following routes must determine i) the next
vertex with which must generate a shorter path and ii) a shorter way this
vertex. Let S be the set of vertices (Including V0) in which the shortest path
has been generated. For w not in S, is DIST (w) the distance of the shortest
path from V0 going only through this
s vertices at S and ending at w. It is observed that:
I. If the next shortest path is the vertex u, then the path begins at V0,
Uy ends goes through the vertices located in S.
II. The next destination path must be generated that vertex or such
that the minimum distance DIST (u), overall vertices not in S.
III. Having selected a vertex u in II and generated the shortest
path V0 au, the apex or becomes member S. At this point the
dimension of the shortest route starting at V0. Shall be located
on the vertices S and ending at S w cannot decrease. That is,
the distance value DIST (w) can change. If it changes, then it is
because there is a shorter route starting in V0 subsequently wills
then u and w. Intermediate vertices V0 AUY of waw must all be
in S. In addition, the way V0 au must be the shortest; otherwise
DIST (w) is not properly defined. Also, the path cannot be uaw
intermediate vertices.
Observations indicated above form a simple algorithm. (The algorithm was
developed by Dijkstra). In fact, only it determines the magnitude of the
trajectory of the vertex V0 to all vertices in G.
It is assumed that G n vertices are numbered from 1 to n. The set S is kept
in an array with S (i) = 0 if vertex i is not in S and S (i) = 1 if part of S. It is
assumed that the graph is represented by a matrix of costs.
Pseudocode algorithm is as follows: Procedure shortest PATHS (v, COST,
DIST, n)
// DIST (j) is the set of lengths of the shortest route from the vertex v to
// vertex j in the graph G with n vertices.
// G is represented by the cost matrix COST (n, m) Boolean S (1: n); Real
COST (1: n, 1: n), DIST (1: n)
Integer u, v, n, num, i, w i ← for 1 to n do
158 Parallel Programming
1 2 3 4 5 6 7 8
1 0
2 300 0
3 1000 800 0
4 1200 0
Recursive Programming 159
5 1500 0 250
6 1000 0 900 1400
7 0 1000
8 0
If v = 5, indicates that the minimum cost path searches all nodes from the
node
5. Therefore, the run is:
It will be noted that this algorithm has a complexity of O (n2) where n is the
number of nodes. It is a nested list as follows:
Dynamic Programming
CONTENTS
9.1. Optimality Principle ....................................................................... 162
9.2. Multistage Graphs (Multistage Graphs) ........................................... 163
9.3. Traveling Salesman Problem (TSP) ................................................... 166
9.4. Ix Return On The Same Route (Backtracking) .................................. 170
9.5. The Eight Queens Puzzle (8-Queens) .............................................. 171
9.6. Hamiltonian Cycles (Hamiltonian Path) .......................................... 177
162 Parallel Programming
all j <= L and 1 <= r. The edge <(V (i, j, Vi + 1, L)>, j <= L is assigned with
a weight or cost of N (i, Lj) and corresponds to the allocation of Lj resource
units to project i, 1 < = i <r, in addition, G has edges type <V (r, j, V (r + 1,
n)>. each of these edges is assigned a weight of max0 <= p <= nj {N ( r, p)}.
The resulting graph with a three-problems with n = 4 is shown in Figure 9.2.
It should be easy to see that an optimal allocation of resources is defined by
a maximum cost path to t.
Cost (2,2) = min {4 + Cost (3,6), 2 + Cost (3,7), 1+ COST (3,8)} = 7 COST
(2,3) = 9
Cost (2,4) = 18 COST (2.5) = 15
Cost (1,1) = min {9 + Cost (2,2) + Cost 7 (2.3) 3 + Cost (2,4), 2 + Cost (2,5)}
= 16.
So the least cost path sat minimum is 16. This path can be easily determined
if we record the decision made at each stage. Let D (i, j) the value of L which
minimizes c (j, L) + COST (i + 1, L), for the printing of five stages has:
D (3.6) = 10; D (3.7) = 10; D (3.8) = 10;
D (2,2) = 7; D (2,3) = 6; D (2.4) = 8; D (2.5) = 8; D (1.1) = 2
So the minimum cost path can be s = 1, v2, v3, v4. vk–1, t. Is easy to see
that v2 = D (1,1) = 2; v3 = D (2, D (1,1)) = 7 and v4 = D3, D (2, D (1,1)))
= D (3,7) = 10.
D (j) = r
repeat
// seeks the path of least cost. P (1) = 1; P (k) = n
For j = 2 to k – 1 do
P (j) = D (P (j–1));
repeat
end FGRAPH
It will be noted that there are two control operators for non-nested, so
that the time required will Ө (n).
n 2n n!
1 2 1
2 4 2
4 16 24
8 256 40,320
Let G = (V, E) A directed graph with cost of each edge ij, ij is defined
such that cij > 0 for all j = ∞ ci iyjy if <i, j> ¢ E. Sea | V | = Ny assume that n
> 1. A tour G is a directed cycle that includes all vertices in V. The cost of the
tour is the sum of costs of edges in the tour. The traveling salesman problem
(TSP) is to find a tour that minimizes costs.
An application of the problem is this: Suppose you have a route of a
postal truck that collects mail from mailboxes located in n different sites.
A graph of n + 1 vertices can be used to represent this situation. A vertex
represents the post office where postal truck begins its journey and which
must return. The edge <i, j> is assigned a cost equal to the distance from
the site i to site j. The route taken by the postal van is a tour and what is
expected to minimize the path of the truck. In the following discussion will
be discussed on a journey that starts at the apex 1 and ends at the same
vertex, being scoured the minimum cost. All turns consists of an edge <1, k>
Dynamic Programming 167
for some k є V – {1} and a path from vertex to vertex k 1. k path from vertex
to vertex 1 goes through each vertex in V – {1, k}. Hence the optimization
principle is maintained. Let g (i, S) the length of shortest path starting at
vertex i, going through all vertices in S and ending at vertex 1. g (1, V – {1})
is the length of a optima tour of a traveling salesman. Since the beginning of
optimality it follows that:
g (1, V – {1}) = min {ci k + g (k, V – {1, k})} 1
2 <= k <= n
Generalizing is obtained (for i ¢ S):
g (i, S) = min {ci j + g (j, S – {j})} 2
jєS
Can be solved g (1, V – {1}) if we know g (k, V – {1} k) for all k options. G
values can be obtained using (2). Clearly, g (i, ф) = Ci, 1, 1 <= i <= n. From
here we can use (2) to obtain g (i, S) for all size S 1, then one can obtain g (i,
S) to S with | S | = 2, etc. When | S | <N – 1, i, and S values for which they
need g (i, S) are such that i ≠ 1; 1 ¢ S ¢ ei S.
int tsp (int ** cost, int * m, int d, int or v int, int r) {if (d == 1)
return cost [or] [r]; int dist, dmin = 999; m [o] = 1;
for (int i = 0; i <v; i ++) if (m [i] == 0) {
dist = cost [or] [i] + tsp (cost, m, d-1, i, v, r); m [i] = 0;
if (dist <dmin) {dmin = dist;
}
}
Dynamic Programming 169
return dmin;
}
Example. Consider the following chart where the size of the edges are given
in matrix c:
Figure 8.3 directed graph whose length of each edge is located in the matrix
C. g (2, ф) = c2,1 = 5, g (3, ф) = c; 3.1 = 6, g (4, ф) = c; 4.1 = 8;
Using a (2) is obtained:
g (2, {3}) = g + C2,3 (3, ф) = 15 g (2, {4}) = c 243 + G (4, ф) = 18; g (3,
{2}) = 18 g (3, {4}) = twenty;
g (4, {2}) = 13 g (4, {3}) = fifteen;
Now, we calculate g (i, S) with | S | = 2, i ≠ 1, 1 ¢ S ¢ ei S.
g (2, {3,4}) = min {c23 + g (3, {4}), c24 + g (4, {3})} = 25
g (3, {2,4}) = min {c32 + g (2, {4}), c34 + g (4, {2})} = 25
g (4, {2,3}) = min {c42 + g (2, {3}), c43 + g (3, {2})} = 23
Finally, from (1) we obtain:
g (1, {2,3,4}) = min {c12 + g (2, {3, 4}), c13 + g (3, {2,4}), c14 + g (4, {2,
3 })}
170 Parallel Programming
x3…, Xn). Sometimes such searches all the vectors that satisfy P. For
example, order the integers located in A (1: n) is a problem whose solution
is expressed by n tuples where xi is the index of where the i- is located the
smallest element. The function P is the inequality A (xi) <= A (xi + 1) for 1
<= i<N. Management numbers is not itself an example of backtracking, is
just one example of a problem whose solution can be formulated by means
of n tuples. In this section, some algorithms which solution is best done by
backtracking are studied.
Suppose my size is set Si. Then there are m = m1, m2…, mn tuples
whose potential candidates can meet P. function approach “brute force”
would form n-tuples all and evaluate each one with P. saving those who
produce optimal. Backtracking virtue is the ability to produce the same
response with fewer steps. Its basic idea is to construct a vector and evaluate
the function P (x1, x2, x3,…, Xn) to test whether the newly formed vector
has a chance to be the optimum. The great advantage of this method is: if
it appears that the partial vector (x1, x2, x3, xn.) Has the ability to lead to a
possible optimal solution, then m1 + 1. mn can be completely ignored.
Several problems are solved using backtracking require that all solutions
meet a complex set of constraints. These restrictions can be divided into
two categories: explicit and implicit. Explicit restrictions are rules whose
restrictions on each xi take values for a given set. An example of explicit
restrictions is:
xi> = 0oS i = {All real numbers are nonnegative} xi = 0 or 1os
i = {0, 1}
li<= Xi <= Ui you i = {A: li <= A <= ui}
Explicit restrictions may or may not depend on a particular instance I of the
problem to be solved. All tuples that satisfy explicit constraints define a pos-
sible solution space I. The implicit constraints determine which of the tuples
in a space solution I actually satisfy the criterion. Thus, implicit restrictions
describe the way in which xi must relate to one another.
the queen i will be placed restriction explicitly using this formulation will
be S = {1, 2, 3,…, 8}, 1≤ i ≤n. By which solution space 88 consists of tuples
8. One of the implicit restrictions of the problem is that two X’s must not be
the same (i.e., all queen must be in different column) and either on the same
diagonal. The first constraint indicates that all solutions are permutations of
8 tuples (1, 2,…, 8). This reduces the solution space of 88 tuples to 8! Tuples
(Figure 9.4).
#include <stdio.h>
#include <stdlib.h>
void mark (int **, int, int, int); empty void (int **, int);
void solution (int **, int);
void dam (int ** ** int, int, int, int, int); void return (int ** ** int, int, int,
int); cont int = 0;
main () {
system ( “color 2f”);
int ** matrix board **, queens, row 0, column = 0; printf ( “Enter the number
of queens”); scanf ( “% d”, & queens);
matrix = (int **) malloc (sizeof (int *) * queens); board = (int **) malloc
(sizeof (int *) * queens); for (int i = 0; i <queens; i ++) {
matrix [i] = (int *) malloc (sizeof (int) * queens); panel [i] = (int *) malloc
(sizeof (int) * queens);} empty (matrix queens);
empty (board, queens); for (int i = 0; i <queens; i ++)
dam (matrix board, i, 0,1, queens); if (cont == 0)
printf ( “No solutions to the problem with queens% d \ n”, queens); system
176 Parallel Programming
( “PAUSE”);
}
void dam (int ** array, int ** board, int row, int column, int queens, int R)
matrix {[row] [col] = 1;
board [row] [col] = 1; mark (matrix, R, row, column); if (queens == R)
solution (board, queens);
else {
for (int j = 0; j <R; j ++) if (matrix [j] [column + 1] == 0) dam (matrix board,
j, column + 1, queens + 1, R);} return ( matrix board, row, column, R);
}
void return (int ** array, int ** board, int row, int column, int R) {board
[row] [col] = 0;
for (int i = 0; i <R; i ++) {for (int j = 0; j <R; j ++) {
matrix [i] [j] = 0;
}
}
empty void (int ** vect, int queens) {for (int i = 0; i <queens; i ++) for (int j
Dynamic Programming 177
void mark (int ** array, int queens falfil int, int calfil) {for (int row = 0; row
<queens; row ++)
for (int column = 0; column <queens; column ++) if ((Row + == column
falfil + calfil) || (row-column == falfil-calfil))
matrix [row] [col] = 1; for (int row = 0; row <queens; row ++) {
matrix [falfil] [row] = 1; matrix [row] [calfil] = 1;
}
}
9.6.1. Definition
A Hamiltonian path is a path passing through each vertex exactly once.
A graph that contains a Hamiltonian path is called a Hamiltonian circuit
Hamiltonian cycle or if it is a cycle passing through each vertex exactly once
(except the apex of that part and to which arrives). A graph that contains a
Hamiltonian cycle is said Hamiltonian graph (Figure 9.8).
178 Parallel Programming
end if
Repeat
End next Value
Using the procedure, next value can particularize the recursive scheme
backtracking to find all Hamiltonian cycles.
This procedure first initializes the adjacent matrix GRAPH (1: n, 1: n), then
set X (2: n) ← 0, X (1) ← 1 and executes the call to Hamiltonian (2).
The traveling salesman problem (TSP) is a Hamiltonian cycle with the dif-
ference that each edge has a different cost.
Chapter 10
CONTENTS
10.1. General Description ..................................................................... 182
10.2. Poda Strategies.............................................................................. 184
10.3. Branching Strategies...................................................................... 184
10.4. The Traveling Salesman Problem (TSP) .......................................... 187
182 Parallel Programming
Function RyP {
Sons P = (x, k)
while (not empty (P)) x (k) = extract (P)
if esFactible (x, k) and G (x, k) <optimal if esSolution (x)
Branch and Bound 183
where:
• G (x) It is the function estimation algorithm.
• P It is the pile of possible solutions.
• esFactible. It is the function that considers whether the proposal
is valid.
• esSolution. It is the function that checks if the target is met.
• Optimum is the value of the function to optimize assessed on the
best solution found so far.
• Note: We use less than (<) for minimization problems and greater
than (>) for maximization problems.
to expand by the algorithm. The LNV contains all nodes that have been
generated but that have not been explored yet. Depending on how the nodes
are stored in the list, the tree traversal will be one or the other, leading to the
three strategies listed below.
FIFO strategy: The strategy FIFO (First in first out), the LNV will be a
tail, Resulting in a path in the tree width (Figure 10.1).
Figure 10.3: Tree states for a traveling salesman problem with n = 4 and i0 =
i4 = 1.
In order to use branch and bound LC to search tree states traveling
salesman requires defining a cost function c (.) And two other ĉ functions
(.) and u (.) So that C (R) ≤ c (.) (.) (.) (R) ≤ u (R) for all node R. c is the
solution node if c is the node of lower cost for the shorter tour G. One way
to choose c it is:
A simple ĉ (.) Such that c (A) ≤ c (A) for all A is obtained by defining
ĉ (A) to be the size of the path defined in the node A. For example, the
path defined in the preceding tree is i0, i1, i2, = 1, 2, 4. This consists of
edges <1,2> and <2, 4>. A ĉ (.) Can be obtained is better using the reduced
cost matrix corresponding to G. A row (column) is reduced if and only if it
contains at least one zero and all other values are non-negative. A matrix
is reduced if and only if all row and column is reduced. As an example of
reducing the cost of a matrix of a given graph G, consisting of the matrix.
The matrix corresponds to a graph with five vertices. All tour includes
exactly one edge <i, j> i = k 1 ≤ 5 ≤k exactly one edge <i, j> j = k, 1 ≤ k
≤ 5, subtracting a constant t of all elements a row or a column of the cost
matrix size of each path exactly t units is reduced. A tour of minimum cost
remains after this subtraction operation. If t is chosen to make minimum the
entry in row i (column j subtracting i from all entries in row i (column j)
present a zero in row i (column j). By repeating this procedure as necessary,
the matrix of costs can be reduced. The total amount subtracted from all
columns and rows is the lower limit and can be used as the C value of the
tree root state space. Subtracting 10, 2, 2, 3, 4, 1 and 3 of the rows 1, 2, 3,
4, 5, and columns 1 and 3, respectively, matrix paragraph a,) of the figure
above has the reduced matrix subsection b) of the same figure. The total
Branch and Bound 189
amount subtracted is 25. Therefore, all travel origin has a length of at least
25 units.
With all nodes in the tree traveling salesman states can be associated to
a cost matrix. Either the node cost matrix R. Let S be the child of R such that
the edge of the tree (R, S) It corresponds to the edge <i, j> in the path.
If S is not a leaf then the cost matrix for S can be obtained as follows:
• By choosing the path <i, j>, change any value in the row i and
column j of A ∞. This prevents the use of certain outgoing edges
of the incoming vertex io vertices j.
• Position A (j, 1) ∞. This prevents the use of edges <j 1>.
• Reduce all the rows and columns in the resulting matrix except
for the rows and columns containing only ∞. Each adds zero
difference in the variable “r.” The resulting matrix is B.
• C (S) = C (R) + A <i, j> +r
It is “S” number of current node. Being “R” the number of parent node.
The first two steps are valid and will not exist a route in the subtree S
containing the edges of the type <j, k> or <k, j> or <j, 1> (except for the
edge <i, j>). At this time, “r” is the total amount subtracted from step 3, then
C (S) = C (R) + A <i, j> + r. For leaf nodes C (.) = C () it is easy to calculate
because each branch to the sheet defines a single path. For the function of
the upper bound u, it is required to use or (R) = ∞ for every node R.
For transition tree the following example will be taken:
Subtracting by rows:
The h1 is subtracted 10. Therefore, r = 10 A h2 is subtracted 2. Therefore,
r = 12 A h3 is subtracted 2. Therefore r = 14 the h4 is subtracted 3. As both
r = 17 a h5 4. Subtracted therefore r = 21
The resulting matrix is:
For S = 2 (1.2):
And knowing the lowest cost, the first part of the path is chosen, where
A <2.1> = ∞.
In this case, any row and column in all there is a zero, so that r = 0.
In this case, the whole row has at least one zero, but the first column is
nonzero, so r = 11.
It notes that the leaf node with the lowest cost is S = 10.
Therefore: C (11) = 28 + 0 + 0 = 28.
#include <stdio.h>
#include <stdlib.h>
List struct {
int ** matrix * brands, cost accountant, city, mc; * Sig list;
};
void restar_fila (List * q, int size, int min, int i) {// Subtract the
minimum each row for (int k = 0; k <tam; k ++)
if (! q-> matrix [i] [k] = 999 && q-> matrix [i] [k] = 0) q-> matrix [i]
[k] - = min;
}
void restar_columna (List * q, int size, int min, int i) {// Subtract the
minimum each column for (int k = 0; k <tam; k ++)
if (! q-> matrix [k] [i] = 999 && q-> matrix [k] [i] = 0) q-> matrix [k]
[i] - = min;
}
for (int j = 1; j <tam; j ++) {if (min> q-> matrix [i] [j]) min = q-> matrix
[i] [j];
}
if (min! = 999)
q-> Cost + = min; if (min! = 0 && min! = 999)
restar_fila (q, tam, min, i);
}
for (int i = 0; i <SIZE; i ++) {min = q-> array [0] [i];
for (int j = 1; j <tam; j ++) {if (min> q-> matrix [j] [i])
min = q-> matrix [j] [i];
}
if (min! = 999)
q-> Cost + = min; if (min! = 0 && min! = 999)
restar_columna (q, tam, min, i);
}
}
void generate (List * q, int size) {// Generate the matrix minimum
costs int i, j, cont = 0, destination;
for (i = 0; i <SIZE; i ++) for (j = 0; j <tam; j ++)
q-> matrix [i] [j] = 999;
printf ( “To stop intorducir a city introduces 99”); while (cont <tam) {
printf ( “\ nCiudad% d \ n “Cont + 1); do {
printf ( “In da city%:” cont + 1); scanf ( “% d”, & destination); if (target
== 99)
printf ( “% d Term city” cont + 1); else if (destination> tam destination
|| <= 0) printf ( “This city does not exist \ n”);
else if (cont destination == -1)
printf ( “You are in this city \ n”); else {
printf ( “Enter cost% to% d”, cont + 1 destination); scanf ( “% d”, & q->
matrix [cont] [destination-1]);
}
Branch and Bound 195
That is, there is no type algorithm seeking Hilbert. A cynic might say that
mathematicians gave a sigh of relief, because if there were such an algorithm,
all would be out of work when he was found. However, mathematicians
were surprised this remarkable discovery.
The decision problem was introduced to challenge the symbolic logician
finding a general algorithm, where the goal was to decide whether a first-
order calculation is a theorem. In 1936, Alonzo Church and Alan Turing
independently showed that it is impossible to write such an algorithm. As
a result, it is also impossible to decide with an algorithm if certain specific
arithmetic phrases are true or false.
The question goes back to Gottfried Leibniz who in the XVII century,
after successfully building a mechanical calculating machine, dreamed of
building a machine that could manipulate symbols to determine whether
a sentence in mathematics is a theorem. The first thing you would need is
a clear and precise formal language. In1928, David Hilbert and Wilhelm
Ackermann proposed this problem in its above-mentioned formula.
A first-order logical formula is called universally valid or invalid if
logically follows from the axioms of calculating first order. The Gödel’s
completeness theorem establishes a logical formula is universally valid in
this sense if and only if it is true in every interpretation of the formula in a
model.
Before you can answer this question, it was necessary to formally define
the notion of algorithm. This was done by Alonzo Church in 1936 with the
concept of “effective calculability” based on his lambda calculus and Alan
Turing based on the Turing machine. The two approaches are equivalent, in
the sense that can be solved exactly the same problems with both approaches.
The negative response to decision problem was given by Alonzo Church
in 1936 and independently shortly thereafter by Alan Turing, 1936. Church
also demonstrated no algorithm (defined by the recursive functions) can
decide whether two lambda calculus expressions are equivalent or not.
Church for this was based on previous work Stephen Kleene. Moreover,
Turing reduced this problem of the stop for Turing machines. Generally,
it considered to be the Turing test has been more influential than Church.
Both works were influenced by previous work Kurt Godel about him
incompleteness theorem, especially by the method of assigning numbers to
logical formulas to reduce logic arithmetic.
Turing’s argument is as follows: Suppose we have a general decision
algorithm for first-order logic. You can translate the question whether a
Branch and Bound 199
Turing machine ends with a formula of the first order, which could then be
submitted to the decision algorithm. But Turing had already shown that there
is no general algorithm that can decide whether a Turing machine stops.
It is important to note that if the problem is restricted to a specific theory
of first order with constant, constant predicates and axioms, there may be
a decision algorithm for the theory. Some examples of decidable theories
are: arithmetic Pressburger and the static type systems of the Programming
languages.
However, the general theory of first order for natural numbers known
as Peano arithmetic cannot be decided with that kind of algorithm. This
follows from Turing’s argument summarized above.
In addition, Godel’s theorem showed that there is no algorithm whose
input can be any statement about integers and whose output or is not true.
Following closely Gödel, other mathematicians as Alonso Church, Stephen
Kleene, Emil Post, Alan Turing, and many others, found more problems that
lacked algorithmic solution. Perhaps the most striking feature of these early
results on problems that cannot be solved by computers is that they were
obtained in the 1930s before the first computer was built!
Chapter 11
Turing’s Hypothesis
CONTENTS
11.1. Hypothesis Church–Turing ............................................................ 202
11.2. Complexity ................................................................................... 202
11.3. Thesis Sequential Computability ................................................... 203
11.4. NP Problems................................................................................. 204
202 Parallel Programming
11.2. COMPLEXITY
The study of computability leads to understand what problems that support
algorithmic solution and which are not. Of those problems for which there
are algorithms also interesting to know how many computing resources are
needed for implementation. Only algorithms that use a feasible amount of
Turing’s Hypothesis 203
11.4. NP PROBLEMS
This section contains what is perhaps the most important development in
research on algorithms in the early 70’s, not only in computer science but
also in electrical engineering, operations research and related areas.
An important idea is the distinction between a group of problems whose
solution is obtained in polynomial time and a second group of problems
whose solution is not obtained in polynomial time.
The theory of NP-complete does not provide algorithms for solving the
problems of the second group in polynomial time; it does not say that there
is some algorithm in polynomial time. Instead, it explains that everyone
has problems not currently in polynomial time algorithm is computationally
related. Actually, you can set two kinds of problems. These will be the NP-
hard problems and NP-complete. One problem is NP-complete will have the
property that can be solved in polynomial time if and only if all other NP-
complete also can be solved in polynomial time. If an NP-hard problem can
be solved in polynomial time, then all NP-complete problems can be solved
in polynomial time.
While several issues are defined with the property of being classified
as NP-hard or NP-complete (problems that are not solved sequentially in
polynomial time), these same problems can be solved in non-deterministic
machine in polynomial time.
⊥, x2 =
x1 = ⊥ Y x3 =
T
be NP-complete, while some decision problems can be NP-hard type and are
not NP-complete.
The tape drive is similar to a traditional Turing machine. The only difference
is that each element of the tape machine is a quantum qubit. The alphabet of
this new machine is formed by the space of values qubit. Head position is
represented by an integer variable (Figure 11.2).
Acceleration of parallelization 55 C
Acceleration program 55 Calculation of communicating sys-
Additional agglomeration 47 tems 59
Algorithm grows 134 Calculation of Resonance Frequen-
Algorithmic solution 202 cy 29
Algorithm implemented 76 Calculation of resources 5
Algorithm remains proportional 141 Code and style sheets (CSS) 72
Amazon Web Services (AWS) 10 Combinatorial optimization 186
Application management 15 Combinatorial problem 171
Applications investigated 131 Communication 6, 7, 34, 35, 36, 37,
Application-specific integrated cir- 39, 40, 41, 43, 44, 45, 46, 49
cuit approaches (ASIC) 66 Communication and synchroniza-
Appropriate partition 34 tion 52, 53
Atmosphere 42, 43, 44, 45, 46, 47, Comparing strings 135
48 Comprehensive visualization 76
Atmospheric processes 42 Computational electromagnetic field
Automatic parallelization 67 3
B Computational environment 4
Computer program 60
Berkeley Open Infrastructure for Computer services 4
Network Computing (BOINC) Consuming valuable resources 92
64 Countryside mathematical 177
Bound design method algorithm 182 Critical parameter 129
Bound technique 182 Critical processes 92
218 Parallel Programming
T Written communication 93