0% found this document useful (0 votes)
13 views32 pages

Lecture 5

Uploaded by

Chhotu Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views32 pages

Lecture 5

Uploaded by

Chhotu Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Amazon web services (AWS)

EC2 also provides the capability to save a specific running instance as an


image, thus allowing users to create their templates for deploying systems.
These templates are stored in S3 which delivers persistent storage on
demand.
S3 is organized into buckets; these are containers of objects that are stored
in binary form and can be enriched with attributes.
Users can store objects of any size, from simple files to entire disk images,
and have them accessible from everywhere.
Besides EC2 and S3, a wide range of services can be leveraged to build
virtual computing systems. including networking support, caching systems,
DNS, database (relational and not) support, and others
Google AppEngine
 Google AppEngine is a scalable runtime environment mostly devoted to executing Web applications.
 These take advantage of the large computing infrastructure of Google to dynamically scale as the demand
varies over time.
 AppEngine provides both a secure execution environment and a collection of services that simplify the
development of scalable and high-performance Web applications.
 These services include in-memory caching, scalable data storage, job queues, messaging, and cron tasks.
 Developers can build and test applications on their own machines using the AppEngine software
development kit (SDK), which replicates the production runtime environment and helps test and profile
applications.
 Once development is complete, developers can easily migrate their application to AppEngine, set quotas to
contain the costs generated, and make the application available to the world. The languages currently
supported are Python, Java, and Go.
 Go (also known as Golang) is indeed a programming language. It was developed by Google engineers
and has gained popularity for its simplicity, efficiency, and strong support for concurrency.
Microsoft Azure
Microsoft Azure is a cloud operating system and a platform for developing applications in the cloud. It
provides a scalable runtime environment for Web applications and distributed applications in
general.
 Applications in Azure are organized around the concept of roles, which identify a distribution unit for
applications and embody the application’s logic. Currently, there are three types of roles: Web role,
worker role, and virtual machine role.
The Web role is designed to host a Web application, the worker role is a more generic container of
applications and can be used to perform workload processing, and the virtual machine role provides
a virtual environment in which the computing stack can be fully customized, including the operating
systems.
Besides roles, Azure provides a set of additional services that complement application execution,
such as support for storage (relational data and blobs), networking, caching, content delivery, and
others.
Hadoop
Apache Hadoop is an open-source framework that is suited for processing large data sets on commodity
hardware.
Hadoop is an implementation of MapReduce, an application programming model developed by Google,
which provides two fundamental operations for data processing: map and reduce.
The former transforms and synthesizes the input data provided by the user; the latter aggregates the
output obtained by the map operations.
Hadoop provides the runtime environment, and developers need only provide the input data and
specify the map, and reduce functions that need to be executed.
Yahoo!, the sponsor of the Apache Hadoop project, has put considerable effort into transforming the
project into an enterprise-ready cloud computing platform for data processing.
Hadoop is an integral part of the Yahoo! cloud infrastructure and supports several business processes of
the company.
Currently, Yahoo! manages the largest Hadoop cluster in the world, which is also available to academic
institutions.
Force.com and Salesforce.com
Force.com is a cloud computing platform for developing social enterprise applications.
The platform is the basis for SalesForce.com, a Software-as-a-Service solution for customer
relationship management.
Force.com allows developers to create applications by composing ready-to-use blocks; a complete
set of components supporting all the activities of an enterprise is available.
It is also possible to develop your own components or integrate those available in AppExchange into
your applications.
The platform provides complete support for developing applications, from the design of the data
layout to the definition of business rules and workflows and the definition of the user interface.
The Force.com platform is completely hosted on the cloud and provides complete access to its
functionalities and those implemented in the hosted applications through Web services technologies.
Manjrasoft Aneka
Manjrasoft Aneka is a cloud application platform for the rapid creation of scalable
applications and their deployment on various types of clouds in a seamless and elastic
manner.
It supports a collection of programming abstractions for developing applications and a
distributed runtime environment that can be deployed on heterogeneous hardware
(clusters, networked desktop computers, and cloud resources).
Developers can choose different abstractions to design their application: tasks, distributed
threads, and map-reduce.
These applications are then executed on the distributed service-oriented runtime
environment, which can dynamically integrate additional resources on demand.
The service-oriented architecture of the runtime has a great degree of flexibility and
simplifies the integration of new features, such as the abstraction of a new programming
model and associated execution management environment.
Manjrasoft Aneka
Services manage most of the activities happening at runtime: scheduling, execution, accounting,
billing, storage, and quality of service.
These platforms are key examples of technologies available for cloud computing.
They mostly fall into the three major market segments identified in the reference model:
Infrastructure-as-a-Service, Platform-as-a-Service, and Software-as-a-Service.
Principles of Parallel and Distributed Computing
 Eras of Computing:
• The two fundamental and
dominant models of computing
are sequential and parallel.
• The sequential computing era
began in the 1940s;
• The parallel (and distributed)
computing era followed it
within a decade.
• The four key elements of
computing developed during
these eras are architectures,
compilers, applications, and
problem-solving
environments.
Principles of Parallel and Distributed Computing
 Eras of Computing (Contd.):
• The computing era started with a development in hardware architectures, which enabled the creation
of system software—particularly in the area of compilers and operating systems—which supports the
management of such systems and the development of applications.

• The development of applications and systems is a major element, and it comes to consolidation when
problem-solving environments were designed and introduced to facilitate and empower engineers.

• This is when the paradigm characterizing computing achieved maturity and became mainstream.
Moreover, every aspect of this era underwent a three-phase process: research and development
(R&D), commercialization, and commoditization
Principles of Parallel and Distributed Computing
 Parallel vs. Distributed computing :
• The terms parallel computing and distributed computing are often used interchangeably, even though they
mean slightly different things.

• The term parallel implies a tightly coupled system, whereas distributed refers to a wider class of systems,
including those that are tightly coupled.

• Parallel Computing:
• The term parallel computing refers to a model in which the computation is divided among several processors
sharing the same memory.
• The architecture of a parallel computing system is often characterized by the homogeneity of components: each
processor is of the same type and it has the same capability as the others.
• The shared memory has a single address space, which is accessible to all the processors.
• Parallel programs are then broken down into several units of execution that can be allocated to different
processors and can communicate with each other by means of the shared memory.
• Originally we considered parallel systems only those architectures that featured multiple processors sharing the
same physical memory and that were considered a single computer.
Principles of Parallel and Distributed Computing
 Distributed Computing:
 The term distributed computing encompasses any architecture or system that allows the computation to be
broken down into units and executed concurrently on different computing elements, whether these are
processors on different nodes, processors on the same computer, or cores within the same processor.

 Therefore, distributed computing includes a wider range of systems and applications than parallel
computing and is often considered a more general term.

 Even though it is not a rule, the term distributed often implies that the locations of the computing elements
are not the same and such elements might be heterogeneous in terms of hardware and software features.

 Classic examples of distributed computing systems are computing grids or Internet computing systems,
which combine together the biggest variety of architectures, systems, and applications in the world.
Principles of Parallel and Distributed Computing
 Elements of Parallel Processing:
• Processing of multiple tasks simultaneously on multiple processors is called parallel processing.
• The parallel program consists of multiple active processes (tasks) simultaneously solving a given
problem.
• A given task is divided into multiple subtasks using a divide-and-conquer technique, and each
subtask is processed on a different central processing unit (CPU).
• Programming on a multiprocessor system using the divide-and-conquer technique is called parallel
programming.
Principles of Parallel and Distributed Computing
 Hardware architectures for parallel processing
• The core elements of parallel processing are CPUs.
• Based on the number of instruction and data streams that can be processed simultaneously, computing
systems are classified into the following four categories:
• Single-instruction, single-data (SISD) systems
• Single-instruction, multiple-data (SIMD) systems
• Multiple-instruction, single-data (MISD) systems
• Multiple-instruction, multiple-data (MIMD) systems
Principles of Parallel and Distributed Computing
• Single-instruction, single-data (SISD) systems:
• An SISD computing system is a uniprocessor machine capable of executing a single instruction, which
operates on a single data stream.
• In SISD, machine instructions are processed sequentially; hence computers adopting this model are
popularly called sequential computers.
• Most conventional computers are built using the SISD model. All the instructions and data to be
processed have to be stored in primary memory.
• The speed of the processing element in the SISD model is limited by the rate at which the computer can
transfer information internally.
• Dominant representative SISD systems are IBM PC, Macintosh, and workstations.
Principles of Parallel and Distributed Computing
• Single-instruction, multiple-data (SIMD) systems:
• An SIMD computing system is a multiprocessor machine capable of executing the same instruction on all the
CPUs but operating on different data streams.
• Machines based on a SIMD model are well suited to scientific computing since they involve lots of vector and
matrix operations.
• Dominant representative SIMD systems are Cray’s vector processing machine and Thinking Machines’ cm.
Principles of Parallel and Distributed Computing
• Multiple-instruction, single-data (MISD) systems:
• An MISD computing system is a multiprocessor machine capable of executing different instructions on different
PEs but all of them operating on the same data set.
• Machines built using the MISD model are not useful in most of the applications; a few machines are built, but
none of them are available commercially.
Principles of Parallel and Distributed Computing
• Multiple-instruction, multiple-data (MIMD) systems:
• An MIMD computing system is a multiprocessor machine capable of executing multiple instructions on multiple
data sets.
• Each PE in the MIMD model has separate instruction and data streams; hence machines built using this model
are well suited to any kind of application.
• Unlike SIMD and MISD machines, PEs in MIMD machines work asynchronously.
Principles of Parallel and Distributed Computing
• MIMD machines are broadly categorized into shared-memory MIMD and distributed-memory MIMD based on
the way PEs are coupled to the main memory.
• In the shared memory MIMD model, all the PEs are connected to a single global memory and they all have
access to it.
• Systems based on this model are also called tightly coupled multiprocessor systems.
• The communication between PEs in this model takes place through the shared memory; modification of the
data stored in the global memory by one PE is visible to all other PEs.
Principles of Parallel and Distributed Computing
• Distributed memory MIMD machines In the distributed memory MIMD model, all PEs have a local memory.
Systems based on this model are also called loosely coupled multiprocessor systems.
• The communication between PEs in this model takes place through the interconnection network (the
interprocess communication channel, or IPC).
• The network connecting PEs can be configured to tree, mesh, cube, and so on.
• Each PE operates asynchronously, and if communication/synchronization among tasks is necessary, they can do
so by exchanging messages between them.
Principles of Parallel and Distributed Computing
• Approaches to parallel programming:
• A sequential program runs on a single processor and has a single line of control.
• To make many processors collectively work on a single program, the program must be divided into smaller
independent chunks so that each processor can work on separate chunks of the problem.
• The program decomposed in this way is a parallel program.
• A wide variety of parallel programming approaches are available. The most prominent among them are the
following:
• Data parallelism
• Process parallelism
• Farmer-and-worker mode

• These three models are all suitable for task-level parallelism.


Principles of Parallel and Distributed Computing
• Data parallelism:
• In the case of data parallelism, the divide-and-conquer technique is used to split data into multiple sets, and
each data set is processed on different PEs using the same instruction.
• This approach is highly suitable for processing on machines based on the SIMD model.

• process parallelism:
• In the case of process parallelism, a given operation has multiple (but distinct) activities that can be processed
on multiple processors.

• farmer-and-worker:
• In the case of the farmer-and-worker model, a job distribution approach is used: one processor is configured as
master and all other remaining PEs are designated as slaves;
• The master assigns jobs to slave PEs and, on completion, they inform the master, which in turn collects results.
• These approaches can be utilized in different levels of parallelism.
Principles of Parallel and Distributed Computing
• Levels of parallelism:
• Levels of parallelism are decided based on the lumps of code (grain size) that can be a potential candidate for
parallelism.
• Table 2.1 lists categories of code granularity for parallelism.
• All these approaches have a common goal: to boost processor efficiency by hiding latency.
• To conceal latency, there must be another thread ready to run whenever a lengthy operation occurs.
• The idea is to execute concurrently two or more single-threaded applications, such as compiling, text
formatting, database searching, and device simulation.
Principles of Parallel and Distributed Computing
• Parallelism within an application can be detected at several levels:
• Large grain (or task level)
• Medium grain (or control level)
• Fine grain (data level)
• Very fine grain (multiple-instruction issue)
Principles of Parallel and Distributed Computing
 Elements of distributed computing:
 Distributed computing studies the models, architectures,
and algorithms used for building and managing distributed
systems.
 A distributed system is a collection of independent
computers that appears to its users as a single coherent
system.
 A distributed system is one in which components located
at networked computers communicate and coordinate
their actions only by passing messages.
Principles of Parallel and Distributed Computing
 Components of a distributed
system:
• A distributed system is the result of the interaction
of several components that traverse the entire
computing stack from hardware to software.
• It emerges from the collaboration of several
elements that—by working together—give users
the illusion of a single coherent system.
• Figure 2.10 provides an overview of the different
layers that are involved in providing the services of
a distributed system.
Principles of Parallel and Distributed Computing
• At the very bottom layer, computer and network
hardware constitute the physical infrastructure;
these components are directly managed by the
operating system, which provides the basic
services for interprocess communication (IPC),
process scheduling and management, and
resource management in terms of file system
and local devices.
• Taken together these two layers become the
platform on top of which specialized software is
deployed to turn a set of networked computers
into a distributed system.
Principles of Parallel and Distributed Computing
• At the operating system level and even more at
the hardware and network levels allows easy
harnessing of heterogeneous components and
their organization into a coherent and uniform
system.
• For example, network connectivity between
different devices is controlled by standards,
which allow them to interact seamlessly.
• At the operating system level, IPC services are
implemented on top of standardized
communication protocols such as Transmission
Control Protocol/Internet Protocol (TCP/IP), User
Datagram Protocol (UDP), or others.
Principles of Parallel and Distributed Computing
• The middleware layer leverages such services to
build a uniform environment for the
development and deployment of distributed
applications.
• This layer supports the programming paradigms
for distributed systems.
• By relying on the services offered by the
operating system, the middleware develops its
own protocols, data formats, and programming
language or frameworks for the development of
distributed applications.
• All of them constitute a uniform interface to
distributed application developers that is
completely independent of the underlying
operating system and hides all the
heterogeneities of the bottom layers.
Principles of Parallel and Distributed Computing
• The top of the distributed system stack is
represented by the applications and services
designed and developed to use the middleware.
• These can serve several purposes and often
expose their features in the form of graphical
user interfaces (GUIs) accessible locally or
through the Internet via a Web browser.
• For example, in the case of a cloud computing
system, the use of Web technologies is strongly
preferred, not only to interface distributed
applications with the end user but also to
provide platform services aimed at building
distributed systems.
Principles of Parallel and Distributed Computing
• The general reference architecture of a
distributed system is contextualized in the case
of a cloud computing system.
Principles of Parallel and Distributed Computing
• Architectural styles for distributed computing:
• Architectural styles are mainly used to determine the vocabulary of components and connectors
that are used as instances of the style together with a set of constraints on how they can be
combined.
• The architectural styles are organized into two major classes:
I. Software architectural styles
II. System architectural styles

• The first class relates to the logical organization of the software;


• The second class includes all those styles that describe the physical organization of distributed
software systems in terms of their major components

You might also like