0% found this document useful (0 votes)
174 views15 pages

Scalable Computing Over The Internet

Uploaded by

rowdy9752
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
174 views15 pages

Scalable Computing Over The Internet

Uploaded by

rowdy9752
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

1.

Scalable Computing over the internet


Over the past 60 years, computing technology has undergone a series of
platform and environment changes. In this section, we assess evolutionary
changes in machine architecture, operating system platform, network
connectivity, and application workload. Instead of using a centralized
computer to solve computational problems, a parallel and distributed
computing system uses multiple computers to solve large-scale problems
over the Internet. Thus, distributed computing becomes data-intensive and
network-centric. This section identifies the applications of modern computer
systems that practice parallel and distributed computing. These large-scale
Internet applications have significantly enhanced the quality of life and
information services in society today.

a. The Age of Internet Computing


Billions of people use the Internet every day. As a result, supercomputer
sites and large data centers must provide high-performance computing
services to huge numbers of Internet users concurrently. Because of this
high demand, the Linpack Benchmark for high-performance computing
(HPC) applications is no longer optimal for measuring system performance.
The emergence of computing clouds instead demands high-throughput
computing (HTC) systems built with parallel and distribu-ted computing
technologies [5,6,19,25]. We have to upgrade data centers using fast
servers, storage systems, and high-bandwidth networks. The purpose is to
advance network-based computing and web services with the emerging new
technologies.

a.1 The Platform Evolution


Computer technology has gone through five generations of development,
with each generation lasting from 10 to 20 years. Successive generations are
overlapped in about 10 years. For instance, from 1950 to 1970, a handful of
mainframes, including the IBM 360 and CDC 6400, were built to satisfy the
demands of large businesses and government organizations. From 1960 to
1980, lower-cost mini-computers such as the DEC PDP 11 and VAX Series
became popular among small businesses and on college campuses.

From 1970 to 1990, we saw widespread use of personal computers built with
VLSI microprocessors. From 1980 to 2000, massive numbers of portable
computers and pervasive devices appeared in both wired and wireless
applications. Since 1990, the use of both HPC and HTC systems hidden in
Clusters, grids, or Internet clouds has proliferated. These systems are
employed by both consumers and high-end web-scale computing and
information services.
The general computing trend is to leverage shared web resources and
massive amounts of data over the Internet. Figure 1.1 illustrates the
evolution of HPC and HTC systems. On the HPC side, supercomputers
(massively parallel processors or MPPs) are gradually replaced by clusters of
cooperative computers out of a desire to share computing resources. The
cluster is often a collection of homogeneous compute nodes that are
physically connected in close range to one another.

On the HTC side, peer-to-peer (P2P) networks are formed for distributed
file sharing and content delivery applications. A P2P system is built over
many client machines (a concept we will discuss further in Chapter 5). Peer
machines are globally distributed in nature. P2P, cloud computing, and web
service platforms are more focused on HTC applications than on HPC
applications. Clustering and P2P technologies lead to the development of
computational grids or data grids.
a.2 High-Performance Computing
For many years, HPC systems emphasize the raw speed performance. The
speed of HPC systems has increased from Gflops in the early 1990s to now
Pflops in 2010. This improvement was driven mainly by the demands from
scientific, engineering, and manufacturing communities. For example, the
Top 500 most powerful computer systems in the world are measured by
floating-point speed in Linpack benchmark results. However, the number of
supercomputer users is limited to less than 10% of all computer users.
Today, the majority of computer users are using desktop computers or large
servers when they conduct Internet searches and market-driven computing
tasks.
a.3 High-Throughput Computing
The development of market-oriented high-end computing systems is
undergoing a strategic change from an HPC paradigm to an HTC paradigm.
This HTC paradigm pays more attention to high-flux computing. The main
application for high-flux computing is in Internet searches and web services
by millions or more users simultaneously. The performance goal thus shifts
to measure high throughput or the number of tasks completed per unit of
time. HTC technology needs to not only improve in terms of batch processing
speed, but also address the acute problems of cost, energy savings, security,
and reliability at many data and enterprise computing centers. This book will
address both HPC and HTC systems to meet the demands of all computer
users.
a.4 Three New Computing Paradigms
As Figure 1.1 illustrates, with the introduction of SOA, Web 2.0 services
become available. Advances in virtualization make it possible to see the
growth of Internet clouds as a new computing paradigm. The maturity
of radio-frequency identification (RFID), Global Positioning System (GPS), and
sensor technologies has triggered the development of the Internet of Things
(IoT). These new paradigms are only briefly introduced here.
When the Internet was introduced in 1969, Leonard Klienrock of UCLA
declared: “As of now, computer networks are still in their infancy, but as they
grow up and become sophisticated, we will probably see the spread of
computer utilities, which like present electric and telephone utilities, will
service individual homes and offices across the country.” Many people have
redefined the term “computer” since that time. In 1984, John Gage of Sun
Microsystems created the slogan, “The net-work is the computer.” In 2008,
David Patterson of UC Berkeley said, “The data center is the compu-ter.
There are dramatic differences between developing software for millions to
use as a service versus distributing software to run on their PCs.” Recently,
Rajkumar Buyya of Melbourne University simply said: “The cloud is the
computer.”
2. High Performance Computing and High Throughput Computing
High-Performance Computing(HPC) is a computing technique to process
computational problems, and complex data and to perform scientific
simulations. HPC systems consist of considerably more number of
processors or computer nodes, high-speed interconnects, and specialized
libraries and tools. In order to use HPC systems effectively users must have
a proper knowledge of parallel computing and optimization techniques. HPC
is used in various fields such as engineering, finance, commercial
applications, weather forecasting, and automotive design.

Advantages of HPC
 Faster computation: HPC systems can process large and complex
calculations and processes faster end efficiently than the traditional
approach.
 Scalability: HPC systems are more scalable because they can handle
many processors and computing nodes according to their applications.
 Parallel processing: HPC systems make use of parallel processing in
order to divide the computations into smaller tasks so that they can be
processed simultaneously.
 Improved accuracy: HPC system performs and gives more accurate
simulations and calculations.
High Throughput Computing (HTC) is defined as a type of computing that
aims to run a large number of computational tasks using resources in
parallel. HTC systems consist of a distributed network of computers known
as computing clusters. These systems are used to schedule a large number
of jobs effectively. HTC majorly focuses on increasing the overall
throughput of the system by running many smaller size tasks parallelly.
HTC is commonly used in scientific research and engineering applications in
order to process large data sets or perform simulations that require
extensive computational power.

Advantages of HTC
 Flexibility: HTC is more flexible and can be used for many computing
tasks related to business analytics and scientific research.
 Cost-Effectiveness: HTC is more cost-effective as compared to the
solutions offered by High-Performance Computing(HTC) as it makes use
of hardware and software that is available and less expensive and
performs more tasks.
 Reliability: HTC systems are mostly designed to provide high reliability
and make sure that all tasks run efficiently even if any one of the
individual components fails.
 Resource Optimization: HTC also does proper resource allocation by
ensuring that all the resources that are available are efficiently used and
accordingly increases the value of computing resources that are
available.
Difference between HPC and HTC
Parameter HPC HTC

HPC stands for High-


HTC stands for High
Stands for Performance
Throughput Computing
Computing

HPC is defined as the


HTC is defined as a
type of computing that
type of computing that
makes use of multiple
parallelly executes a
Definition computer processors in
large number of simple
order to perform
and computationally
complex computations
independent tasks.
parallelly.

HPC consists of
HPC consists of
running a large
running large-scale,
number of tasks that
complex, and
are independent and
computationally
Workload small in size and does
intensive applications
not require a large
that need significant
amount of memory
resources and
and resources.
memory.

HTC is designed to
It is designed to increase the number of
provide maximum tasks that needs to be
Processing Power
performance and completed in a given
speed for large tasks. specific amount of
time.

For resource For the resource


management to management to
processes, HPC makes processes, HTC makes
Resource Management
use of job schedulers use of distributed
and resource management
managers. resources.

Fault Tolerance To reduce the risk of HTC systems do not


data loss and data affect any other
corruption HPC running processes due
Parameter HPC HTC

systems have a
to the failure of an
complex fault
individual task.
tolerance mechanism.

HTC systems scale


HPC scales up when
horizontally for simple
Scaling few users are running
tasks and require less
together.
computational speed.

HPC can be used in HTC can be used in


applications such as applications such as
Applications engineering design, bioinformatics,
weather forecasting, research applications,
drug discovery etc. etc.

Conclusion
In conclusion, High-Performance Computing(HPC) and High Throughput
Computing(HTC) are two different types of computing systems in order to
handle multiple processes simultaneously. Both HPC and HTC have their
own advantages and limitations accordingly. In some of the applications,
both HPC and HTC can be used in combination in order to achieve their own
specific advantages and improve the functioning of the systems.
3. Parallel Computing
Before taking a toll on Parallel Computing, first, let’s take a look at the
background of computations of computer software and why it failed for the
modern era.
Computer software was written conventionally for serial computing. This
meant that to solve a problem, an algorithm divides the problem into smaller
instructions. These discrete instructions are then executed on the Central
Processing Unit of a computer one by one. Only after one instruction is
finished, next one starts.
A real-life example of this would be people standing in a queue waiting for a
movie ticket and there is only a cashier. The cashier is giving tickets one by
one to the persons. The complexity of this situation increases when there are
2 queues and only one cashier.
So, in short, Serial Computing is following:
In this, a problem statement is broken into discrete instructions.
Then the instructions are executed one by one.
Only one instruction is executed at any moment of time.
This was causing a huge problem in the computing industry as only one
instruction was getting executed at any moment of time. This was a huge
waste of hardware resources as only one part of the hardware will be running
for particular instruction and of time. As problem statements were getting
heavier and bulkier, so does the amount of time in execution of those
statements. Examples of processors are Pentium 3 and Pentium 4.
Now let’s come back to our real-life problem. We could definitely say that
complexity will decrease when there are 2 queues and 2 cashiers giving
tickets to 2 persons simultaneously. This is an example of Parallel
Computing.
Parallel Computing:
It is the use of multiple processing elements simultaneously for solving any
problem. Problems are broken down into instructions and are solved
concurrently as each resource that has been applied to work is working at
the same time.
Advantages of Parallel Computing over Serial Computing are as follows:
1. It saves time and money as many resources working together will reduce
the time and cut potential costs.
2. It can be impractical to solve larger problems on Serial Computing.
3. It can take advantage of non-local resources when the local resources are
finite.
4. Serial Computing ‘wastes’ the potential computing power, thus Parallel
Computing makes better work of the hardware.
Types of Parallelism:

1. Bit-level parallelism –
It is the form of parallel computing which is based on the increasing
processor’s size. It reduces the number of instructions that the system
must execute in order to perform a task on large-sized data.
Example: Consider a scenario where an 8-bit processor must compute the
sum of two 16-bit integers. It must first sum up the 8 lower-order bits,
then add the 8 higher-order bits, thus requiring two instructions to
perform the operation. A 16-bit processor can perform the operation with
just one instruction.

2. Instruction-level parallelism –
A processor can only address less than one instruction for each clock cycle
phase. These instructions can be re-ordered and grouped which are later
on executed concurrently without affecting the result of the program. This
is called instruction-level parallelism.

3. Task Parallelism –
Task parallelism employs the decomposition of a task into subtasks and
then allocating each of the subtasks for execution. The processors
perform the execution of sub-tasks concurrently.
4. Data-level parallelism (DLP) –
Instructions from a single stream operate concurrently on several data –
Limited by non-regular data manipulation patterns and by memory
bandwidth

Why parallel computing?


 The whole real-world runs in dynamic nature i.e. many things happen at a
certain time but at different places concurrently. This data is extensively
huge to manage.
 Real-world data needs more dynamic simulation and modeling, and for
achieving the same, parallel computing is the key.
 Parallel computing provides concurrency and saves time and money.
 Complex, large datasets, and their management can be organized only
and only using parallel computing’s approach.
 Ensures the effective utilization of the resources. The hardware is
guaranteed to be used effectively whereas in serial computation only
some part of the hardware was used and the rest rendered idle.
 Also, it is impractical to implement real-time systems using serial
computing.
Applications of Parallel Computing:
 Databases and Data mining.
 Real-time simulation of systems.
 Science and Engineering.
 Advanced graphics, augmented reality, and virtual reality.

Limitations of Parallel Computing:


 It addresses such as communication and synchronization between
multiple sub-tasks and processes which is difficult to achieve.
 The algorithms must be managed in such a way that they can be handled
in a parallel mechanism.
 The algorithms or programs must have low coupling and high cohesion.
But it’s difficult to create such programs.
 More technically skilled and expert programmers can code a parallelism-
based program well.

Future of Parallel Computing: The computational graph has undergone a


great transition from serial computing to parallel computing. Tech giant such
as Intel has already taken a step towards parallel computing by employing
multicore processors. Parallel computation will revolutionize the way
computers work in the future, for the better good. With all the world
connecting to each other even more than before, Parallel Computing does a
better role in helping us stay that way. With faster networks, distributed
systems, and multi-processor computers, it becomes even more necessary.
Distributed Computing
Distributed computing refers to a system where processing and data
storage is distributed across multiple devices or systems, rather than being
handled by a single central device. In a distributed system, each device or
system has its own processing capabilities and may also store and manage
its own data. These devices or systems work together to perform tasks and
share resources, with no single device serving as the central hub.
One example of a distributed computing system is a cloud computing
system, where resources such as computing power, storage, and
networking are delivered over the Internet and accessed on demand. In this
type of system, users can access and use shared resources through a web
browser or other client software.
Components
There are several key components of a Distributed Computing System
 Devices or Systems: The devices or systems in a distributed system have
their own processing capabilities and may also store and manage their
own data.
 Network: The network connects the devices or systems in the distributed
system, allowing them to communicate and exchange data.
 Resource Management: Distributed systems often have some type of
resource management system in place to allocate and manage shared
resources such as computing power, storage, and networking.
The architecture of a Distributed Computing System is typically a Peer-to-
Peer Architecture, where devices or systems can act as both clients and
servers and communicate directly with each other.
Characteristics
There are several characteristics that define a Distributed Computing
System
 Multiple Devices or Systems: Processing and data storage is distributed
across multiple devices or systems.
 Peer-to-Peer Architecture: Devices or systems in a distributed system
can act as both clients and servers, as they can both request and provide
services to other devices or systems in the network.
 Shared Resources: Resources such as computing power, storage, and
networking are shared among the devices or systems in the network.
 Horizontal Scaling: Scaling a distributed computing system typically
involves adding more devices or systems to the network to increase
processing and storage capacity. This can be done through hardware
upgrades or by adding additional devices or systems to the network..
Advantages of the Distributed Computing System are:
 Scalability: Distributed systems are generally more scalable than
centralized systems, as they can easily add new devices or systems to
the network to increase processing and storage capacity.
 Reliability: Distributed systems are often more reliable than centralized
systems, as they can continue to operate even if one device or system
fails.
 Flexibility: Distributed systems are generally more flexible than
centralized systems, as they can be configured and reconfigured more
easily to meet changing computing needs.
There are a few limitations to Distributed Computing System
 Complexity: Distributed systems can be more complex than centralized
systems, as they involve multiple devices or systems that need to be
coordinated and managed.
 Security: It can be more challenging to secure a distributed system, as
security measures must be implemented on each device or system to
ensure the security of the entire system.
 Performance: Distributed systems may not offer the same level of
performance as centralized systems, as processing and data storage is
distributed across multiple devices or systems.

Applications of Distributed Computing Systems have a number of


applications, including:
 Cloud Computing: Cloud Computing systems are a type of distributed
computing system that are used to deliver resources such as computing
power, storage, and networking over the Internet.
 Peer-to-Peer Networks: Peer-to-Peer Networks are a type of distributed
computing system that is used to share resources such as files and
computing power among users.
 Distributed Architectures: Many modern computing systems, such as
microservices architectures, use distributed architectures to distribute
processing and data storage across multiple devices or systems.
Grid Computing
Grid computing is a distributed architecture that combines computer
resources from different locations to achieve a common goal. It breaks down
tasks into smaller subtasks, allowing concurrent processing. In this article,
we are going to discuss grid computing.
What is Grid Computing?
Grid Computing can be defined as a network of computers working together
to perform a task that would rather be difficult for a single machine. All
machines on that network work under the same protocol to act as a virtual
supercomputer. The tasks that they work on may include analyzing huge
datasets or simulating situations that require high computing power.
Computers on the network contribute resources like processing power and
storage capacity to the network.
Grid Computing is a subset of distributed computing, where a virtual
supercomputer comprises machines on a network connected by some bus,
mostly Ethernet or sometimes the Internet. It can also be seen as a form of
Parallel Computing where instead of many CPU cores on a single machine, it
contains multiple cores spread across various locations. The concept of grid
computing isn’t new, but it is not yet perfected as there are no standard
rules and protocols established and accepted by people.
Why is Grid Computing Important?
• Scalability: It allows organizations to scale their computational
resources dynamically. As workloads increase, additional machines can be
added to the grid, ensuring efficient processing.
• Resource Utilization: By pooling resources from multiple computers,
grid computing maximizes resource utilization. Idle or underutilized
machines contribute to tasks, reducing wastage.
• Complex Problem Solving: Grids handle large-scale problems that
require significant computational power. Examples include climate modeling,
drug discovery, and genome analysis.
• Collaboration: Grids facilitate collaboration across geographical
boundaries. Researchers, scientists, and engineers can work together on
shared projects.
• Cost Savings: Organizations can reuse existing hardware, saving costs
while accessing excess computational resources. Additionally, cloud
resources can be cost-effectively.
Working of Grid Computing
A Grid computing network mainly consists of these three types of machines
• Control Node: A computer, usually a server or a group of servers which
administrates the whole network and keeps the account of the resources in
the network pool.
• Provider: The computer contributes its resources to the network
resource pool.
• User: The computer that uses the resources on the network.
When a computer makes a request for resources to the control node, the
control node gives the user access to the resources available on the network.
When it is not in use it should ideally contribute its resources to the network.
Hence a normal computer on the node can swing in between being a user or
a provider based on its needs. The nodes may consist of machines with
similar platforms using the same OS called homogeneous networks, else
machines with different platforms running on various different OSs called
heterogeneous networks. This is the distinguishing part of grid computing
from other distributed computing architectures. For controlling the network
and its resources a software/networking protocol is used generally known as
Middleware. This is responsible for administrating the network and the
control nodes are merely its executors. As a grid computing system should
use only unused resources of a computer, it is the job of the control node
that any provider is not overloaded with tasks.
The meaning of the term Grid Computing has changed over the years,
according to “The Grid: Blueprint for a new computing infrastructure” by Ian
Foster and Carl Kesselman published in 1999, the idea was to consume
computing power like electricity is consumed from a power grid. This idea is
similar to the current concept of cloud computing, whereas now grid
computing is viewed as a distributed collaborative network. Currently, grid
computing is being used in various institutions to solve a lot of
mathematical, analytical, and physics problems.

What are the Types of Grid Computing?


• Computational grid: A computational grid is a collection of high-
performance processors. It enables researchers to utilize the combined
computing capacity of the machines. Researchers employ computational grid
computing to complete resource-intensive activities like mathematical
calculations.
• Scavenging grid: Similar to computational grids, CPU scavenging grids
have a large number of conventional computers. Scavenging refers to the
process of searching for available computing resources in a network of
normal computers.
• Data grid: A data grid is a grid computing network that connects
multiple computers together to enable huge amounts of data storage. You
can access the stored data as if it were on your local system, without
worrying about where it is physically located on the grid.
Use Cases of Grid Computing
• Genomic Research
• Drug Discovery
• Cancer Research
• Weather Forecasting
• Risk Analysis
• Computer-Aided Design (CAD)
• Animation and Visual Effects
• Collaborative Projects
Advantages of Grid Computing
• Grid Computing provide high resources utilization.
• Grid Computing allow parallel processing of task.
• Grid Computing is designed to be scalable.
Disadvantages of Grid Computing
• The software of the grid is still in the evolution stage.
• Grid computing introduce Complexity.
• Limited Flexibility
• Security Risks
What is Distributed Computing?
Distributed computing refers to a system where processing and data storage
is distributed across multiple devices or systems, rather than being handled
by a single central device. In a distributed system, each device or system
has its own processing capabilities and may also store and manage its own
data. These devices or systems work together to perform tasks and share
resources, with no single device serving as the central hub. One example of
a distributed computing system is a cloud computing system, where
resources such as computing power, storage, and networking are delivered
over the Internet and accessed on demand. In this type of system, users can
access and use shared resources through a web browser or other client
software.
What is Cluster Computing?
Cluster computing is a collection of tightly or loosely connected computers
that work together so that they act as a single entity. The connected
computers execute operations all together thus creating the idea of a single
system. The clusters are generally connected through fast local area
networks (LANs).

You might also like