Unit-1 Part-1
Unit-1 Part-1
PART-1 Systems Modeling, Clustering and virtualization: Scalable Computing over the Internet,
Technologies for Network based systems, System models for Distributed and Cloud Computing, Software
environments for distributed systems and clouds, Performance, Security And Energy Efficiency.
Billions of people use the Internet every day. As a result, supercomputer sites and
large data centers must provide high-performance computing (HPC) services to huge
numbers of Internet users concurrently, but its no longer optimal for measuring
system performance.
The emergence of computing clouds instead demands high throughput computing
(HTC) systems built with parallel and distributed computing technologies. With
upgrade of data centers using fast servers, storage systems, and high bandwidth
networks for emerging new technologies
Computer technology has gone through five generations of development, with each
generation lasting from 10 to 20 years. Successive generations are overlapped in
about 10 years, were built to satisfy the demands of large businesses and government
organizations. Since 1990, the use of both HPC and HTC systems hidden in.
High-Performance Computing:
For many years, HPC systems emphasize the raw speed performance, which has
been increased to Pflops in 2010. This improvement was driven mainly by the
demands from scientific, engineering, and manufacturing communities.
For example, the Top 500 most powerful computer systems in the world are measured
by floating-point speed in Linpack benchmark results. However, the number of
supercomputer users is limited to less than 10% of all computer users.
Today, the majority of computer users are using desktop computers or large servers
when they conduct Internet searches and market-driven computing tasks.
High-Throughput Computing:
Degrees of Parallelism:
Bit-level parallelism (BLP): It converts bit-serial processing to word-level
processing gradually. Over the years, users graduated from 4-bit microprocessors
to 8,16,32, and 64-bitCPUs.
Instruction-level parallelism (ILP): The processor executes multiple instructions
simultaneously.
Ex: Pipelining, supercomputing, VLIW (very long instruction word), and
multithreading.
Data-level Parallelism (DLP): Here, instructions are given like arrays (single
instruction, multiple data SIMD). More hardware support is needed.
Task-level Parallelism (TLP): It is a process of execution where different
threads (functions) are distributed across multiple processors in parallel
computing environments
Job-level Parallelism (JLP): Job level parallelism is the highest level of
parallelism where we concentrate on a lab or computer center to execute as many
jobs as possible in any given time period
The IoT refers the networked interconnection of everyday objects, tools, devices or
computers. It can be seen as a wireless network of sensors that interconnect all things we use
in our daily life. RFID and GPS are also used here. The IoT demands universal addressability
of all the objects or things that may be steady or moving.
Cyber-Physical System (CPS) is the result of interaction between computational
processes and the physical world. A CPS integrates “cyber” (heterogeneous, asynchronous)
with “physical” (concurrent and information-dense) objects. A CPS merges the “3C”
technologies of computation, communication, and control into an intelligent closed feedback
system between the physical world and the information world, a concept which is actively
explored in the United States
1.2 TECHNOLOGIES FOR NETWORK BASED SYSTEMS
With the concept of scalable computing under our belt, it’s time to explore hardware,
software, and network technologies for distributed computing system design and applications.
System-Area Interconnects:
The nodes in small clusters are interconnected by an Ethernet switch or a LAN. As
shown below,where a LAN is used to connect clients to servers. A Storage Area
Network (SAN) connects servers to network storage like disk arrays. Network Attached
Storage (NAS) connects clients directly to disk arrays. All these types of network
appear in a large cluster built with commercial network components (Cisco, Juniper). If
not much data is shared (overlapped), we can build a small cluster with an Ethernet
Switch + copper cables to link to the end machines (clients/servers).
As seen in above figure, the Host machine is equipped with a physical hardware. The
Virtual Machine is built with virtual resources managed by a guest OS to run a specific
application (Ex: VMware to run Ubuntu for Hadoop). Between the VMs and the host
platform we need a middleware called VM Monitor (VMM). A hypervisor (VMM) is a
program that allows different operating systems to share a single hardware host. This
approach is called bare-metal VM because a hypervisor handles CPU, memory and I/O
directly. VM can also be implemented with a dual mode as shown above. Here, part of VMM
runs under user level and another part runs under supervisor level.
VM Primitive operations: A VMM operation provides VM abstraction to the guest OS. The
VMM can also export an abstraction at full virtualization so that a standard OS can run it as it
would on physical hardware. Low level VMM operations are indicated by Mendel
Rosenblum.
Low Cost Design Philosophy: High-end switches or routers that provide high
bandwidth networks cost more and do not match the financial design of cloud
computing. For a fixed budget, typical switches and networks are more desirable.
Web 2.0 is the second stage of the development of the Internet, where static pages
transformed into dynamic and the growth of social media.
Data is increasing by leaps and bounds every day, coming from sensors, simulations, web
services, mobile services and so on. Storage, acquisition and access of this huge amount of
data sets requires standard tools that support high performance, scalable file systems, DBs,
algorithms and visualization. With science becoming data-centric, storage and analysis of the
data plays a huge role in the appropriate usage of the data-intensive technologies.
1.3 SYSTEM MODELS FOR DISTRIBUTED AND CLOUD COMPUTING
Distributed and Cloud Computing systems are built over a large number of independent
computer nodes, which are interconnected by SAN, LAN or WAN. Few LAN switches can
easily connect hundreds of machines as a working cluster.
1.3.1 Clusters of Cooperative Computers
A computing cluster consists of inter-connected standalone computers which work jointly as
a single integrated computing resource. Particularly, this approach yields good results in
handling heavy workloads with large datasets.
Cluster Architecture: The Figure below shows the architecture of a typical server cluster
that has low latency and high bandwidth network. For building a large cluster, an
interconnection network can be utilized using Gigabit Ethernet, Myrinet or Infini Brand
switches.
Through a hierarchical construction using SAN, LAN or WAN, scalable clusters can be built
with increasing number of nodes. The concerned cluster is connected to the Internet through a
VPN (Virtual Private Network) gateway, which has an IP address to locate the cluster.
Generally, most clusters have loosely connected nodes, which are autonomous with their own
OS.
Major Cluster Design Issues: A cluster-wide OSs or a single OS controlling the cluster
virtually is not yet available. This makes the designing and achievement of SSI difficult and
expensive. All the apps should rely upon the middleware to bring out the coupling between
the machines in cluster or between the clusters
1.3.2 Grid Computing Infrastructures
Grid computing is designed to allow close interaction among applications running on distant
computers simultaneously.
Computational Grids: Provides an infrastructure that couples computers,
software/hardware, sensors and others together. The grid can be constructed across LAN,
WAN and other networks on a regional, national or global scale. They are also termed as
virtual platforms. Computers, workstations, servers and clusters are used in a grid. Note that
PCs, laptops and others can be viewed as access devices to a grid system. Figure below
shows an example grid built by different organisations over multiple systems of different
types, with different operating systems.
Grid Families: Grid technology demands new distributed computing models,
software/middleware support, network protocols, and hardware infrastructures. National grid
projects are followed by industrial grid platforms by IBM, Microsoft, HP, Dell-EMC, Cisco,
and Oracle.
1.3.3 Peer-to-Peer Network Families
P2P Systems: In a P2P system, every node acts as both a client and a server, providing part
of the system resources. Peer machines are simply client computers connected to the Internet.
All client machines act autonomously to join or leave the system freely. This implies that no
master-slave relationship exists among the peers. No central coordination or central database
is needed. In other words, no peer machine has a global view of the entire P2P system. The
system is self-organizing with distributed control.
P2P Application Families: There exist 4 types of P2P networks: distributed file sharing,
collaborative platform, distributed P2P computing and others. Ex: BitTorrent, Napster,
Skype, Geonome, JXTA, .NET etc.
A cloud allows workloads to be deployed and scaled out through rapid provisioning of
physical or virtual systems. The cloud supports redundant, self-recovering, and highly
scalable programming models that allow workloads to recover from software or hardware
failures. The cloud system also monitors the resource use in such a way that allocations can
be rebalanced when required.
In grids/web services, Java, and CORBA, an entity is, respectively, a service, a Java object,
and a CORBA distributed object in a variety of languages. These architectures build on the
traditional seven Open Systems Interconnection (OSI) layers that provide the base
networking abstractions. On top of this we have a base software environment, which would
be .NET or Apache Axis for web services, the Java Virtual Machine for Java, and a broker
network for CORBA. On top of this base environment one would build a higher level
environment reflecting the special features of the distributed computing environment.
The Evolution of SOA: The service-oriented architecture (SOA) has evolved over
the years. SOA applies to building grids, clouds, grids of clouds, clouds of grids,
clouds of clouds (also known as interclouds), and systems of systems in general. A
large number of sensors provide data-collection services, denoted in the figure as SS
(sensor service).
A sensor can be a ZigBee device, a Bluetooth device, a WiFi access point, a personal
computer, a GPA, or a wireless phone, among other things. Raw data is collected by sensor
services. All the SS devices interact with large or small computers, many forms of grids,
databases, the compute cloud, the storage cloud, the filter cloud, the discovery cloud, and so
on. Filter services (fs in the figure) are used to eliminate unwanted raw data, in order to
respond to specific requests from the web, the grid, or web services.
A distributed system inherently has multiple system images. This is mainly due to the fact
that all node machines run with an independent operating system. To promote resource
sharing and fast communication among node machines, it is best to have a distributed OS that
manages all resources coherently and efficiently. Such a system is most likely to be a closed
system, and it will likely rely on message passing and RPCs for internode communications. It
should be pointed out that a distributed OS is crucial for upgrading the performance,
efficiency, and flexibility of distributed applications.
In this section, we will explore programming models for distributed computing with expected
scalable performance and application flexibility.
Hadoop Library: Hadoop enables users to write and run apps over vast amounts of
distributed data. Users can easily scale Hadoop to store and process Petabytes of data
in the web space. The package is economical (open source), efficient (high level of
parallelism) and is reliable (keeps multiple data copies).
Performance metrics are needed to measure various distributed systems. In this section, we
will discuss various dimensions of scalability and performance laws. Then we will examine
system scalability against OS images and the limiting factors encountered.
Amdahl’s Law:
Consider the execution of a given program on a uniprocessor workstation with a total
execution time of T minutes. Say the program is running in parallel with other servers on a
cluster of many processing nodes. Assume that a fraction α of the code must be executed
sequentially (sequential bottleneck). Hence, (1-α) of the code can be compiled for parallel
execution by n processors. The total execution time of the program is calculated by αT + (1-
α) T/n where the first term is for sequential execution time on a single processor and the
second term is for parallel execution time on n parallel nodes.
Speedup S= T/[αT + (1-α) T/n] = 1/[ α + (1-α)/n]
The maximum speedup of n can be obtained only if α is reduced to zero or the code can be
parallelized with α = 0.
1.5.2 Fault Tolerance and System Availability
In addition to performance, system availability and application flexibility are two other
important design goals in a distributed computing system.
System Availability: High availability (HA) is needed in all clusters, grids, P2P networks
and cloud systems. A system is highly available if it has a long mean time to failure (MTTF)
and a short mean time to repair (MTTR).
System Availability = MTTF/(MTTF + MTTR)
System availability depends on many factors like hardware, software and network
components. Any failure that will lead to the failure of the total system is known as a single
point of failure. It is the general goal of any manufacturer or user to bring out a system with
no single point of failure. For achieving this goal, the factors that need to be considered are:
adding hardware redundancy, increasing component reliability and designing testability.
Threats to networks and systems: The Figure below presents a summary of various
attack types and the damaged caused by them to the users. Information leaks lead to a
loss of confidentiality. Loss of data integrity can be caused by user alteration, Trojan
horses, service spoofing attacks, and Denial of Service (DoS) – this leads of loss of
Internet connections and system operations. Users need to protect clusters, grids, clouds
and P2P systems from malicious intrusions that may destroy hosts, network and storage
resources.
Primary performance goals in conventional parallel and distributed computing systems are
high performance and high throughput, considering some form of performance reliability
(e.g., fault tolerance and security
To run a server farm (data center) a company has to spend a huge amount of money for
hardware, software, operational support, and energy every year. Therefore, companies should
thoroughly identify whether their installed server farm (more specifically, the volume of
provisioned resources) is at an appropriate level, particularly in terms of utilization. It was
estimated in the past that, on average, one-sixth (15 percent) of the full-time servers in a
company are left powered on without being actively used (i.e., they are idling) on a daily
basis. This indicates that with 44 million servers in the world, around 4.7 million servers are
not doing any useful work.