0% found this document useful (0 votes)

63 views

06 Distributed Computing

The document discusses distributed computing and compares it to other computing concepts like parallel computing, grid computing, cloud computing, and cluster computing. Distributed computing involves dividing processing tasks among multiple computers or nodes that communicate over a network to work together on problems and share workload. It allows for scalability, fault tolerance, and more efficient use of computing resources compared to centralized systems.

Uploaded by

Low Wai Leong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views

06 Distributed Computing

Uploaded by

Low Wai Leong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 19

WQD7007 Big Data Management

Distributed Computing

1
WQD7007 Big Data Management

Agenda
• Distributed computing compare to other computing
concepts
• Distributed computing in Hadoop

2
WQD7007 Big Data Management

Distributed Computing
• Distributed computing is a field of computer
science that studies distributed systems.

• A distributed system is a model in which components located on

networked computers, communicate and coordinate their actions
by passing messages. The components interact with each other in
order to achieve a common goal.

• Three characteristics:
• the computers operate concurrently
• fail independently
• do not share global clock

3
Source: Tanenbaum, Andrew S.; Steen, Maarten van (2002). Distributed systems: principles and paradigms. Upper Saddle River, NJ: Pearson Prentice Hall.
WQD7007 Big Data Management

Distributed Computing
• A model consists of multiple software components
that are run on multiple computers to improve
efficiency and performance.
• something that shared among multiple systems which may also be
in different locations to make such a network work as a single
computer.

• Two types of distributed systems:

• Computers are physically close together (connected by a local
network)
• Computers are geographically distant (connected by a wide area
network)
4
WQD7007 Big Data Management

Distributed Computing
• A distributed system can consist of many different
possible configurations, such as mainframes,
personal computers, workstations, minicomputers,
and so on.
• For example, in a 3-tier distributed model each tier will do different
task:
1. User interface processing (performed in the PC at the user's
location)
2. Business processing (done in a remote computer)
3. Database access and processing (performed in another
computer that provides centralized access)

5
WQD7007 Big Data Management

Distributed Computing VS
Parallel computing
• Parallel computing: There are many processing
steps and each processing step is completed at the
same specified time.
• For example: It is important to have steps work in parallel when
dealing with videos.
• Distributed Computing: There are many processing
steps but processing is divided into several
computers or nodes
• each computer/node works concurrently and then forwarding their
individual outputs of some process to another master or
responsible node that aggregates the outputs together.
• The sequence of such processing is not guaranteed.
• Example: Hadoop and MapReduce
6
• How to choose?
WQD7007 Big Data Management

Distributed Computing VS
Grid Computing
• Grid Computing: It is a kind of more secured and
location-specified distributed system such that in a
specific organization or firm.
• a service for sharing computer power and data storage capacity
over the Internet
• is more concerned to efficient utilization of a pool of
heterogeneous systems with optimal workload management -
utilizing an enterprise's entire computational resources (servers,
networks, storage, and information), acting together to create one
or more large pools of computing resources.
• Distributed Computing normally refers to managing
or pooling the hundreds or thousands of computer
systems which individually are more limited in
their memory and processing power. 7
Source: http://www.jatit.org/distributed-computing/grid-vs-distributed.htm
WQD7007 Big Data Management

Distributed Computing VS
Cloud Computing
• Cloud Computing: Creating a distributed system/
architecture at a remote location or over a virtual
facility.
• Example: Amazon Cloud Services
• massively scalable and flexible IT-related capabilities are delivered
as a service to the users using Internet technologies
• services may include: infrastructure, platform, applications, and
storage space.
• The users pay for these services, resources they actually use. They
do not need to build infrastructure of their own.
• minimal flexibility: The application and services run on a remote
server. Due to this, enterprises using cloud computing have minimal
control over the functions of the software as well as hardware.

8
WQD7007 Big Data Management

Distributed Computing VS
Cluster Computing
• Cluster Computing - A cluster is a system
comprising two or more computers or systems
(called nodes) which work together to execute
applications or perform other tasks, so that users
who use them, have the impression that only a
single system responds to them, thus creating an
illusion of a single resource (virtual machine).
• Types of cluster: High Availability (HA) and fail-over clusters
• These models are built to provide an availability of services and
resources in an uninterrupted manner through the use of
implicit redundancy to the system.
• The general idea is that if a cluster node fail (fail-over), applications
or services may be available in another node
9
WQD7007 Big Data Management

Summary
• Grid Computing
• Loosely coupled (Decentralization)
• Diversity and Dynamism
• Distributed Job Management & scheduling

• Cloud computing
• Dynamic computing infrastructure
• IT service-centric approach
• Self service based usage model
• Minimally or self managed platform

10
Source: https://www.quora.com/What-is-the-difference-between-grid-cloud-cluster-and-distributed-computing
WQD7007 Big Data Management

Summary
• Cluster computing
• Tightly coupled systems
• Single system image
• Centralized Job management & scheduling system

• Distributed Computing
• Is to solve a single large problem by breaking it down into several
tasks where each task is computed in the individual computers of
the distributed system.

11
Source: https://www.quora.com/What-is-the-difference-between-grid-cloud-cluster-and-distributed-computing
WQD7007 Big Data Management

Advantages
• Distributed systems are inherently scalable
because they work across a variety of different
machines.
• The system can easily be expanded by adding more machines as
needed.
• They can adjust how many system resources it is
making use of in light of what kind of demand the
system is under.
• If a system is under high demand, then it can have every machine
running to capacity.
• However, if the load on the system is relatively low, it can take
different components of the distributed system offline to save
power and wear on the system.
• When demand on the system goes up again, these components can
12
come back online.
WQD7007 Big Data Management

Advantages
• Redundancy:
• several machines can provide the same services, so if one is
unavailable, work does not stop.
• fault tolerant and able to sustain computer failures without
crashing since it makes the whole network work as a single
computer.
• distributed computing encourages parallel
processing of jobs which can offer enormous
performance gains.
• For example, a network of processors can execute graphics
computations much faster than a single highly-clocked core.
• Better pricing versus performance ratio
• adding microprocessors in distributed computing systems are more
cost effective compared to adding mainframes in centralized
13
computer.
WQD7007 Big Data Management

Drawbacks
• Analysis as a whole:
• When the project is divided into each personal computing
calculating power, it cannot solve some big problem which require
to analyse in a whole.
• Trusted source:
• With so many computers involving, we need to make sure they are
from trusted source, and not malpractice participating in it
• Data synchronization:
• a problem in distributed systems because different distributed
system components could handle different tasks and data.
• At any given point in time, there will be small periods of time in
which data exists on one component, but not on others.
• Difficulties in troubleshooting and diagnosing
14
problems
WQD7007 Big Data Management

Distributed Computing in Hadoop

• Tools that are using distributed computing
concepts:

• HDFS  distributed file system

• MapReduce  distributed computation
• These include all other tools that uses MapReduce as backbone
• ZooKeeper  distributed coordination

15
Source: https://www.slideshare.net/KonstantinVShvachko/distributed-computing-with-apache-hadoop-technology-overview
WQD7007 Big Data Management

ZooKeeper
• A distributed coordination service for distributed
apps
• Event coordination and notification
• Leader election
• Distributed locking

• ZooKeeper can help in building High Availability

(HA) system
• Installing Zookeeper:
• https://medium.com/@ryannel/installing-zookeeper-on-ubuntu-9f1
f70f22e25

16
Source: https://www.slideshare.net/KonstantinVShvachko/distributed-computing-with-apache-hadoop-technology-overview
WQD7007 Big Data Management

Examples:
• SETI:
• There are massive amounts of data are collected around the world
from the stars, searching for any intelligent life, recorded via many
observatories.
• Search for Extraterrestrial Intelligence (SETI) takes these massively
large information stores and slices them up into smaller pieces of
data for easy analysis via distributed computing applications
running as screen savers on individual user PC’s, all around the
world.
• Tens of thousands of PC’s running the SETI screen saver will
download a small file, and while a PC is unused, it’s screen saver
downloads a data slice from SETI, runs the analysis application
while the PC is idle, and when the analysis is complete, the
analyzed data slice is uploaded back to SETI.

17
Source: https://setiathome.berkeley.edu/
WQD7007 Big Data Management

Examples:
• World Community Grid - IBM

• By using the idle time of computers around the world, this

research project can analyse human genome, HIV, etc.
• When our computer connected to internet, the users install WCG
client software onto computers. This will work in background when
your computer is idle, sing spare system resources.
• Major findings in 2014: they manage to discover 7 compounds that
destroy a particular nerve type cancer cells without side effects

18
WQD7007 Big Data Management

Examples:
• “Folding at home” project is a real example of
distributed computing where you install their
application and when your computer goes to
screen saver they use your computer processing
resource.
• “While you keep going with your everyday
activities, your computer will be working to help us
find cures for diseases like cancer, ALS, Parkinson’s,
Huntington’s, Influenza and many others.”

19
Source: https://foldingathome.org/start-folding/ /

Free Access to Test Bank for Introduction to Information Systems 5th Edition R Kelly Rainer Download Chapter Answers
100% (9)
Free Access to Test Bank for Introduction to Information Systems 5th Edition R Kelly Rainer Download Chapter Answers
45 pages
CSI106
100% (1)
CSI106
11 pages
Information Security Material For Exit Exam (IT)
No ratings yet
Information Security Material For Exit Exam (IT)
35 pages
CS3042/CS3272 - Database Systems Laboratory Exercise 1 Entity Relationship Diagram
No ratings yet
CS3042/CS3272 - Database Systems Laboratory Exercise 1 Entity Relationship Diagram
8 pages
COCS71188 - AWS - Module Handbook SL
No ratings yet
COCS71188 - AWS - Module Handbook SL
6 pages
Distributed System
No ratings yet
Distributed System
65 pages
Distributed Systems: Chapter 1 - Introduction
100% (2)
Distributed Systems: Chapter 1 - Introduction
74 pages
Distributed Systems
No ratings yet
Distributed Systems
1 page
The East African University (Teau) : School of Computer Science and It
No ratings yet
The East African University (Teau) : School of Computer Science and It
2 pages
Distributed Systems Assignment 2
100% (1)
Distributed Systems Assignment 2
3 pages
Advantages & Disadvantages of DBMS
100% (1)
Advantages & Disadvantages of DBMS
12 pages
Junior Cybersecurity Analyst Career Path Exam
No ratings yet
Junior Cybersecurity Analyst Career Path Exam
29 pages
Pipelining in MIPs Architecture
100% (3)
Pipelining in MIPs Architecture
23 pages
Lecture 1 Introduction To The Internet and Web
No ratings yet
Lecture 1 Introduction To The Internet and Web
8 pages
File System Vs Database Approach
No ratings yet
File System Vs Database Approach
18 pages
Telecommunication Networks:: Network and Distributed Processing What Is Distributed System?
No ratings yet
Telecommunication Networks:: Network and Distributed Processing What Is Distributed System?
5 pages
CH-14 Database Transactions PDF
No ratings yet
CH-14 Database Transactions PDF
4 pages
VIM - Text Editor: Linux and Unix Vim Command
No ratings yet
VIM - Text Editor: Linux and Unix Vim Command
103 pages
Database Management System: Dr. Neha Gulati University Business School Panjab University
100% (1)
Database Management System: Dr. Neha Gulati University Business School Panjab University
30 pages
CNS PDF
No ratings yet
CNS PDF
213 pages
Compiled by Mark E.S. Bernard, ISO 27001 Lead Auditor, CISSP, CISM, SABSA-F2, CISA, CRISC, CGEIT
No ratings yet
Compiled by Mark E.S. Bernard, ISO 27001 Lead Auditor, CISSP, CISM, SABSA-F2, CISA, CRISC, CGEIT
52 pages
Lecture 2 Security Threats
No ratings yet
Lecture 2 Security Threats
56 pages
Bit4209 Distributed Systems Module
No ratings yet
Bit4209 Distributed Systems Module
117 pages
Understanding UNIX
No ratings yet
Understanding UNIX
3 pages
Lab 06
No ratings yet
Lab 06
13 pages
COSC 2810 Chapter 3 Homework (21 Jul 12)
No ratings yet
COSC 2810 Chapter 3 Homework (21 Jul 12)
3 pages
Sad 2
No ratings yet
Sad 2
7 pages
Fundamentals of Information Systems Fourth Edition
No ratings yet
Fundamentals of Information Systems Fourth Edition
70 pages
Distributed Systems:: Principles and Paradigms
No ratings yet
Distributed Systems:: Principles and Paradigms
31 pages
How Domain Name Servers Work
No ratings yet
How Domain Name Servers Work
58 pages
What Is Cloud Computing and Its Services
No ratings yet
What Is Cloud Computing and Its Services
2 pages
System Design Karanpratapsingh
No ratings yet
System Design Karanpratapsingh
191 pages
Job Description of Network Engineer
No ratings yet
Job Description of Network Engineer
4 pages
Single-Cycle MIPS Processor
No ratings yet
Single-Cycle MIPS Processor
15 pages
Debremarkos University School of Computing Information Security (Chapter One)
No ratings yet
Debremarkos University School of Computing Information Security (Chapter One)
20 pages
Pssi
100% (2)
Pssi
46 pages
Deadlock in Distributed Enviornment
0% (1)
Deadlock in Distributed Enviornment
31 pages
Chapter 1 Introduction and Security Trends
100% (3)
Chapter 1 Introduction and Security Trends
52 pages
IS - Chapter 1, Overview of Information System
No ratings yet
IS - Chapter 1, Overview of Information System
34 pages
Network Services, Virtualization, and Cloud Computing
No ratings yet
Network Services, Virtualization, and Cloud Computing
45 pages
Chapter 4 Application and OS Security
No ratings yet
Chapter 4 Application and OS Security
49 pages
Review Questions: ISYS104 Tutorial - Week 3
No ratings yet
Review Questions: ISYS104 Tutorial - Week 3
8 pages
Prepared By: CS8661 Internet Programming Lab Manual
No ratings yet
Prepared By: CS8661 Internet Programming Lab Manual
89 pages
MCQ
No ratings yet
MCQ
5 pages
Information Systems
No ratings yet
Information Systems
7 pages
Essential System Administration 3rd Edition Æleen Frisch 2024 scribd download
100% (6)
Essential System Administration 3rd Edition Æleen Frisch 2024 scribd download
50 pages
Unix Linux Introduction
No ratings yet
Unix Linux Introduction
124 pages
Exit Exam
No ratings yet
Exit Exam
100 pages
Importance of Cyber Security IJERTCONV8IS05036
No ratings yet
Importance of Cyber Security IJERTCONV8IS05036
3 pages
INFO 101 Reviewing Quiz 3 (Lectures 7-8-9) : Student Name: - Student ID
No ratings yet
INFO 101 Reviewing Quiz 3 (Lectures 7-8-9) : Student Name: - Student ID
3 pages
Lecture 1 - PPT - CNS
No ratings yet
Lecture 1 - PPT - CNS
16 pages
CCNA Exp2 - Chapter01 - Introduction To Routing and Packet Forwarding
No ratings yet
CCNA Exp2 - Chapter01 - Introduction To Routing and Packet Forwarding
60 pages
Cloud Computing
No ratings yet
Cloud Computing
19 pages
HRCOS82 101 2021 0 B
No ratings yet
HRCOS82 101 2021 0 B
8 pages
Networking Mid-Semester Exam - 2004-5 Semester 1
100% (1)
Networking Mid-Semester Exam - 2004-5 Semester 1
4 pages
26 Ways to Save on Your Utility Bills!: 26 Ways, #1
From Everand
26 Ways to Save on Your Utility Bills!: 26 Ways, #1
Kimberly Peters
No ratings yet
BDA Answer Bank
No ratings yet
BDA Answer Bank
24 pages
Parallel_and_distributed
No ratings yet
Parallel_and_distributed
2 pages
BDA MQP 1
No ratings yet
BDA MQP 1
29 pages
21CS71-SOLUTIONS
No ratings yet
21CS71-SOLUTIONS
24 pages
ETW2510 Lecture 8 Heteroskedasticity
No ratings yet
ETW2510 Lecture 8 Heteroskedasticity
29 pages
Science Homework 22 Jan 2021
No ratings yet
Science Homework 22 Jan 2021
2 pages
Nutrient Use in The Body Good Sources: Summary of Nutrition
No ratings yet
Nutrient Use in The Body Good Sources: Summary of Nutrition
2 pages
Mandarin Notes 18 Sept 2021
No ratings yet
Mandarin Notes 18 Sept 2021
2 pages
Lecture Slides (Alternative Pay Schemes)
No ratings yet
Lecture Slides (Alternative Pay Schemes)
15 pages
Lecture 1a (Week 1) : An Introduction To The Foundations of Finance
No ratings yet
Lecture 1a (Week 1) : An Introduction To The Foundations of Finance
20 pages
Test Your Skills - Year 8 & 9 Free Preparation Exam
No ratings yet
Test Your Skills - Year 8 & 9 Free Preparation Exam
15 pages
2016 Mar SJKC Yu Hua Standard 3 BC2 加影育华华文学校三年级华文作文
No ratings yet
2016 Mar SJKC Yu Hua Standard 3 BC2 加影育华华文学校三年级华文作文
4 pages
WQD7007 Big Data Management: Introduction To The Course
No ratings yet
WQD7007 Big Data Management: Introduction To The Course
6 pages
03 Big Data Concepts - Providing Stucture To Unstructured Data
No ratings yet
03 Big Data Concepts - Providing Stucture To Unstructured Data
26 pages
Reproduction (Multiple Choice) 2 QP
100% (1)
Reproduction (Multiple Choice) 2 QP
22 pages
Topic Modelling
No ratings yet
Topic Modelling
2 pages
Physio-Social Impact of Covid19
No ratings yet
Physio-Social Impact of Covid19
10 pages
Submission of Research Project For Examination - Low Wai Leong
No ratings yet
Submission of Research Project For Examination - Low Wai Leong
3 pages
Public Goods 1. Non-Excludable - No One Can Be Excluded 2.non-Rival - Your Utilization Will Not Affect The Utilization of Next User
No ratings yet
Public Goods 1. Non-Excludable - No One Can Be Excluded 2.non-Rival - Your Utilization Will Not Affect The Utilization of Next User
3 pages
Literature Review - Emotions and Topics
No ratings yet
Literature Review - Emotions and Topics
1 page
WQD7010 Network Security L1
No ratings yet
WQD7010 Network Security L1
56 pages
SQL Test
No ratings yet
SQL Test
1 page
Kerberos Attacks
No ratings yet
Kerberos Attacks
125 pages
AI Techniques For Stability Analysis and Control in Smart Grids
No ratings yet
AI Techniques For Stability Analysis and Control in Smart Grids
28 pages
Css - Repeat Table Headers in Print Mode - Stack Overflow
No ratings yet
Css - Repeat Table Headers in Print Mode - Stack Overflow
11 pages
UNIT-3: Chromatic Number
No ratings yet
UNIT-3: Chromatic Number
35 pages
Routing Policy
No ratings yet
Routing Policy
2,997 pages
Hometalk: A Smart Home Platform
No ratings yet
Hometalk: A Smart Home Platform
16 pages
An Intelligent Personal Assistant For Task and Time Management
No ratings yet
An Intelligent Personal Assistant For Task and Time Management
20 pages
Chapter 4
No ratings yet
Chapter 4
66 pages
A Hybrid Artificial Bee Colony Algorithmic Approach For Classification Using Neural Networks
No ratings yet
A Hybrid Artificial Bee Colony Algorithmic Approach For Classification Using Neural Networks
24 pages
Ericsson Intelligent Automation Platform Solution Description
No ratings yet
Ericsson Intelligent Automation Platform Solution Description
10 pages
Full Ppt-health Monitoring and Tracking System for Soldiers Using Iot-2
No ratings yet
Full Ppt-health Monitoring and Tracking System for Soldiers Using Iot-2
32 pages
Machine Learning Engineering in Action MEAP V04 Ben T Wilson 2024 Scribd Download
100% (4)
Machine Learning Engineering in Action MEAP V04 Ben T Wilson 2024 Scribd Download
65 pages
Nemo Backpack DS
No ratings yet
Nemo Backpack DS
5 pages
Lego Spike Python Booklet
No ratings yet
Lego Spike Python Booklet
13 pages
22 Paneles de Wally Wood
No ratings yet
22 Paneles de Wally Wood
1 page
Indefinite Integration 28-08-2020
No ratings yet
Indefinite Integration 28-08-2020
5 pages
Bitnami Openfire Virtual Machine
No ratings yet
Bitnami Openfire Virtual Machine
30 pages
Multi factor Authentication (MFA) Guide (1)
No ratings yet
Multi factor Authentication (MFA) Guide (1)
6 pages
Blockchain
No ratings yet
Blockchain
8 pages
Sap Hana On Vmware Vsphere 6.5 and 6.7 in Production
No ratings yet
Sap Hana On Vmware Vsphere 6.5 and 6.7 in Production
4 pages
SAP Workflow Management OPENSAP
No ratings yet
SAP Workflow Management OPENSAP
57 pages
Classification Error: Training Errors Generalization Errors
No ratings yet
Classification Error: Training Errors Generalization Errors
39 pages
NUX MG-30 Versatile Modeler User Manual: Manuals+
No ratings yet
NUX MG-30 Versatile Modeler User Manual: Manuals+
11 pages
Very Useful Mantra
No ratings yet
Very Useful Mantra
16 pages
Event Management System.
No ratings yet
Event Management System.
3 pages
Manual Relé Allen Bradley SI 440r-Um013 - En-P PDF
No ratings yet
Manual Relé Allen Bradley SI 440r-Um013 - En-P PDF
88 pages
Chap 4 Servlet
No ratings yet
Chap 4 Servlet
12 pages
DM780 Manual
No ratings yet
DM780 Manual
98 pages
Activity Diagram (Part 2)
No ratings yet
Activity Diagram (Part 2)
1 page

06 Distributed Computing

Uploaded by

06 Distributed Computing

Uploaded by

WQD7007 Big Data Management

• A distributed system is a model in which components located on

• Two types of distributed systems:

Distributed Computing in Hadoop

• HDFS  distributed file system

• ZooKeeper can help in building High Availability

• By using the idle time of computers around the world, this

You might also like