BigData ParallelComputing

The document discusses the challenges and methodologies of parallel computing, particularly in the context of Big Data. It highlights the advantages of parallel processing over linear processing, such as reduced processing times and flexibility, while also addressing issues like data scaling and fault tolerance. Key concepts include horizontal scaling, data locality, and the importance of managing complex computations across multiple nodes in a computing cluster.

Uploaded by

rajnath singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views9 pages

BigData ParallelComputing

Uploaded by

rajnath singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 9

PARALLEL

COMPUTING
UNIT-1
BIG DATA

Presented by:
Priyanka Rahi
SINGLE NODE CAPACITY
 In any normal analytics cycle, the functionality of the computer is to store
data and move that data from its storage capacity into a compute capacity
(which includes memory), and back to storage once important results are
computed.
 With Big Data, you have more data than will fit on a single computer.
PARALLELISM
Linear Processing Parallel Processing
 Linear processing is the traditional method of  The alternative to Linear processing is parallel
computing a problem where the problem statement processing. Here too, the problem statement is
is broken into a set of instructions that are executed broken down into a set of executable
sequentially till all instructions are completed instructions.
successfully.
 If an error occurs in any one of the instructions, the  The instructions are then distributed to
entire sequence of instructions is executed from the multiple execution nodes of equal processing
beginning after the error has been resolved. power and are executed in parallel.
 It is evident from the processing method that Linear  Since the instructions are run on separate
processing is best suited for minor computing tasks execution nodes, errors can be fixed and
and is inefficient and time consuming when it comes executed locally independent of other
to processing complex problems such as Big Data.
instructions.
PARALLEL PROCESSING
ADVANTAGES
 Parallel
processing offers significant advantages when dealing with
complex problems such as Big Data.
 Some of the other benefits of using Parallel processing are:
 Reduced processing times: Parallel processing can process Big Data in a
fraction of the time compared to linear processing.
 Less memory and processing requirements: Since the problem instructions
are executed on separate execution nodes, memory and processing requirements
are low even while processing large volumes of data.
 Flexibility: The biggest advantage of parallel processing is that execution nodes
can be added and removed as and when required. This significantly reduces
infrastructure cost.
DATA SCALING
 Data Scaling is a technique to manage, store, and process the overflow of data.
 You can get a larger single node computer. But when your data is growing
exponentially, eventually it will outgrow the capacity that is available.

 Increasing the capacity of a single node as a means of increasing capacity is

called scaling up.
HORIZONTAL SCALING
 A better strategy is to scale out or to scale horizontally.

 This simply means adding additional nodes with the same capacity until the problem is
tractable.

 The individual nodes arranged in this way are called a computing cluster.

 Compute clusters can solve problems that are known as “embarrassingly parallel”
calculations. These are the kind of workloads that can easily be divided and run
independent of one another. If any one process fails, it has no impact on the others and
can easily be rerun. An example would be to, say, change the date format in a single
column of a large dataset that has been split into multiple smaller chunks that are stored
in different nodes of the cluster.
ISSUES IN PARALLEL
COMPUTING
 Sometimes, sorting a large data set adds significant complexity to the
process.
 Now, the multiple computations must coordinate with one another because
each process needs to be aware of the state of its peer processes in order
to complete the calculation.
 This requires sending messages across a network to each other or writing
them to a file system that is accessible to all processes on the cluster.
 The level of complexity increases significantly, because you are basically
asking a cluster of computers to behave as a single computer.
DATA LOCALITY
 In the Hadoop ecosystem, the concept of “bringing compute to the data” is
a central idea in the design of the cluster.
 The cluster is designed in a way that computations on certain pieces, or
partitions, of the data will take place right at the location of the data when
possible.
 The resulting output will also be written to the same node.
FAULT TOLERANCE
 Fault tolerance comes into play when computers break an outages happen.

 Fault tolerance refers to the ability of a system to continue operating without interruption when one
or more of its components fail.
 This works for Hadoop primary data storage system (HDFS) and other similar storage systems (like
S3 and object storage).
 Consider the first 3 partitions of a dataset labelled P1, P2, and P3, which reside on the first node.

 In this system, copies of each of these data partitions are also stored on other locations or nodes
within the cluster.
 If the first node ever goes down, you can add a new node to the cluster and recover the lost
partitions by copying data from one of the other nodes where copies of P1, P2, and P3 partitions are
stored.

Unit 1 - Computing Paradigms
No ratings yet
Unit 1 - Computing Paradigms
31 pages
ADSU1 VFTVF25 VF
No ratings yet
ADSU1 VFTVF25 VF
118 pages
Lec14 Merged
No ratings yet
Lec14 Merged
107 pages
Spark Introduction
No ratings yet
Spark Introduction
90 pages
Parallel & Distributed Computing
100% (1)
Parallel & Distributed Computing
52 pages
Paper Tablets EXP
100% (2)
Paper Tablets EXP
4 pages
Agenda: Big Data Systems
No ratings yet
Agenda: Big Data Systems
25 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
Cloud Computing (AutoRecovered) - 1
No ratings yet
Cloud Computing (AutoRecovered) - 1
60 pages
Parallel Computing Simply in Depth by Ajit Singh PDF
No ratings yet
Parallel Computing Simply in Depth by Ajit Singh PDF
125 pages
BDA Answer Bank
No ratings yet
BDA Answer Bank
24 pages
SSC CGL 2021 Mains Maths (En) Paper
No ratings yet
SSC CGL 2021 Mains Maths (En) Paper
35 pages
Parallel Computing Pastpaper Solve by Noman Tariq
No ratings yet
Parallel Computing Pastpaper Solve by Noman Tariq
30 pages
Cloud Computing Unit - 1
No ratings yet
Cloud Computing Unit - 1
41 pages
Parallel Computing
No ratings yet
Parallel Computing
21 pages
Cluster Basics
No ratings yet
Cluster Basics
34 pages
U1&u2 Padcom-25
No ratings yet
U1&u2 Padcom-25
95 pages
Week 1
No ratings yet
Week 1
14 pages
CC Sem
No ratings yet
CC Sem
64 pages
3.3 Computing
No ratings yet
3.3 Computing
5 pages
DistributedComputing Rev2
No ratings yet
DistributedComputing Rev2
44 pages
CCUnit 1
No ratings yet
CCUnit 1
83 pages
CC Unit-1
No ratings yet
CC Unit-1
17 pages
Map Reduce
No ratings yet
Map Reduce
11 pages
Parallel Distributed Computing
No ratings yet
Parallel Distributed Computing
51 pages
Cloud 4 Unit
No ratings yet
Cloud 4 Unit
26 pages
Week 14 Applications of Parallel and Distributed Computing
No ratings yet
Week 14 Applications of Parallel and Distributed Computing
10 pages
Discount Rates: III: Relative Risk Measures
No ratings yet
Discount Rates: III: Relative Risk Measures
20 pages
CH - 1 DPC
No ratings yet
CH - 1 DPC
6 pages
Stuart Mcroberts New Brawn Series Book 1 How To Build Up To 50 Pounds of Muscle The Natural Way by Stuart Mcrobert
100% (2)
Stuart Mcroberts New Brawn Series Book 1 How To Build Up To 50 Pounds of Muscle The Natural Way by Stuart Mcrobert
13 pages
Parallel Algorithms Presentation
No ratings yet
Parallel Algorithms Presentation
32 pages
Parallel Computing An Introduction
No ratings yet
Parallel Computing An Introduction
40 pages
Lecture 1
No ratings yet
Lecture 1
13 pages
CC Chapter1
No ratings yet
CC Chapter1
20 pages
Parallel and Distributed
No ratings yet
Parallel and Distributed
2 pages
Claim Divine Your Dinner A Cookbook For Using Tarot As Your Guide To Magickal Meals Premium Ebook Download
No ratings yet
Claim Divine Your Dinner A Cookbook For Using Tarot As Your Guide To Magickal Meals Premium Ebook Download
16 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
34 pages
Colored White - Transcending The Racial Pas - David R. Roediger
100% (1)
Colored White - Transcending The Racial Pas - David R. Roediger
337 pages
Cloud Computing Unit-1
No ratings yet
Cloud Computing Unit-1
51 pages
PDC Lecture 1
No ratings yet
PDC Lecture 1
34 pages
Wheel Loader
No ratings yet
Wheel Loader
28 pages
Parallel Computing
No ratings yet
Parallel Computing
25 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
1 page
Handwriting Analysis
100% (5)
Handwriting Analysis
17 pages
Lecture 01
No ratings yet
Lecture 01
34 pages
Lec1 and 2
No ratings yet
Lec1 and 2
52 pages
CS621 - Handouts - Mids
No ratings yet
CS621 - Handouts - Mids
61 pages
CC Question and Answers
No ratings yet
CC Question and Answers
14 pages
CC - Unit 1
No ratings yet
CC - Unit 1
29 pages
The Entire Cars 2 Script
No ratings yet
The Entire Cars 2 Script
15 pages
PDC 3
No ratings yet
PDC 3
26 pages
PDC 1
No ratings yet
PDC 1
41 pages
Curriculum Development
67% (3)
Curriculum Development
65 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
Transplantation of The Liver 3rd Edition Entire Volume Download
100% (17)
Transplantation of The Liver 3rd Edition Entire Volume Download
16 pages
Screenshot 2024-06-27 at 11.49.45 PM
No ratings yet
Screenshot 2024-06-27 at 11.49.45 PM
28 pages
Module 1
No ratings yet
Module 1
30 pages
DL 650 Am 3
No ratings yet
DL 650 Am 3
108 pages
BCSE412L - Parallel Computing 01
No ratings yet
BCSE412L - Parallel Computing 01
27 pages
Decmar J. Jaclop - Activity - Assessment 1
No ratings yet
Decmar J. Jaclop - Activity - Assessment 1
6 pages
CS621 Cheatsheet
No ratings yet
CS621 Cheatsheet
11 pages
Parallel and Distributed Computing-1
No ratings yet
Parallel and Distributed Computing-1
17 pages
3.3 Computing
No ratings yet
3.3 Computing
5 pages
CS ELEC 2 Introduce Parallel Computing
No ratings yet
CS ELEC 2 Introduce Parallel Computing
28 pages
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
No ratings yet
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
32 pages
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
No ratings yet
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
32 pages
CC Unit-1
No ratings yet
CC Unit-1
17 pages
PDC Digital Notes 6 17
No ratings yet
PDC Digital Notes 6 17
12 pages
Information Sheet
No ratings yet
Information Sheet
3 pages
Ap Literature and Composition Syllabus MR
No ratings yet
Ap Literature and Composition Syllabus MR
2 pages
Faculty of Computer Engineering Informatics and Communications
No ratings yet
Faculty of Computer Engineering Informatics and Communications
5 pages
Unit 4 Map Reduce
No ratings yet
Unit 4 Map Reduce
10 pages
Geography S5 P1
No ratings yet
Geography S5 P1
80 pages
Poseidon Principles
No ratings yet
Poseidon Principles
73 pages
Accounting Ratios
No ratings yet
Accounting Ratios
23 pages
Calendar of Events 2025-06-05 Page 1 of 2
No ratings yet
Calendar of Events 2025-06-05 Page 1 of 2
1 page
Week 6
No ratings yet
Week 6
19 pages
Essential Oils: June 2019
No ratings yet
Essential Oils: June 2019
14 pages
Trek Marlin 29er Owners Manual
No ratings yet
Trek Marlin 29er Owners Manual
3 pages
Crime and Punishment
No ratings yet
Crime and Punishment
2 pages
Biochem-Experiment 2-Carbohydrates
No ratings yet
Biochem-Experiment 2-Carbohydrates
6 pages
Comparative Table
No ratings yet
Comparative Table
9 pages
Aura and Color Readings
No ratings yet
Aura and Color Readings
4 pages
2022.09.27 OCE Axne
No ratings yet
2022.09.27 OCE Axne
4 pages
Moroccan Arabic Textbook 23
No ratings yet
Moroccan Arabic Textbook 23
2 pages
Lavazza
No ratings yet
Lavazza
3 pages
Inflations, Its Types and Causes of Inflation in Pakistan
No ratings yet
Inflations, Its Types and Causes of Inflation in Pakistan
5 pages
The New Trends of Parallel Processing
No ratings yet
The New Trends of Parallel Processing
5 pages
Cameron Browne - Taiji Variations: Yin and Yang in Multiple Dimensions
No ratings yet
Cameron Browne - Taiji Variations: Yin and Yang in Multiple Dimensions
17 pages
Scalability By Design
From Everand
Scalability By Design
Chukwunonso Offor
No ratings yet

BigData ParallelComputing

Uploaded by

BigData ParallelComputing

Uploaded by

PARALLEL

 Increasing the capacity of a single node as a means of increasing capacity is

You might also like