4.big Data Platforms

Uploaded by

newt67710

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views37 pages

4.big Data Platforms

Uploaded by

newt67710

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 37

Platforms to handle Big

data

Dr. Jigna Ashish Patel

Assistant Professor, CSE Dept,
Institute of Technology,
Nirma University
Objective of the lecture
• Right platform
• Need of the application/algorithm
• Right decision
• How quickly do we need to get the results?
• How big is the data to be processed?
• Does the model building require several iterations or single iteration?
System/platform level requirements
• Will there be a need for more data processing capability in the future?
• Is the rate of data transfer critical for this application?
• Is there a need for handling hardware failures within the application?
Horizontal Scaling
• It involves distributing the workload across many servers which may
be even commodity machines.
• It is also known as “scale out”, where multiple independent machines
are added together in order to improve the processing capability.
• Typically, multiple instances of the operating system are running on
separate machines.
Vertical Scaling
• Vertical Scaling involves installing more processors, more memory and
faster hardware, typically, within a single server.
• It is also known as “scale up” and it usually involves a single instance
of an operating system.
Horizontal Scaling Platforms
• Peer-to-Peer Network
• Apache Hadoop
• Apache Spark

Vertical Scaling Platforms

• High performance computing clusters
• Multicore CPU
• Graphics Processing Unit(GPU)
• Field Programmable gate arrays(FPGA)
Peer-to-Peer networks
• involve millions of machines connected in a network
• decentralized and distributed network architecture where the
nodes in the networks (known as peers) serve as well as consume
resources.
• oldest distributed computing platforms
• Message Passing Interface (MPI) for communication scheme used
in such a setup to communicate and exchange the data between
peers.
• Each node can store the data instances and the scale out is
practically unlimited (can be millions of nodes).
Apache Hadoop
• open source framework for storing and processing large datasets
using clusters of commodity hardware.
• Hadoop is designed to scale up to hundreds
• highly fault tolerant
• The Hadoop platform contains the following two important
components: (1) HDFS (2) YARN
Apache Spark
• developed by researchers at the University of California at
Berkeley. designed to overcome the disk I/O limitations
• ability to perform in-memory computations.
• allows the data to be cached in memory, thus eliminating the
• Hadoop’s disk overhead limitation for iterative tasks.
• supports Java, Scala and Python and for certain tasks
• it is tested to be up to 100× faster than Hadoop MapReduce
HPC clusters
• Known as blades or supercomputers, are machines with thousands
of cores.
• They can have a different variety of disk organization, cache,
communication mechanism etc.
• powerful hardware which is optimized for speed and throughput.
• They are not as scalable as Hadoop or Spark clusters but they are still
capable of processing terabytes of data.
Multicore CPU
• Multicore refers to one machine having dozens of processing
cores They usually have shared memory but only one disk.
• the number of cores per chip and the number of
• operations that a core can perform has increased significantly.
Newer breeds of motherboards allow multiple CPUs within a
single machine thereby increasing the parallelism.
• Until the last few years, CPUs were mainly responsible for
accelerating the algorithms for big data analytics.
GPU
• It is designed to accelerate the creation of images in a frame
buffer intended for display output
• GPUs were primarily used for graphical operations such as video
and image editing, accelerating graphics-related processing etc.
due to their massively parallel architecture, recent
developments in GPU hardware and related programming
frameworks have given rise to GPGPU
• In addition to the processing cores, GPU has its own high
throughput DDR5 memory which is many times faster than a
typical DDR3 memory.
FPGA
• highly specialized hardware units for specific applications
• FPGAs can be highly optimized for speed and can be orders of
magnitude faster compared to other platforms for certain
applications.
• Due to customized hardware, the development cost is typically
much higher compared to other platforms.
• On the software side, coding has to be done in HDL with a low-
level knowledge of the hardware which increases the algorithm
development cost.
Comparison of platforms
System/Platform Level characteristics

• Scalability
• Data I/O performance
• Fault Tolerance
Scalability

Platform Scalability
Peer-to-Peer *****
Virtual Clusters(MapReduce/MPI) *****
Virtual Clusters(Spark) *****
HPC clusters (MPI/MapReduce) ***
Multicore(Multithreading) **
GPU(CUDA) **
FPGA(HDL) *
Data I/O Performance

Platform Data I/O performance

Peer-to-Peer *
Virtual Clusters(MapReduce/MPI) **
Virtual Clusters(Spark) ***
HPC clusters (MPI/MapReduce) ****
Multicore(Multithreading) ****
GPU(CUDA) *****
FPGA(HDL) *****
Fault Tolerance

Platform Fault Tolerance

Peer-to-Peer *
Virtual Clusters(MapReduce/MPI) *****
Virtual Clusters(Spark) *****
HPC clusters (MPI/MapReduce) ****
Multicore(Multithreading) ****
GPU(CUDA) ****
FPGA(HDL) ****
Comparison of platforms
Application/Algorithm Level characteristics

• Real time processing

• Data size supported
• Iterative task support
Real Time Processing

Platform Real Time Processing

Peer-to-Peer *
Virtual Clusters(MapReduce/MPI) **
Virtual Clusters(Spark) **
HPC clusters (MPI/MapReduce) ***
Multicore(Multithreading) ***
GPU(CUDA) *****
FPGA(HDL) *****
Data Size supported

Platform Data Size supported

Peer-to-Peer *****
Virtual Clusters(MapReduce/MPI) ****
Virtual Clusters(Spark) ****
HPC clusters (MPI/MapReduce) ****
Multicore(Multithreading) **
GPU(CUDA) **
FPGA(HDL) **
Iterative Task Support

Platform Iterative task support

Peer-to-Peer **
Virtual Clusters(MapReduce/MPI) **
Virtual Clusters(Spark) ***
HPC clusters (MPI/MapReduce) ****
Multicore(Multithreading) ****
GPU(CUDA) ****
FPGA(HDL) ****
How will you choose one of platform for a
particular criteria ?
Amount of Time
Number of Iterations
Fault Tolerance
Scalability
Choice of platform
• Data size
• Speed/Throughput
• Training /Applying a model
K means clustering
K-means on MapReduce
K-means on MPI
K-means on GPU

Installation and Start-Up Guide 11/2002 Edition: Ccu3 Software Version 6 Sinumerik 810D
100% (5)
Installation and Start-Up Guide 11/2002 Edition: Ccu3 Software Version 6 Sinumerik 810D
350 pages
IBHNet Manual PDF
No ratings yet
IBHNet Manual PDF
50 pages
Guía Aermod Lakes
No ratings yet
Guía Aermod Lakes
22 pages
Mpi Book
No ratings yet
Mpi Book
350 pages
Manual Mpi II
No ratings yet
Manual Mpi II
87 pages
03 Intro HadoopAndMapReduce BigData
No ratings yet
03 Intro HadoopAndMapReduce BigData
91 pages
DC - Co 1 All in 1 PDF
No ratings yet
DC - Co 1 All in 1 PDF
197 pages
09 ParallelizationRecap PDF
No ratings yet
09 ParallelizationRecap PDF
62 pages
Autodyn Parallel Processing Guide
No ratings yet
Autodyn Parallel Processing Guide
52 pages
Seminar Report On Cluster Computing
No ratings yet
Seminar Report On Cluster Computing
21 pages
Mpi 4 Py
No ratings yet
Mpi 4 Py
23 pages
HPC Architecture and ECO System PDF
No ratings yet
HPC Architecture and ECO System PDF
3 pages
CDB21DW043 (Autosaved)
No ratings yet
CDB21DW043 (Autosaved)
19 pages
Clase 4 - Tutorial de MPI
No ratings yet
Clase 4 - Tutorial de MPI
35 pages
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
No ratings yet
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
134 pages
Dns PDF
No ratings yet
Dns PDF
28 pages
Cray Series Programming Environment Users Guide
100% (1)
Cray Series Programming Environment Users Guide
233 pages
Virtual Cluster For HPC Education
No ratings yet
Virtual Cluster For HPC Education
10 pages
Maintenance Performance Measurement (MPM) : Issues and Challenges
No ratings yet
Maintenance Performance Measurement (MPM) : Issues and Challenges
14 pages
CAQA5e ch1
No ratings yet
CAQA5e ch1
42 pages
Pytorch Distributed: Experiences On Accelerating Data Parallel Training
No ratings yet
Pytorch Distributed: Experiences On Accelerating Data Parallel Training
14 pages
The Earth Simulator: Presented by Jin Soon Lim For CS 566
No ratings yet
The Earth Simulator: Presented by Jin Soon Lim For CS 566
29 pages
Grid and Cloud Computing
No ratings yet
Grid and Cloud Computing
46 pages
FEKO 7.0 InstallationGuide
No ratings yet
FEKO 7.0 InstallationGuide
32 pages
Spark Introduction
No ratings yet
Spark Introduction
90 pages
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
No ratings yet
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
43 pages
CCL All
No ratings yet
CCL All
68 pages
PWC Communication Tools and B.U.D.S. (Spark Series) - Shop Manual Supplement smr2016-108
No ratings yet
PWC Communication Tools and B.U.D.S. (Spark Series) - Shop Manual Supplement smr2016-108
6 pages
Abstract A WR
No ratings yet
Abstract A WR
1 page
1 Introduction
No ratings yet
1 Introduction
58 pages
HPC Education and Training: An Australian Perspective
No ratings yet
HPC Education and Training: An Australian Perspective
5 pages
Unit 1 Cloud
No ratings yet
Unit 1 Cloud
102 pages
Icac Act 1005
No ratings yet
Icac Act 1005
3 pages
Method of Moments Accelerations and Extensions in
No ratings yet
Method of Moments Accelerations and Extensions in
5 pages
AnsysEMInstallGuide Windows
No ratings yet
AnsysEMInstallGuide Windows
71 pages
Machine Learning
No ratings yet
Machine Learning
102 pages
MPI Under Condor
No ratings yet
MPI Under Condor
3 pages
Module 2
No ratings yet
Module 2
5 pages
Chapter - 2 Hadoop
No ratings yet
Chapter - 2 Hadoop
32 pages
Parallel Computing
No ratings yet
Parallel Computing
57 pages
Big Data
No ratings yet
Big Data
29 pages
Chapter Three Data Science
No ratings yet
Chapter Three Data Science
23 pages
A Comparative Survey of Big Data Computing and HPC
No ratings yet
A Comparative Survey of Big Data Computing and HPC
38 pages
Lecture 03
No ratings yet
Lecture 03
17 pages
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
No ratings yet
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
25 pages
BDA Unit 2 1
No ratings yet
BDA Unit 2 1
42 pages
Module 1-Topic 1
No ratings yet
Module 1-Topic 1
36 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
HPC Week1 Samp
No ratings yet
HPC Week1 Samp
23 pages
INTEL - The Parallel Universe - Issue 21 - 2015
No ratings yet
INTEL - The Parallel Universe - Issue 21 - 2015
36 pages
CCL Assignments 1 and 2
No ratings yet
CCL Assignments 1 and 2
4 pages
Cloud Computing Unit-1
100% (1)
Cloud Computing Unit-1
88 pages
CC Unit-1
No ratings yet
CC Unit-1
17 pages
Preview-9781482211191 A37870511
No ratings yet
Preview-9781482211191 A37870511
50 pages
Cloud Computing - 1
No ratings yet
Cloud Computing - 1
41 pages
Lecture 5
No ratings yet
Lecture 5
32 pages
Unit-1 (Cloud Computing) 1. (Accessible) Scalable Computing Over The Internet
100% (1)
Unit-1 (Cloud Computing) 1. (Accessible) Scalable Computing Over The Internet
17 pages
4.big Data Platforms
No ratings yet
4.big Data Platforms
49 pages
Chap2 ComputingTrends
No ratings yet
Chap2 ComputingTrends
55 pages
Unit 1
No ratings yet
Unit 1
31 pages
Introduction To High-Performance Computing (HPC) : Scientific Research Engineering Data Analytics Machine Learning
No ratings yet
Introduction To High-Performance Computing (HPC) : Scientific Research Engineering Data Analytics Machine Learning
30 pages
HPC Tools and Technologies For Web Programming
No ratings yet
HPC Tools and Technologies For Web Programming
33 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
BD by Maaz
No ratings yet
BD by Maaz
19 pages
HPC Lecture 2 Points
No ratings yet
HPC Lecture 2 Points
7 pages
Lecture 13-Derived Datatypes in MPI
No ratings yet
Lecture 13-Derived Datatypes in MPI
33 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
36 pages
Module1 Part1
No ratings yet
Module1 Part1
26 pages
2025 R1 What's New - Ansys Polyflow
No ratings yet
2025 R1 What's New - Ansys Polyflow
43 pages
Module - 01 CC (BCS601)
No ratings yet
Module - 01 CC (BCS601)
47 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
34 pages
Assignment2 CCL 24
No ratings yet
Assignment2 CCL 24
9 pages
4.big Data Platforms
No ratings yet
4.big Data Platforms
40 pages
Scalable Computing Over The Internet
No ratings yet
Scalable Computing Over The Internet
41 pages
Lecture 1
No ratings yet
Lecture 1
13 pages
PDC Assignment 44
No ratings yet
PDC Assignment 44
5 pages
HPC BOOk
No ratings yet
HPC BOOk
68 pages
Module01 Cloudcomputing 250409082345 d719f5bc
No ratings yet
Module01 Cloudcomputing 250409082345 d719f5bc
82 pages
Big Data Analytics Presentation
No ratings yet
Big Data Analytics Presentation
30 pages
1 Introduction
No ratings yet
1 Introduction
48 pages
Cloud Computing Notes (As Per Guidelines)
No ratings yet
Cloud Computing Notes (As Per Guidelines)
93 pages
BCS601 Module 01
No ratings yet
BCS601 Module 01
36 pages
Cloud DevOps Services VFA
No ratings yet
Cloud DevOps Services VFA
88 pages
Module-1 Notes
No ratings yet
Module-1 Notes
46 pages
Big Data Concepts Detailed Introductory Notes
No ratings yet
Big Data Concepts Detailed Introductory Notes
8 pages
Visvesvaraya Technological University (VTU) : Created by
No ratings yet
Visvesvaraya Technological University (VTU) : Created by
47 pages
CC 1
No ratings yet
CC 1
19 pages

4.big Data Platforms

Uploaded by

4.big Data Platforms

Uploaded by

Platforms to handle Big

Dr. Jigna Ashish Patel

Vertical Scaling Platforms

Platform Data I/O performance

Platform Fault Tolerance

• Real time processing

Platform Real Time Processing

Platform Data Size supported

Platform Iterative task support

You might also like