0% found this document useful (0 votes)

45 views3 pages

MapReduceBusinessDriver - NOSQL Case Studypdf

Google's MapReduce framework effectively addresses the challenges of processing large-scale datasets by utilizing NoSQL principles and distributed computing. It enables faster data processing, cost savings, scalability, and efficient resource utilization while maintaining resilience and flexibility. The framework is particularly suited for handling massive volumes of data, making it a powerful solution for business problems related to big data analysis.

Uploaded by

siddhagawane09

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views3 pages

MapReduceBusinessDriver - NOSQL Case Studypdf

Uploaded by

siddhagawane09

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

(ARTIFICIAL INTELLIGENCE & MACHINE LEARNING)

----------------------------------------------------------------------------------------------------------------

Big Data Analysis

Demonstrate how business problems have been successfully solved faster, cheaper and more effectively
considering NoSQL Google’s MapReduce case study. Also illustrate the business drivers and the findings
on it. Business Drivers:

Google faced a significant challenge with processing and analysing large-scale datasets generated by its search
engine and other services. Traditional relational database systems were struggling to handle the immense volume
of data in a timely and cost-effective manner.

The business problems included:

Scalability: The need to process and analyze massive amounts of data quickly and efficiently. (Velocity)

Cost-effectiveness: Traditional relational databases were proving to be expensive to scale and maintain for such
large datasets.

Performance: The requirement for faster processing to provide real-time insights and results.

Google introduced MapReduce, a programming model, and an associated implementation, which leveraged
NoSQL principles to tackle these challenges effectively. The MapReduce model splits tasks into smaller sub-tasks
that can be executed in parallel across a distributed computing cluster.

Map Phase: The input data is divided into smaller chunks, and a "map" function is applied to each chunk. This
function processes and generates key-value pairs as intermediate outputs.

Shuffle and Sort Phase: The intermediate key-value pairs are sorted and grouped by key across different nodes
in the cluster. This prepares the data for the next phase.

Reduce Phase: The sorted data is passed to a "reduce" function, which aggregates and processes the data for the
final output.

Department of Computer Science & Engineering-(AI&ML) | APSIT

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
(ARTIFICIAL INTELLIGENCE & MACHINE LEARNING)
----------------------------------------------------------------------------------------------------------------

Google's MapReduce demonstrates how NoSQL and distributed computing can successfully solve complex
business problems faster, cheaper, and more effectively. The MapReduce framework enabled Google to process
massive amounts of data, gaining valuable insights and paving the way for advancements in various domains.

Big Data Analysis

Google has leveraged following advantages of MapReduce implementation:

Faster Data Processing: Google's MapReduce framework allowed them to distribute the data processing tasks
across multiple servers, enabling parallel execution. This led to significant speed-ups in data processing. For
instance, tasks that took hours or days with traditional methods could now be completed in minutes or even
seconds.

Cost Savings: The MapReduce approach also resulted in cost savings. By utilizing commodity hardware and
distributing tasks across a cluster, Google could achieve high performance at a fraction of the cost of traditional
solutions. This approach eliminated the need for expensive, specialized hardware.

Scalability: The combination of MapReduce with Bigtable, a distributed NoSQL database, allowed Google to
scale their infrastructure horizontally. As data volumes grew, they could add more servers to the cluster, ensuring
that the system's performance remained consistent even with increasing data loads. predefined schemas,
accommodating the evolving nature of web data.

Efficient Resource Utilization: MapReduce's task distribution ensured optimal utilization of resources. Each
server in the cluster could work on its assigned task, minimizing idle time and maximizing overall efficiency.

Resilience and Fault Tolerance: The distributed nature of MapReduce and Bigtable increased resilience. If a
server failed during processing, tasks could be automatically rerouted to healthy nodes, minimizing downtime and
data loss.

Flexibility: Bigtable's NoSQL design provided flexibility in data modelling. Unlike rigid relational databases,
Bigtable allowed Google to store various types of data without

Department of Computer Science & Engineering-(AI&ML) | APSIT

Following are the business drivers behind Google’s MapReduce discovery,

Volume:

MapReduce is designed to handle massive volumes of data. Traditional data processing systems, like relational
databases, can struggle to scale effectively as data volumes increase. However, MapReduce's distributed
processing model allows it to handle vast amounts of data by dividing it into smaller chunks that can be processed
in parallel across a cluster of servers. This approach ensures that the system can scale horizontally by adding more
servers to the cluster as data volumes grow. This scalability enables efficient processing and analysis of large
datasets without compromising performance.

Velocity:

Velocity refers to the speed at which data is generated and needs to be processed. In the context of real-time or
near-real-time data processing, MapReduce might not be the best fit due to its batch-oriented nature. However,
for scenarios where data doesn't need to be processed in real-time, MapReduce can still be highly effective. By
breaking down data processing into smaller tasks that can be executed in parallel, MapReduce significantly speeds
up the processing time compared to traditional single-threaded approaches. This means that even though
MapReduce doesn't address real-time velocity, it does help handle the high velocity of data by efficiently
processing large volumes of data within reasonable time frames.

Google's MapReduce framework addresses the volume and velocity business drivers. It excels at processing large
volumes of data in a parallel and distributed manner, which leads to efficient data processing and analysis.
Additionally, while not designed for real-time processing, MapReduce can still handle data with a relatively high
velocity within reasonable time frames due to its parallel processing capabilities. These characteristics make
MapReduce a powerful solution for managing and analyzing vast amounts of data efficiently and effectively.

Department of Computer Science & Engineering-(AI&ML) | APSIT

Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Big Data Seminar Report
No ratings yet
Big Data Seminar Report
19 pages
Convolutional Neural Networks in Python Master Data Science and Machine Learning With Modern Deep Le
100% (3)
Convolutional Neural Networks in Python Master Data Science and Machine Learning With Modern Deep Le
178 pages
Paper 2 Bda
No ratings yet
Paper 2 Bda
14 pages
Map Reduce Report
No ratings yet
Map Reduce Report
16 pages
Mapreduce: Simplified Data Analysis of Big Data: Sciencedirect
No ratings yet
Mapreduce: Simplified Data Analysis of Big Data: Sciencedirect
9 pages
MapReduce - A Flexible DP Tool
No ratings yet
MapReduce - A Flexible DP Tool
6 pages
Authors Seema Maitreya, C.K. Jhab
No ratings yet
Authors Seema Maitreya, C.K. Jhab
23 pages
HPE Compute Certification Guide: 444 Practice Questions for the Advanced HPE1-H02 Exam
From Everand
HPE Compute Certification Guide: 444 Practice Questions for the Advanced HPE1-H02 Exam
Steve Brown
No ratings yet
Simplified Data Processing For Large Cluster A Map
No ratings yet
Simplified Data Processing For Large Cluster A Map
7 pages
Mapreduce Article Review
No ratings yet
Mapreduce Article Review
8 pages
MAPREDUCEFRAMEWORK
No ratings yet
MAPREDUCEFRAMEWORK
12 pages
Dean 08 Map Reduce
No ratings yet
Dean 08 Map Reduce
7 pages
Map Reduce On Red Green Blue Architecture
No ratings yet
Map Reduce On Red Green Blue Architecture
11 pages
Keynote Jeffrey Dean
No ratings yet
Keynote Jeffrey Dean
1 page
CC Unit4
No ratings yet
CC Unit4
14 pages
Practical 1: Data Mining and Business Intelligence Practical-1
No ratings yet
Practical 1: Data Mining and Business Intelligence Practical-1
10 pages
Term Paper Java
No ratings yet
Term Paper Java
14 pages
Act4 May2 6E BDA SEC
No ratings yet
Act4 May2 6E BDA SEC
4 pages
Mapreduce
No ratings yet
Mapreduce
8 pages
7-Brief About Big Data, Hadoop Map Reduce-31-07-2023
No ratings yet
7-Brief About Big Data, Hadoop Map Reduce-31-07-2023
35 pages
Exploration On Big Data Oriented Data Analyzing and Processing Technology
No ratings yet
Exploration On Big Data Oriented Data Analyzing and Processing Technology
7 pages
Data Science
No ratings yet
Data Science
7 pages
Principles of MapReduce Systems: Definitive Reference for Developers and Engineers
From Everand
Principles of MapReduce Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
MapReduce: Simplified Data Processing On Large Clusters
100% (1)
MapReduce: Simplified Data Processing On Large Clusters
13 pages
Cloud Security UNIT 5
No ratings yet
Cloud Security UNIT 5
4 pages
BDA05 DistributedComputing
No ratings yet
BDA05 DistributedComputing
7 pages
Large Scale Index Processing: Mapreduce
No ratings yet
Large Scale Index Processing: Mapreduce
9 pages
132 P16cse5a-P16ite3a 2020052706582977
No ratings yet
132 P16cse5a-P16ite3a 2020052706582977
15 pages
3D Hardware design:: Software applications for GPU
From Everand
3D Hardware design:: Software applications for GPU
S Mathioudakis
No ratings yet
03 Big Data Overview
No ratings yet
03 Big Data Overview
96 pages
Paper Map Reduce
No ratings yet
Paper Map Reduce
16 pages
Real Time Applications Using MapReduce
No ratings yet
Real Time Applications Using MapReduce
12 pages
Mapreduce With Hadoop For Simplified Analysis of Big Data: Ch. Shobha Rani Dr. B. Rama
No ratings yet
Mapreduce With Hadoop For Simplified Analysis of Big Data: Ch. Shobha Rani Dr. B. Rama
4 pages
Efficient Parallel Computing with Dask: Definitive Reference for Developers and Engineers
From Everand
Efficient Parallel Computing with Dask: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering DuckDB: High-Performance Analytics Made Easy
From Everand
Mastering DuckDB: High-Performance Analytics Made Easy
Robert Johnson
No ratings yet
Talk - Why Is Google A Verb
No ratings yet
Talk - Why Is Google A Verb
11 pages
Map Reduce Summary
No ratings yet
Map Reduce Summary
4 pages
Bwu BTD 21 079-Pratap
No ratings yet
Bwu BTD 21 079-Pratap
9 pages
Research Assignment
No ratings yet
Research Assignment
7 pages
Parallel Data Processing in The Cloud
No ratings yet
Parallel Data Processing in The Cloud
25 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Medha 8059
No ratings yet
Medha 8059
4 pages
By Christian Mechem and Geoff Crowley
No ratings yet
By Christian Mechem and Geoff Crowley
11 pages
Advantages of Hadoop MapReduce Programming
No ratings yet
Advantages of Hadoop MapReduce Programming
3 pages
Chapter 3 - 大数据管理
No ratings yet
Chapter 3 - 大数据管理
38 pages
Parcial Cono 1 14
No ratings yet
Parcial Cono 1 14
14 pages
Parcial Cono 1 21
No ratings yet
Parcial Cono 1 21
21 pages
Big Data Analytics Unit - 1 Notes
No ratings yet
Big Data Analytics Unit - 1 Notes
24 pages
Mapreduce
No ratings yet
Mapreduce
13 pages
Mapreduce
No ratings yet
Mapreduce
13 pages
ABSTRACT
No ratings yet
ABSTRACT
9 pages
4a MapReduce
No ratings yet
4a MapReduce
47 pages
Data Mining With Hadoop and Hive Introduction To Architecture
No ratings yet
Data Mining With Hadoop and Hive Introduction To Architecture
39 pages
A Brief On MapReduce Performance
No ratings yet
A Brief On MapReduce Performance
6 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
32 pages
Hadoop Quick Guide
No ratings yet
Hadoop Quick Guide
32 pages
Analysis of Mapreduce Algorithms: Harini Padmanaban
No ratings yet
Analysis of Mapreduce Algorithms: Harini Padmanaban
6 pages
StarPU: Parallel Computing and Task Scheduling Techniques
From Everand
StarPU: Parallel Computing and Task Scheduling Techniques
Richard Johnson
No ratings yet
Practical TimescaleDB Solutions: Definitive Reference for Developers and Engineers
From Everand
Practical TimescaleDB Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Big Data
No ratings yet
Big Data
120 pages
Li2022 Article BrainTumorSegmentationBasedOnR
No ratings yet
Li2022 Article BrainTumorSegmentationBasedOnR
11 pages
INFO59548 (F2020) Final Exam
No ratings yet
INFO59548 (F2020) Final Exam
3 pages
Honor in Artificial Intelligence
No ratings yet
Honor in Artificial Intelligence
3 pages
Back Propogation
No ratings yet
Back Propogation
9 pages
Daily Dose of Data Science Full Archive
No ratings yet
Daily Dose of Data Science Full Archive
53 pages
AIML Lab Programs
No ratings yet
AIML Lab Programs
13 pages
AI Research
No ratings yet
AI Research
17 pages
Artificial Intelligence: UNIT-1
No ratings yet
Artificial Intelligence: UNIT-1
77 pages
BCA Program Structure
No ratings yet
BCA Program Structure
30 pages
Unit No. 01 - Introduction To AI & ML
No ratings yet
Unit No. 01 - Introduction To AI & ML
34 pages
There Are 5 Questions. Answer All Questions
No ratings yet
There Are 5 Questions. Answer All Questions
4 pages
Animationssoftware Com Top 5 Free Ai Video Generator in 2024
No ratings yet
Animationssoftware Com Top 5 Free Ai Video Generator in 2024
6 pages
2020 Deep CNN TR Le
No ratings yet
2020 Deep CNN TR Le
6 pages
Eurofound - Ethical Digitalisation at Work
No ratings yet
Eurofound - Ethical Digitalisation at Work
68 pages
UT Austin Texas PGP AIML Brochure
No ratings yet
UT Austin Texas PGP AIML Brochure
18 pages
Neuromorphic Computing Full Report
No ratings yet
Neuromorphic Computing Full Report
12 pages
Types of Neural Networks
No ratings yet
Types of Neural Networks
7 pages
Black Book Syncaxis
No ratings yet
Black Book Syncaxis
41 pages
IBPS PO Prelims Memory Based Paper Held On 23 Sept 2023 Shift 1
No ratings yet
IBPS PO Prelims Memory Based Paper Held On 23 Sept 2023 Shift 1
30 pages
2407.12836v1 Heteful Features
No ratings yet
2407.12836v1 Heteful Features
4 pages
Minerals: Image Process of Rock Size Distribution Using Dexined-Based Neural Network
No ratings yet
Minerals: Image Process of Rock Size Distribution Using Dexined-Based Neural Network
13 pages
Funnel Agency Blueprint Book - FINAL
No ratings yet
Funnel Agency Blueprint Book - FINAL
129 pages
AI Algorithms and Awful Humans
No ratings yet
AI Algorithms and Awful Humans
19 pages
The Latest Technology Trends in Hospitality and Tourism Industry
No ratings yet
The Latest Technology Trends in Hospitality and Tourism Industry
7 pages
A Review of Traffic Congestion Prediction Using Artificial Intelligence
No ratings yet
A Review of Traffic Congestion Prediction Using Artificial Intelligence
18 pages
Consumer Behavior Divya
No ratings yet
Consumer Behavior Divya
5 pages
UNIT 1 Introduction Part 1
No ratings yet
UNIT 1 Introduction Part 1
37 pages
Ai-El RMC
No ratings yet
Ai-El RMC
42 pages
Brochure 14 1
No ratings yet
Brochure 14 1
7 pages

MapReduceBusinessDriver - NOSQL Case Studypdf

Uploaded by

MapReduceBusinessDriver - NOSQL Case Studypdf

Uploaded by

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

(ARTIFICIAL INTELLIGENCE & MACHINE LEARNING)

Big Data Analysis

The business problems included:

Department of Computer Science & Engineering-(AI&ML) | APSIT

Big Data Analysis

Google has leveraged following advantages of MapReduce implementation:

Department of Computer Science & Engineering-(AI&ML) | APSIT

Following are the business drivers behind Google’s MapReduce discovery,

Department of Computer Science & Engineering-(AI&ML) | APSIT

You might also like