0% found this document useful (0 votes)

13 views4 pages

BDAunit III

Hadoop is an open-source framework for storing and processing large datasets in a distributed computing environment, utilizing HDFS and MapReduce for efficient data handling. It enables parallel processing and fault tolerance, making it suitable for big data applications. Additionally, NoSQL databases provide flexible, high-performance solutions for managing diverse data types, overcoming the limitations of traditional SQL databases.

Uploaded by

bhargavrajvaranasi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views4 pages

BDAunit III

Uploaded by

bhargavrajvaranasi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

UNIT-III

Deﬁni on: Hadoop is an open-source framework designed for storing and processing large volumes
of data in a distributed compu ng environment. It enables scalable and eﬃcient data handling by
u lizing clusters of commodity hardware. The Hadoop framework allows parallel processing and fault
tolerance, making it a powerful tool for managing big data.

Introduc on: In today’s digital world, the volume of data generated is enormous, requiring systems
that can handle, store, and analyze large datasets eﬃciently. Tradi onal database management
systems (DBMS) struggle with scalability and performance when dealing with massive amounts of
data. Hadoop was developed as a solu on to these challenges, providing a distributed compu ng
model that processes large datasets across mul ple nodes simultaneously. By leveraging the Hadoop
Distributed File System (HDFS) and the MapReduce processing paradigm, Hadoop ensures high
availability, fault tolerance, and eﬃcient handling of structured and unstructured data.

Processing Data with Hadoop:

Deﬁni on: Processing data with Hadoop refers to the method of handling large-scale datasets using
distributed compu ng across mul ple nodes. Hadoop processes data using HDFS for storage and
MapReduce for processing, ensuring parallel computa on and fault tolerance.

Introduc on: Hadoop provides an eﬃcient way to process large amounts of data by dividing tasks
into smaller sub-tasks and execu ng them in parallel across a distributed system. The Hadoop
processing model includes data inges on, storage, processing, and retrieval. The MapReduce
programming model plays a crucial role in processing data eﬃciently by breaking it down into two
primary phases: the Map phase and the Reduce phase.

Introduc on to MapReduce Programming:

Deﬁni on: MapReduce is a programming paradigm used in Hadoop to process vast amounts of data
in parallel across a distributed cluster. It simpliﬁes large-scale computa ons by breaking them down
into two main opera ons: mapping and reducing.

Introduc on: The MapReduce framework enables eﬃcient parallel processing of big data by
distribu ng tasks across mul ple compu ng nodes. It consists of two core func ons:

 Map Func on: Processes input data and generates intermediate key-value pairs.

 Reduce Func on: Aggregates and processes the intermediate results to produce the final
output.
Mapper:
Defini on: The Mapper func on is the first phase of the MapReduce framework, responsible for
processing input data and transforming it into key-value pairs.

Introduc on: Each Mapper processes a por on of the input dataset independently and generates
key-value pairs as output. These intermediate key-value pairs are later sorted and passed to the
Reducer. The Mapper func on is highly parallelizable, allowing mul ple Mappers to process data
simultaneously for eﬃciency.

Reducer:
Deﬁni on: The Reducer func on is the second phase of MapReduce, which processes intermediate
key-value pairs generated by the Mappers and consolidates them into ﬁnal results.

Introduc on: A er sor ng and grouping the intermediate results, the Reducer applies aggrega on,
computa on, or transforma on opera ons to generate the ﬁnal output. This phase is responsible for
reducing large amounts of data into meaningful insights.

Combiner:
Deﬁni on: The Combiner is an op onal op miza on step in MapReduce that performs local
aggrega on on Mapper output before it is sent to the Reducer.

Introduc on: By reducing the amount of intermediate data transferred across the network, the
Combiner minimizes data shuﬄing overhead, improving overall performance and eﬃciency in
Hadoop processing.

Par oner:
Deﬁni on: The Par oner func on determines how intermediate key-value pairs are distributed to
Reducers in a MapReduce job.

Introduc on: Par oning ensures load balancing by assigning specific key ranges to different
Reducers. This step improves efficiency by preven ng data skew and ensuring even data distribu on.
NoSQL Databases:

Defini on: NoSQL databases are non-rela onal databases designed for distributed data storage and
high-performance opera ons. They handle structured, semi-structured, and unstructured data
efficiently without requiring a fixed schema.

Introduc on: Tradi onal SQL-based databases struggle with scalability and flexibility when dealing
with large datasets. NoSQL databases were developed to overcome these limita ons by offering a
flexible schema, high availability, and horizontal scalability. These databases are widely used in real-
me applica ons, big data analy cs, and distributed environments.

Types of NoSQL Databases:

1. Key-Value Stores: Data is stored in key-value pairs, oﬀering fast lookups and eﬃcient
retrieval. Examples: Redis, DynamoDB.

2. Document Stores: Data is stored in ﬂexible document formats like JSON or BSON. Examples:
MongoDB, CouchDB.

3. Column-Family Stores: Data is stored in column-oriented structures for fast retrieval and
analysis. Examples: Cassandra, HBase.

4. Graph Databases: Data is represented as interconnected nodes and rela onships, suitable
for social networks and recommenda on engines. Examples: Neo4j, ArangoDB.

Advantages of NoSQL:
 Scalability: Horizontally scalable to handle increasing data volumes.

 Schema Flexibility: No ﬁxed schema, allowing for dynamic data models.

 High Availability: Built-in replica on ensures data redundancy and fault tolerance.

 Faster Performance: Op mized for high-speed read/write opera ons.

 Support for Large Data Volumes: Handles big data applica ons eﬃciently.

Use of NoSQL in Industry:

NoSQL databases are widely used in various industries, including:

 E-commerce: Managing user sessions, shopping carts, and product catalogs.

 Social Media: Storing and analyzing user interac ons, recommenda ons, and messages.

 Finance: Processing real- me transac ons and fraud detec on.

 Healthcare: Managing electronic health records and pa ent data.

 IoT Applica ons: Handling sensor data and real- me analy cs.

SQL vs. NoSQL:

Feature SQL (Rela onal Databases) NoSQL (Non-Rela onal Databases)

Schema Fixed schema Dynamic schema

Scalability Ver cal Scaling Horizontal Scaling

Data Model Tables with Rows and Columns Key-Value, Document, Column, Graph

Transac ons ACID Compliance BASE Model (Eventually Consistent)

Performance Op mized for transac ons Op mized for large-scale reads/writes

NewSQL:
Deﬁni on: NewSQL databases are modern rela onal database systems that combine the scalability
of NoSQL with the strong consistency of tradi onal SQL databases.

Comparison of NoSQL, SQL, and NewSQL:

Feature SQL NoSQL NewSQL

Schema Fixed Flexible Fixed

Scalability Limited High High

Transac ons ACID BASE ACID

Use Case OLTP Big Data & Real-Time Processing Scalable SQL Processing

IDS Unit3
No ratings yet
IDS Unit3
19 pages
Medical Informatics, E-Health - Fundamentals and Applications (PDFDrive) 1
100% (2)
Medical Informatics, E-Health - Fundamentals and Applications (PDFDrive) 1
495 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
R23 IDS Unit3
No ratings yet
R23 IDS Unit3
36 pages
Module - 2 - Introduction To Hadoop
No ratings yet
Module - 2 - Introduction To Hadoop
24 pages
Unit 3 & 4 Big Data
No ratings yet
Unit 3 & 4 Big Data
18 pages
PostgreSQL DBA Guide
No ratings yet
PostgreSQL DBA Guide
105 pages
DM - Topic Five
No ratings yet
DM - Topic Five
30 pages
Big Data Management Continued
No ratings yet
Big Data Management Continued
48 pages
AZ-900 ITExams
No ratings yet
AZ-900 ITExams
208 pages
07 BigData DataAnalysis
No ratings yet
07 BigData DataAnalysis
66 pages
Module 5 BDA
No ratings yet
Module 5 BDA
25 pages
IDS Unit3
No ratings yet
IDS Unit3
16 pages
Unit # 2
No ratings yet
Unit # 2
23 pages
M5
No ratings yet
M5
18 pages
Bda Unit 3
No ratings yet
Bda Unit 3
22 pages
NAAC Accredited ''A" D. E. Society's: "Rainfall Analysis in India "
No ratings yet
NAAC Accredited ''A" D. E. Society's: "Rainfall Analysis in India "
21 pages
BDA Class3
No ratings yet
BDA Class3
15 pages
Lesson 2 A Review of Hadoop
No ratings yet
Lesson 2 A Review of Hadoop
6 pages
Chapter2 Bdi
No ratings yet
Chapter2 Bdi
101 pages
Analyzing Big Data in Hadoop Spark
No ratings yet
Analyzing Big Data in Hadoop Spark
30 pages
NAAC Accredited ''A" D. E. Society's: "Rainfall in India Analysis"
No ratings yet
NAAC Accredited ''A" D. E. Society's: "Rainfall in India Analysis"
21 pages
Big Data and Mapreduce Challenges, Opportunities and Trends
No ratings yet
Big Data and Mapreduce Challenges, Opportunities and Trends
9 pages
BDA Unit2 Notes
No ratings yet
BDA Unit2 Notes
23 pages
20ai402 Data Analytics Unit-2
No ratings yet
20ai402 Data Analytics Unit-2
72 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
92 pages
Big Data - Hadoop
No ratings yet
Big Data - Hadoop
20 pages
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
No ratings yet
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
40 pages
18CS72-Big Data and Analytics 3rd Internal QP 7th Semester - Scheme of Evaluation
No ratings yet
18CS72-Big Data and Analytics 3rd Internal QP 7th Semester - Scheme of Evaluation
14 pages
Shortnotes For Cloud
No ratings yet
Shortnotes For Cloud
22 pages
Haddob Lab Report
No ratings yet
Haddob Lab Report
12 pages
Unit-III (Big Data) Final
No ratings yet
Unit-III (Big Data) Final
34 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
Unit 5
No ratings yet
Unit 5
32 pages
9 Hadoop PDF
No ratings yet
9 Hadoop PDF
59 pages
Big Data Analysis PDF 2
No ratings yet
Big Data Analysis PDF 2
18 pages
CC Unit4
No ratings yet
CC Unit4
14 pages
Biggdata
No ratings yet
Biggdata
24 pages
Experiment No. 11 Part A A.1 Aim: 2 Prerequisite: A.3 Outcome: After Successful Completion of This Experiment, Students Will Be Able To
No ratings yet
Experiment No. 11 Part A A.1 Aim: 2 Prerequisite: A.3 Outcome: After Successful Completion of This Experiment, Students Will Be Able To
21 pages
Apache Hadoop
No ratings yet
Apache Hadoop
27 pages
Tools in Data Analytics
No ratings yet
Tools in Data Analytics
17 pages
11 Lecture
No ratings yet
11 Lecture
22 pages
Unit 5
No ratings yet
Unit 5
7 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Bda Unit 1
No ratings yet
Bda Unit 1
13 pages
Bda Ia1 Scheme
No ratings yet
Bda Ia1 Scheme
7 pages
Hadoop: Er. Gursewak Singh Dsce
No ratings yet
Hadoop: Er. Gursewak Singh Dsce
15 pages
WWW Doubtly in Big Data Analytics Semester 7 Mu Ai Ds Viva Qna
No ratings yet
WWW Doubtly in Big Data Analytics Semester 7 Mu Ai Ds Viva Qna
7 pages
Hadoop Notes
No ratings yet
Hadoop Notes
8 pages
Big Data Notes (All Lectures)
No ratings yet
Big Data Notes (All Lectures)
44 pages
MapReduce Unit3
No ratings yet
MapReduce Unit3
27 pages
Big Data
No ratings yet
Big Data
29 pages
BCA Syllabus
No ratings yet
BCA Syllabus
43 pages
Hadoop-How It Works
No ratings yet
Hadoop-How It Works
5 pages
Cloud Computing Prof
No ratings yet
Cloud Computing Prof
11 pages
Teradata SQL Advanced
No ratings yet
Teradata SQL Advanced
21 pages
Unit - III Advanced Analytics Technology and Tools
No ratings yet
Unit - III Advanced Analytics Technology and Tools
44 pages
Chapter - 2 Hadoop
No ratings yet
Chapter - 2 Hadoop
32 pages
Workplace Software and Skills - WEB IlfJtcP
No ratings yet
Workplace Software and Skills - WEB IlfJtcP
1,101 pages
3bus094075r0301 Advant Ocs
No ratings yet
3bus094075r0301 Advant Ocs
400 pages
Combating The Effect of Turnover Military Lesson Learned From Project Teams Rebuilding Iraq
No ratings yet
Combating The Effect of Turnover Military Lesson Learned From Project Teams Rebuilding Iraq
42 pages
Resume
No ratings yet
Resume
10 pages
AA Incedo Business Pro Device Configuration Processes v2.2 ENG
No ratings yet
AA Incedo Business Pro Device Configuration Processes v2.2 ENG
145 pages
CC5001 Week 18 Support
No ratings yet
CC5001 Week 18 Support
32 pages
Information Management
No ratings yet
Information Management
3 pages
Migrating Oracle E-Business Suite On AWS
No ratings yet
Migrating Oracle E-Business Suite On AWS
26 pages
11 It June Theory 2024 Paper
No ratings yet
11 It June Theory 2024 Paper
18 pages
Oracle 1z0 1072
No ratings yet
Oracle 1z0 1072
72 pages
Analysis of Utilization of Management Information System in Standard Chartered Bank
No ratings yet
Analysis of Utilization of Management Information System in Standard Chartered Bank
10 pages
Oracle AssgManager - Tech
No ratings yet
Oracle AssgManager - Tech
79 pages
Project File
100% (1)
Project File
19 pages
2 Marks For ISM All 5 Units
No ratings yet
2 Marks For ISM All 5 Units
22 pages
Java Hibernate, JSF Primefaces and MySQL (Part 1)
No ratings yet
Java Hibernate, JSF Primefaces and MySQL (Part 1)
38 pages
Cadcam 2 Marks PDF
No ratings yet
Cadcam 2 Marks PDF
18 pages
Pinaka Final Edit Natin Ahahah
No ratings yet
Pinaka Final Edit Natin Ahahah
88 pages
A 9 New Features
No ratings yet
A 9 New Features
3 pages
DBMS Report
No ratings yet
DBMS Report
84 pages
Using IndexedDB - Web APIs - MDN
No ratings yet
Using IndexedDB - Web APIs - MDN
26 pages
Visual Basic 2015
No ratings yet
Visual Basic 2015
40 pages
IJNRD Data Visual
No ratings yet
IJNRD Data Visual
5 pages
K0201ax F
No ratings yet
K0201ax F
24 pages
Resume: Sandeep Baurai
No ratings yet
Resume: Sandeep Baurai
3 pages
Documentation On E-Vault
No ratings yet
Documentation On E-Vault
11 pages
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
DBA's Guide to NoSQL
From Everand
DBA's Guide to NoSQL
The Enlightened DBA
5/5 (1)
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Redshift Essentials: Definitive Reference for Developers and Engineers
From Everand
Redshift Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
The DynamoDB Handbook: Practical Solutions for Modern NoSQL Database Management
From Everand
The DynamoDB Handbook: Practical Solutions for Modern NoSQL Database Management
Robert Johnson
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet

BDAunit III

Uploaded by

BDAunit III

Uploaded by

UNIT-III

Processing Data with Hadoop:

Introduc on to MapReduce Programming:

Types of NoSQL Databases:

 Schema Flexibility: No ﬁxed schema, allowing for dynamic data models.

 Faster Performance: Op mized for high-speed read/write opera ons.

Use of NoSQL in Industry:

 E-commerce: Managing user sessions, shopping carts, and product catalogs.

 Finance: Processing real- me transac ons and fraud detec on.

 Healthcare: Managing electronic health records and pa ent data.

SQL vs. NoSQL:

Schema Fixed schema Dynamic schema

Scalability Ver cal Scaling Horizontal Scaling

Transac ons ACID Compliance BASE Model (Eventually Consistent)

Performance Op mized for transac ons Op mized for large-scale reads/writes

Comparison of NoSQL, SQL, and NewSQL:

Feature SQL NoSQL NewSQL

Schema Fixed Flexible Fixed

Scalability Limited High High

Transac ons ACID BASE ACID

You might also like