0% found this document useful (0 votes)
27 views20 pages

Hadoop YARN

The document provides an overview of Big Data technologies, focusing on MapReduce, YARN, and NoSQL databases. It explains the architecture and differences between Hadoop 1 and Hadoop 2, as well as the features and benefits of Apache Spark. Additionally, it discusses various types of NoSQL databases, their advantages and disadvantages, and applications in real-world scenarios.

Uploaded by

Arut Jothi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views20 pages

Hadoop YARN

The document provides an overview of Big Data technologies, focusing on MapReduce, YARN, and NoSQL databases. It explains the architecture and differences between Hadoop 1 and Hadoop 2, as well as the features and benefits of Apache Spark. Additionally, it discusses various types of NoSQL databases, their advantages and disadvantages, and applications in real-world scenarios.

Uploaded by

Arut Jothi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

UNIT-III

Dr.G.Arutjothi
Assistant Professor
MapReduce, YARN & NoSQL

Big Data Technologies and Databases


Introduction to MapReduce
• - A programming model for processing large
datasets
• - Uses parallel computing across clusters
• - Consists of Map and Reduce functions
Processing Data with Hadoop using
MapReduce
• - Data is split into chunks and processed in
parallel
• - Mappers perform initial processing
• - Reducers aggregate and output results
Introduction to YARN & Architecture
• - YARN: Yet Another Resource Negotiator
• - Separates resource management and job
scheduling
• - Components: Resource Manager, Node
Manager, Application Master
Managing Resources & Applications with
Hadoop YARN
• - YARN dynamically allocates resources to applications
• - Ensures efficient resource utilization
• - Supports multiple processing frameworks
Difference between Hadoop 1 and Hadoop 2
Sr. No. Key Hadoop 1 Hadoop 2
It has more components and
New As Hadoop 1 less components APIs as compare to Hadoop 1
1 Components and and APIs as compare to that of such as YARN API,YARN
API Hadoop 2. FRAMEWORK, and enhanced
Resource Manager.
It allows to work in
Hadoop 1 only supports MapReducer model as well as
2 Support
MapReduce processing model other distributed computing
models

cluster resource management


Map reducer in Hadoop 1 is
Resource YARN is used while processing
3 responsible for processing and
Management management is done using
cluster-resource management.
different processing models.

less scalable, nodes it is limited scalable up to 10000 nodes


4 Scalability
to 4000 nodes per cluster per cluster.
It can be used to run generic
5 Implementation Map task or a Reduce task only.
tasks.

Windows no support for Microsoft support for Microsoft


6
Support Windows provided by Apache. windows in Hadoop 2.
Big Data Technologies
• Big data technologies like Hadoop, Spark, and NoSQL are essential
for managing vast amounts of data generated by various sources.
• Hadoop is an open-source framework that allows for distributed
storage and processing of large data sets across clusters of
computers.
• Its capabilities include HDFS for storage, MapReduce for processing,
and Hive for data warehousing.
• Spark, on the other hand, offers an advanced analytics engine that
offers fast, in-memory processing, supporting various workloads.
• NoSQL databases, like MongoDB, Cassandra, and Couchbase, offer
flexibility and scalability for diverse data types.
• Understanding these technologies is crucial for making informed
decisions about data architecture and strategy, enhancing business
intelligence, operational efficiency, and innovation.
Exploring Apache Spark: Key Features and
Performance Benefits
 Apache Spark is a powerful big data technology with in-memory
computing, allowing it to process large datasets faster than
traditional frameworks like Hadoop MapReduce.
 It offers a unified framework that can handle various workloads,
simplifying architecture and reducing overhead.
 Spark's programming language support allows developers to write
applications in Java, Scala, Python, or R, making it accessible to a
wider audience.
 Its rich ecosystem includes integrations with Hadoop, Apache Hive,
and NoSQL databases, as well as libraries for machine learning,
graph processing, and SQL-based queries.
 Spark's horizontal scalability and fault tolerance make it an
attractive choice for mission-critical applications.
Big Data Technologies & NoSQL Databases

• - NoSQL: Non-relational databases designed


for Big Data
-NoSQL databases are designed to handle various data
models, offering flexibility and performance that traditional
SQL databases may struggle with.
• - Scalable and flexible data models
• - Used in distributed systems
Understanding NoSQL Databases: Types
and Use Cases
There are three types of NoSQL databases:
1. Document Stores,
2. Key-value Stores,
3. Column-family Stores, and
4. Graph Databases.

* Document Stores store data in documents, ideal for complex data


structures and varied formats.
* Key-value Stores store data as a collection of key-value pairs, ideal for
caching, session management, and rapid lookups.
* Column-family Stores store data in column-oriented ways, efficient for
analytical queries and large-scale data warehousing.
* Graph Databases represent and analyze relationships between
interconnected data points, excelling in social networks, fraud detection,
and recommendation engines.
Features & Types of NoSQL Databases
• - Types: Document-based, Key-Value, Column-
family, Graph databases
• - Schema-less structure
• - High availability and partition tolerance
Advantages & Disadvantages of NoSQL
• - Advantages: Scalable, flexible schema, fast
performance
• - Disadvantages: Eventual consistency, complex
querying, limited standardization
Applications of NoSQL Databases
• - Social media (Facebook, Twitter)
• - E-commerce (Amazon, eBay)
• - Real-time analytics
• - Internet of Things (IoT) applications

You might also like