The document provides an overview of Big Data technologies, focusing on MapReduce, YARN, and NoSQL databases. It explains the architecture and differences between Hadoop 1 and Hadoop 2, as well as the features and benefits of Apache Spark. Additionally, it discusses various types of NoSQL databases, their advantages and disadvantages, and applications in real-world scenarios.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
27 views20 pages
Hadoop YARN
The document provides an overview of Big Data technologies, focusing on MapReduce, YARN, and NoSQL databases. It explains the architecture and differences between Hadoop 1 and Hadoop 2, as well as the features and benefits of Apache Spark. Additionally, it discusses various types of NoSQL databases, their advantages and disadvantages, and applications in real-world scenarios.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20
UNIT-III
Dr.G.Arutjothi Assistant Professor MapReduce, YARN & NoSQL
Big Data Technologies and Databases
Introduction to MapReduce • - A programming model for processing large datasets • - Uses parallel computing across clusters • - Consists of Map and Reduce functions Processing Data with Hadoop using MapReduce • - Data is split into chunks and processed in parallel • - Mappers perform initial processing • - Reducers aggregate and output results Introduction to YARN & Architecture • - YARN: Yet Another Resource Negotiator • - Separates resource management and job scheduling • - Components: Resource Manager, Node Manager, Application Master Managing Resources & Applications with Hadoop YARN • - YARN dynamically allocates resources to applications • - Ensures efficient resource utilization • - Supports multiple processing frameworks Difference between Hadoop 1 and Hadoop 2 Sr. No. Key Hadoop 1 Hadoop 2 It has more components and New As Hadoop 1 less components APIs as compare to Hadoop 1 1 Components and and APIs as compare to that of such as YARN API,YARN API Hadoop 2. FRAMEWORK, and enhanced Resource Manager. It allows to work in Hadoop 1 only supports MapReducer model as well as 2 Support MapReduce processing model other distributed computing models
cluster resource management
Map reducer in Hadoop 1 is Resource YARN is used while processing 3 responsible for processing and Management management is done using cluster-resource management. different processing models.
less scalable, nodes it is limited scalable up to 10000 nodes
4 Scalability to 4000 nodes per cluster per cluster. It can be used to run generic 5 Implementation Map task or a Reduce task only. tasks.
Windows no support for Microsoft support for Microsoft
6 Support Windows provided by Apache. windows in Hadoop 2. Big Data Technologies • Big data technologies like Hadoop, Spark, and NoSQL are essential for managing vast amounts of data generated by various sources. • Hadoop is an open-source framework that allows for distributed storage and processing of large data sets across clusters of computers. • Its capabilities include HDFS for storage, MapReduce for processing, and Hive for data warehousing. • Spark, on the other hand, offers an advanced analytics engine that offers fast, in-memory processing, supporting various workloads. • NoSQL databases, like MongoDB, Cassandra, and Couchbase, offer flexibility and scalability for diverse data types. • Understanding these technologies is crucial for making informed decisions about data architecture and strategy, enhancing business intelligence, operational efficiency, and innovation. Exploring Apache Spark: Key Features and Performance Benefits Apache Spark is a powerful big data technology with in-memory computing, allowing it to process large datasets faster than traditional frameworks like Hadoop MapReduce. It offers a unified framework that can handle various workloads, simplifying architecture and reducing overhead. Spark's programming language support allows developers to write applications in Java, Scala, Python, or R, making it accessible to a wider audience. Its rich ecosystem includes integrations with Hadoop, Apache Hive, and NoSQL databases, as well as libraries for machine learning, graph processing, and SQL-based queries. Spark's horizontal scalability and fault tolerance make it an attractive choice for mission-critical applications. Big Data Technologies & NoSQL Databases
• - NoSQL: Non-relational databases designed
for Big Data -NoSQL databases are designed to handle various data models, offering flexibility and performance that traditional SQL databases may struggle with. • - Scalable and flexible data models • - Used in distributed systems Understanding NoSQL Databases: Types and Use Cases There are three types of NoSQL databases: 1. Document Stores, 2. Key-value Stores, 3. Column-family Stores, and 4. Graph Databases.
* Document Stores store data in documents, ideal for complex data
structures and varied formats. * Key-value Stores store data as a collection of key-value pairs, ideal for caching, session management, and rapid lookups. * Column-family Stores store data in column-oriented ways, efficient for analytical queries and large-scale data warehousing. * Graph Databases represent and analyze relationships between interconnected data points, excelling in social networks, fraud detection, and recommendation engines. Features & Types of NoSQL Databases • - Types: Document-based, Key-Value, Column- family, Graph databases • - Schema-less structure • - High availability and partition tolerance Advantages & Disadvantages of NoSQL • - Advantages: Scalable, flexible schema, fast performance • - Disadvantages: Eventual consistency, complex querying, limited standardization Applications of NoSQL Databases • - Social media (Facebook, Twitter) • - E-commerce (Amazon, eBay) • - Real-time analytics • - Internet of Things (IoT) applications