1.1.2 and 1.1.3

The document outlines a course on Big Data Technologies focusing on Spark and Scala, detailing course outcomes related to Hadoop Ecosystem, Scala constructs, and application development. It explains the MapReduce programming model and the introduction of YARN in Hadoop 2.0 for resource management and job scheduling, highlighting its advantages and limitations. Additionally, it includes a Q&A section on YARN components and related programming frameworks.

Uploaded by

rahul104941

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views21 pages

1.1.2 and 1.1.3

Uploaded by

rahul104941

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

APEX INSTITUTE OF TECHNOLOGY.

AIT-IBM CSE
CHANDIGARH UNIVERSITY, MOHALI

Big Data Technologies (Spark & Scala)

(22CSH-391)
Lecture-1 (CO1)
By
Dr Geeta Rani (E15227)
Associate Professor (Chandigarh University)
Course Outcomes

C • Understand the components of the Hadoop Ecosystem and Data Science

O methodology
1
• Understand the constructs of Scala
CO
2
• Understand Apache Spark and its components
CO
3
• Design the applications using Scala
CO
4
• Develop the Applications using Spark and its available Libraries
CO
5
MapReduce AND YARN
Mapreduce
• It is a processing layer of Hadoop.
• MapReduce is a programming model designed for processing large
volumes of data in parallel by dividing the work into the set of
chunks
• There are two processes one is Mapper and another is
the Reducer.
Mapreduce
• Map phase- It is the first phase of data processing. In
this phase, we specify all the complex logic/business
rules/costly code.
• Reduce phase- It is the second phase of processing.
In this phase, we specify light-weight processing like
aggregation/summation
Hadoop version 2.0

• YARN was introduced in Hadoop version 2.0 in the year 2012

by Yahoo and Hortonworks. The basic idea behind YARN is to
relieve MapReduce by taking over the responsibility of
Resource Management and Job Scheduling
• YARN allows different data processing methods like graph
processing, interactive processing, stream processing as well
as batch processing to run and process data stored in HDFS.
Therefore YARN opens up Hadoop to other types of distributed
applications beyond MapReduce.
• YARN enabled the users to perform operations as per
requirement by using a variety of tools like Spark for real-time
processing, Hive for SQL, HBase for NoSQL and others.
Limitations
• Scalability
• The utilization of computational resources is inefficient.
• The Hadoop framework became limited only to
MapReduce processing paradigm.
Q/A
• What are the two main components of YARN?
• a) ResourceManager and NameNode
• b) ResourceManager and NodeManager
• c) TaskTracker and JobTracker
• d) DataNode and SecondaryNameNode
Ans: b

17
Q/A
• Which programming framework is commonly used with YARN for
distributed data processing?
• a) Hadoop MapReduce
• b) Apache Spark
• c) Apache Hive
• d) Apache Kafka
Ans: b) Apache Spark

18
Q/A
• What is the role of the NodeManager in YARN?
• a) Managing and allocating cluster resources
• b) Managing and storing metadata information
• c) Coordinating and scheduling MapReduce jobs
• d) Storing and managing data blocks
Ans: a

19
References:

✔ https://www.edureka.co/blog/big-data-tutorial
✔ https://www.coursera.org/learn/big-data-introduction?specialization=big-data2.
✔ https://www.coursera.org/learn/fundamentals-of-big-data
✔ Big Data, Black Book: Covers Hadoop 2, MapReduce, Hive, YARN, Pig, R and Data Visualization, DT Editorial
Service, Dreamtech Press
✔ Big Data Analytics, Subhashini Chellappa, Seema Acharya, Wiley publications
✔ Big Data: Concepts, Technology, and Architecture, Nandhini Abirami R , Seifedine Kadry, Amir H. Gandomi ,
Wiley publication

8/8/2021 20
THANK YOU

Big Data Unit 2
No ratings yet
Big Data Unit 2
277 pages
Bigdata & Hadoop: Shushrutha Reddy K M.Tech in Computational Engineering From Rgukt Senior Bigdata Developer @servicenow
100% (1)
Bigdata & Hadoop: Shushrutha Reddy K M.Tech in Computational Engineering From Rgukt Senior Bigdata Developer @servicenow
49 pages
Cloud PDF
No ratings yet
Cloud PDF
138 pages
4 PPT On YARN MapReduce 31 10 20
No ratings yet
4 PPT On YARN MapReduce 31 10 20
17 pages
Updated Unit-IV Reference PPT 08-02-2022
No ratings yet
Updated Unit-IV Reference PPT 08-02-2022
103 pages
Unit 4
No ratings yet
Unit 4
85 pages
Wa0005.
No ratings yet
Wa0005.
84 pages
Module 1 - Introduction To Big Data
100% (1)
Module 1 - Introduction To Big Data
40 pages
Big Data Unit 4
No ratings yet
Big Data Unit 4
96 pages
Hadoop
No ratings yet
Hadoop
83 pages
Hadoop Ecosystem and Their Components
No ratings yet
Hadoop Ecosystem and Their Components
19 pages
BigData Unit-4 Complete
No ratings yet
BigData Unit-4 Complete
97 pages
Hadoop
No ratings yet
Hadoop
61 pages
Chap5 BigDataComputingAndProcessing
No ratings yet
Chap5 BigDataComputingAndProcessing
72 pages
Data W - Bigdata8
No ratings yet
Data W - Bigdata8
105 pages
Lecture 2
No ratings yet
Lecture 2
70 pages
Big Data - Tomas Iglesias IV
No ratings yet
Big Data - Tomas Iglesias IV
37 pages
Introduc) On To Bigdata
No ratings yet
Introduc) On To Bigdata
103 pages
Module 2 Big Data Analytics
No ratings yet
Module 2 Big Data Analytics
38 pages
Unit 2
No ratings yet
Unit 2
73 pages
05-MapReduce and Yarn
No ratings yet
05-MapReduce and Yarn
82 pages
Chap 3-5.-Hadoop Ecosystem YARN MapReduce - 1
No ratings yet
Chap 3-5.-Hadoop Ecosystem YARN MapReduce - 1
87 pages
Bigdata Lecture 4
No ratings yet
Bigdata Lecture 4
23 pages
2-Introduction To Hadoop Eco System
No ratings yet
2-Introduction To Hadoop Eco System
35 pages
BDA Presentations Unit-4 - Hadoop, Ecosystem
100% (1)
BDA Presentations Unit-4 - Hadoop, Ecosystem
25 pages
Unit-2 - Introduction To Hadoop and Hadoop Architecture
No ratings yet
Unit-2 - Introduction To Hadoop and Hadoop Architecture
46 pages
Chapter-7. Big Data Tools and Techniques
No ratings yet
Chapter-7. Big Data Tools and Techniques
16 pages
Big Data Technologies (Spark & Scala) (22CSH-391) Lecture-1 (CO1)
No ratings yet
Big Data Technologies (Spark & Scala) (22CSH-391) Lecture-1 (CO1)
30 pages
MapReduce Unit3
No ratings yet
MapReduce Unit3
27 pages
Modern Data Engineering With Apache Spark (For - .)
No ratings yet
Modern Data Engineering With Apache Spark (For - .)
604 pages
Big Data Unit 3
No ratings yet
Big Data Unit 3
374 pages
10 - Big Data Architecture and Tools
No ratings yet
10 - Big Data Architecture and Tools
31 pages
Big Data Technologies
No ratings yet
Big Data Technologies
37 pages
BD U-2 (Anupam Sir)
No ratings yet
BD U-2 (Anupam Sir)
30 pages
Python Unit - 2
No ratings yet
Python Unit - 2
142 pages
Unit III
No ratings yet
Unit III
15 pages
M2 Bigdata&Hadoop
No ratings yet
M2 Bigdata&Hadoop
27 pages
Day 2 S1 Intro - To - Hadoop - Ashok
No ratings yet
Day 2 S1 Intro - To - Hadoop - Ashok
27 pages
BD U-4 (Anupam Sir)
No ratings yet
BD U-4 (Anupam Sir)
23 pages
BIGDATA4
No ratings yet
BIGDATA4
28 pages
INtroduction To Big DAta and HAdoop
No ratings yet
INtroduction To Big DAta and HAdoop
30 pages
Lec 2
No ratings yet
Lec 2
20 pages
Unit-2 (HADOOP)
No ratings yet
Unit-2 (HADOOP)
20 pages
UNIT-I Introduction To Hadoop - A20
No ratings yet
UNIT-I Introduction To Hadoop - A20
24 pages
Bda Summer 2022 Solution
No ratings yet
Bda Summer 2022 Solution
30 pages
Bda QB Soln
No ratings yet
Bda QB Soln
22 pages
Data Engineering Cookbook
89% (9)
Data Engineering Cookbook
88 pages
Bda Bi Jit Chapter-4
No ratings yet
Bda Bi Jit Chapter-4
20 pages
Lec 2
No ratings yet
Lec 2
19 pages
Big Data Unit 3 Own
No ratings yet
Big Data Unit 3 Own
20 pages
Operating System Unit 1
No ratings yet
Operating System Unit 1
210 pages
Unit 1 Dbms - Patel
No ratings yet
Unit 1 Dbms - Patel
183 pages
Unit - 4 Yarn
No ratings yet
Unit - 4 Yarn
20 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
194 pages
Unit 3 Bda
No ratings yet
Unit 3 Bda
13 pages
Unit2 Bda
No ratings yet
Unit2 Bda
12 pages
Module 2 Hadoop Eco System
No ratings yet
Module 2 Hadoop Eco System
13 pages
Big Data Notes
No ratings yet
Big Data Notes
12 pages
Hadoop - Presentation 101
No ratings yet
Hadoop - Presentation 101
10 pages
BBBBCCCCDDD
No ratings yet
BBBBCCCCDDD
10 pages
BigData Unit 2
No ratings yet
BigData Unit 2
15 pages
AWS Data Engineering Services
No ratings yet
AWS Data Engineering Services
24 pages
Big Data Exam Help
No ratings yet
Big Data Exam Help
7 pages
Apache Hadoop Next Generation Compute Platform: Bikas Saha @bikassaha
No ratings yet
Apache Hadoop Next Generation Compute Platform: Bikas Saha @bikassaha
22 pages
Resumen Ejercicios Libro Spark
No ratings yet
Resumen Ejercicios Libro Spark
86 pages
Deploy Machine Learning Models
100% (1)
Deploy Machine Learning Models
45 pages
What Is The Hadoop Ecosystem?
No ratings yet
What Is The Hadoop Ecosystem?
4 pages
YARN Essentials - Sample Chapter
No ratings yet
YARN Essentials - Sample Chapter
12 pages
Real-Time Processing of Events (Sensor, Telecommunications, Fraud Etc.) Even
No ratings yet
Real-Time Processing of Events (Sensor, Telecommunications, Fraud Etc.) Even
4 pages
6 3-Syllabhssnk
No ratings yet
6 3-Syllabhssnk
62 pages
Full Download Mastering Machine Learning With Spark 2 X Harness The Potential of Machine Learning Through Spark 1st Edition Alex Tellez PDF
100% (4)
Full Download Mastering Machine Learning With Spark 2 X Harness The Potential of Machine Learning Through Spark 1st Edition Alex Tellez PDF
55 pages
Stream Computing Methods
No ratings yet
Stream Computing Methods
35 pages
Scala PDF
No ratings yet
Scala PDF
29 pages
Business Intelligence Unit - 1
No ratings yet
Business Intelligence Unit - 1
55 pages
BD - Spark - Baladasu A - SightSpectrum
No ratings yet
BD - Spark - Baladasu A - SightSpectrum
3 pages
Datamites Certified Data Scientist Brochure
No ratings yet
Datamites Certified Data Scientist Brochure
18 pages
1.1.4 and 1.1.5
No ratings yet
1.1.4 and 1.1.5
38 pages
Big Data Analytics in Intelligent Transportation Systems A Survey
No ratings yet
Big Data Analytics in Intelligent Transportation Systems A Survey
20 pages
Bda Pyq
No ratings yet
Bda Pyq
4 pages
18-C7 Big Data
No ratings yet
18-C7 Big Data
27 pages
Anoop - Azure - Senior Data Engineer
No ratings yet
Anoop - Azure - Senior Data Engineer
5 pages
Intellipaat's Data Science Architect Masters Course V1
No ratings yet
Intellipaat's Data Science Architect Masters Course V1
13 pages
Spark SQL PPT 3.2.3 and 3.2.4
No ratings yet
Spark SQL PPT 3.2.3 and 3.2.4
17 pages
Unit1.1.1 RTHFGBCV TRHBGFV TDHNGFB
No ratings yet
Unit1.1.1 RTHFGBCV TRHBGFV TDHNGFB
26 pages
R01 1
No ratings yet
R01 1
7 pages
Future of Big Data
No ratings yet
Future of Big Data
3 pages
Best Practices For Resource Management in Hadoop: James Kochuba, SAS Institute Inc., Cary, NC
No ratings yet
Best Practices For Resource Management in Hadoop: James Kochuba, SAS Institute Inc., Cary, NC
10 pages
SPark Monitoring and Tuning PPT 3.3.1
No ratings yet
SPark Monitoring and Tuning PPT 3.3.1
15 pages
Resume
No ratings yet
Resume
2 pages
Spark 3.0 New Features: Spark With GPU Support
No ratings yet
Spark 3.0 New Features: Spark With GPU Support
8 pages
GCP Data Engineer Resume
No ratings yet
GCP Data Engineer Resume
1 page
Calcite
No ratings yet
Calcite
10 pages
Joins in Pyspark
No ratings yet
Joins in Pyspark
10 pages
Sunny Kumar-Data Engineer
No ratings yet
Sunny Kumar-Data Engineer
3 pages
The Ultimate Data Engineering Guide - Apache Spark, Apache Airflow, and AWS Glue
No ratings yet
The Ultimate Data Engineering Guide - Apache Spark, Apache Airflow, and AWS Glue
6 pages
DSBDA Sample Problem Statements
No ratings yet
DSBDA Sample Problem Statements
3 pages
Assignment-1 Spark and Scala
No ratings yet
Assignment-1 Spark and Scala
1 page
Data Science23 Student Resume Example
No ratings yet
Data Science23 Student Resume Example
1 page
Hadoop Engineering
From Everand
Hadoop Engineering
Jaxon Vyas
No ratings yet

1.1.2 and 1.1.3

Uploaded by

1.1.2 and 1.1.3

Uploaded by

APEX INSTITUTE OF TECHNOLOGY.

Big Data Technologies (Spark & Scala)

C • Understand the components of the Hadoop Ecosystem and Data Science

• YARN was introduced in Hadoop version 2.0 in the year 2012

You might also like