Big Data Spark Cs606pc Syllabus

The CS606PC Big Data-Spark course aims to equip students with the skills to process Big Data using Spark and its ecosystem. Students will learn to develop MapReduce programs, write Hive queries, and perform various Spark operations, including RDD transformations and SQL queries. The course includes practical experiments on Hadoop architecture, data management in HDFS, and utilizing PySpark for data analysis.

Uploaded by

saidonthula2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views4 pages

Big Data Spark Cs606pc Syllabus

Uploaded by

saidonthula2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

CS606PC: BIG DATA-SPARK

B.Tech. III Year II Sem. L T P C

0 0 4 2
Course Objectives:
 The main objective of the course is to process Big Data with advance architecture
like spark and streaming data in Spark

Course Outcomes:
 Develop MapReduce Programs to analyze large dataset Using Hadoop and Spark
 Write Hive queries to analyze large dataset Outline the Spark Ecosystem and its
components
 Perform the filter, count, distinct, map, flatMap RDD Operations in Spark.
 Build Queries using Spark SQL
 Apply Spark joins on Sample Data Sets
 Make use of sqoop to import and export data from hadoop to database and vice-versa

List of Experiments:
1. To Study of Big Data Analytics and Hadoop Architecture
(i) know the concept of big data architecture
(ii)know the concept of Hadoop architecture

2. Loading DataSet in to HDFS for Spark

Analysis Installation of Hadoop and
cluster management
(i) Installing Hadoop single node cluster in ubuntu environment
(ii)Knowing the differencing between single node clusters and multi-node clusters
(iii)Accessing WEB-UI and the port number
(iv)Installing and accessing the environments such as hive and sqoop

3. File management tasks & Basic linux commands

(i) Creating a directory in HDFS
(ii)Moving forth and back to directories
(iii)Listing directory contents
(iv)Uploading and downloading a file in HDFS
(v) Checking the contents of the file
(vi)Copying and moving files
(vii)Copying and moving files between local to HDFS environment
(viii) Removing files and paths
(ix)Displaying few lines of a file
(x) Display the aggregate length of a file
(xi)Checking the permissions of a file
(xii)Zipping and unzipping the files with & without permission pasting it to a location
(xiii) Copy, Paste commands

4. Map-reducing
(i) Definition of Map-reduce
(ii)Its stages and terminologies
(iii) Word-count program to understand map-reduce (Mapper phase, Reducer phase,
Driver
code)
5. Implementing Matrix-Multiplication with Hadoop Map-reduce

6. Compute Average Salary and Total Salary by Gender for an Enterprise.

7. (i) Creating hive tables (External and internal)
(ii) Loading data to external hive tables from sql tables(or)Structured c.s.v using scoop
(iii) Performing operations like filterations and updations
(iv) Performing Join (inner, outer etc)
(v) Writing User defined function on hive tables

8. Create a sql table of employees Employee table with id,designation Salary table
(salary ,dept id) Create external table in hive with similar schema of above
tables,Move data to hive using scoop and load the contents into tables,filter a
new table and write a UDF to encrypt the table with AES-algorithm, Decrypt it
with key to show contents

9. (i) Pyspark Definition(Apache Pyspark) and difference between Pyspark, Scala, pandas
(ii) Pyspark files and class methods
(iii) get(file name)
(iv) get root directory()

10. Pyspark -RDD’S

(i) what is RDD’s?
(ii)ways to Create RDD
(iii)parallelized collections
(iv)external dataset
(v) existing RDD’s
(vi)Spark RDD’s operations (Count, foreach(), Collect, join,Cache()

11. Perform pyspark transformations

(i) map and flatMap
(ii)to remove the words, which are not necessary to analyze this text.
(iii)groupBy
(iv)What if we want to calculate how many times each word is coming in corpus ?
(v) How do I perform a task (say count the words ‘spark’ and ‘apache’ in rdd3)
separatly on each partition and get the output of the task performed in these
partition ?
(vi)unions of RDD
(vii)join two pairs of RDD Based upon their key

12. Pyspark sparkconf-Attributes and applications

(i) What is Pyspark spark conf ()
(ii) Using spark conf create a spark session to write a dataframe to read details in
a c.s.v and later move that c.s.v to another location

TEXT BOOKS:
1. Spark in Action, Marko Bonaci and Petar Zecevic, Manning.
2. PySpark SQL Recipes: With HiveQL, Dataframe and Graphframes, Raju Kumar
Mishra and Sundar Rajan Raman, Apress Media.

WEB LINKS:
1. https://infyspringboard.onwingspan.com/web/en/app/toc/
lex_auth_013301505844518912251
8 2_shared/overview
2. https://infyspringboard.onwingspan.com/web/en/app/toc/
lex_auth_01258388119638835242_s hared/overview
3. https://infyspringboard.onwingspan.com/web/en/app/toc/
lex_auth_012605268423008256169
2 _shared/overview

PySpark FP Course ID 58339
No ratings yet
PySpark FP Course ID 58339
44 pages
Spark Devops
0% (1)
Spark Devops
301 pages
Comandos Básicos No Termux:: (Apt List) (Chmod +X .SH) (Python - Py)
67% (3)
Comandos Básicos No Termux:: (Apt List) (Chmod +X .SH) (Python - Py)
9 pages
Big Data Management Syllabus
100% (1)
Big Data Management Syllabus
5 pages
Big Data Analytics Digital Notes
No ratings yet
Big Data Analytics Digital Notes
119 pages
Big Data Analytics
No ratings yet
Big Data Analytics
5 pages
Intro To Spark Development
No ratings yet
Intro To Spark Development
172 pages
Course Outline Hadoop and Spark For Big Data and Data Science
100% (1)
Course Outline Hadoop and Spark For Big Data and Data Science
4 pages
J2EE Architecture Overview
100% (1)
J2EE Architecture Overview
33 pages
Apache Spark Engine
100% (1)
Apache Spark Engine
82 pages
COMP9313: Big Data Management: Course Web Site: HTTP://WWW - Cse.unsw - Edu.au/ cs9313
No ratings yet
COMP9313: Big Data Management: Course Web Site: HTTP://WWW - Cse.unsw - Edu.au/ cs9313
76 pages
Big Data Analytics - Sem 7 CVMU
No ratings yet
Big Data Analytics - Sem 7 CVMU
4 pages
Aspnet Core CheatSheet
No ratings yet
Aspnet Core CheatSheet
7 pages
Bca Bigdata Fifth - Sem Approved Syllabus
No ratings yet
Bca Bigdata Fifth - Sem Approved Syllabus
23 pages
Big Data-Spark Lab Syllabus
No ratings yet
Big Data-Spark Lab Syllabus
2 pages
AWS Resource For Tech Support - PostQuiz - Attempt Review
No ratings yet
AWS Resource For Tech Support - PostQuiz - Attempt Review
4 pages
Big Data Analytics (R18a0529)
No ratings yet
Big Data Analytics (R18a0529)
134 pages
Cloud Data Engineering V1.0
No ratings yet
Cloud Data Engineering V1.0
5 pages
Enterprise Architecture Framework
No ratings yet
Enterprise Architecture Framework
12 pages
4-2 Bda PPTS
No ratings yet
4-2 Bda PPTS
114 pages
Course Outline Hadoop and Spark For Big Data and Data Science PDF
No ratings yet
Course Outline Hadoop and Spark For Big Data and Data Science PDF
4 pages
IIT Kharagpur Data Science PDF
No ratings yet
IIT Kharagpur Data Science PDF
22 pages
Data Engineering Brochure FXSr63lN9T
No ratings yet
Data Engineering Brochure FXSr63lN9T
14 pages
Trend Nologies Curriculum
No ratings yet
Trend Nologies Curriculum
30 pages
Module 5 Data Science
No ratings yet
Module 5 Data Science
8 pages
Big Data Analytics 0th Lecture
No ratings yet
Big Data Analytics 0th Lecture
19 pages
Unit 6-1
No ratings yet
Unit 6-1
128 pages
BE AIDS R 20 VII VIII Sem Syllabus - Compressed
No ratings yet
BE AIDS R 20 VII VIII Sem Syllabus - Compressed
55 pages
Developer Training For Apache Spark and Hadoop
No ratings yet
Developer Training For Apache Spark and Hadoop
3 pages
Pyspark TOC - 24 Hours
No ratings yet
Pyspark TOC - 24 Hours
2 pages
Data Engineering - JVM Institute - Coding - Data Science
No ratings yet
Data Engineering - JVM Institute - Coding - Data Science
14 pages
Big Data Analytics - Notes
No ratings yet
Big Data Analytics - Notes
13 pages
Bigdata
No ratings yet
Bigdata
3 pages
In9040 PHD Presentation Selimozcan 2
No ratings yet
In9040 PHD Presentation Selimozcan 2
36 pages
IV Yr II Sem Lesson Plans
No ratings yet
IV Yr II Sem Lesson Plans
19 pages
Big Data Hadoop & Spark Curriculum
No ratings yet
Big Data Hadoop & Spark Curriculum
10 pages
Venu Data Engineering Training in Hyderabad 1
No ratings yet
Venu Data Engineering Training in Hyderabad 1
8 pages
DE Python
No ratings yet
DE Python
11 pages
Big Data & Hadoop - Course Curriculum
No ratings yet
Big Data & Hadoop - Course Curriculum
6 pages
Course Pack BDA
No ratings yet
Course Pack BDA
6 pages
JNTU Hyderabad: B.Tech. Year LL
No ratings yet
JNTU Hyderabad: B.Tech. Year LL
1 page
Midhun BIGDATA Curicullum
No ratings yet
Midhun BIGDATA Curicullum
17 pages
Big Data Technologies PG-DBDA September 2023: ACTS, Pune
No ratings yet
Big Data Technologies PG-DBDA September 2023: ACTS, Pune
6 pages
Bigdata Engineer Complete Syllabus: Presented by
No ratings yet
Bigdata Engineer Complete Syllabus: Presented by
21 pages
20IT503 - Big Data Analytics - Unit4
No ratings yet
20IT503 - Big Data Analytics - Unit4
73 pages
2024 25 ODD CE449 BDA Syllabus
No ratings yet
2024 25 ODD CE449 BDA Syllabus
4 pages
Bigdata Syllabus
No ratings yet
Bigdata Syllabus
3 pages
Python and Pyspark With Databricks, With Azure Project
No ratings yet
Python and Pyspark With Databricks, With Azure Project
9 pages
Data Engineering Bootcamp
No ratings yet
Data Engineering Bootcamp
5 pages
Int 421
No ratings yet
Int 421
2 pages
Skyess Spark Syllabus
No ratings yet
Skyess Spark Syllabus
12 pages
MCA - II Sem - Curriculum and Syllabus
No ratings yet
MCA - II Sem - Curriculum and Syllabus
15 pages
K.L.N. College of Engineering Lecture Schedule: Cloud Architecture and Model
No ratings yet
K.L.N. College of Engineering Lecture Schedule: Cloud Architecture and Model
5 pages
Big Data
No ratings yet
Big Data
4 pages
Big Data Analytics
No ratings yet
Big Data Analytics
2 pages
Big Data Syllabus For Theory and Lab
No ratings yet
Big Data Syllabus For Theory and Lab
4 pages
Architectural Requirements For Cloud Computing Systems
No ratings yet
Architectural Requirements For Cloud Computing Systems
20 pages
Big Data With Hadoop and Spark - 2023-25
No ratings yet
Big Data With Hadoop and Spark - 2023-25
4 pages
Gujarat Technological University: Sr. No. Content Total Hrs % Weightage 1 13
No ratings yet
Gujarat Technological University: Sr. No. Content Total Hrs % Weightage 1 13
3 pages
3-1 Bigdata (Spark)
No ratings yet
3-1 Bigdata (Spark)
3 pages
Myinterview Qs
No ratings yet
Myinterview Qs
9 pages
B.Tech. CS - CE and CSE Syllabus 3rd Year 2024-25
No ratings yet
B.Tech. CS - CE and CSE Syllabus 3rd Year 2024-25
2 pages
Big Data Hadoop - Course Curriculum - V1
No ratings yet
Big Data Hadoop - Course Curriculum - V1
7 pages
BDA LABCoure Co-Po Mapping
No ratings yet
BDA LABCoure Co-Po Mapping
4 pages
Whitepaper - How To Deploy On AWS From GitLab
No ratings yet
Whitepaper - How To Deploy On AWS From GitLab
42 pages
CC Syllabus Jntuh
No ratings yet
CC Syllabus Jntuh
1 page
1.3.SOA and Cloud Computing
No ratings yet
1.3.SOA and Cloud Computing
25 pages
Complete EJB
100% (2)
Complete EJB
93 pages
JSF 2: Introduction and Overview: For Live Training On JSF 2 And/Or Primefaces
No ratings yet
JSF 2: Introduction and Overview: For Live Training On JSF 2 And/Or Primefaces
32 pages
MQTT Using Iot Cloud: Ibm Integration Bus
No ratings yet
MQTT Using Iot Cloud: Ibm Integration Bus
17 pages
Jagriti Koirala-JavaResume
No ratings yet
Jagriti Koirala-JavaResume
7 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
33 pages
A Brief Version History of .Net Framework
No ratings yet
A Brief Version History of .Net Framework
4 pages
Tafj Calljee
No ratings yet
Tafj Calljee
7 pages
Shubham Shrivastava: Work Experience Skills
No ratings yet
Shubham Shrivastava: Work Experience Skills
1 page
Assignment 1 ANSWER STRUCTURE Guide
No ratings yet
Assignment 1 ANSWER STRUCTURE Guide
5 pages
6 Azure Portal
No ratings yet
6 Azure Portal
29 pages
OAF Concepts
No ratings yet
OAF Concepts
3 pages
How To Configure R11 Browser With TOCF (EE) Using Jboss
No ratings yet
How To Configure R11 Browser With TOCF (EE) Using Jboss
8 pages
Introduction To Terraform
No ratings yet
Introduction To Terraform
7 pages
WorldLine - Student Domain Interest - Shortlist Final
No ratings yet
WorldLine - Student Domain Interest - Shortlist Final
10 pages
HOWKTEAM.VN - Tạo cấu trúc lưu trữ dữ liệu Từ điển nói C# Winform
No ratings yet
HOWKTEAM.VN - Tạo cấu trúc lưu trữ dữ liệu Từ điển nói C# Winform
8 pages
VTU CloudComputing 22 and 21 Scheme Questions
No ratings yet
VTU CloudComputing 22 and 21 Scheme Questions
2 pages
Devops Res
No ratings yet
Devops Res
4 pages
Lab+State+Management Provider
No ratings yet
Lab+State+Management Provider
2 pages
SauravSrivastav (5 5)
No ratings yet
SauravSrivastav (5 5)
4 pages
Rahul - Sharma - JAVA
No ratings yet
Rahul - Sharma - JAVA
2 pages
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
Mastering Hadoop
From Everand
Mastering Hadoop
Sandeep Karanth
No ratings yet
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
From Everand
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Peter Jones
No ratings yet

Big Data Spark Cs606pc Syllabus

Uploaded by

Big Data Spark Cs606pc Syllabus

Uploaded by

CS606PC: BIG DATA-SPARK

B.Tech. III Year II Sem. L T P C

2. Loading DataSet in to HDFS for Spark

3. File management tasks & Basic linux commands

6. Compute Average Salary and Total Salary by Gender for an Enterprise.

10. Pyspark -RDD’S

11. Perform pyspark transformations

12. Pyspark sparkconf-Attributes and applications

You might also like