0% found this document useful (0 votes)
23 views4 pages

Big Data Spark Cs606pc Syllabus

The CS606PC Big Data-Spark course aims to equip students with the skills to process Big Data using Spark and its ecosystem. Students will learn to develop MapReduce programs, write Hive queries, and perform various Spark operations, including RDD transformations and SQL queries. The course includes practical experiments on Hadoop architecture, data management in HDFS, and utilizing PySpark for data analysis.

Uploaded by

saidonthula2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views4 pages

Big Data Spark Cs606pc Syllabus

The CS606PC Big Data-Spark course aims to equip students with the skills to process Big Data using Spark and its ecosystem. Students will learn to develop MapReduce programs, write Hive queries, and perform various Spark operations, including RDD transformations and SQL queries. The course includes practical experiments on Hadoop architecture, data management in HDFS, and utilizing PySpark for data analysis.

Uploaded by

saidonthula2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

CS606PC: BIG DATA-SPARK

B.Tech. III Year II Sem. L T P C


0 0 4 2
Course Objectives:
 The main objective of the course is to process Big Data with advance architecture
like spark and streaming data in Spark

Course Outcomes:
 Develop MapReduce Programs to analyze large dataset Using Hadoop and Spark
 Write Hive queries to analyze large dataset Outline the Spark Ecosystem and its
components
 Perform the filter, count, distinct, map, flatMap RDD Operations in Spark.
 Build Queries using Spark SQL
 Apply Spark joins on Sample Data Sets
 Make use of sqoop to import and export data from hadoop to database and vice-versa

List of Experiments:
1. To Study of Big Data Analytics and Hadoop Architecture
(i) know the concept of big data architecture
(ii)know the concept of Hadoop architecture

2. Loading DataSet in to HDFS for Spark


Analysis Installation of Hadoop and
cluster management
(i) Installing Hadoop single node cluster in ubuntu environment
(ii)Knowing the differencing between single node clusters and multi-node clusters
(iii)Accessing WEB-UI and the port number
(iv)Installing and accessing the environments such as hive and sqoop

3. File management tasks & Basic linux commands


(i) Creating a directory in HDFS
(ii)Moving forth and back to directories
(iii)Listing directory contents
(iv)Uploading and downloading a file in HDFS
(v) Checking the contents of the file
(vi)Copying and moving files
(vii)Copying and moving files between local to HDFS environment
(viii) Removing files and paths
(ix)Displaying few lines of a file
(x) Display the aggregate length of a file
(xi)Checking the permissions of a file
(xii)Zipping and unzipping the files with & without permission pasting it to a location
(xiii) Copy, Paste commands

4. Map-reducing
(i) Definition of Map-reduce
(ii)Its stages and terminologies
(iii) Word-count program to understand map-reduce (Mapper phase, Reducer phase,
Driver
code)
5. Implementing Matrix-Multiplication with Hadoop Map-reduce

6. Compute Average Salary and Total Salary by Gender for an Enterprise.


7. (i) Creating hive tables (External and internal)
(ii) Loading data to external hive tables from sql tables(or)Structured c.s.v using scoop
(iii) Performing operations like filterations and updations
(iv) Performing Join (inner, outer etc)
(v) Writing User defined function on hive tables

8. Create a sql table of employees Employee table with id,designation Salary table
(salary ,dept id) Create external table in hive with similar schema of above
tables,Move data to hive using scoop and load the contents into tables,filter a
new table and write a UDF to encrypt the table with AES-algorithm, Decrypt it
with key to show contents

9. (i) Pyspark Definition(Apache Pyspark) and difference between Pyspark, Scala, pandas
(ii) Pyspark files and class methods
(iii) get(file name)
(iv) get root directory()

10. Pyspark -RDD’S


(i) what is RDD’s?
(ii)ways to Create RDD
(iii)parallelized collections
(iv)external dataset
(v) existing RDD’s
(vi)Spark RDD’s operations (Count, foreach(), Collect, join,Cache()

11. Perform pyspark transformations


(i) map and flatMap
(ii)to remove the words, which are not necessary to analyze this text.
(iii)groupBy
(iv)What if we want to calculate how many times each word is coming in corpus ?
(v) How do I perform a task (say count the words ‘spark’ and ‘apache’ in rdd3)
separatly on each partition and get the output of the task performed in these
partition ?
(vi)unions of RDD
(vii)join two pairs of RDD Based upon their key

12. Pyspark sparkconf-Attributes and applications


(i) What is Pyspark spark conf ()
(ii) Using spark conf create a spark session to write a dataframe to read details in
a c.s.v and later move that c.s.v to another location

TEXT BOOKS:
1. Spark in Action, Marko Bonaci and Petar Zecevic, Manning.
2. PySpark SQL Recipes: With HiveQL, Dataframe and Graphframes, Raju Kumar
Mishra and Sundar Rajan Raman, Apress Media.

WEB LINKS:
1. https://infyspringboard.onwingspan.com/web/en/app/toc/
lex_auth_013301505844518912251
8 2_shared/overview
2. https://infyspringboard.onwingspan.com/web/en/app/toc/
lex_auth_01258388119638835242_s hared/overview
3. https://infyspringboard.onwingspan.com/web/en/app/toc/
lex_auth_012605268423008256169
2 _shared/overview

You might also like