Big Data Spark Cs606pc Syllabus
Big Data Spark Cs606pc Syllabus
Course Outcomes:
Develop MapReduce Programs to analyze large dataset Using Hadoop and Spark
Write Hive queries to analyze large dataset Outline the Spark Ecosystem and its
components
Perform the filter, count, distinct, map, flatMap RDD Operations in Spark.
Build Queries using Spark SQL
Apply Spark joins on Sample Data Sets
Make use of sqoop to import and export data from hadoop to database and vice-versa
List of Experiments:
1. To Study of Big Data Analytics and Hadoop Architecture
(i) know the concept of big data architecture
(ii)know the concept of Hadoop architecture
4. Map-reducing
(i) Definition of Map-reduce
(ii)Its stages and terminologies
(iii) Word-count program to understand map-reduce (Mapper phase, Reducer phase,
Driver
code)
5. Implementing Matrix-Multiplication with Hadoop Map-reduce
8. Create a sql table of employees Employee table with id,designation Salary table
(salary ,dept id) Create external table in hive with similar schema of above
tables,Move data to hive using scoop and load the contents into tables,filter a
new table and write a UDF to encrypt the table with AES-algorithm, Decrypt it
with key to show contents
9. (i) Pyspark Definition(Apache Pyspark) and difference between Pyspark, Scala, pandas
(ii) Pyspark files and class methods
(iii) get(file name)
(iv) get root directory()
TEXT BOOKS:
1. Spark in Action, Marko Bonaci and Petar Zecevic, Manning.
2. PySpark SQL Recipes: With HiveQL, Dataframe and Graphframes, Raju Kumar
Mishra and Sundar Rajan Raman, Apress Media.
WEB LINKS:
1. https://infyspringboard.onwingspan.com/web/en/app/toc/
lex_auth_013301505844518912251
8 2_shared/overview
2. https://infyspringboard.onwingspan.com/web/en/app/toc/
lex_auth_01258388119638835242_s hared/overview
3. https://infyspringboard.onwingspan.com/web/en/app/toc/
lex_auth_012605268423008256169
2 _shared/overview