Big Data For Machine Learning - Syllabus
Big Data For Machine Learning - Syllabus
Scientific Reasoning
Reflective Thinking
Level of Thinking
Critical Thinking
Leadership Skills
Problem Solving
Attainment (%)
Proficiency (%)
CLR-4 : Review about MongoDB Aggregation framework
Research Skills
Self-Directed
Multicultural
Competence
Engagement
Team Work
Disciplinary
Community
CLR-5 : Infer about different kind of ecosystem tools in Hadoop
Knowledge
Reasoning
ICT Skills
Analytical
Expected
Expected
Learning
(Bloom)
Course Learning Outcomes
At the end of this course, learners will be able to:
(CLO):
CLO-1 : Understand Hadoop architecture and its Business Implications 1 80 70 L H - H L - - - L L - H - - -
CLO-2 : Build reliable, scalable distributed system with Apache Hadoop 1 85 75 M H M M H - - - M L - H - - -
CLO-3 : Import and export data into Hadoop Distributed File system 2 75 70 M H H H M - - - M L - H - - -
CLO-4 : Interpret MongoDB design goals and setup MongoDB environment 2 85 80 M H M H M - - - M L - H - - -
CLO-5 : Develop Big Data Solutions using Hadoop Eco System tools 3 85 75 H H M H H - - - M L - H - - -
Duration 15 15 15 15
15
(hour)
Basics of Data and what is Big data. Applications Blocks and replication management, HDFS Architecture Data Ingesting into Big data, What is Data Intro to PyMongo PySpark Ml-Preprocess data
SLO-1
of Big Data ingestion ? Install PyMongo, the Python Driver
S-1
Big Data requirement for traditional Data and Distributed Storage (HDFS) Sources of data which can be ingested into Steps to Connect to MongoDB Model training
SLO-2
the environment
Data warehousing and BI space, Big Data HDFS Federation SQOOP Introduction, Need for Sqoop PyMongo Basic Operations Hyper parameter training and AutoML
SLO-1
solutions
S-2
What is Distributed File System What is Name node and Data node, Name node High Where can we use sqoop, import and export Perform basic Create, Retrieve, Update and Delete Inference of Model
SLO-2
availability, syntaxes in sqoop, (CRUD) operations using PyMongo
Characteristics of Big Data and Dimensions of Component failures and recoveries, Incremental imports in SQOOP One end to end tutorial showing installation, data Deploy the model
SLO-1
Scalability loading , processing
S-3
Applications of Big data Basic Hadoop Shell commands implementation Importing data into hive using Introduction to Spark Serve the model
SLO-2
SQOOP,Case Study on SQOOP
S SLO-1 Tutorial 1:Programs in Map Reduce Tutorial4: Hadoop command hands-on Tutorial 7: Case Study Tutorial10: PyMongo Hands-on Tutorial 13: Hands-on PySpark and Various
4-5 SLO-2 examples on Spark
Historical concepts of Hadoop-Where is Hadoop Features of Hadoop 2.0 Flume ,Introduction to Ingesting data into Spark Architecture Model inference
SLO-1
S-6 used. Big Data Platforms using Flume
SLO-2 Apache Hadoop :Introduction to Hadoop The HDFS Sink Application of Data Ingestion PySpark and Data Bricks Deployment of the model
Distributed Computing Environment, What Partitioning and Interceptors Introduction to Flume, Need for Flume Case Study Export the model
SLO-1
Hadoop is & why it is important
S-7
Hadoop comparison with traditional systems, Different File Formats used Flume Architecture, Event, source, channel Introduction to Spark SQL Kafka, Data Streaming
SLO-2
and sink
SLO-1 Data and Types of Data Anatomy of File Write Demo: Data ingestion using flume Basics of Spark SQL as an ETL tool What is Kafka and its architecture ?
S-8 ,Structured, unstructured, semi-structured and Anatomy of File read, Case Study Case Study on Spark SQL Performance Tuning Connect to KSQL or SQL or Python for
SLO-2
quasi structured data analytics
S SLO-1 Tutorial 2: HDFS Commands Tutorial 5: HDFS Commands(Reading and Loading Tutorial 8: Using Sqoop and Flume Tutorial 11: Spark SQL Tutorial 14: Implementing Spark MLib
9-10 SLO-2 Files) examples
HDFS Design System Intro to Hive ,Hive Architecture Introduction to MongoDb, Understanding Case Study Twitter -> Kafka -> Spark streaming -
SLO-1
S-11 Ecosystem of MongoDB >Analytics
SLO-2 Different HDFS Shell Commands Query submission in Hive Limitations of RDBMS PySpark & Azure Data Bricks (Free) Case study,
SRM Institute of Science and Technology - Academic Curricula – (M.Tech Regulations 2020) 45
File Formats supported Hive basic operations Why NoSQL ? Business use cases of PySpark MLBasics Example using Twitter Data - MongoDB -
SLO-1
NoSQL Kafka - PySpark/ADB
S-12 Hadoop main components with a Diagram Creating table and loading data from HDFS Why choose MongoDB and advantages ? PySpark Ml :Walk through and pricing details Twitter API (access, token)
SLO-2 Explore MongoDB collections and
documents
Internal and External Table, Create a free hosted MongoDB database I PySpark Ml :nstance setup and stopping Using MongoDB and examples of MongoDB
SLO-1 HDFS overview and design, using MongoDB Atlas Working with
S-13 MongoDb,
Mapreduce - Python based Program HQL bucketing and partitioning in hive, Case study on MongoDB - Hands On PySpark Ml :Load the data ImplementingPyMongo, Analytics,Case Study
SLO-2
Case Study on HIVE
S SLO-1 Tutorial 3:Implementing HDFS Shell commands Tutorial 6: Hive Commands Tutorial 9: Mongo Db Tutorial 12:Spark Mlib examples Tutorial 15:Streaming using Kafka
14-15 SLO-2 and Python based Mapreduce programs
Course Designers
Experts from Industry Experts from Higher Technical Institutions Internal Experts
Ms Leena Shibu, Data Scientist, Great Learning Dr.N.Arunachalam, SRMIST
SRM Institute of Science and Technology - Academic Curricula – (M.Tech Regulations 2020) 46