0% found this document useful (0 votes)
24 views3 pages

BDDA - Course Outline

nn

Uploaded by

Aru Ranjan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views3 pages

BDDA - Course Outline

nn

Uploaded by

Aru Ranjan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

FORE School of Management, New Delhi

Programme: PGDM (FMG-30, IMG-15, FM-04 & BDA-02)

Course Name: Big Data and Data Analytics for Managers (Using Python) Credit: 3.0
Term: 4 Academic Year: 2022-2023
Faculty: Prof. Ashok Kumar Harnal & (Mr. Anuj Saini (20hrs) for BDA-02)
Office Contact No.: 8750893093
Email: [email protected]

Introduction

This course has two objectives: One, build up project-profile of students on Kaggle/github using important
techniques and second, analyzing big data on Spark—a unified platform for data analytics. We begin with covering
two very important machine learning techniques that are often used in the data analytics community. Learning to
optimize hyper parameters, especially when there are many of them, is very important in any model building
exercise. We briefly cover Hadoop—a big-data storage platform and then move on to analyzing data on this
platform using Spark. We cover streaming analytics—that is analyzing data in motion. Streaming analytics has
numerous applications (for example in ‘social-media-analytics’) and a number of business models (for example
that of Uber or of smart-cities) are built only on streaming technologies. This course assumes some prior basic
working knowledge of two python libraries—pandas and numpy. This is a project oriented Big Data course with
python as the primary language.

Students are expected to have laptops with minimum 8GB of RAM. They are strongly advised to upgrade to 16GB.
OS of Windows 10, Mac or Ubuntu will do.

Objectives: Broadly speaking the course’s objectives are two-fold:

1. Executing real-world projects on Kaggle so as to build up project-profile of students.


2. Learning techniques to handle Big Data and streaming data.

Text Book:

1. Hands on Machine Learning with Scikit Learn Keras and TensorFlow 2nd Edition-2019--Aurélien
Geron

Reference Book:

1. Feature Engineering for Machine Learning--Principles and Techniques for Data Scientists by Alice
Zheng & Amanda Casari
2. Spark The Definitive Guide--by Bill Chambers and Matei Zaharia
3. HadoopThe definitive guide by Tom White

Course Pedagogy: This is a project based and lab-oriented course. For every topic there is a project. Students are
first exposed to a problem, then understand data and learn techniques and tools to solve the problem and finally a
model is built and solution presented. For working on Big Data and streaming analytics related projects we will
use virtual machines.

Evaluation Components:

Team Project: 20%


Class Participation: 10%
Quiz: 10%
Mid Term: 20%
End Term: 40%
TOTAL – 100 marks

Page 1 of 3
Session Plan: (Each session is of 90 minutes unless specified)

Session(s) Topic/Session Theme Project(s) Learning outcomes


(see note on data sources
below)
1 Data Pipelining Practical To learn smooth processing of
data through pipelines

2-3 Using pipes in modeling, Otto project from Kaggle Learn to use pipelines in any
stacking classifiers predictive modeling project

3-4 Structure in data—tsne and Otto project from Kaggle To learn how to discover
umap whether data has some
structure or is mostly random

5-6 Bayesian Hyper-parameter Kaggle project: Satellite Learn how to tune


optimization images discrimination hyperparameters for best
possible predictive modeling

15-minutes online-quiz

7-8 Catboost—Experiments with Kaggle Project: BNP Learn to perform modelling of


Paribas Cardif Claims data with large number of
Management categorical features

9 Learning Hadoop Simple experiments on To learn challenges in Big Data


hadoop storage

10 Spark basics Experiment with Spark Understand Spark as a unified


DataFrames platform for predictive
modeling.

11-12 Machine learning with Spark--I Classification --do---

15-minutes online quiz

13-14 Machine Learning with Spark-- Bike-sharing Dataset— ---do---


II Regression problem

15 Spark streaming Analyzing streaming data

16-17 Kafka—Streaming Analytics Simple experiments with To learn basics of streaming


Kafka analytics

18-19 Spark-Kafka data pipeline Analyzing streaming data Develop a simple pipeline for
over a pipeline streaming data

19-20 Students Project work in class

Page 2 of 3
For official use: -
As Benchmarked with course content in previous year, the contents of this course: (Please mark the
right option below)
(a) Is totally new
(b) Has not changed at all
(c) Has undergone less than/equal to 20% change
 _
(d) Has undergone more than 20% change _
_
/
Faculty – Prof. Ashok Harnal Area Chair – Prof. Shilpi Jain

Manager (Academics-1)

Dean (Academics)

Page 3 of 3

You might also like