0% found this document useful (0 votes)
231 views123 pages

M.SC - BigData Syllabus

The document is a syllabus for the M.Sc. Big Data Analytics program at MIT - World Peace University for the 2018-2019 batch. It outlines the structure and requirements of the 2-year program. The first year focuses on providing foundational knowledge of big data technologies, mathematics, statistics, and programming. The second year includes advanced topics and an internship. The program aims to equip students with skills for careers in big data analytics and for further research. It uses a trimester system and credits are awarded based on contact hours. A minimum attendance of 90% is required to appear for exams.

Uploaded by

AKASH KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
231 views123 pages

M.SC - BigData Syllabus

The document is a syllabus for the M.Sc. Big Data Analytics program at MIT - World Peace University for the 2018-2019 batch. It outlines the structure and requirements of the 2-year program. The first year focuses on providing foundational knowledge of big data technologies, mathematics, statistics, and programming. The second year includes advanced topics and an internship. The program aims to equip students with skills for careers in big data analytics and for further research. It uses a trimester system and credits are awarded based on contact hours. A minimum attendance of 90% is required to appear for exams.

Uploaded by

AKASH KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 123

SYLLABUS

DR VISHWANATH KARAD
MIT - WORLD PEACE UNIVERSITY

FACULTY OF SCIENCE

M.SC. Big Data Analytics

BATCH – 2018-19

Dr.SudhirGavhane
Dean, LASC
PROGRAMME STRUCTURE
Preamble:

Big Data Analytics is required to deal with the problems faced by industry today. The techniques
and tools are used to solve problems from a wide variety of Industries such as manufacturing,
services, retail, banking and finance, sports, pharmaceuticals, and aerospace etc.

Big Data Analytics is interdisciplinary and is required to analyse ever growing large data ( growing
by volumn, velocity and variety) applying techniques like data mining, machine learning, and deep
learning from computer science, statistics and maths.

Big Data Analytics is required to cope up with rapid changes in both, domain knowledge and
technology. It is one of the fastest growing and most promising technologies
First year Provides foundation of Big Data Technology, Maths and Statistics including
programming languages. Programme includes technology such as Hadoop, techniques such data
mining, and computer programing, maths and statistics subjects that will provide the foundation for
students.
Second year will include subjects belonging to the chosen track in his/her own interest relevant to
Big data Analytics. It will also include advanced topics and technologies in Big Data. There will be
mini project and Internship to get industrial exposure to the students.

Dr.SudhirGavhane
Dean, LASC
Vision and Mission of the Programme
Vision:
To contribute to the society through excellence in scientific and knowledge-based education
utilizing the potential of computer science with a deep passion for wisdom, culture and
values.

Mission:

 Big Data Analytics is aimed to offer a thorough professional training which prepare
students to embark on Big Data Analytics careers which is one of the fastest growing
technologies. They are also provided a very good foundation for further study at PhD level.
 Prepare and equip students for opportunities in ever changing technology with hands-on
industrial training.
 Transform the students to become globally competent professionals through international
training/internship.
 Nurture the creativity and inculcate entrepreneurial skills among the students.

Programme Educational Objectives


 To enable learners to develop expert knowledge and analytical skills in current and developing
areas of analysis statistics, and machine learning.
 To provide learners with a deep and systematic knowledge of business and technical strategies for
data analytics and the subsequent skills to implement solutions in these areas.
 To facilitate the development by the learner of applied skills that are directly complementary and
relevant to the workplace.
 To develop in the learner a deep and systematic understanding of current issues of research and
analysis
 To enable learners conduct independent research and analysis in the field of data analytics.
 To enable the learner to identify, develop and apply detailed analytical, creative, problem solving
skills.
 Provide the learner with a comprehensive platform for career development, innovation and
further study.

Dr.SudhirGavhane
Dean, LASC
Programme Specific Outcomes

 A graduate with a M.Sc. in Big Data Analytics will have the ability to communicate
computer science concepts, designs, and solutions effectively and professionally
 This course is aimed to offer training which prepare students to embark on Big Data
Analytics careers which is one of the fastest growing technologies. They are also provided
a very good foundation for further study at PhD level.
 Prepare and equip students for opportunities in ever changing technology with hands-on
industrial training.
 Transform the students to become globally competent professionals through internship.
 Nurture the creativity and inculcate entrepreneurial skills among the students.
 Project work gives students hands on experience in solving a real world problem.
 The Syllabus also develops requisite professional skills and problem solving abilities for
pursuing a career in Software Industry.

Dr.SudhirGavhane
Dean, LASC
Programme Structure:
(a) Programme duration: 2 years full time.
(b) System followed: Trimester
(c) Credits System:
(i) Per term or per year, as applicable
(ii) Total in the programme, as applicable
(d) Credits for activities other than academics: NA
(e) Internship: Full time three months Industrial training should be completed.
(f) Assessment Criteria: Minimum 50% credits of first year are required to take
admission in second year.
(g) Branches or Specialisations: NA
(h) Mandatory Attendance to appear for examination:
It is expected on the part of the student to attend each and every Lecture,
Tutorial, and Laboratory practical sessions in a course for the academic
excellence. However, due to any contingencies, the attendance requirement will
be a minimum of 90% of the classes scheduled/ held.
(j) Medium of Instruction and Examination: English
(k) Eligibility criteria for admission to the programme: B.Sc.(CS), BCS, B.Sc.(IT),
BCA, BE-IT, Comp., E&TC with 50% of Marks (45% marks aggregate in case of
candidate backward class categories and persons with disability belonging to
Maharashtra state only)

Dr.SudhirGavhane
Dean, LASC
M.Sc. Big Data Analytics
2017-18

A. DefinitionofCredit:-

4Hr.Lecture / Tutorial perweek 3credit


3HoursPractical(Lab) per week 3credit

B. Credits:-

Total number of credits for two years Post Graduate M.Sc. Programme would be 120.

C. StructureofCredits for Post Graduate M.Sc.Program:-

S. Category SuggestedBreakupof
No. Credits(Total175)
Humanities andSocialSciences and Peace Programmes
1 10
includingManagementcourses
Professionalcorecourses including
2 84
Laboratory/Mini Project Work
ProfessionalElectivecourses
3 06

4 Full Time Industrial Training 20

Total 120

Dr. Sudhir Gavhane


Dean, LASC
D. Coursecodeanddefinition:-

Coursecode Definitions
L Lecture
T Tutorial
WP Humanities andSocialSciences and Peace Programs
MBD includingManagementcourses
M.Sc.(Big Data Analytics)

E. Grading Scheme:

Grades & Grade Points Grade Grade Point


Marks Out of 100
80-100 O: Outstanding 10
70-79 A+: Excellent 9
60-69 A: Very Good 8
55-59 B+: Good 7
50-54 B: Above Average 6
45-49 C: Average 5
40-44 Pass 4
0-39 Fail 0
Ab Absent NA

Dr.SudhirGavhane
Dean, LASC
M. Sc. Big Data Analytics (First Year) (Batch 2017-18)
Trimester – I

Weekly Workload, Hrs Credits Assessment, Marks


Sr.
Course Code Name of Course Type End
No.
Theory Tutorial Lab Th Lab CCA* LCA* Term Total
Test

1 MIT-WPU-MBD-1101 Data Warehousing & Data Mining Core 3 1 3 50 50 100

2 MIT-WPU-MBD-1102 Parallel And Distributed Computing Core 4 3 50 50 100

MIT-WPU-MBD-1103 Big Data Architecture & Ecosystem - Core


3 4 3 50 50 100
Hadoop

4 MIT-WPU-MBD-1104 Python Programming Core 3 1 3 50 50 100

5 MIT-WPU-MBD-1105 Lab on Python Core 3 3 50 50 100

6 MIT-WPU-MBD-1106 Lab on Hadoop using HDFS Core 3 3 50 50 100

7 WP Philosophy of Science and Spirituality SEC 3 2 25 25 50

Total : 17 02 06 14 06 225 125 300 650

Type: Core **Assessment Marks are valid only if Attendance criteria are met
Weekly Teaching Hours:25 *CCA: Class Continuous Assessment
Total Credits: First Year M.Sc. Big Data Analytics Trimester I:20 *LCA: Laboratory Continuous Assessment

Dr.SudhirGavhane
Dean, LASC
M. Sc. Big Data Analytics (First Year) (Batch 2017-18)
Trimester – II

Weekly Workload, Hrs Credits Assessment Marks **


Sr.
Course Code Name of Course Type End
No.
Theory Tutorial Lab Th Lab CCA* LCA* Term Total
Test

1 MIT-WPU-MBD-1201 R Programing Core 3 1 3 50 50 100

2 MIT-WPU-MBD-1202 Distributed Processing using Hadoop Core 4 3 50 50 100

3 MIT-WPU-MBD-1203 Operation Research Core 4 3 50 50 100

4 MIT-WPU-MBD-1204 Next Generation Databases Core 3 1 3 50 50 100

5 MIT-WPU-MBD-1205 Lab on R Programming Core 3 3 50 50 100

6 MIT-WPU-MBD-1206 Lab on Hadoop and Tools Core 3 3 50 50 100

7 WP Philosophy of Science and Spirituality SEC 3 2 25 25 50

Total : 17 02 06 14 06 225 125 300 650

Type: Core **Assessment Marks are valid only if Attendance criteria are met

Weekly Teaching Hours: 25 *CCA: Class Continuous Assessment


Total Credits: First Year M.Sc. Big Data Analytics Trimester II:20 *LCA: Laboratory Continuous Assessment

Dr.SudhirGavhane
Dean, LASC
M. Sc. Big Data Analytics (First Year) (Batch 2017-18)
Trimester – III

Weekly Workload, Hrs Credits Assessment Marks**


Sr.
Course Code Name of Course Type End
No.
Theory Tutorial Lab Th Lab CCA* LCA* Term Total
Test

1 MIT-WPU-MBD-1301 Statistical Computing Core 4 3 50 50 100

2 MIT-WPU-MBD-1302 Information Security Core 4 3 50 50 100

3 MIT-WPU-MBD-1303 Apache Spark Core 3 1 3 50 50 100

4 MIT-WPU-MBD-1304 Machine Learning Algorithm-I Core 3 1 3 50 50 100

5 MIT-WPU-MBD-1305 Lab on Statistical Computing Core 3 3 50 50 100

6 MIT-WPU-MBD-1306 Lab on Machine Learning Algorithms- I Core 3 3 50 50 100

7 WP Creativity and Innovation SEC 3 2 25 25 50

Total : 17 02 06 14 06 225 125 300 650

Type: Core **Assessment Marks are valid only if Attendance criteria are met
Weekly Teaching Hours: 25 *CCA: Class Continuous Assessment
Total Credits: First Year M.Sc. Big Data Analytics Trimester III:20 *LCA: Laboratory Continuous Assessment
Total First Year M.Sc. Big Data Analytics Credits: 60

Dr.SudhirGavhane
Dean, LASC
M. Sc. Big Data Analytics(Second Year) (Batch 2017-18)
Trimester – I

Weekly Workload, Hrs Credits Assessment Marks**


Sr.
Course Code Name of Course Type End
No.
Theory Tutorial Lab Th Lab CCA* LCA* Term Total
Test

1 MIT-WPU-MBD-2101 Principles Of Deep Learning Core 3 1 3 50 50 100

2 MIT-WPU-MBD-2102 Machine Learning Algorithm-II Core 3 1 3 50 50 100

3 MIT-WPU-MBD-2103 Data Science life cycle Core 4 3 50 50 100

MIT-WPU-MBD-2104 Lab on Machine Learning Core


4 4 3 50 50 100
Algorithms II

5 MIT-WPU-MBD-2105 Lab Data Science life cycle Core 3 3 50 50 100

6 Elective I Elective 4 3 50 50 100

WP Scientific studies of Peace – Mind, SEC


7 3 2 25 25 50
matter, Spirit and consciousness

Total : 21 02 03 14 06 225 125 300 650

Type: Core/ Elective **Assessment Marks are valid only if Attendance criteria are met

Weekly Teaching Hours:26 *CCA: Class Continuous Assessment


Total Credits: Second Year M.Sc. Big Data Analytics Trimester I:20 *LCA: Laboratory Continuous Assessment

Dr.SudhirGavhane
Dean, LASC
M. Sc. Big Data Analytics (Second Year) (Batch 2017-18)
Trimester – II

Weekly Workload, Hrs Credits Assessment Marks**


Sr.
Course Code Name of Course Type End
No.
Theory Tutorial Lab Th Lab CCA* LCA* Term Total
Test

1 MIT-WPU-MBD-2201 Natural Language Processing Core 4 3 50 50 100

2 MIT-WPU-MBD-2202 Web & Social Intelligence Core 3 1 3 50 50 100

3 MIT-WPU-MBD-2203 Cloud Computing Core 4 3 50 50 100

4 MIT-WPU-MBD-2204 Lab on Web & Social Intelligence Core 3 1 3 50 50 100

5 MIT-WPU-MBD-2205 Mini Project Core 3 3 50 50 100

6 Elective II Elective 4 3 50 50 100

WP Business-strategic planning and SEC


7 3 2 25 25 50
finance

Total : 21 02 03 17 03 225 125 300 650

Type: Core/ Elective **Assessment Marks are valid only if Attendance criteria are met

Weekly Teaching Hours: 26 *CCA: Class Continuous Assessment


Total Credits: Second Year M.Sc. Big Data Analytics Trimester II:20 *LCA:Laboratory Continuous Assessment

Dr.SudhirGavhane
Dean, LASC
M. Sc. Big Data Analytics (Second Year) (Batch 2017-18)
Trimester – III

Weekly Workload, Hrs Credits Assessment Marks**


Sr. Course
Name of Course Type End
No. Code
Theory Tutorial Lab Th Lab CCA* LCA* Term Total
Test

MIT- Core
WPU-
1 Full Time Industrial Training 4 3 50 50 100
MS-
2301

Total : 4 3 50 50 100

Type: Core **Assessment Marks are valid only if Attendance criteria are met
Weekly Teaching Hours: 15 *CCA: Class Continuous Assessment
Total Credits: Second Year M.Sc. Big Data Analytics Trimester III:20 *LCA: Laboratory Continuous Assessment

Total Second Year M.Sc. Big Data AnalyticsCredits:60

Dr.SudhirGavhane
Dean, LASC
ElectiveCourses:

Big Data Analytics Big Data Analytics


Code Title Code Title
Elect I MIT- Internet Of Things MIT- Marketing Analytics
WPU- WPU-
MBD- MBD-
2106 2206

Elect II MIT- Introduction to image MIT- HR Analytics


WPU- processing WPU-
MBD- MBD-
2107 2207

Name of Specialisation: Big Data Analytics

Dr.SudhirGavhane
Dean, LASC
`

COURSE STRUCTURE

Course Code MIT-WPU- MBD-1101


Course Category Core BigData Analytics
Course Title Data Warehousing & Data Mining
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 4 - - 3
Pre-requisites:
Understanding of: Relational database normalization techniques,

Physical design of a database, Concepts of algorithm design and analysis, Basic understanding of:
Software engineering principles and techniques, Probability and statistics – Bayesian theory,
regression, hypothesis testing
Course Objectives:
1. To understand the structure of Data Warehouse
2. To understand different data pre-processing techniques.
3. To understand basic descriptive and predictive data mining techniques.
4. To use data mining tool on different data sets
5. To understand Classification algorithms
6. To understand Prediction algorithms.
7. To understand Clustering algorithms.
8. To use data mining tool on different data sets

CourseOutcomes:
The student will get knowledge of:
 Data processing and data quality.
 Modelling and design of data warehouses.
 Basic and advanced concepts of algorithms for data mining.
 Data mining tool and practical experience of applying data mining algorithms

CourseContents:
Introduction to Data Mining
Basic concepts of data mining, Types of Data to be mined.
Introduction to Data Warehouse
Data Warehouse and DBMS, Architecture of Data Warehouse
Data pre-processing
Need Data pre-processing, Attributes and Data types
Data Mining Techniques: Association Rule Mining
Basic idea: item sets, Frequent Item-sets
Data Mining Techniques: Classification

Dr. Sudhir Gavhane


Dean, LASC
`

Definition of Classification, Decision tree Induction: Information gain, gain ratio, Gini Index
Data Mining Techniques: Prediction
Definition of Prediction, Linear regression
Data Mining Techniques: Clustering
Definition of Clustering, Partitioning Methods
Performance Measures
Precision, recall, F-measure
Problem solving with R or Weka: filters, Discretization, mining association rules, decision trees,
Prediction, k-means

LearningResources:
Reference Books:
1. Data Mining: Concepts and Techniques, Han, Elsevier ISBN:9789380931913/
9788131205358
2. Margaret H. Dunham, S. Sridhar, Data Mining – Introductory and Advanced Topics, Pearson
Education
3. Data warehousing: fundamentals fot IT professionals 3rd edition , Kimball, Wiley Publication
4. Ian H.Witten, Eibe Frank Data Mining: Practical Machine Learning Tools and Techniques,
Elsevier/(Morgan Kauffman), ISBN:9789380501864
5. Introduction to Data Mining (2005) By Pang-Ning Tan, Michael Steinbach, Vipin Kumar
Addison Wesley ISBN: 0-321-32136-7
6. [Research-Papers]: Some of the relevant research papers that contain recent results and
developments in data mining field

Pedagogy: Participative learning, discussions, algorithm, Flowchart & Program writing,


experiential learning through practical problem solving, assignment, PowerPoint presentation.
AssessmentScheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
10 10 - - 10 10 10
Term End Examination : 50 Marks

Dr. Sudhir Gavhane


Dean, LASC
`

Syllabus:
Module Workload in Hrs
Contents
No. Theory Lab Assess
Introduction to Data Mining: Basic concepts of data mining,
Types of Data to be mined, Stages of the Data Mining
1 4 - -
Process, Data Mining Techniques, Knowledge Discovery in
Databases, Data Mining Issues, Applications of Data Mining
Introduction to Data Warehouse: Data Warehouse and DBMS
Architecture of Data Warehouse, Multidimensional data model,
2 Concepts of OLAP and Data Cube, OLAP 5 - -
operations, Dimensional Data Modelling- Star, Snow flake
schemas
Data pre-processing: Need Data pre-processing, Attributes and
Data types, Statistical descriptions of Data, Handling missing
3 Data, Data sampling, Data cleaning, Data Integration and 6 - -
transformation, Data reduction, Discretization and generating
concept hierarchies
Data Mining Techniques: Association Rule Mining: Basic
4 idea: item sets, Frequent Item-sets, Association Rule Mining, 4 - -
Generating item sets and rules efficiently, FP growth algorithm
Data Mining Techniques: Classification: Definition of
Classification, Decision tree Induction: Information gain, gain
ratio, Gini Index, Issues: Over-fitting, tree pruning methods,
missing values, continuous classes, Classification and Regression
5 9 - -
Trees (CART), Bayesian Classification: Bayes Theorem, Naïve
Bayes classifier, Bayesian Networks, Linear classifiers,
Least squares, SVM classifiers, Lazy Learners (or Learning from
Your Neighbors)
Data Mining Techniques: Prediction: Definition of Prediction
6 Linear regression, Non-linear regression, Logistic regression 3 - -

Data Mining Techniques: Clustering: Definition of Clustering


7 Partitioning Methods, Hierarchical Methods, Distance Measures 6 - -
in Algorithmic Methods, Density Based Clustering
Performance Measures: Precision, recall, F-measure, confusion
8 3 - -
matrix, cross-validation, bootstrap.
Problem solving with R or Weka: filters, Discretization,
9 5 - -
mining association rules, decision trees, Prediction, k-means

Prepared By Checked By Approved By

Ms. Devyani B Kamble Ms. Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean, LASC
`

COURSE STRUCTURE

Course Code MIT-WPU-MBD-1102


Course Category Core Big Data Analytics
Course Title Parallel And Distributed Computing
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 4 - - 3
Pre-requisites:

1. Ability to program well in C, C++ or Fortran.


2. Willingness to rethink how problems should be solved.
3. Algorithm & Data Structures
Basics of Computer Architecture
Course Objectives:

1. Learning basic models of parallel machines and tools


2. How to parallelize programs and how to use basic tools like MPI and POSIX threads.
3. To learn core ideas behind parallel and distributed computing.
4. To explore the methodologies adopted for concurrent and distributed environment.
5. To understand the networking aspects of parallel and distributed computing.
6. To provide an overview of the computational aspects of parallel and distributed computing.
7. To learn parallel and distributed computing models.

Course Outcomes:

Students will be able to:


1. Explore the methodologies adopted for concurrent and distributed environment.
2. Analyse the networking aspects of Distributed and Parallel Computing.
3. Explore the different performance issues and tasks in parallel and distributed computing.
4. Develop parallel algorithms for solving real–world problems.

Course Contents:

1. Parallel and Distributed Computing— Introduction, Benefits and Needs, Programming


Environment, Theoretical Foundations- Parallel Algorithms Parallel Models and Algorithms-
Sorting- Matrix Multiplication- Convex Hull- Pointer Based Data Structures.

2. Synchronization- Process Parallel Languages- Architecture of Parallel and Distributed


Systems- Consistency and Replication- Security- Parallel Operating Systems.

Dr. Sudhir Gavhane


Dean, LASC
`

3. Management of Resources in Parallel Systems- Tools for Parallel Computing- Parallel


Database Systems and Multimedia Object Servers.

4. Networking Aspects of Distributed and Parallel Computing- Process- Parallel and


Distributed Scientific Computing.

5. High-Performance Computing in Molecular Sciences- Communication- Multimedia


Applications for Parallel and Distributed Systems- Distributed File Systems.

Learning Resources:
Reference Books:
1. Jacek Błażewicz, et al., “Handbook on parallel and distributed processing”, Springer
Science & Business Media, 2013.
2. Andrew S. Tanenbaum, and Maarten Van Steen, “Distributed Systems: Principles and
Paradigms”. Prentice-Hall, 2007.
3. George F.Coulouris, Jean Dollimore, and Tim Kindberg, “Distributed systems: concepts
and design”, Pearson Education, 2005.
4. Gregor Kosec and Roman Trobec, “Parallel Scientific Computing: Theory, Algorithms, and
Applications of Mesh Based and Meshless Methods”, Springer, 2015.

Supplementary Reading:
1. Quinn, M. J., Parallel Computing: Theory and Practice (McGraw-Hill Inc.).
2. Gibbons, A., W. Rytter, Efficient Parallel Algorithms (Cambridge Uni. Press).
3. Shameem A and Jason, Multicore Programming, Intel Press, 2006

Weblinks:
1 https://www.tutorialspoint.com/parallel_algorithm/parallel_algorithm_introduction.htm

Pedagogy: Participative learning, discussions, demonstrations, practical, assignment, PowerPoint


presentation
Assessment Scheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
20 10 10 - - - 10
Term End Examination : 50 Marks

Dr. Sudhir Gavhane


Dean, LASC
`

Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
Parallel and Distributed Computing— Introduction- Benefits
and Needs- Parallel and
Distributed Systems- Programming Environment- Theoretical
Foundations- Parallel
1 10 - -
Algorithms— Introduction- Parallel Models and Algorithms-
Sorting- Matrix
Multiplication- Convex Hull- Pointer Based Data Structures.

Synchronization- Process Parallel Languages- Architecture of


Parallel and Distributed
2 Systems- Consistency and Replication- Security- Parallel 10 - -
Operating Systems.

Management of Resources in Parallel Systems- Tools for


Parallel Computing- Parallel
3 6 - -
Database Systems and Multimedia Object Servers.

Networking Aspects of Distributed and Parallel Computing-


Process- Parallel and
4 11 - -
Distributed Scientific Computing.

High-Performance Computing in Molecular Sciences-


Communication- Multimedia
5 Applications for Parallel and Distributed Systems- Distributed 8 - -
File Systems.

ggest the below items:


Prepared By Checked By Approved By

Ms. Deepali Sonawane Ms. Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean, LASC
`

COURSE STRUCTURE

Course Code MIT-WPU-MBD-1103


Course Category Core Big Data Analytics
Course Title Big Data Architecture & Ecosystem - Hadoop
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 4 - - 3
Pre-requisites:

Some basic knowledge and experience of Java (Jars, Array, Classes, Objects, etc.)
Course Objectives:

1. Learn Injecting data into Hadoop


2. Learn to build and maintain reliable, scalable, distributed systems with Hadooop
3. Able to apply Hadoop ecosystem components.

Course Outcomes:

1. Students will learn injecting data into Hadoop .


2. They will able to learn distributed systems with Apache Hadoop.
3. They will able to apply Hadoop ecosystem components.

Course Contents:

1. Introduction to big data: Introduction, distributed file system, Big Data and its importance,
Drivers, Big data analytics, Big data applications. Algorithms, Matrix-Vector, Multiplication by
Map Reduce.

2. Introduction to HADOOP: Big Data, Apache Hadoop & Hadoop Ecosystem, MapReduce,
Data Serialization.

3. HADOOP Architecture: Architecture, Storage, Task trackers, Hadoop Configuration

4. HADOOP ecosystem and yarn: Hadoop ecosystem components, Hadoop 2.0 New Features
NameNode High Availability, HDFS Federation, MRv2, YARN, Running MRv1 in YARN.

Dr. Sudhir Gavhane


Dean, LASC
`

Learning Resources:
Reference Books:
1. Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, “Professional Hadoop Solutions”,
Wiley, ISBN: 9788126551071, 2015.
2. Chris Eaton, Dirk deroos et al. “Understanding Big data ”, McGraw Hill, 2012.
3. Tom White, “HADOOP: The definitive Guide”, O Reilly 2012.
4. MapReduce Design Patterns (Building Effective Algorithms & Analytics for Hadoop) by
Donald Miner & Adam Shook

Supplementary Reading:

Weblinks:
https://cloudthat.in/course/processing-bigdata-with-apache-hadoop/

Pedagogy: Participative learning, discussions, algorithm, Flowchart & Program writing,


experiential learning through practical problem solving, assignment, PowerPoint presentation.

Assessment Scheme:

Class Continuous Assessment (CCA) 50 Marks


Assignments Test Presentations Case study MCQ Oral Attendance
20 10 10 - - - 10
Term End Examination : 50 Marks

Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
1. Introduction to big data
Introduction – distributed file system – Big Data and its
1 importance, Four Vs, Drivers for Big data, Big data analytics, Big 11 - -
data applications. Algorithms using map reduce, Matrix-Vector
Multiplication by Map Reduce.
Introduction to HADOOP
Big Data, Apache Hadoop & Hadoop Ecosystem, Moving Data
2 in and out of Hadoop, 11 - -
Understanding inputs and outputs of MapReduce, Data
Serialization.

Dr. Sudhir Gavhane


Dean, LASC
`

HADOOP Architecture
Hadoop Architecture, Hadoop Storage: HDFS, Common Hadoop
Shell commands, Anatomy of
File Write and Read, NameNode, Secondary NameNode, and
DataNode, Hadoop MapReduce
3 12 - -
Paradigm, Map and Reduce tasks, Job, Task trackers - Cluster
Setup – SSH &Hadoop
Configuration – HDFS Administering –Monitoring &
Maintenance.

HADOOP ecosystem and yarn


Hadoop ecosystem components - Schedulers - Fair and Capacity,
4 Hadoop 2.0 New Features NameNode High Availability, HDFS 11 - -
Federation, MRv2, YARN, Running MRv1 in YARN.

ggest the below items:


Prepared By Checked By Approved By

Ms. Deepali Sonawane Ms. Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean, LASC
`

COURSE STRUCTURE

Course Code MIT-WPU-MBD-1104


Course Category Core Big Data Analytics
Course Title Python Programming
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 4 - - 3
Pre-requisites:
Knowledge of any scripting language, XML.

Course Objectives:
1. To understand why Python is a useful scripting language for developers.
2. To learn how to design and program Python applications.
3. To learn how to use lists, tuples, and dictionaries in Python programs.
4. To learn how to identify Python object types.
5. To define the structure and components of a Python program.
6. To learn how to write loops and decision statements in Python.
Course Outcomes:
1. Students will demonstrate the ability to solve problems using system approaches, critical
and innovative thinking, and technology to create solutions.
2. Students will design, develop, and present their final project.
3. Students will understand the purpose and the process of code reviews.
4. Students will be able to create scripts in Python for Autodesk's Maya.
5. Students will understand and will be able to articulate and apply the principles of 3D
graphics
Course Contents:
Introduction to Python
Introduction to python language.

Conditional Statements & Looping


Introduction conditional and looping statements in python

String Manipulation
Introduction to various operations on strings.

Lists, Tuple and Dictionaries


Introduction to various operations on Lists, Tuple and Dictionaries.

Dr. Sudhir Gavhane


Dean, LASC
`

Functions
Introduction to functions in python.

Modules
Introduction to module, package in python.

Input-Output
Handling of inputs in python

Regular expressions
Use of regular expression in python

CGI
Introduction to CGI and cookies.

Database
Handling of database in python.

Learning Resources:
Reference Books:
1. Dive into Python by Mark Pilgrim
2. Programming Python by Mark Lutz, O’Reilly Media
3. Python Programming: An Introduction to Computer Science” by John Zelle

Supplementary Reading:
1. Python Testing Cookbook by Greg L. Turnquist

Web Resources:
1. www.tutorialspoint.com/python/
2. docs.python.org/3/tutorial/
3. www.learnpython.org
4. www.guru99.com/python-tutorials.html
5. www.tutorialspoint.com/cprogramming/
6. www.learn-c.org/
7. www.w3schools.in/c-tutorial/

Pedagogy: Participative learning, discussions, Problem Solving, experiential learning through


practical problem solving, assignment, PowerPoint presentation

Dr. Sudhir Gavhane


Dean, LASC
`

Assessment Scheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
20 10 10 - - 10
Term End Examination : 50 Marks

Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
Introduction to Python
History
Features
Setting up path
working with Python
1 4 - -
Basic Syntax
Variable and Data Types
Operator

Conditional Statements & Looping


If, If- else, Nested if-else
2 For, While, Nested loops 4 - -
Break, Continue, Pass

String Manipulation
Accessing Strings
Basic Operations
3 String slices 5 - -
Function and Methods

Lists, Tuple and Dictionaries


4 6 - -
Lists – Introduction, Accessing list, Operations, Working with

Dr. Sudhir Gavhane


Dean, LASC


`

lists, Function and Methods


Tuple – Introduction, Accessing tuples, Operations, Working,
Functions and Methods
Dictionaries - Introduction, Accessing values in dictionaries,
working with dictionaries, Properties, Functions

Functions
Defining a function
calling a function
5 Types of functions 4 - -
Function Arguments
Anonymous functions
Global and local variables
Modules
Importing module
Math module
6 Random module 4 - -
Packages
Composition

Input-Output
Printing on screen
Reading data from keyboard
7 Opening and closing file 4 - -
Reading and writing files
Functions

Regular expressions
Match function
Search function
8 4 - -
Matching VS Searching
Modifiers
Patterns
CGI
Introduction
9 Architecture 5 - -
CGI environment variable
GET and POST methods

Dr. Sudhir Gavhane


Dean, LASC
`

Cookies
File upload
Database
Introduction
Connections
10 5 - -
Executing queries
Transactions
Handling error

ggest the below items:


Prepared By Checked By Approved By

Ms. Punam Nikam Ms. Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean, LASC
`

COURSE STRUCTURE

Course Code MIT-WPU-MBD-1105


Course Category Core Big Data Analytics
Course Title Lab on Python
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrsDr. Sudhir Gavhane - - 3 3
Dean, LASC

Pre-requisites:
Knowledge of any scripting language, XML.

Course Objectives:
1. To understand why Python is a useful scripting language for developers.
2. To learn how to design and program Python applications.
3. To learn how to use lists, tuples, and dictionaries in Python programs.
4. To learn how to identify Python object types.
5. To define the structure and components of a Python program.
6. To learn how to write loops and decision statements in Python.
Course Outcomes:
1. Students will demonstrate the ability to solve problems using system approaches, critical
and innovative thinking, and technology to create solutions.
2. Students will design, develop, and present their final project.
3. Students will understand the purpose and the process of code reviews.
4. Students will be able to create scripts in Python for Autodesk's Maya.
5. Students will understand and will be able to articulate and apply the principles of 3D
graphics
Course Contents:
Introduction to Python
Introduction to python language.

Conditional Statements & Looping


Introduction conditional and looping statements in python

String Manipulation
Introduction to various operations on strings.

Lists, Tuple and Dictionaries


Introduction to various operations on Lists, Tuple and Dictionaries.

Dr. Sudhir Gavhane


Dean, LASC
`

Functions
Introduction to functions in python.

Modules
Introduction to module, package in python.

Input-Output
Handling of inputs in python

Regular expressions
Use of regular expression in python

CGI
Introduction to CGI and cookies.

Database
Handling of database in python.

Laboratory Exercises / Practical:

1. Introduction to Python : Assignment on simple programs in python

2. Conditional Statements & Looping: Assignment on conditional statements and looping


statements

3. String Manipulation: Assignment on string manipulations.

4. Lists, Tuple and Dictionaries : Assignment on Lists, tuples and directories

5. Functions: Assignment on functions.

6. Modules : Assignment on use of modules

7. Input-Output : Assignment Input-Output operations

8. Regular expressions : Assignment on use of regular expressions

Dr. Sudhir Gavhane


Dean, LASC


`

9. CGI : Assignment on CGI

10. Database : Assignment on database

Learning Resources:
Reference Books:
1. Dive into Python by Mark Pilgrim
2. Programming Python by Mark Lutz, O’Reilly Media
3. Python Programming: An Introduction to Computer Science” by John Zelle

Supplementary Reading:
1. Python Testing Cookbook by Greg L. Turnquist

Web Resources:
1. www.tutorialspoint.com/python/
2. docs.python.org/3/tutorial/
3. www.learnpython.org
4. www.guru99.com/python-tutorials.html
5. www.tutorialspoint.com/cprogramming/
6. www.learn-c.org/
7. www.w3schools.in/c-tutorial/

Pedagogy: Participative learning, discussions, Problem Solving, experiential learning through


practical problem solving, assignment, PowerPoint presentation

Assessment Scheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
20 10 10 - - 10
Term End Examination : 50 Marks

Syllabus:

Dr. Sudhir Gavhane


Dean, LASC


`

Module Workload in Hrs


Contents
No. Theory Lab Assess
Introduction to Python
History
Features
Setting up path
1 working with Python - 2 -
Basic Syntax
Variable and Data Types
Operator

Conditional Statements & Looping


If, If- else, Nested if-else
2 For, While, Nested loops - 2 -
Break, Continue, Pass

String Manipulation
Accessing Strings
Basic Operations
3 - 2 -
String slices
Function and Methods

Lists, Tuple and Dictionaries


Lists – Introduction, Accessing list, Operations, Working with
lists, Function and Methods
4 Tuple – Introduction, Accessing tuples, Operations, Working, - 2 -
Functions and Methods
Dictionaries - Introduction, Accessing values in dictionaries,
working with dictionaries, Properties, Functions
Functions
Defining a function
calling a function
5 Types of functions - 3 -
Function Arguments
Anonymous functions
Global and local variables
Modules
6 Importing module - 3 -
Math module

Dr. Sudhir Gavhane


Dean, LASC


`

Random module
Packages
Composition

Input-Output
Printing on screen
Reading data from keyboard
7 Opening and closing file - 3 -
Reading and writing files
Functions

Regular expressions
Match function, Search function
8 Matching VS Searching - 3 -
Modifiers
Patterns
CGI
Introduction
Architecture
9 CGI environment variable - 2 -
GET and POST methods
Cookies
File upload
Database
Introduction Connections
10 Executing queries - 2 -
Transactions
Handling error
ggest the below items:
Prepared By Checked By Approved By

Ms. Punam Nikam Ms. Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean, LASC
`

COURSE STRUCTURE

Course Code MIT-WPU-MBD-1106


Course Category Core Big Data Analytics
Course Title Lab on Hadoop using HDFS
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 4 - - 3
Pre-requisites:

Some basic knowledge and experience of Java (Jars, Array, Classes, Objects, etc.)
Course Objectives:

1. Learn tips and tricks for Big Data use cases and solutions.
2. Learn to build and maintain reliable, scalable, distributed systems with Apache
3. Able to apply Hadoop ecosystem components.

Course Outcomes:

1. Students will learn tips and tricks for Big Data use cases and solutions.
2. They will able to build distributed systems with Apache Hadoop.
3. They will able to apply Hadoop ecosystem components.

Course Contents:

1. Introduction to big data: Introduction, distributed file system, Big Data and its importance,
Drivers, Big data analytics, Big data applications. Algorithms, Matrix-Vector, Multiplication by
Map Reduce.

2. Introduction to HADOOP: Big Data, Apache Hadoop & Hadoop Ecosystem, MapReduce,
Data Serialization.

3. HADOOP Architecture: Architecture, Storage, Task trackers, Hadoop Configuration

4. HADOOP ecosystem and yarn: Hadoop ecosystem components, Hadoop 2.0 New Features
NameNode High Availability, HDFS Federation, MRv2, YARN, Running MRv1 in YARN.

Lab Assignments
1. Lab on Install and configure Hadoop cluster

Dr. Sudhir Gavhane


Dean, LASC
`

2. Lab on Manipulating files in HDFS using hadoop fs commands.


3. Lab on Manipulating files in HDFS pragmatically using the FileSystem API.Alternative
Hadoop File Systems: IBM GPFS, MapR-FS, Lustre, Amazon S3 etc.
4. Lab on Write an Inverted Index MapReduce Application with custom Partitioner and
Combiner Custom types and Composite Keys Custom Comparators InputFormats and
OutputFormats Distributed Cache MapReduce Design Patterns Sorting Joins.
5. Lab on Writing a streaming MapReduce job in Python YARN and Hadoop 2.0.
6. Lab on Exporting data from HDFS to an Other data integration tools: Flume, Kafka,
Informatica, Talend etc.

Learning Resources:
Reference Books:
1. Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, “Professional Hadoop Solutions”,
Wiley, ISBN: 9788126551071, 2015.
2. Chris Eaton, Dirk deroos et al. “Understanding Big data ”, McGraw Hill, 2012.
3. Tom White, “HADOOP: The definitive Guide”, O Reilly 2012.
4. MapReduce Design Patterns (Building Effective Algorithms & Analytics for Hadoop) by
Donald Miner & Adam Shook

Supplementary Reading:

Weblinks:
https://cloudthat.in/course/processing-bigdata-with-apache-hadoop/

Pedagogy: Participative learning, discussions, algorithm, Flowchart & Program writing,


experiential learning through practical problem solving, assignment, PowerPoint presentation.

Assessment Scheme:
Laboratory Continuous Assessment (LCA) 50 Marks
Practical Oral based on Site Visit Mini Project Problem based Any other
practical Learning
10 20 - - 20 -
Term End Examination : 50 Marks

Dr. Sudhir Gavhane


Dean, LASC
`

Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
1. Introduction to big data
Introduction – distributed file system – Big Data and its
1 importance, Four Vs, Drivers for Big data, Big data analytics, Big 11 - -
data applications. Algorithms using map reduce, Matrix-Vector
Multiplication by Map Reduce.
Introduction to HADOOP
Big Data, Apache Hadoop & Hadoop Ecosystem, Moving Data
2 in and out of Hadoop, 11 - -
Understanding inputs and outputs of MapReduce, Data
Serialization.
HADOOP Architecture
Hadoop Architecture, Hadoop Storage: HDFS, Common Hadoop
Shell commands, Anatomy of
File Write and Read, NameNode, Secondary NameNode, and
DataNode, Hadoop MapReduce
3 12 - -
Paradigm, Map and Reduce tasks, Job, Task trackers - Cluster
Setup – SSH &Hadoop
Configuration – HDFS Administering –Monitoring &
Maintenance.

HADOOP ecosystem and yarn


Hadoop ecosystem components - Schedulers - Fair and Capacity,
4 Hadoop 2.0 New Features NameNode High Availability, HDFS 11 - -
Federation, MRv2, YARN, Running MRv1 in YARN.

ggest the below items:


Prepared By Checked By Approved By

Ms. Deepali Sonawane Ms. Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean, LASC
COURSE STRUCTURE

Course Code MIT-WPU-MBD-1201


Course Category Core Big Data Analytics
Course Title R Programming
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 4 - - 3
Pre-requisites: Knowledge of any Programming Language

Course Objectives:

1. Understand the basics of R programming including objects, classes, vectors etc.


2. Write functions including generic functions using various methods and loops
3. Install various packages and work effectively in the R environment
4. Become proficient in writing a fundamental program and perform analytics with R

Course Outcomes:
Students will be able to:
1. Recognize and make appropriate use of different types of data structures
2. Use R to create sophisticated figures and graphs
3. Identify and implement appropriate control structures to solve a particular programming
problem
4. Design and write functions in R and implement simple iterative algorithms.

Course Contents:

Introduction to R
Overview of R programming, Evolution of R, Applications of R programming, Basic syntax

Basic Concepts of R
Reserved Words, Variables & Constants

Data structures in R
Vectors, Matrix

Control flow
If...else,If else() Function

Functions
R Functions, Function Return Value

Strings

Dr. Sudhir Gavhane


Dean, LASC


String construction rules

R packages
Study of different packages in R

R Data Reshaping
Joining Columns and Rows in a Data Frame

Working with files


Read and writing into different types of files

R object and Class


Object and Class,R S3 Class,R S4 Class

Data visualization in R and Data Management


Bar Chart,Dot Plot

Statistical modelling and Databases in R


Mean, mode, median
Learning Resources:

Reference Books:
1. The Art of R Programming-a tour of statistical software design by Norman Matloff
2. R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics (O'Reilly
Cookbooks) by Paul Teetor
3. R in Action Book by Rob Kabacoff
4. Practical Data Science with R by Nina Zumel , John Mount , Jim Porzak
5. Learning R: A Step-by-Step Function Guide to Data Analysis by Richard Cotton

Pedagogy:
Participative learning, discussions, algorithm, Flowchart & Program writing, experiential learning
through practical problem solving, assignment, PowerPoint presentation.

Assessment Scheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
10 10 - - 10 10 10

Term End Examination : 50 Marks

Dr. Sudhir Gavhane


Dean, LASC


Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
Introduction to R:
1 Overview of R programming, Evolution of R, Applications of R 2 - -
programming, Basic syntax
Basic Concepts of R: Reserved Words, Variables & Constants
2 4 - -
Operators, Operator Precedence, Data Types , Input and Output
Data structures in R: Vectors, Matrix, List in R programming
3 5 - -
Data Frame, Factor
Control flow: If...else, If else() Function, Programming for loop
4 4 - -
While Loop, Break & next, Repeat Loop
Functions: R Functions, Function Return Value, Environment &
5 Scope, R Recursive Function, R Infix Operator, R Switch 4 - -
Function.
6 Strings: String construction rules, String Manipulation functions 3 - -
7 R packages: Study of different packages in R 2 - -
R Data Reshaping: Joining Columns and Rows in a Data Frame
8 4 - -
Merging Data Frames, Melting and Casting
9 Working with files: Read and writing into different types of files 2 - -
R object and Class Object and Class: R S3 Class, R S4 Class
10 2 - -
R Reference Class, R Inheritance
Data visualization in R and Data Management: Bar Chart, Dot
Plot, Scatter Plot (3D), Spinning Scatter Plots, Pie Chart
Histogram (3D) [including colorful ones], Overlapping
11 7 - -
Histograms, Boxplot, Plotting with Base and Lattice Graphics
Missing Value Treatment, Outlier Treatment, Sorting Datasets
Merging Datasets, Binning variables
Statistical modelling and Databases in R: Mean, mode, median
12 Linear regression, Decision tree, K-means Clustering, RODBC 6 - -
and DBI Package, Performing queries

Prepared By Checked By Approved By

Pradnya Mahadik
Preeti Adhav Dr. Sudhir Gavhane
BOS Chairman Dean
Lecturer
COURSE STRUCTURE

Course Code MIT-WPU-BA-1202


Course Category Core Big Data Analytics
Course Title Distributed Processing of Data using Hadoop
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 3 -- -- 3
Pre-requisites:

 Some basic knowledge and experience of Java (Jars, Array, Classes, Objects, etc.)

Course Objectives:

 What is Hadoop and how can it help process large data sets.
 How to write MapReduce programs using Hadoop API.
 How to use HDFS (the Hadoop Distributed Filesytem), from the command line and API,
for effectively loading and processing data in Hadoop.
 How to ingest data from a RDBMS or a data warehouse to Hadoop.
 Best practices for building, debugging and optimizing Hadoop solutions.
 Get introduced to tools like Pig, Hive, HBase, Elastic MapReduce etc. and understand how
they can help in BigData projects.

Course Outcomes:

 Understand Sqoop architecture and uses Able to load real-time data from an RDBMS
table/Query on to HDFS Able to write sqoop scripts for exporting data from HDFS onto
RDMS tables.
 Understand Apache PIG , PIG Data Flow Engine Understand data types, data model, and
modes of execution.
 Able to store the data from a Pig relation on to HDFS.
 Able to load data into Pig Relation with or without schema.
 Able to split, join, filter, and transform the data using pig operators Able to write pig scripts
and work with UDFs.
 Understand the importance of Hive, Hive Architecture Able to create Managed, External,
Partitioned and Bucketed Tables Able to Query the data, perform joins between tables
Understand storage formats of Hive Understand Vectorization in Hive

Dr. Sudhir Gavhane


Dean, LASC
Course Contents

Data Storage
What is Hadoop Distributed File System (HDFS). Architecture of HDFS.Architectural assumptions
and goals.How data is stored in HDFS.How data is read from HDFS
Namenodes and Datanodes

Data Processing
What is use of MapReduce.Architecture of the MapReduce framework.what are Phases of a
MapReduce Job.what are MapReduce Design Patterns.what is YARN Architecture

Data Integration
How to Integrate Hadoop into your existing enterprise.Introduction to Sqoop

Higher Level Tools


Workflows of Oozie.An introduction & Architecture hive.Data Types and File Formats
How to Create Tables and Load Data.how to Read & Querying Data. introduction to Pig
Grunt Shell.what is Pig's Data Model.An introduction to HBase.what is Architecture of Client
API & MapReduce Integration

Learning Resources:

Reference Books:

1. The Definitive Guide by Tom White.


2. MapReduce Design Patterns (Building Effective Algorithms & Analytics for Hadoop) by
Donald Miner & Adam Shook
3. Professional Hadoop Solutions by Boris Lublinksy, Kevin Smith, and Alexey Yakubovich

Weblinks:
https://cloudthat.in/course/processing-bigdata-with-apache-hadoop/

Pedagogy:
Participative learning, discussions, algorithm, Flowchart & Program writing, experiential learning
through practical problem solving, assignment, PowerPoint presentation
Assessment Scheme:
Class Continuous Assessment (CCA) 50 marks

Assignments Test Attendance Viva Presentation Any other


10 10 10 10 10 -

Term End Examination : 50 marks


Dr. Sudhir Gavhane
Dean, LASC
Syllabus:
Module Workload in Hrs
Contents
No. Theory Lab Assess
Data Storage
File System Abstraction
Big Data and Distributed File Systems
Hadoop Distributed File System (HDFS)
HDFS Architecture
Architectural assumptions and goals
How data is stored in HDFS
How data is read from HDFS
Namenodes and Datanodes
1 Blocks 13 - -
Data Replication
Fault Tolerance
Data Integrity
Namespaces
Federation in Hadoop 2.0
High Availability in Hadoop 2.0
Security and Encryption
HDFS Interfaces: FileSystem API, FSShell, WebHDFS, Fuse
etc.
Data Processing
MapReduce
The fundamentals: map() and reduce()
Data Locality
Architecture of the MapReduce framework.
Phases of a MapReduce Job
Custom types and Composite Keys
Custom Comparators
InputFormats and OutputFormats
2 12 - -
Distributed Cache
MapReduce Design Patterns
Sorting
Joins,YARN and Hadoop 2.0
Separating resource management and processing
YARN Applications: MapReduce, Tez, HBase, Storm, Spark,
Giraph etc.
YARN Architecture, ResourceManager, NodeManagers
ApplicationMasters,Containers, Fault Tolerance
Data Integration
3 Integrating Hadoop into your existing enterprise. 10 - -
Introduction to Sqoop

Dr. Sudhir Gavhane


Dean, LASC
Higher Level Tools
Defining workflows with Oozie
An introduction to Hive
Architecture
Interfaces: Hive Shell, Thrift, JDBC, ODBC etc.
HiveQL: A dialect of SQL
Data Types and File Formats
Creating Tables and Loading Data
Schema at Read
Querying Data
User Defined Functions
An introduction to Pig
Grunt Shell
Pig's Data Model
4 Pig Latin 10 - -
User Defined Functions
An introduction to HBase
Architecture
Client API
MapReduce Integration
Schema Design

Prepared By Checked By Approved By

Ms. Varsha Gholave Ms. Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean, LASC
COURSE STRUCTURE

Course Code MIT-WPU-MBD-1203


Course Category Core Big Data Analytics
Course Title Operational Research
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 3 1 -- 3
Pre-requisites:
1. Linear algebra
2. Probability and Statistics
Course Objectives:

 To introduce the students to the use of basic methodology for the solution of liner programs
and integer programs.
 To introduce the students to the advanced methods for large-scale transportation and
assignment problems.

Course Outcomes:
 Define and formulate linear programming problems and appreciate their limitations.
 Solve linear programming problems using appropriate techniques and optimization solvers,
interpret the results obtained and translate solutions into directives for action.
 Conduct and interpret post-optimal and sensitivity analysis and explain the primal-dual
relationship.
 Identify the special features of the transportation problem, and assignment problem.

Course Contents

Introduction to Operation Research


Brief introduction about Optimization and the OR process. Descriptive vs. Simulation. Exact vs.
Heuristic techniques, Deterministic vs. Stochastic models.

LPP and Methods to solve LPP


Duality Theory and applications Dual Simplex method. Sensitivity analysis in L.P., Parametric
Programming. Transportation, assignment and least cost transportation. Interior point methods:
scaling techniques, log barrier methods. Dual and primal dual extensions

Non-Linear programming
Kuhn-Tucker conditions. Convex functions and convex regions. Convex programming
problems. Algorithms for solving convex programming problems.

Dr. Sudhir Gavhane


Dean, LASC
PERT and CPM
Basic differences between PERT and CPM. What is Arrow Networks, time estimates, Earliest
expected time. Representation in Tabular Form, Critical Path. Probability of meeting scheduled
date of completion.

Calculation on CPM network, Various floats for activities. Critical path updating projects.
Operation time cost trade off Curve project. Time cost – trade off Curve- Selection of schedule
based on Cost.

Network Flow Problem


Formulation, Max-Flow Min-Cut theorem. Ford and Fulkerson’s algorithm. Exponential
behavior of Ford and Fulkerson’s algorithm.

Learning Resources:

Reference Books:
1. Hadley G. (1969): Linear Programming, Addision Wesley.
2. Taha H. A. (1971): Operations Research an Introduction, Macmillan N. Y.
3. KantiSwaroop, Gupta and Manmohan (1985): Operations Research, Sultan
Chand and Co.
4. Sharma J. K. (2003): Operations Research Theory and Applications, 2
Nd Ed. Macmillan India ltd.
5. Sharma J. K. (1986): Mathematical Models Operations Research, McGraw Hill.

Pedagogy:
Participative learning, discussions, algorithm, Flowchart & Program writing, experiential learning
through practical problem solving, assignment, PowerPoint presentation
Assessment Scheme:

Class Continuous Assessment (CCA) 50 marks

Assignments Test Attendance Viva Presentation Any other


10 10 10 10 10 -
Term End Examination : 50 marks
Dr. Sudhir Gavhane
Dean LASC
Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
Introduction to Operation Research
The nature of O.R., History, Meaning, Models, Principles
Problem solving with mathematical models. Optimization and the
1 5 - -
OR process. Descriptive vs. Simulation . Exact vs. Heuristic
techniques, Deterministic vs. Stochastic models.

LPP and Methods to solve LPP


Linear Programming, Introduction. Graphical Solution and
Formulation of L.P. Models Simplex Method (Theory and
Computational aspects), Revised Simplex. Duality Theory and
applications Dual Simplex method. Sensitivity analysis in L.P.,
2 10 - -
Parametric Programming. Transportation, assignment and least
cost transportation. Interior point methods: scaling techniques,
log barrier methods. Dual and primal dual extensions

Non-Linear programming
Kuhn-Tucker conditions. Convex functions and convex regions.
3 10 - -
Convex programming problems. Algorithms for solving convex
programming problems.
PERT and CPM
Basic differences between PERT and CPM. Arrow Networks,
time estimates, Earliest expected time. Latest – allowable
occurrences time. Forward Pass Computation, Backward Pass
Computation. Representation in Tabular Form, Critical Path.
4 Probability of meeting scheduled date of completion. Calculation 10 - -
on CPM network, Various floats for activities. Critical path
updating projects. Operation time cost trade off Curve project.
Time cost – trade off Curve- Selection of schedule based on Cost.

Network Flow Problem


Formulation, Max-Flow Min-Cut theorem. Ford and Fulkerson’s
5 algorithm. Exponential behavior of Ford and Fulkerson’s 10 - -
algorithm.

Prepared By Checked By Approved By

Ms. Varsha Ghule Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean, LASC
`

COURSE STRUCTURE

Course Code MIT-WPU-MBD-1204


Course Category Core Big Data Analytics
Course Title Next Generation Databases (No SQL
databases)
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 4 - - 3
Pre-requisites:
Knowledge of RDMS

Course Objectives:
1. To study the usage and applications of Object Oriented database
2. To acquire knowledge on variety of NoSQL databases
3. To attain inquisitive attitude towards research topics in NoSQL databases
Course Outcomes:
1: Master the basics of SQL and construct queries using Pl/SQL efficiently and apply object
oriented features for developing database applications.
2: Compare and Contrast NoSQL databases with each other and Relational Database
Systems
3: Critically analyse and evaluate variety of NoSQL databases.
4: Demonstrate the knowledge of Key-Value databases, Document based Databases,
Column based Databases and Graph Databases.
Course Contents:
1. Introduction to NOSQL
Definition of NOSQL, History of NOSQL and Different NOSQL products, Exploring MondoDB
Java/Ruby/Python, Interfacing and Interacting with NOSQL
2. NOSQL Basics
NOSQL Storage Architecture, CRUD operations with MongoDB, Querying, Modifying and
Managing NOSQL Data stores, Indexing and ordering datasets (MongoDB/CouchDB/Cassandra)
3. Advanced NOSQL
NOSQL in CLOUD, Parallel Processing with Map Reduce, Big Data with Hive
4. Working with NOSQL
Surveying Database Internals, Migrating from RDBMS to NOSQL, Web Frameworksand NOSQL,
using MySQL as a NOSQL

Dr. Sudhir Gavhane


Dean, LASC
`

5. Developing Web Application with NOSQL and NOSQL Administration


Php and MongoDB, Python and MongoDB, Creating Blog Application with PHP, NOSQL
Database Administration

Learning Resources:
Reference Books:
Dan Sullivan,"NoSQL for Mere Mortals",1 stEdition, Pearson Education, 2015. (ISBN-13:
978-9332557338)

Supplementary Reading:
Pramod J. Sadalage, Martin Fowler,"NoSQL Distilled: A Brief Guide to the Emerging
World of Polyglot Persistence", 1 stEdition, Pearson Education, 2012. (ISBN-13: 978-
8131775691

Pedagogy: Participative learning, discussions, Problem Solving, experiential learning through


practical problem solving, assignment, PowerPoint presentation
Assessment Scheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
20 10 10 - - 10
Term End Examination : 50 Marks

Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
Introduction to NOSQL
Definition of NOSQL, History of NOSQL and Different NOSQL
1 products, Exploring MondoDB Java/Ruby/Python, Interfacing 6 - -
and Interacting with NOSQL

Dr. Sudhir Gavhane


Dean, LASC
`

NOSQL Basics
NOSQL Storage Architecture, CRUD operations with MongoDB,
2 Querying, Modifying and Managing NOSQL Data stores, 12 - 1
Indexing and ordering datasets (MongoDB/CouchDB/Cassandra)

Advanced NOSQL
NOSQL in CLOUD, Parallel Processing with Map Reduce, Big
3 8 - 1
Data with Hive

Working with NOSQL


Surveying Database Internals, Migrating from RDBMS to
4 NOSQL, Web Frameworksand NOSQL, using MySQL as a 9 - 1
NOSQL

Developing Web Application with NOSQL and NOSQL


Administration
5 Php and MongoDB, Python and MongoDB, Creating Blog 10 - 1
Application with PHP, NOSQL Database Administration

ggest the below items:


Prepared By Checked By Approved By

Ms. Smita Patil Ms. Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean, LASC
COURSE STRUCTURE

Course Code MIT-WPU-MBD-1205


Course Category Core Big Data Analytics
Course Title Lab on R Programming
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs - - 3 3
Pre-requisites: Knowledge of any Programming Language

Course Objectives:

1. Understand the basics of R programming including objects, classes, vectors etc.


2. Write functions including generic functions using various methods and loops
3. Install various packages and work effectively in the R environment
4. Become proficient in writing a fundamental program and perform analytics with R

Course Outcomes:
Students will be able to:
1. Recognize and make appropriate use of different types of data structures
2. Use R to create sophisticated figures and graphs
3. Identify and implement appropriate control structures to solve a particular programming
problem
4. Design and write functions in R and implement simple iterative algorithms.

Course Contents:

Basic Concepts of R: Variables, constants, Operators, datatypes, input output

Data structures in R:
Vectors, Matrix, List, Data Frame/ Factor

Control flow:
Decision making, Repeat, while, for

Functions: built-in, user defined

R packages, R Data Reshaping: Joining Columns and Rows in a Data Frame, Merging Data
Frames

Working with files, R object and Class: csv, excel, S3 and S4 Class, reference

Data visualization in R and Data Management: Bar Chart, Dot Plot, Scatter Plot (3D),Spinning

Dr. Sudhir Gavhane


Dean, LASC


Scatter Plots, Pie Chart, Histogram (3D) [including colorful ones], Overlapping Histograms,
Boxplot, Plotting with Base and Lattice Graphics, Missing Value Treatment, Outlier Treatment,
Sorting Datasets, Merging Datasets, Binning variables

Statistical modelling and Databases in R: Mean, mode, median, Linear regression,


Decision tree, K-means Clustering

Laboratory Exercises / Practical:


1.Assignments on Basic Concepts of R
2. Assignments on Data structures in R
3. Assignments on Control flow
4. Assignments on Functions
5. Assignments on R packages, R Data Reshaping
6. Assignments on Working with files, R object and Class
7. Assignments on Data visualization in R and Data Management
8. Assignments on Statistical modelling and Databases in R
Learning Resources:

Reference Books:
1. The Art of R Programming-a tour of statistical software design by Norman Matloff
2. R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics (O'Reilly
Cookbooks) by Paul Teetor
3. R in Action Book by Rob Kabacoff
4. Practical Data Science with R by Nina Zumel , John Mount , Jim Porzak
5. Learning R: A Step-by-Step Function Guide to Data Analysis by Richard Cotton

Pedagogy:
Participative learning, discussions, algorithm, Flowchart & Program writing, experiential learning
through practical problem solving, assignment, PowerPoint presentation.
Assessment Scheme:
Class Continuous Assessment (CCA) 50

Assignments Test Presentations Case study MCQ Oral Attendance


10 10 - - 10 10 10

Term End Examination : 50 Marks External

Dr. Sudhir Gavhane


Dean, LASC


Laboratory Continuous Assessment (LCA)50

Practical Oral based on Site Visit Mini Problem Attendance


practical Project based
Learning
10 10 - 10 10 10

Term End Examination : 50 Marks External

Dr. Sudhir Gavhane


Dean, LASC


Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
Basic Concepts of R: Variables, constants, Operators,
1 - 3 -
datatypes, input output
Data structures in R:
2 - 3 -
Vectors, Matrix, List, Data Frame/ Factor
Control flow:
3 - 3 -
Decision making, Repeat, while, for
4 Functions: built-in, user defined - 3 -
R packages, R Data Reshaping: Joining Columns and Rows in a
5 - 3 -
Data Frame, Merging Data Frames
Working with files, R object and Class: csv, excel, S3 and S4
6 Class, reference - 3 -

Data visualization in R and Data Management: Bar Chart, Dot


Plot, Scatter Plot (3D),Spinning Scatter Plots, Pie Chart,
Histogram (3D) [including colorful ones], Overlapping
7 - 3 -
Histograms, Boxplot, Plotting with Base and Lattice Graphics,
Missing Value Treatment, Outlier Treatment, Sorting Datasets,
Merging Datasets, Binning variables
Statistical modelling and Databases in R: Mean, mode,
8 median, Linear regression, Decision tree, K-means - 3 -
Clustering

Prepared By Checked By Approved By

Preeti Adhav Pradnya Mahadik Dr. Sudhir Gavhane


Lecturer BOS Chairman Dean
COURSE STRUCTURE

Course Code MIT-WPU-BA-1206


Course Category Core Big Data Analytics
Course Title Lab on Hadoop and Databases
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs -- -- 3 3
Pre-requisites:
 Some basic knowledge and experience of Java (Jars, Array, Classes, Objects, etc.)

Course Objectives:

 What is Hadoop and how can it help process large data sets.
 How to write MapReduce programs using Hadoop API.
 How to use HDFS (the Hadoop Distributed Filesytem), from the command line and API,
for effectively loading and processing data in Hadoop.
 How to ingest data from a RDBMS or a data warehouse to Hadoop.
 Best practices for building, debugging and optimizing Hadoop solutions.
 Get introduced to tools like Pig, Hive, HBase, Elastic MapReduce etc. and understand how
they can help in BigData projects.

Course Outcomes:
 Understand Sqoop architecture and uses Able to load real-time data from an RDBMS
table/Query on to HDFS Able to write sqoop scripts for exporting data from HDFS onto
RDMS tables.
 Understand Apache PIG , PIG Data Flow Engine Understand data types, data model, and
modes of execution.
 Able to store the data from a Pig relation on to HDFS.
 Able to load data into Pig Relation with or without schema.
 Able to split, join, filter, and transform the data using pig operators Able to write pig scripts
and work with UDFs.
 Understand the importance of Hive, Hive Architecture Able to create Managed, External,
Partitioned and Bucketed Tables Able to Query the data, perform joins between tables
Understand storage formats of Hive Understand Vectorization in Hive

Course Contents

Data Storage
What is Hadoop Distributed File System (HDFS). Architecture of HDFS.Architectural assumptions
and goals.How data is stored in HDFS.How data is read from HDFS

Dr. Sudhir Gavhane


Dean, LASC
Namenodes and Datanodes

Data Processing
What is use of MapReduce.Architecture of the MapReduce framework.what are Phases of a
MapReduce Job.what are MapReduce Design Patterns.what is YARN Architecture

Data Integration
How to Integrate Hadoop into your existing enterprise.Introduction to Sqoop

Higher Level Tools


Workflows of Oozie.An introduction & Architecture hive.Data Types and File Formats
How to Create Tables and Load Data.how to Read & Querying Data. introduction to Pig
Grunt Shell.what is Pig's Data Model.An introduction to HBase.what is Architecture of Client
API & MapReduce Integration

Lab Assignments
1. Lab on Manipulating files in HDFS pragmatically using the FileSystem API.Alternative
Hadoop File Systems: IBM GPFS, MapR-FS, Lustre, Amazon S3 etc.
2. Lab on Write an Inverted Index MapReduce Application with custom Partitioner and
Combiner Custom types and Composite Keys Custom Comparators InputFormats and
OutputFormats Distributed Cache MapReduce Design Patterns Sorting Joins.
3. Lab on Writing a streaming MapReduce job in Python YARN and Hadoop 2.0.
4. Lab on Importing data from an RDBMS to HDFS using Sqoop.
5. Lab on Exporting data from HDFS to an Other data integration tools: Flume, Kafka,
Informatica, Talend etc.

Learning Resources:

Dr. Sudhir Gavhane


Dean, LASC
Reference Books:

1. The Definitive Guide by Tom White.


2. MapReduce Design Patterns (Building Effective Algorithms & Analytics for Hadoop) by
Donald Miner & Adam Shook
3. Professional Hadoop Solutions by Boris Lublinksy, Kevin Smith, and Alexey Yakubovich

Weblinks:
https://cloudthat.in/course/processing-bigdata-with-apache-hadoop/

Pedagogy:
Participative learning, discussions, algorithm, Flowchart & Program writing, experiential learning
through practical problem solving, assignment, PowerPoint presentation
Assessment Scheme:

Class Continuous Assessment (CCA) 50 marks


Practical Viva Attendance Mini Any other
Project
15 10 15 10 -

Term End Examination : 50 marks

Syllabus:
Module Workload in Hrs
Contents
No. Theory Lab Assess
Data Storage
File System Abstraction
Big Data and Distributed File Systems
Hadoop Distributed File System (HDFS)
HDFS Architecture,Architectural assumptions and goals
How data is stored in HDFS
How data is read from HDFS
Namenodes and Datanodes
- 13 -
Blocks,Data Replication
Fault Tolerance
Data Integrity Namespaces
Federation in Hadoop 2.0
High Availability in Hadoop 2.0
Security and Encryption
HDFS Interfaces: FileSystem API, FSShell, WebHDFS,
Fuse etc.

Dr. Sudhir Gavhane


Dean, LASC
Data Processing
MapReduce
The fundamentals: map() and reduce()
Data Locality
Architecture of the MapReduce framework.
Phases of a MapReduce Job
Custom types and Composite Keys
Custom Comparators
InputFormats and OutputFormats
Distributed Cache
2 MapReduce Design Patterns 12 -
Sorting Joins
YARN and Hadoop 2.0
Separating resource management and processing
YARN Applications: MapReduce, Tez, HBase, Storm,
Spark, Giraph etc.
YARN Architecture
ResourceManager
NodeManagers
ApplicationMasters
Containers Fault Tolerance
Data Integration
3 Integrating Hadoop into your existing enterprise. - 10 -
Introduction to Sqoop
Higher Level Tools
Defining workflows with Oozie
An introduction to Hive
Architecture Interfaces: Hive Shell, Thrift, JDBC, ODBC
etc. HiveQL: A dialect of SQL
Data Types and File Formats
Creating Tables and Loading Data
Schema at Read Querying Data
User Defined Functions
4 An introduction to Pig - 10
Grunt Shell
Pig's Data Model
Pig Latin
User Defined Functions
An introduction to HBase
Architecture
Client API
MapReduce Integration
Schema Design
Prepared By Checked By Approved By

Ms. Varsha Ghule Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean, LASC
`COURSE STRUCTURE
Course Code MIT-WPU- MBD-1301
Course Category Core Big Data Analytics
Course Title Statistical Computing
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 3 3
Pre-requisites:
1. Linear algebra
2. Probability and Statistics
Course Objectives:

1. To provide an understanding of concepts and techniques of Business Statistics

2. How to use Excel, Python or R to solve Business Statistics problems

3.To learn Experimental Design

Course Outcomes:
1. The student should be able to formulate and solve problems related to topics covered in this course.
2. The student should be able to solve the problems using Python or R
3. Perform statistical analysis on variety of data.

Course Contents:

1. Data and Statistics


2. Descriptive Statistics: Tabular and Graphical Presentations
3. Descriptive Statistics: Numerical Measures
4. Probability
5. Discrete Probability Distributions
6. Continuous Probability Distribution
7. Sampling and Sampling Distributions
8. Interval Estimation
9. Fundamentals of Hypothesis Testing
10. Two-Sample Tests
11. Inferences about Population Variances
12. Tests of Goodness of Fit and Independence
13. Experimental Design and ANOVA
14. Simple Linear Regression

Laboratory Exercises / Practical:


1. Discrete Probability Distributions
2. Continuous Probability Distribution
3. Sampling and Sampling Distributions

Dr. Sudhir Gavhane


Dean, LASC


4. Interval Estimation
5. Fundamentals of Hypothesis Testing
6. Two-Sample Tests
7. Inferences about Population Variances
8. Tests of Goodness of Fit and Independence
9. Experimental Design and ANOVA
10. Simple Linear Regression

Learning Resources:
Reference Books:

Text Book: David R Anderson, Dennis J Sweeney, Thomas A Williams, Jeffrey D. Camm and
James J. Cochran, Statistics for Business and Economics. 12th Edition. Cengage Learning. 2014
(note that a new edition, 13e, has recently come up, but mostly unavailable)

Pedagogy: Participative learning, discussions, demonstrations, practical, assignment

Assessment Scheme:

Class Continuous Assessment (CCA)

Assignments Test Presentations Case study Attendance


10% 10% 10% 10% 10%

Term End Examination : 50%

Dr. Sudhir Gavhane


Dean, LASC


Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
Data and Statistics:Applications in Business and
Economics, Data Data Sources, Descriptive Statistics,
1 Statistical Inference Computers and Statistical Analysis, 4
Data Mining and Ethical Guidelines for Statistical
Practice (Self Study
Descriptive Statistics: Tabular and Graphical 4
Presentations, Summarizing Qualitative Data,
2 Summarizing Quantitative Data, Cross Tabulation and
Scatter Diagrams, Data Visualization Practices (Self
Study)
Descriptive Statistics: Numerical Measures 4
Measures of Location
Measures of Variability Measures of Shape,
3 Relative Location and Detecting Outliers
Exploratory Data Analysis
Measures of Association between Two Variables
Data Dashboards (Self Study)
Probability 4
Basic Probability Concepts
4
Conditional Probability
Bayes’ Theorem
Discrete Probability Distributions 4
Probability Distribution for a Discrete Random
Variable
Properties: Expectation, Variance
5 Binomial Distribution
Poisson Distribution
Hypergeometric Distribution
Discrete Bivariate Distributions: Covariance and Financial
Portfolios
Continuous Probability Distribution 4
Uniform Probability Distributions
Normal Probability Distribution
6
Normal Approximation to Binomial Probabilities
Exponential Probability Distribution

Dr. Sudhir Gavhane


Dean, LASC
Sampling and Sampling Distributions 4
Simple Random Sampling
Point Estimation
Introduction to Sampling Distribution
7
Sampling Distribution of the Mean
Sampling Distribution of Proportion
Properties of Point Estimators
Other Sampling Methods
Interval Estimation 4
Confidence Interval Estimation for the Mean (σ
known)
Confidence Interval Estimation for the Mean (σ
8 unknown)
Determining Sample Size
Confidence Interval Estimation for the
Proportion

Fundamentals of Hypothesis Testing 4


Hypothesis Testing Methodology
Z test of Hypothesis for the Mean (σ known)
t test of Hypothesis for the Mean (σ unknown)
9
Z test of Hypothesis for the Proportion
Decision Making, Probability of Type-II Errors,
Sample Size Determination

. Two-Sample Tests 4
Comparing Means of Two Independent
Populations
10
Comparing Means of Two Related Populations
Comparing Two Population Proportions

. Inferences about Population Variances 4


Inferences about a Population Variance
11
Inferences about Two Population Variances

Tests of Goodness of Fit and Independence 4


Test the Equality of Three or More Population
Proportions
12 Test of Independence
Goodness of Fit Test: A Multinomial Population
(Self Study)

Dr. Sudhir Gavhane


Dean, LASC


Experimental Design and ANOVA 4
An Introduction
ANOVA and the Completely Randomized
Design
13
Multiple Comparison Procedure
Randomized Block Design and Factorial
Experiment (Self Study)

Simple Linear Regression 4


Simple Linear Regression Model
Least Squares method
Coefficient of Determination
14
Model Assumptions
Testing for Significance
Computer Solution
Residual Analysis (Self Study)

Prepared by Checked By Approved by

Ms. Pradnya Mahadik Ms. Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean, LASC

BOS Chairman
`

COURSE STRUCTURE

Course Code MIT-WPU- MBD -1302


Course Category Core BigData Analytics
Course Title Information Security
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 4 - - 3
Pre-requisites:
Basic concepts of Networking.

Course Objectives:
1. To provide an understanding of principal concepts, major issues, technologies and basic
approaches in information security.
2. Develop a basic understanding of cryptography, how it has evolved and some key encryption
techniques used today
CourseOutcomes:
The students will have firm understanding of:
1. Basic concepts related to network and system level security.
2. Basics of cryptography, security management and network security techniques.
3. Information security governance, and related legal and regulatory issues
4. How threats to an organization are discovered, analyzed, and dealt with.

CourseContents:

UNIT - I
Security Attacks (Interruption, Interception, Modification and Fabrication), Security Services
UNIT - II
Conventional Encryption Principles, Conventional encryption algorithms

UNIT - III
Public key cryptography principles, public key cryptography algorithms

UNIT - IV
Email privacy: Pretty Good Privacy (PGP) and S/MIME.

UNIT - V
IP Security Overview, IP Security Architecture

UNIT - VI
Web Security Requirements, Secure Socket Layer (SSL) and Transport Layer Security (TLS)

Dr. Sudhir Gavhane


Dean, LASC
`

UNIT - VII
Basic concepts of SNMP, SNMPv1 Community facility and SNMPv3.

UNIT - VIII Firewall Design principles, Trusted Systems. Intrusion Detection Systems.

LearningResources:
TEXT BOOKS:
1. Network Security Essentials (Applications and Standards) by William Stallings Pearson
Education.
2. Hack Proofing your network by Ryan Russell, Dan Kaminsky, Rain Forest Puppy, Joe Grand,
David Ahmad, Hal Flynn Ido Dubrawsky, Steve W.Manzuik and Ryan Permeh, Wiley Dreamtech

REFERENCES:
1. Fundamentals of Network Security by Eric Maiwald (Dreamtech press)
2. Network Security - Private Communication in a Public World by Charlie Kaufman, Radia
Perlman and Mike Speciner, Pearson/PHI.
3. Cryptography and network Security, Third edition, Stallings, PHI/Pearson
4. Principles of Information Security, Whitman, Thomson.
5. Network Security: The complete reference, Robert Bragg, Mark Rhodes, TMH
6. Introduction to Cryptography, Buchmann, Springer.

Pedagogy: Participative learning, discussions, algorithm, Flowchart & Program writing,


experiential learning through practical problem solving, assignment, PowerPoint presentation.
AssessmentScheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
10 10 10 - 10 - 10
Term End Examination : 50 Marks

Syllabus:
Module Workload in Hrs
Contents
No. Theory Lab Assess
UNIT - I
Security Attacks (Interruption, Interception, Modification and
1 9 - -
Fabrication), Security Services (Confidentiality, Authentication,
Integrity, Non-repudiation, access Control and Availability) and

Dr. Sudhir Gavhane


Dean, LASC
`

Mechanisms, A model for Internetwork security, Internet


Standards and RFCs, Buffer overflow & format string
vulnerabilities, TCP session hijacking, ARP attacks, route table
modification, UDP hijacking, and man-in-the-middle attacks.
UNIT - II
Conventional Encryption Principles, Conventional encryption
2 algorithms, cipher block modes of operation, location of 6 - -
encryption devices, key distribution Approaches of Message
Authentication, Secure Hash Functions and HMAC.
UNIT - III
Public key cryptography principles, public key cryptography
3 algorithms, digital signatures, digital Certificates, Certificate 6 - -
Authority and key management Kerberos, X.509 Directory
Authentication Service.
UNIT - IV
4 4 - -
Email privacy: Pretty Good Privacy (PGP) and S/MIME.
UNIT - V
IP Security Overview, IP Security Architecture, Authentication
5 7 - -
Header, Encapsulating Security Payload, Combining Security
Associations and Key Management.
UNIT - VI
Web Security Requirements, Secure Socket Layer (SSL) and
6 5 - -
Transport Layer Security (TLS), Secure Electronic Transaction
(SET).
UNIT - VII
7 Basic concepts of SNMP, SNMPv1 Community facility and 5 - -
SNMPv3. Intruders, Viruses and related threats.
UNIT - VIII
8 Firewall Design principles, Trusted Systems. Intrusion Detection 3 - -
Systems.

ggest the below items:


Prepared By Checked By Approved By

Ms. Devyani B Kamble Ms. Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean, LASC
`

COURSE STRUCTURE

Course Code MIT-WPU-MBD-1303


Course Category Core BigData Analytics
Course Title Big Data – Apache Spark - In memory
distributed processing
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 4 - - 3
Pre-requisites:
Basic knowledge of Object Oriented programming concepts, Java, database concepts and
any of the Linux operating system flavors.
Course Objectives:
1. To understand the concepts of Scala and learn their implementation.
2. To understand the Apache Spark architecture.
3. To understand Spark Resilient Distributed Datasets – Transformation, Action.

CourseOutcomes:
The student will get knowledge of:
1. Concepts of Scala and its implementation.
2. Concepts of Spark and how it is used along with Spark.
CourseContents:
Introduction: Introduction to Scala, History of Scala

Conditional Expressions: If-else, While, do-while

Scala Function: Function declaration, function definition.

Scala Classes and Objects: Object, Class, Singleton Object

Array and Strings: Single dimensional

Scala Collections: Sequence, List

File Input-Output: Reading and Writing of files

Introduction to Apache Spark: Features of Apache Spark

Resilient Distributed Dataset(RDD): Introduction of Resilient Distributed Dataset


Spark RDD operations: RDD Transformation

Dr. Sudhir Gavhane


Dean, LASC
`

LearningResources:
1. Programming Scala by Dean Wampler, Alex Payne

2. Scala Cookbook by Alvin Alexander

3. Scala in depth by Joshua D. Suereth

4. Programming in Scala by Martin Odersky, Lex Spoon, Bill Venners

5. Scala for the Impatient by Cay S. Horstmann

6. Learning Spark by Matei Zaharia, Patrick Wendell, Andy Konwinski, Holden Karau

7. Advanced Analytics with Spark by Sandy Ryza, Uri Laserson, Sean Owen and Josh Wills

8. Mastering Apache Spark by Mike Frampton

9. Apache Spark Graph Processing by Rindra Ramamonjison


Pedagogy: Participative learning, discussions, algorithm, Flowchart & Program writing,
experiential learning through practical problem solving, assignment, PowerPoint presentation.
AssessmentScheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
10 10 - - 10 10 10
Term End Examination : 50 Marks

Dr. Sudhir Gavhane


Dean, LASC
`

Syllabus:
Module Workload in Hrs
Contents
No. Theory Lab Assess
Introduction: Introduction to Scala, History of Scala, Features
1 3 - -
Basic Syntax, Scala Comments, Variables, Data types, Operators.
Conditional Expressions: If-else, While, do-while, for, Pattern
2 5 - -
matching, break statement.
Scala Function: Function declaration, function definition,
Function calling, Functions-Call by name, Functions with named
3 arguments, Functions with variable arguments, Default parameter 7 - -
values, Nested functions, Recursion, Higher order functions,
Scala Closures.
Scala Classes and Objects: Object, Class, Singleton Object,
Companion Object, access modifiers, constructors, method
4 overloading, inheritance, method overriding, this keyword, 4 - -
inheritance, method overriding, field overriding, final, Scala
Abstract class, Scala Trait, Apply and Unapply.
Array and Strings: Single dimensional, Passing array into the
5 function, Multidimensional Array, Strings, String methods, String 5 - -
Interpolation
Scala Collections: Sequence, List, Set, Map, Tuples, Options,
6 5 - -
Iterators
7 File Input-Output: Reading and Writing of files 1 - -

Introduction to Apache Spark: Features of Apache Spark,


8 Apache Spark Architecture, Spark Applications, Apache Spark 5 - -
Components, Describe the Different Data Sources and Formats in
Spark.

9 Resilient Distributed Dataset (RDD): Introduction of RDD, 5 - -


Features of RDD in Spark, RDD operations.

10 Spark RDD operations: RDD Transformation, RDD Action. 5 - -

ggest the below items:


Prepared By Checked By Approved By

Ms. Devyani B Kamble Ms. Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean, LASC
COURSE STRUCTURE

Course Code MIT-WPU-MBD-1304


Course Category Core Big Data Analytics
Course Title Machine Learning Algorithm -I
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 4 -- -- 3
Pre-requisites:
1. The main prerequisite for machine learning is data analysis.
2. Familiarity with probability theory
3. Familiarity with linear algebra
Course Objectives:
4. To introduce the basic concepts and techniques of Machine Learning.
5. To develop the skills in using recent machine learning software for solving practical
problems.
6. To be familiar with a set of well-known supervised, semi-supervised and unsupervised
learning algorithms

Course Outcomes:
1. Select real-world applications that needs machine learning based solutions.
2. Implement and apply machine learning algorithms.
3. Select appropriate algorithms for solving a particular group of real-world problems.
4. Recognize the characteristics of machine learning techniques that are useful to solve
real-world problems.

Course Contents

Introduction to learning
What is Supervised, Unsupervised and Reinforcement Learning? visualization of algebraic
concepts

Linear Regression
What is Regression? What is simple one variable regression line and coefficients of the line? What
are assumptions of linear regression? What is Gradient descent algorithm, cost function to find
'beta' values and concept

Gradient Descent
How to represent matrix of problem? How to use Gradient descent for multiple features and
scaling techniques in gradient descent? What are types of feature scaling, finding coefficients
analytically?

Dr. Sudhir Gavhane


Dean
Logistic Regression
What is Logistic regression model? What is Sigmoid function and its graphical representation?
What is Receiver-operating characteristic (RoC) curve? What is the use of RoC curve?

Optimization and Classifications


What is Optimization objective from logistic regression? What is large margin classifier? What is
concept behind large margin classifications using SVM?

Learning Resources:

T. Hastie, R. Tibshirani and J. Friedman, “Elements of Statistical Learning”,


1.
2.
Springer, 2009.
E. Alpaydin, “Machine Learning”, MIT Press, 2010.
3.
K. Murphy, “Machine Learning: A Probabilistic Perspective”, MIT Press, 2012.
4.
C. Bishop, “Pattern Recognition and Machine Learning, Springer”, 2006.
5.
Shai Shalev-Shwartz, Shai Ben-David, “Understanding Machine Learning:From Theory to
6.
Algorithms”, Cambridge University Press, 2014.
7. John Mueller and Luca Massaron, “Machine Learning for Dummies“, John Wiley &
Sons, 2016.
Pedagogy:

Participative learning, discussions, algorithm, Program writing, experiential learning through


practical problem-solving, assignment, PowerPoint presentation

Assessment Scheme:

Class Continuous Assessment (CCA)


Assignments Test Problem solving Attendance Case study Any other
10 10 10 10 10 -

Term End Examination : 50 marks External

Dr. Sudhir Gavhane


Dean
Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
Introduction to learning
Supervised, Unsupervised and Reinforcement Learning,
1 5 - -
geometry (lines, curves and 3D spaces) and visualization of
algebraic concepts
Linear Regression
Regression as a concept, simple one variable regression line,
2 coefficients of the line, assumptions of linear regression, 8 - -
Gradient descent algorithm, cost function to find 'beta' values
and concept, local and global minima, concept of learning rate
Gradient Descent
Matrix representation of problem, Gradient descent for multiple
3 features, use of feature scaling techniques in gradient descent, 7 - -
types of feature scaling, finding coefficients analytically,
normal equation (matrix)non-invertibility
Logistic Regression
Logistic regression model, matrix representation, general
Sigmoid function and graphical representation, decision
boundary (linear and non-linear), metrics for logistic regression
4 13 - -
(accuracy, sensitivity, specificity etcetera concepts), Receiver-
operating characteristic (RoC) curve, use of RoC curve to find
out optimum decision boundary, convexity and non-convexity
of a group of points
Optimization and Classifications
Optimization objective from logistic regression to support
5 vector machines, large margin classifier, concepts behind large 12
margin classifications, kernels (concept, types and graphical
explanations), using SVM

Prepared By Checked By Approved By

Archana Varade Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean
COURSE STRUCTURE

Course Code MIT-WPU- MBD-1305


Course Category Core Big Data Analytics
Course Title Lab on Statistical Computing
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 3 3

Course Objectives:

1. To provide an understanding of concepts and techniques of Business Statistics

2. How to use Excel to solve Business Statistics problems

3. Hands on training on Python and R.

Course Outcomes:
1. The student should be able to formulate and solve problems related to topics covered in this course.
2. The student should be able to solve the problems using Python or R
3. Perform statistical analysis on variety of data.

Course Contents:
Laboratory Exercises / Practical:

1. Data and Statistics


2. Descriptive Statistics: Tabular and Graphical Presentations
3. Descriptive Statistics: Numerical Measures
4. Probability
5. Discrete Probability Distributions
6. Continuous Probability Distribution
7. Sampling and Sampling Distributions
8. Interval Estimation
9. Fundamentals of Hypothesis Testing
10. Two-Sample Tests
11. Inferences about Population Variances
12. Tests of Goodness of Fit and Independence
13. Experimental Design and ANOVA
14. Simple Linear Regression

Dr. Sudhir Gavhane


Dean, LASC


Learning Resources:
Reference Books:

Text Book: David R Anderson, Dennis J Sweeney, Thomas A Williams, Jeffrey D. Camm and James J.
Cochran, Statistics for Business and Economics. 12th Edition. Cengage Learning. 2014 (note that a new
edition, 13e, has recently come up, but mostly unavailable)

Pedagogy: Participative learning, discussions, demonstrations, practical, assignment

Assessment Scheme:

Laboratory Continuous Assessment (LCA)

Practical Oral based on Problem Attendance


practical based
Learning
20% 10% 10% 10%

Term End Examination : 50%

Dr. Sudhir Gavhane


Dean, LASC


Syllabus:
Workload in Hrs
Module
Contents Ass
No. Theory Lab
ess
1 Data and Statistics: 3
2 Descriptive Statistics:, 3
3 Descriptive Statistics: 3
Probability 3
4
Discrete Probability Distributions 3
5
Continuous Probability Distribution 3
6
Sampling and Sampling Distributions 3
7
Interval Estimation 3
8
Fundamentals of Hypothesis Testing 3
9
. Two-Sample Tests 3
10
. Inferences about Population Variances 3
11
Tests of Goodness of Fit and Independence 3
12
Experimental Design and ANOVA 3
13
Simple Linear Regression 3
14

Prepared by Checked By Approved by

Ms. Pradnya Mahadik Ms. Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean, LASC

BOS Chairman
COURSE STR UCTURE

Course Code MIT-WPU-MBD-1306


Course Category Core Big Data Analytics
Course Title Lab on Machine Learning Algorithms - I
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs -- -- 3 3
Pre-requisites:
1.
1. Basic Linear Algebra
2. Programming Experience
3. Statistics and Probability
Course Objectives:
1. To introduce basic machine learning techniques.
2. To develop the skills in using recent machine learning software for solvingpractical problems in
high-performance computing environment.
3. To develop the skills in applying appropriate supervised, semi-supervised or unsupervised
learning algorithms for solving practical problems.
Course Outcomes:
1. Students will be able to:
2. Implement and apply machine learning algorithms to solve problems.
3. Select appropriate algorithms for solving a of real-world problems.
4. Use machine learning techniques in high-performance computing environment to solve real-
world problems.
Course Contents

Laboratory Exercises / Practical:


1. Exercises to solve the real-world problems using the following machine learning
methods:
 Linear Regression
 Logistic Regression
 Multi-Class Classification
 Neural Networks
 Support Vector Machines
 K-Means Clustering & PCA
2. Develop programs to implement Anomaly Detection & Recommendation Systems.
3. Implement GPU computing models to solving some of the problems mentioned in Problem 1.

Dr. Sudhir Gavhane


Dean, LASC


Reference Books

1. Peter Flach: Machine Learning: The Art and Science of Algorithms that Make
Sense of Data, Cambridge University Press, Edition 2012.
2. Hastie, Tibshirani, Friedman: Introduction to Statistical Machine Learning with
Applications in R, Springer, 2nd Edition-2012.
3. C. M. Bishop : Pattern Recognition and Machine Learning, Springer 1st Edition-
2013.
4. Ethem Alpaydin : Introduction to Machine Learning, PHI 2nd Edition-2013.
5. Parag Kulkarni : Reinforcement and Systematic Machine Learning for Decision
Making, Wiley-IEEE Press, Edition July 2012.
Supplementary Reading:

Web Resources:
Weblinks: -

MOOCs: -

Pedagogy:

Mini Project development, Problem solving approach, Participative learning, discussions, algorithm,
Program writing, experiential learning through practical problem-solving, assignment, PowerPoint
presentation

Assessment Scheme:

Class Continuous Assessment (CCA)


Assignments Test Presentations Attendance Viva Any other
10 10 10 10 10 -

Term End Examination : 50 marks External

Dr. Sudhir Gavhane


Dean, LASC


Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
Exercises to solve the real-world problems using the following
machine learning methods:
1 Linear Regression - 3 -
Logistic Regression

Exercises to solve the real-world problems using the following


machine learning methods:
2 Multi-Class Classification - 3 -
Neural Networks

Exercises to solve the real-world problems using the following


machine learning methods:
3 - 3 -
Support Vector Machines
K-Means Clustering & PCA
Develop programs to implement Anomaly Detection &
4 - 3 -
Recommendation Systems.
Implement GPU computing models to solving some of the
5 - 3 -
problems mentioned in Problem 1.
Implement GPU computing models to solving some of the
6 - 3 -
problems mentioned in Problem 2.
Implement GPU computing models to solving some of the problems
7 - 3 -
mentioned in Problem 3.

Prepared By Checked By Approved By

Dr. C. H. Patil Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor Course Coordinator Dean
COURSE STRUCTURE

Course Code MIT-WPU-MBD-2101


Course Category Core Big Data Analytics
Course Title Principles Of Deep Learning
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 3 -- -- 3
Pre-requisites:
This
1. is an upper-level undergraduate/graduate course. All students should have the following
skills:
1. Calculus, Linear Algebra
2. Probability & Statistics
3. Ability to code in Python .

Course Objectives:
Learning in neural networks output vs hidden layers; linear vs nonlinear networks
Course Outcomes: Understand Deep Learning

Course Contents
Course overview What is deep learning? DL successes; syllabus & course logistics;
Intro to neural networks cost functions, hypotheses and tasks; training data; maximum likelihood
based cost, cross entropy, MSE cost; feed-forward networks; MLP, sigmoid units; neuroscience
inspiration; Learning in neural networks output vs hidden layers; linear vs nonlinear networks;
Backpropagation learning via gradient descent; recursive chain rule (backpropagation); if time:
bias-variance tradeoff, regularization; output units: linear, softmax; hidden units: tanh,
Deep learning strategies I (e.g., GPU training, regularization,etc); project proposals
Deep learning strategies II (e.g., RLUs, dropout, etc) SCC/TensorFlow overview How to use the
SCC cluster; introduction to Tensorflow. CNNs I Convolutional neural networks
Deep Belief Nets I probabilistic methods RNNs I Recurrent neural networks Other DNN variants
(e.g. attention, memory networks, etc.)
Neural Turing Machines(Kate) Unsupervised deep learning I(e.g. autoencoders etc.)
Unsupervised deep learning II (e.g. deep generative models etc.)
Deep reinforcement learning Vision applications I NLP applications I Laboratory Exercises /
Practical: NA

Dr. Sudhir Gavhane


Dean, LASC


Reference Books
1. Ian Goodfellow, Yoshua Bengio, Aaron Courville. Deep Learning.
Supplementary Reading:
1. Duda, R.O., Hart, P.E., and Stork, D.G. Pattern Classi cation . Wiley-Interscience.
2nd Edition. 2001.
2. Theodoridis, S. and Koutroumbas, K. Pattern Recognition. Edition 4 . Academic
Press, 2008.
3. Russell, S. and Norvig, N. Artificial Intelligence: A Modern Approach . Prentice Hall
Series in ArtificialIntelligence. 2003.
4. Bishop, C. M. Neural Networks for Pattern Recognition . Oxford University Press.
1995.
5. Hastie, T., Tibshirani, R. and Friedman, J. The Elements of Statistical Learning .
Springer. 2001.
6. Koller, D. and Friedman, N. Probabilistic Graphical Models . MIT Press. 2009.

Web Resources:

Weblinks: -

MOOCs: -

Pedagogy:

Mini Project development, Problem solving approach, Participative learning, discussions, algorithm,
Program writing, experiential learning through practical problem-solving, assignment, PowerPoint
presentation

Assessment Scheme:

Class Continuous Assessment (CCA)


Assignments Test Presentations Attendance Viva Any other
10 10 10 10 10 -

Term End Examination : 50 marks External

Dr. Sudhir Gavhane


Dean, LASC


Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
Course overview
1 What is deep learning? DL successes; syllabus & course 2 - -
logistics;
Intro to neural networks cost functions, hypotheses and tasks;
training data; maximum likelihood based cost, cross entropy,
2 4 - -
MSE cost; feed-forward networks; MLP, sigmoid units;
neuroscience inspiration;
Learning in neural networks
3 4 - -
output vs hidden layers; linear vs nonlinear networks;
Backpropagation learning via gradient descent; recursive chain
4 rule (backpropagation); if time: bias-variance tradeoff, 4 - -
regularization; output units: linear, softmax; hidden units: tanh,
Deep learning strategies I
5 2
(e.g., GPU training, regularization,etc); project proposals
Deep learning strategies II
6 2
(e.g., RLUs, dropout, etc)
SCC/TensorFlow overview
7 2
How to use the SCC cluster; introduction to Tensorflow.
8 CNNs I Convolutional neural networks 2
9 Deep Belief Nets I probabilistic methods 2
10 RNNs I Recurrent neural networks 2
11 Other DNN variants (e.g. attention, memory networks, etc.) 2
12 Neural Turing Machines 2
13 Unsupervised deep learning I(e.g. autoencoders etc.) 2
14 Unsupervised deep learning II (e.g. deep generative models etc.) 2
15 Deep reinforcement learning 2
16 Vision applications I 2
17 NLP applications I 2

Prepared By Checked By Approved By

Dr. C. H. Patil Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor Course Coordinator Dean LASC
COURSE STRUCTURE

Course Code MIT-WPU-MBD-2102


Course Category Core Big Data Analytics
Course Title Machine Learning Algorithm -II
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 4 -- -- 3
Pre-requisites:
1. The main prerequisite for machine learning is data analysis.
2. Familiarity with probability theory
3. Familiarity with linear algebra
CourseObjectives:
4. To introduce the basic concepts and techniques of Machine Learning.
5. To develop the skills in using recent machine learning software for solving practical
problems.
6. To be familiar with a set of well-known supervised, semi-supervised and unsupervised
learning algorithms

CourseOutcomes:
1. Select real-world applications that needs machine learning based solutions.
2. Implement and apply machine learning algorithms.
3. Select appropriate algorithms for solving a particular group of real-world problems.
4. Recognize the characteristics of machine learning techniques that are useful to solve
real-world problems.

CourseContents

Decision trees and random forests


Concept, diagrammatic representation, random forest as a voting committee of decision trees,
parameter meaning and explanation.
Naive Bayes:
Venn diagrams, Naive Bayes algorithm, application and problems, Naive Bayes learning, Bayesian
inference, Retail basket analysis; Concept of boosting and bagging
Unsupervised learning methods/Clustering:
K-means algorithm, optimization objective, graphical representation, random initialization,
choosing number of clusters
Association Rules
Association rule mining, K-nearest neighbours’ algorithm.

Dr. Sudhir Gavhane


Dean, LASC


T. Hastie, R. Tibshirani and J. Friedman, “Elements of Statistical Learning”,
1.
2.
Springer, 2009.
E. Alpaydin, “Machine Learning”, MIT Press, 2010.
3.
K. Murphy, “Machine Learning: A Probabilistic Perspective”, MIT Press, 2012.
4.
C. Bishop, “Pattern Recognition and Machine Learning, Springer”, 2006.
5.
Shai Shalev-Shwartz, Shai Ben-David, “Understanding Machine Learning:From Theory to
6.
Algorithms”, Cambridge University Press, 2014.
7. John Mueller and Luca Massaron, “Machine Learning For Dummies“, John Wiley &
Sons, 2016.
Pedagogy:

Participative learning, discussions, algorithm, Program writing, experiential learning through


practical problem-solving, assignment, PowerPoint presentation

AssessmentScheme:

Class Continuous Assessment (CCA)


Assignments Test Problem solving Attendance Case study Any other
10 10 10 10 10 -

Term End Examination: 50 marks External

Dr. Sudhir Gavhane


Dean, LASC


Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
Decision trees and random forests
Concept, diagrammatic representation, random forest as a
1 voting committee of decision trees, parameter meaning and 12 - -
explanation.

Naive Bayes:
Venn diagrams, Naive Bayes algorithm, application and
2 problems, Naive Bayes learning, Bayesian inference, Retail 12 - -
basket analysis; Concept of boosting and bagging

Unsupervised learning methods/Clustering:


K-means algorithm, optimization objective, graphical
3 representation, random initialization, choosing number of 12 - -
clusters

Association Rules
4 Association rule mining, K-nearest neighbours algorithm. 09 - -

Prepared By Checked By Approved By

Sameer Kakade Pradnya Mahadik Dr. Sudhir Gavhane


Asst.Prof. Course Coordinator Dean LASC
COURSE STRUCTURE

Course Code MIT-WPU-MBD-2103


Course Category Core Big Data Analytics
Course Title Data Science life cycle & Visualization
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 3 - -- 3
Pre-requisites:
Computing: The Structure and Interpretation of Computer Programs
Math: Linear Algebra: some basic concepts like linear operators, eigenvectors, derivatives, and
integrals to enable statistical inference and derive new prediction algorithms.

Course Objectives:
 To describe Data Science Life cycle.
 To describe Data Visualization
Course Outcomes:
 Students will be able to understand Data Science Life cycle & Data Visualization
Course Contents:
1. What is Data Science?
What does Data Science involve?
Era of Data Science
Business Intelligence vs Data Science
Life cycle of Data Science including Extract Transform and Load
 Data Preprocessing
 Data Imputation
 Data Cleaning
 Data Transformation
 Data Visualization
 Data Analysis
 Data Engineering - Big Data
Tools of Data Science

2. Data Extraction Wrangling & Exploration


Data Analysis Pipeline
What is Data Extraction
Types of Data
Raw and Processed Data
Data Wrangling
Exploratory Data Analysis

3. Visualization of Data

Dr. Sudhir Gavhane


Dean, LASC


Introduction to Visualization.
Human Perception and Information Processing
Data types
Graphical perception (the ability of viewers to interpret visual
(graphical) encodings of information and thereby decode information in graphs
Color for information display
Color management systems
Picture visualization and fruition
Data Transformation into sources of knowledge through visual representation.
Requirements and heuristics for high-quality visualizations.
Charts and standard views: relevance and appropriateness.

Advanced and innovative tools for data visualization and advanced quantitative analysis.
The evaluation of the quality of visualizations and infographics.

Learning Resources:

Reference Books:
1. Foundations of Data Science By Avrim Blum, John Hopcroft, and Ravindran Kannan

Pedagogy:
Participative learning, discussions, algorithm, experiential learning through practical problem
solving, assignment, PowerPoint presentation

Assessment Scheme:

Class Continuous Assessment (CCA)


Assignments Test Problem solving Attendance Case study Any other
10 10 10 10 10 -

Term End Examination : 50 marks External

Dr. Sudhir Gavhane


Dean, LASC


Syllabus :
Module Workload in Hrs
Contents
No. Theory Lab Assess
What is Data Science?
What does Data Science involve?
Era of Data Science
1 Business Intelligence vs Data Science 12 - -
Life cycle of Data Science
Tools of Data Science

Data Extraction Wrangling & Exploration


Data Analysis Pipeline
What is Data Extraction
2 Types of Data 12 - -
Raw and Processed Data
Data Wrangling
Exploratory Data Analysis
Visualization of Data
Introduction to Visualization.
Human Perception and Information Processing
Data types
Graphical perception (the ability of viewers to interpret visual
(graphical) encodings of information and thereby decode
information in graphs
Color for information display
Color management systems
3 12 - -
Picture visualization and fruition
Data Transformation into sources of knowledge through visual
representation.
Requirements and heuristics for high-quality visualizations.
Charts and standard views: relevance and appropriateness.
Advanced and innovative tools for data visualization and
advanced quantitative analysis.
The evaluation of the quality of visualizations and infographics.

Prepared By Checked By Approved By

Preeti Adhav Pradnya Mahadik Dr. Sudhir Gavhane


Asst.Prof. BOS Chairman Dean LASC
COURSE STRUCTURE

Course Code MIT-WPU-MBD-2104


Course Category Core Big Data Analytics
Course Title Machine Learning Algorithm -I
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 4 -- -- 3
Pre-requisites:
1. The main prerequisite for machine learning is data analysis.
2. Familiarity with probability theory
3. Familiarity with linear algebra
Course Objectives:
4. To introduce the basic concepts and techniques of Machine Learning.
5. To develop the skills in using recent machine learning software for solving practical
problems.
6. To be familiar with a set of well-known supervised, semi-supervised and unsupervised
learning algorithms

Course Outcomes:
1. Select real-world applications that needs machine learning based solutions.
2. Implement and apply machine learning algorithms.
3. Select appropriate algorithms for solving a particular group of real-world problems.
4. Recognize the characteristics of machine learning techniques that are useful to solve
real-world problems.

Course Contents

Introduction to learning
What is Supervised, Unsupervised and Reinforcement Learning? visualization of algebraic
concepts

Linear Regression
What is Regression? What is simple one variable regression line and coefficients of the line? What
are assumptions of linear regression? What is Gradient descent algorithm, cost function to find
'beta' values and concept

Gradient Descent
How to represent matrix of problem? How to use Gradient descent for multiple features and
scaling techniques in gradient descent? What are types of feature scaling, finding coefficients
analytically?

Dr. Sudhir Gavhane


Dean
Logistic Regression
What is Logistic regression model? What is Sigmoid function and its graphical representation?
What is Receiver-operating characteristic (RoC) curve? What is the use of RoC curve?

Optimization and Classifications


What is Optimization objective from logistic regression? What is large margin classifier? What is
concept behind large margin classifications using SVM?

Learning Resources:

T. Hastie, R. Tibshirani and J. Friedman, “Elements of Statistical Learning”,


1.
2.
Springer, 2009.
E. Alpaydin, “Machine Learning”, MIT Press, 2010.
3.
K. Murphy, “Machine Learning: A Probabilistic Perspective”, MIT Press, 2012.
4.
C. Bishop, “Pattern Recognition and Machine Learning, Springer”, 2006.
5.
Shai Shalev-Shwartz, Shai Ben-David, “Understanding Machine Learning:From Theory to
6.
Algorithms”, Cambridge University Press, 2014.
7. John Mueller and Luca Massaron, “Machine Learning for Dummies“, John Wiley &
Sons, 2016.
Pedagogy:

Participative learning, discussions, algorithm, Program writing, experiential learning through


practical problem-solving, assignment, PowerPoint presentation

Assessment Scheme:

Class Continuous Assessment (CCA)


Assignments Test Problem solving Attendance Case study Any other
10 10 10 10 10 -

Term End Examination : 50 marks External

Dr. Sudhir Gavhane


Dean
Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
Introduction to learning
Supervised, Unsupervised and Reinforcement Learning,
1 5 - -
geometry (lines, curves and 3D spaces) and visualization of
algebraic concepts
Linear Regression
Regression as a concept, simple one variable regression line,
2 coefficients of the line, assumptions of linear regression, 8 - -
Gradient descent algorithm, cost function to find 'beta' values
and concept, local and global minima, concept of learning rate
Gradient Descent
Matrix representation of problem, Gradient descent for multiple
3 features, use of feature scaling techniques in gradient descent, 7 - -
types of feature scaling, finding coefficients analytically,
normal equation (matrix)non-invertibility
Logistic Regression
Logistic regression model, matrix representation, general
Sigmoid function and graphical representation, decision
boundary (linear and non-linear), metrics for logistic regression
4 13 - -
(accuracy, sensitivity, specificity etcetera concepts), Receiver-
operating characteristic (RoC) curve, use of RoC curve to find
out optimum decision boundary, convexity and non-convexity
of a group of points
Optimization and Classifications
Optimization objective from logistic regression to support
5 vector machines, large margin classifier, concepts behind large 12
margin classifications, kernels (concept, types and graphical
explanations), using SVM

Prepared By Checked By Approved By

Archana Varade Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean
COURSE STRUCTURE

Course Code MIT-WPU-MBD-2105


Course Category Core Big Data Analytics
Course Title Lab on R Programming
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs - - 3 3
Pre-requisites
Computing: The Structure and Interpretation of Computer Programs
Math: Linear Algebra: some basic concepts like linear operators, eigenvectors, derivatives, and
integrals to enable statistical inference and derive new prediction algorithms.

Course Objectives:

 To describe Data Science Life cycle.


 To describe Data Visualization
Course Outcomes:
Students will be able to understand Data Science Life cycle & Data Visualization
Course Contents:

 Data Cleaning
 Data Transformation
 Data Visualization
 Data Analysis
 Data Engineering - Big Data
 Tableau Desktop
 Getting Started
 Connecting to Data
 Visual Analytics
 Dashboards and Stories
 Mapping
 Calculations
 Why is Tableau Doing That?
 How To cleanse & represent

Learning Resources:

Dr. Sudhir Gavhane


Dean, LASC


Reference Books:
Foundations of Data Science By Avrim Blum, John Hopcroft, and Ravindran Kannan

Pedagogy:
Participative learning, discussions, algorithm, Flowchart & Program writing, experiential learning
through practical problem solving, assignment, PowerPoint presentation.

Assessment Scheme:
Class Continuous Assessment (CCA) 50

Assignments Test Presentations Case study MCQ Oral Attendance


10 10 - - 10 10 10

Term End Examination : 50 Marks External

Laboratory Continuous Assessment (LCA)50

Practical Oral based on Site Visit Mini Problem Attendance


practical Project based
Learning
10 10 - 10 10 10

Term End Examination : 50 Marks External

Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
1 Assignment on Data Cleansing
2 Assignment on Transformation
3 Assignment on

Dr. Sudhir Gavhane


Dean, LASC


Basic of Tableau :
i. Tableau interface:
 Menus and Toolbar
 Data Pane
 Analytics Pane
 Sheet Tabs
 Shelves and Cards
 Marks Card
 Legends
4 - -
 Layout for Dashboards & Stories
 Distributing and Publishing
ii. Distributing & publishing:

 Way to share
 Exploring images and PDFs
 Workbook file types
 Opening workbook files
 Sharing securely
Connecting with Data:
 Getting Started with Data
 Managing Metadata
 Managing Extracts
 Saving and Publishing Data Sources
 Data Prep with Text and Excel Files
5 - -
 Join Types with Union
 Cross-database Joins
 Data Blending
 Additional Data Blending Topics
 Connecting to Cubes
 Connecting to PDFs

Visual Analytics:

 Getting Started with Visual Analytics


 Drill Down and Hierarchies
 Sorting
6 - -
 Grouping
 Additional Ways to Group
 Creating Sets
 Working with Sets
 Ways to Filter
 Using the Filter Shelf

Dr. Sudhir Gavhane


Dean, LASC


 Interactive Filters
 Where Tableau Filters
 Additional Filtering Topics
 Parameters
 Formatting
 The Formatting Pane
 Basic Tooltips
 Viz in Tooltip
 Trend Lines
 Reference Lines
 Forecasting
 Clustering
 Analysis with Cubes and MDX
Dashboards and Stories:
 Getting Started with Dashboards and Stories
 Building a Dashboard
 Dashboard Objects
7  Dashboard Formatting - -

 Device Designer
 Dashboard Interactivity Using Actions
 Story Points
Mapping:
 Getting Started with Mapping
 Maps in Tableau
 Editing Unrecognized Locations
 Spatial Files
8  Expanding Tableau's Mapping Capabilities - -
 Custom Geocoding
 Polygon Maps
 Mapbox Integration
 WMS: Web Mapping Services
 Background Images
Calculations:
 Calculation Syntax
 Introduction to LOD Expressions
 Modifying Table Calculations
9  Aggregate Calculations - -
 Logic Calculations
 String Calculations
 Number Calculations
 Type Calculations

Dr. Sudhir Gavhane


Dean, LASC


 Conceptual Topics with LOD Expressions
 Aggregation and Replication with LOD Expressions
 Nested LOD Expressions
 How to Integrate R and Tableau
 Using R within Tableau
 Date Calculations
 Getting Started with Calculations
 Intro to Table Calculations
Why is Tableau Doing That?
 Understanding Pill Types
 Measure Names and Measure Values
10  Aggregation, Granularity, and Ratio Calculations - -
 When to Blend and When to Join
 Fixing "Incorrect" Sorts
 Filtering for Top Across Panes
How To
 Finding the Second Purchase Date with LOD
Expressions
 Using a Parameter to Change Fields
 Cleaning Data by Bulk Re-aliasing
11  Bollinger Bands - -
 Bump Charts
 Control Charts
 Funnel Charts
 Pareto Charts
 Waterfall Charts

Prepared By Checked By Approved By

Preeti Adhav Pradnya Mahadik Dr. Sudhir Gavhane


Lecturer BOS Chairmen Dean LASC
COURSE STRUCTURE

Course Code MIT-WPU- MBD-2106


Course Category Elective Big Data Analytics
Course Title Internet Of Things
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 4 - - 3
Pre-requisites:
1. Knowledge of networking, sensing, databases, programming, and related technology.
2. Familiarity with business concepts and marketing.
Course Objectives:

1. Vision and Introduction to IoT.


2. Understand IoT Market perspective.
3. Data and Knowledge Management and use of Devices in IoT Technology.
4. Understand State of the Art – IoT Architecture.
5. Real World IoT Design Constraints, Industrial Automation and Commercial Building
Automation in IoT.

Course Outcomes:

1. Students will understand IoT Market perspective.


2. Students will get Data and Knowledge Management and use of Devices in IoT
Technology.
3. Students will understand State of the Art – IoT Architecture.
4. Students will get Real World IoT Design Constraints, Industrial Automation and
Commercial Building Automation in IoT.

Course Contents:
M2M to IoT
Introduction of M2M to IoT

M2M to IoT – A Market Perspective


Introduce basic concepts of IoT. Emerging industrial structure for IoT and development of IoT
architecture.

M2M and IoT Technology Fundamentals


Fundamental concepts of technology required for M2M and IoT

Dr. Sudhir Gavhane


Dean, LASC
IoT Architecture-State of the Art
Includes study of IoT reference model.

IoT Reference Architecture


Study of different views of reference architecture. Introduction to Industrial Automation- Service-
oriented architecture-based device integration

Commercial Building Automation


Case study for Commercial Building Automation.

Learning Resources:
Reference Books:
1. Jan Holler, Vlasios Tsiatsis, Catherine Mulligan, Stefan Avesand, Stamatis
Karnouskos, David Boyle, “From Machine-to-Machine to the Internet of Things:
Introduction to a New Age of Intelligence”, 1st Edition, Academic Press, 2014.Data
Warehousing in the Real World, Anahory, Murray, Pearson Education
2. Vijay Madisetti and Arshdeep Bahga, “Internet of Things (A Hands-on-Approach)”,
1st Edition, VPT, 2014.
3. Francis daCosta, “Rethinking the Internet of Things: A Scalable Approach to
Connecting Everything”, 1st Edition, Apress Publications, 2013

Supplementary Reading:
1. Collaborative Internet of Things (C-IoT): For Future Smart Connected Life and
Business
2. By Fawzi Behmann, Kwok Wu

Weblinks:
www.tutorialspoint.com

Pedagogy:
Participative learning, discussions, Problem Solving, experiential learning through practical
problem solving, assignment, PowerPoint presentation

Assessment Scheme:
Class Continuous Assessment (CCA) 50 Marks

Assignments Test Presentations Case study MCQ Oral Attendance


20 10 10 - - 10

Term End Examination : 50 Marks

Dr. Sudhir Gavhane


Dean, LASC
Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
M2M to IoT
1 The Vision-Introduction, From M2M to IoT, M2M towards IoT- 5 - -
the global context, A use case example, Differing Characteristics
M2M to IoT – A Market Perspective
Introduction, Some Definitions, M2M Value Chains, IoT Value
Chains, An emerging industrial structure for IoT, The
2 international driven global value chain and global information 7 - 1
monopolies. M2M to IoT-An Architectural Overview– Building
an architecture, Main design principles and needed capabilities,
An IoT architecture outline, standards considerations
M2M and IoT Technology Fundamentals
Devices and gateways, Local and wide area networking, Data
3 management, Business processes in IoT, Everything as a 7 - 1
Service(XaaS), M2M and IoT Analytics, Knowledge
Management
IoT Architecture-State of the Art
Introduction, State of the art, Architecture Reference Model-
4 6 - 1
Introduction, Reference Model and architecture, IoT reference
Model
IoT Reference Architecture
Introduction, Functional View, Information View, Deployment
and Operational View, Other Relevant architectural views. Real-
World Design Constraints- Introduction, Technical Design
constraints-hardware is popular again, Data representation and
5 8 - 1
visualization, Interaction and remote control. Industrial
Automation- Service-oriented architecture-based device
integration, SOCRADES: realizing the enterprise integrated Web
of Things, IMC-AESOP: from the Web of Things to the Cloud of
Things
Commercial Building Automation
Introduction, Case study: phase one-commercial building
6 7 - 1
automation today, Case study: phase two- commercial building
automation in the future

Prepared By Checked By Approved By

Ms. Smita Patil Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean, LASC
COURSE STRUCTURE

Course Code MIT-WPU-MBD-2107


Course Category Elective Big Data Analytics
Course Title Introduction to image processing
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs -- 04 -- 03
Pre-requisites:
Basic knowledge of Core Java programing

Course Objectives:
1. To learn the fundamental concepts of Digital Image Processing.
2. To study basic image processing operations.
3. To understand image analysis algorithms.
4. To expose students to current applications in the field of digital image processing.

Course Outcomes:
1. Understand image formation and the role human visual system plays in perception of gray
and color image data.
2. Get broad exposure to and understanding of various applications of image processing in
industry, medicine, and defense.
3. Learn the signal processing algorithms and techniques in image enhancement and image
restoration.
4. Acquire an appreciation for the image processing issues and techniques and be able to apply
these techniques to real world problems.
5. Be able to conduct independent study and analysis of image processing problems and
techniques

Course Contents

Introduction
What is Image Processing?, The origins of Image Processing, Examples of Fields that use Image
Processing, Gamma-Ray Imaging, X-Ray Imaging, Imaging in the Ultraviolet Band, Imaging in
the Visible and Infrared Bands, Imaging in the Microwave Band, Imaging in the Radio Band,
Fundamental steps in Digital Image Processing, Components of an Image Processing System

Digital Image Fundamentals


Elements of Visual Perception, Light and the Electromagnetic Spectrum, Image sensing and
Acquisition, Image Sampling and Quantization, Some Basic Relationships between Pixels, An
Introduction to the Mathematical Tools Used in Digital Image Processing, Array versus Matrix
Operations, Linear versus Nonlinear Operations, Arithmetic Operations, Set and Logical
Operations

Intensity Transformation and Spatial Filtering

Dr. Sudhir Gavhane


Dean LASC
Background, Some Basic Intensity Transformation Functions, Histogram Processing, Histogram
Equalization, Histogram Matching (Specification), Local Histogram Processing, Fundamentals of
Spatial Filtering, Smoothing Spatial Filters, Sharpening Spatial Filters, Combining Spatial
Enhancement Methods

Filtering in the Frequency Domain


Background, Preliminary Concepts, Sampling and the Fourier Transform of Sampled Functions,
The Discrete Fourier Transform (DFT) of One variable, Extension to Functions of Two Variables.

Image Restoration and Reconstruction


A Model of the Image Degradation / Restoration Process, Noise Models, Restoration in the
Presence of Noise Only- Spatial Filtering, Periodic Noise Reduction by Frequency Domain
Filtering, Bandreject Filters, Bandpass Filters, Notch Filters, Estimating the Degradation Function,
Inverse Filtering, Minimum Mean Square Error(Wiener) Filtering, Geometric Mean Filter

Morphological Image Processing


Preliminaries, Erosion and Dilation, Opening and Closing, The Hit-or-Miss Transformation, Some
Basic Morphological Algorithms, Boundary Extraction, Hole Filling, Extraction of Connected
Components, Convex Hull, Thinning, Thickening, Skeletons, Pruning, Morphological
Reconstruction

Image Segmentation
Fundamentals, Point, Line, and Edge Detection, Background, Detection of Isolated Points, Line
Detection
Edge Models, Basic Edge Detection, Edge Linking and Boundary Detection, Thresholding,
Foundation, Basic Global Thresholding, Optimum Global Thresholding Using Otsu's Method.
Learning Resources:

Reference Books
B1: Cay’s Horstmann and Gary Cornell Core Java Volume -1 and Volume 2.
B2: Herbert Schildt (TMH) The complete reference JAVA-2 Fifth Edition.

Pedagogy:
Participative learning, discussions, algorithm, Flowchart & Program writing, experiential learning
through practical problem solving, assignment, PowerPoint presentation.

Assessment Scheme:
Class Continuous Assessment (CCA) 50 Marks

Assignments Test Presentations Case study MCQ Oral Any other

Dr. Sudhir Gavhane


Dean LASC
Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
Introduction [3]
What is Image Processing?, The origins of Image Processing,
Examples of Fields that use Image Processing, Gamma-Ray
Imaging, X-Ray Imaging, Imaging in the Ultraviolet Band,
1 4 - -
Imaging in the Visible and Infrared Bands, Imaging in the
Microwave Band, Imaging in the Radio Band, Fundamental steps
in Digital Image Processing, Components of an Image
Processing System
Digital Image Fundamentals [6]
Elements of Visual Perception, Light and the Electromagnetic
Spectrum, Image sensing and Acquisition, Image Sampling and
Quantization, Some Basic Relationships between Pixels, An
2 10 - -
Introduction to the Mathematical Tools Used in Digital Image
Processing, Array versus Matrix Operations, Linear versus
Nonlinear Operations, Arithmetic Operations, Set and Logical
Operations
Intensity Transformation and Spatial Filtering [7]
Background, Some Basic Intensity Transformation Functions,
Histogram Processing, Histogram Equalization, Histogram
3 Matching (Specification), Local Histogram Processing, 9 - -
Fundamentals of Spatial Filtering, Smoothing Spatial Filters,
Sharpening Spatial Filters, Combining Spatial Enhancement
Methods
Filtering in the Frequency Domain [10]
Background, Preliminary Concepts, Sampling and the Fourier
4 Transform of Sampled Functions, The Discrete Fourier 7 - -
Transform (DFT) of One variable, Extension to Functions of Two
Variables.

Image Restoration and Reconstruction [6]


A Model of the Image Degradation / Restoration Process, Noise
Models, Restoration in the Presence of Noise Only- Spatial
5 Filtering, Periodic Noise Reduction by Frequency Domain 7 - -
Filtering, Bandreject Filters, Bandpass Filters, Notch Filters,
Estimating the Degradat ion Function, Inverse Filtering,
Minimum Mean Square Error(Wiener) Filtering, Geometric
Mean Filter
Morphological Image Processing [5]
6 8 - -
-or-

Dr. Sudhir Gavhane


Dean LASC
Morphological Algorithms, Boundary Extraction, Hole Filling,
Extraction of Connected Components, Convex Hull, Thinning,
Thickening, Skeletons, Pruning, Morphological Reconstruction
Image Segmentation [7]
Fundamentals, Point, Line, and Edge Detection,Background,
Detection of Isolated Points, Line Detection
7 - -
Edge Models, Basic Edge Detection, Edge Linking and Boundary
Detection, Thresholding, Foundation, Basic Global Thresholding,
Optimum Global Thresholding Using Otsu's Method.

Prepared By Checked By Approved By

Nilesh Magar Pradnya Mahadik Dr. Sudhir Gavhane


Assistant professor Course Coordinator Dean LASC
COURSE STRUCTURE

Course Code MIT-WPU-MBD-2201


Course Category Core Big Data Analytics
Course Title Natural Language Processing
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 3 -- -- 3
Pre-requisites:
1. 1. Linear algebra
2. Probability & Statistics
3. Artificial Intelligence and Neural Networks
Course Objectives: To understand natural language processing, algorithms, structures and
meanings
Course Outcomes:
1. Students will understand Word forms.
2. Students will understand structures.
3. Students will understand meaning processing.
Course Contents

Introduction to Natural Language Processing


Brief History and introduction about Natural Language Processing

ML basics
Algorithms, Naïve Bayes, Bayesian Statistics, HMM, CRF

Word Forms
POS tagging and Chunking: Morphology fundamentals; Morphological Diversity of Indian
Languages; Morphology Paradigms; Finite State Machine Based Morphology; Automatic
Morphology Learning; Shallow Parsing; Named Entities; Maximum Entropy Models; Random
Fields, POS tagging techniques, Chunking techniques:CRF.

Structures
Theories of Parsing, Parsing Algorithms; Robust and Scalable Parsing on Noisy Text as in Web
documents; dependency parsing; Hybrid of Rule Based and Probabilistic Parsing: MST, MALT
parser; Scope Ambiguity and Attachment Ambiguity resolution.
Meaning
Lexical Knowledge Networks, Wordnet Theory; Indian Language Wordnets and Multilingual
Dictionaries; Semantic Roles; Word Sense Disambiguation; WSD and Multilinguality; Metaphors;
Co-references.

Dr. Sudhir Gavhane


Dean, LASC
Learning Resources:

Reference Books:
1. Allen, James, “Natural Language Understanding”, Second Edition, Benjamin/Cumming, 1995.
2. Charniack, Eugene, “Statistical Language Learning”, MIT Press, 1993.
3. Jurafsky, Dan and Martin, James, “Speech and Language Processing”,Second Edition, Prentice
Hall, 2008.
4. Manning, Christopher and Heinrich, Schutze, “Foundations of StatisticalNatural Language
Processing”, MIT Press, 1999.
5. AksharBharti, VineetChaitanya, Rajeev Sangal,”Natural Language Processing: An Paninian
perspective”

Web Resources:

Weblinks: -

MOOCs:-

Pedagogy:

Participative learning, discussions, algorithm, Flowchart & Program writing, experiential learning
through practical problem solving, assignment, PowerPoint presentation

Assessment Scheme:

Class Continuous Assessment (CCA)


Assignments Test Presentations Attendance Viva Any other
10 10 10 10 10 -

Term End Examination: 50 marks External

Dr. Sudhir Gavhane


Dean, LASC


Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
Introduction to Natural Language Processing
Brief History, Applications: Speech to text, story understanding,
QA system, Machine Translation, Text summarization, text
classification, sentiment analysis, chatterbox, challenges/Open
1 Problems, Natural Language (NL) Characteristics and NL 10 - -
computing techniques, NL tasks: Segmentation, Chunking,
tagging, NER, Parsing, Word Sense Disambiguation, NL
Generation, Web 2.0 Applications : Sentiment Analysis; Text
Entailment; Cross Lingual Information Retrieval (CLIR).
ML basics
2 5 - -
Algorithms, Naïve Bayes, Bayesian Statistics, HMM, CRF
Word Forms
POS tagging and Chunking: Morphology fundamentals;
Morphological Diversity of Indian Languages; Morphology
3 Paradigms; Finite State Machine Based Morphology; Automatic 10 - -
Morphology Learning; Shallow Parsing; Named Entities;
Maximum Entropy Models; Random Fields, POS tagging
techniques, Chunking techniques: CRF.
Structures
Theories of Parsing, Parsing Algorithms; Robust and Scalable
Parsing on Noisy Text as in Web documents; dependency
4 10 - -
parsing; Hybrid of Rule Based and Probabilistic Parsing: MST,
MALT parser; Scope Ambiguity and Attachment Ambiguity
resolution.
Meaning
Lexical Knowledge Networks, Wordnet Theory; Indian
5 Language Wordnets and Multilingual Dictionaries; Semantic 10
Roles; Word Sense Disambiguation; WSD and Multilinguality;
Metaphors; Coreferences.

Prepared By Checked By Approved By

Mr. Sameer Kakade Ms. Pradnya Mahadik Dr. Sudhir Gavhane


Asst. Professor BOS Chairman Dean, LASC
COURSE STRUCTURE

Course Code MIT-WPU- MBD-2202


Course Category Core Big Data Analytics
Course Title Web & Social Intelligence
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 4 - - 3
Pre-requisites:
Knowledge of any scripting language, XML and cloud

Course Objectives:

Organizations worldwide are waking up to the opportunity of this revolutionary medium to fulfill
various business objectives ranging from Sales,
Marketing, CRM, Product Development and Research. This has created an ever increasing demand
of skilled Web Analytics professionals.The objective is to fulfill this demand.

Course Outcomes:
After taking this course, you will be able to: - Utilize various Application Programming
Interface (API) services to collect data from different social media sources such as YouTube,
Twitter, and Flickr. - Process the collected data - primarily structured - using methods involving
correlation, regression, and classification to derive insights about the sources and people who
generated that data. - Analyze unstructured data - primarily textual comments - for sentiments
expressed in them. - Use different tools for collecting, analyzing, and exploring social media data
for research and development purposes.

Course Contents:

Introduction to web analytics


What’s analysis?
Getting started with Google Analytics

Google Analytics
Getting Started With Google Analytics
How Google Analytics works?
Accounts, profiles, and users navigating
Google Analytics
Content performance analysis
Pages and Landing Pages
Event Tracking and Ad Sense Site Search

Dr. Sudhir Gavhane


Dean, LASC
Visitor analysis
Unique visitors
Geographic and language information
Technical reports
Benchmarking
Social media analytics
Face book insights
Twitter analytics
YouTube analytics
Social Ad analytics / ROI measurement
Social & CRM Analysis
Radian6
Sentiment analysis
Workflow management
Text analytics

Learning Resources:
Reference Books:
Written by none other than Avinash Kaushik, Digital Marketing Evangelist for Google, Co-
Founder and Chief Education Officer for Market Motive, and author of two bestselling
books: Web Analytics 2.0, Web Analytics: An Hour A Day tops the chart when it comes to
best Web Analytics Books.
Supplementary Reading:
Weblinks:

Pedagogy:
Participative learning, discussions, Problem Solving, experiential learning through practical
problem solving, assignment, PowerPoint presentation

Assessment Scheme:
Class Continuous Assessment (CCA): 50 Marks

Assignments Test Presentations Case study MCQ Oral Attendance


20 10 10 - - 10

Term End Examination : 50 Marks

Dr. Sudhir Gavhane


Dean, LASC
Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
Introduction to web analytics

What’s analysis?
Is analysis worth the effort?
• Small businesses
1 • Medium and large scale businesses 5 - -
Analysis vs intuition
What is web analytics?
Getting started with Google Analytics
• How Google Analytics works
• Accounts, profiles, and users
Google Analytics

Getting Started With Google Analytics


How Google Analytics works?
Accounts, profiles, and users navigating
Google Analytics
2 Basic metrics 7 - 1
The main sections of Google Analytics reports
Traffic Sources
Direct, referring, and search traffic
Campaigns
AdWords, Adsense

Content performance analysis

Pages and Landing Pages


3 7 - 1
Event Tracking and AdSense
Site Search

Visitor analysis

Unique visitors
4 Geographic and language information 6 - 1
Technical reports
Benchmarking

Dr. Sudhir Gavhane


Dean, LASC
Social media analytics

Face book insights


5 Twitter analytics 8 - 1
YouTube analytics
Social Ad analytics / ROI measurement

Social & CRM Analysis

Radian6
6 Sentiment analysis 7 - 1
Workflow management
Text analytics

Prepared By Checked By Approved By

Ms. Smita Patil Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean, LASC
COURSE STRUCTURE

Course Code MIT-WPU-MBD-2203


Course Category Core Big Data Analytics
Course Title Cloud Computing
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs -- 04 -- 03
Pre-requisites:
1. Basic understanding about Distributed Computing
2. Basic understanding about networking like VLAN , IP addressing (Class A , B, C ), VNET
, Subnet , Introduction to RFC 1918 , DNS systems and how they work in general
3. Cloud Storage Systems
Course Objectives:
This course gives the idea of evolution of cloud computing and its services available today,
which may led to the design and development of simple cloud service. It also focused on some
key challenges and issues around cloud computing.
Course Outcomes:
After successfully completing students should be able to

 Articulate the main concepts, key technologies, strengths, and limitations of cloud
computing and the possible applications for state-of-the-art cloud computing
 Identify the architecture and infrastructure of cloud computing, including SaaS, PaaS, IaaS,
public cloud, private cloud, hybrid cloud, etc.
 Explain the core issues of cloud computing such as security, privacy, and interoperability.
 Choose the appropriate technologies, algorithms, and approaches for the related issues.
 Identify problems, and explain, analyze, and evaluate various cloud computing solutions.
 Provide the appropriate cloud computing solutions and recommendations according to the
applications used.
 Attempt to generate new ideas and innovations in cloud computing.
 Collaboratively research and write a research paper, and present the research online.

Course Contents:

INTRODUCTION
Introduction of Cloud

CLOUD SERVICES
Types of Cloud services
Service providers- Google, Amazon, Microsoft Azure, IBM, Sales force

COLLABORATING USING CLOUD SERVICES

Dr.Sudhir Gavhane
Dean LASC
Email Communication over the Cloud - CRM Management - Project Management-Event
Management - Task Management – Calendar - Schedules - Word Processing – Presentation
Spreadsheet - Databases – Desktop - Social Networks and Groupware
VIRTUALIZATION FOR CLOUD
Need for Virtualization – Pros and cons of Virtualization – Types of Virtualization –System
Vm, Process VM, Virtual Machine monitor – Virtual machine properties - Interpretation and
Binary translation, HLL VM - Hypervisors – Xen, KVM, VMWare, Virtual Box, Hyper-V.

SECURITY, STANDARDS AND APPLICATIONS


Security in Clouds: Cloud security challenges – Software as a Service Security, Common
Standards: The Open Cloud Consortium – The Distributed management Task Force –
Standards for application Developers – Standards for Messaging – Standards for Security,
End user access to cloud computing, Mobile Internet devices and the cloud.
Learning Resources:
TEXT BOOKS:
1. John Rittinghouse & James Ransome, Cloud Computing, Implementation, Management
and Strategy, CRC Press, 2010.
2. Michael Miller, Cloud Computing: Web-Based Applications That Change the Way You
Work and Collaborate Que Publishing, August 2008.
3. James E Smith, Ravi Nair, Virtual Machines, Morgan Kaufmann Publishers, 2006.
REFERENCES:
1. David E.Y. Sarna Implementing and Developing Cloud Application, CRC press 2011.
2. Lee Badger, Tim Grance, Robert Patt-Corner, Jeff Voas, NIST, Draft cloud computing
synopsis and recommendation, May 2011.
3. Anthony T Velte, Toby J Velte, Robert Elsenpeter, Cloud Computing : A Practical
Approach, Tata McGraw-Hill 2010.
4. Haley Beard, Best Practices for Managing and Measuring Processes for On-demand
Computing, Applications and Data Centers in the Cloud with SLAs, Emereo Pty Limited,
July 2008.
5. G.J.Popek, R.P. Goldberg, Formal requirements for virtualizable third generation
Architectures, Communications of the ACM, No.7 Vol.17, July 1974.

Pedagogy:
Participative learning, discussions, algorithm, Flowchart & Program writing, experiential learning
through practical problem solving, assignment, PowerPoint presentation.
Assessment Scheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study Attendance Oral Any other
10 10 10 10 10 - -

Term End Examination : 50 Marks of external Examination

Dr. Sudhir Gavhane


Dean LASC
Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
INTRODUCTION
Cloud-definition, benefits, usage scenarios, History
of Cloud Computing - Cloud Architecture
1 Types of Clouds - Business models around Clouds – 9 - -
Major Players in Cloud Computing -
issues in Clouds - Eucalyptus - Nimbus - Open
Nebula, Cloud Sim.
CLOUD SERVICES
Types of Cloud services: Software as a Service -
Platform as a Service – Infrastructure as
2 a Service - Database as a Service - Monitoring as a 9 - -
Service –Communication as services.
Service providers- Google, Amazon, Microsoft
Azure, IBM, Sales force
UNIT III COLLABORATING USING CLOUD
SERVICES
Email Communication over the Cloud - CRM
Management - Project Management-Event
3 9 - -
Management - Task Management – Calendar -
Schedules - Word Processing – Presentation
Spreadsheet - Databases – Desktop - Social
Networks and Groupware
UNIT IV VIRTUALIZATION FOR CLOUD
Need for Virtualization – Pros and cons of
Virtualization – Types of Virtualization –System
4 Vm, Process VM, Virtual Machine monitor – Virtual 9 - -
machine properties - Interpretation and
Binary translation, HLL VM - Hypervisors – Xen,
KVM, VMWare, Virtual Box, Hyper-V.
UNIT V SECURITY, STANDARDS AND
APPLICATIONS
Security in Clouds: Cloud security challenges –
5 9 -
Software as a Service Security, Common
Standards: The Open C loud Consortium – The
Distributed management Task Force –

Dr. Sudhir Gavhane


Dean LASC
Standards for application Developers – Standards for
Messaging – Standards for Security

End user access to cloud computing, Mobile Internet


devices and the cloud.

Prepared By Checked By Approved By

Nilesh Magar Pradnya Mahadik Dr. Sudhir Gavhane


Assistant professor Course Coordinator Dean LASC
COURSE STRUCTURE

Course Code MIT-WPU- MBD-2204


Course Category Lab Big Data Analytics
Course Title Web & Social Intelligence
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 4 - - 3
Pre-requisites:

Course Objectives:

Organizations worldwide are waking up to the opportunity of this revolutionary medium to fulfill
various business objectives ranging from Sales,
Marketing, CRM, Product Development and Research. This has created an ever increasing demand
of skilled Web Analytics professionals.The objective is to fulfill this demand.

Course Outcomes:
After taking this course, you will be able to: - Utilize various Application Programming
Interface (API) services to collect data from different social media sources such as YouTube,
Twitter, and Flickr. - Process the collected data - primarily structured - using methods involving
correlation, regression, and classification to derive insights about the sources and people who
generated that data. - Analyze unstructured data - primarily textual comments - for sentiments
expressed in them. - Use different tools for collecting, analyzing, and exploring social media data
for research and development purposes.

Course Contents:

Introduction to web analytics


What’s analysis?
Getting started with Google Analytics

Google Analytics
Getting Started With Google Analytics
How Google Analytics works?
Accounts, profiles, and users navigating
Google Analytics

Dr. Sudhir Gavhane


Dean, LASC
Content performance analysis
Pages and Landing Pages
Event Tracking and Ad Sense Site Search
Visitor analysis
Unique visitors
Geographic and language information
Technical reports
Benchmarking
Social media analytics
Face book insights
Twitter analytics
YouTube analytics
Social Ad analytics / ROI measurement
Social & CRM Analysis
Radian6
Sentiment analysis
Workflow management
Text analytics

Learning Resources:
Reference Books:
Written by none other than Avinash Kaushik, Digital Marketing Evangelist for Google, Co-
Founder and Chief Education Officer for Market Motive, and author of two bestselling books: Web
Analytics 2.0, Web Analytics: An Hour A Day tops the chart when it comes to best Web Analytics
Books
Supplementary Reading:
Weblinks:

Pedagogy:
Participative learning, discussions, Problem Solving, experiential learning through practical
problem solving, assignment, PowerPoint presentation

Dr. Sudhir Gavhane


Dean, LASC


Assessment Scheme:
Class Continuous Assessment (CCA): 50 Marks

Assignments Test Presentations Case study MCQ Oral Attendance


20 10 10 - - 10

Term End Examination : 50 Marks

Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
Introduction to web analytics

What’s analysis?
Is analysis worth the effort?
• Small businesses
1 • Medium and large scale businesses 5 - -
Analysis vs intuition
What is web analytics?
Getting started with Google Analytics
• How Google Analytics works
• Accounts, profiles, and users
Google Analytics

Getting Started With Google Analytics


How Google Analytics works?
Accounts, profiles, and users navigating
Google Analytics
2 7 - -
Basic metrics
The main sections of Google Analytics reports
Traffic Sources
Direct, referring, and search traffic
Campaigns
AdWords, Adsense

Dr. Sudhir Gavhane


Dean, LASC


Content performance analysis

Pages and Landing Pages


3
Event Tracking and AdSense
Site Search
7 - 1
Visitor analysis

Unique visitors
4 Geographic and language information 6 - 1
Technical reports
Benchmarking

Social media analytics

Face book insights


5 Twitter analytics 8 - 1
YouTube analytics
Social Ad analytics / ROI measurement

Social & CRM Analysis

Radian6
6 Sentiment analysis 7 - 1
Workflow management
Text analytics

Prepared By Checked By Approved By

Ms. Smita Patil Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean, LASC
COURSE STRUCTURE

Course Code MIT-WPU-MBD-2206


Course Category Elective Big Data Analytics
Course Title Marketing Analytics
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 4 -- -- 3
Pre-requisites:

Course Objectives:
1. This course will focus on developing marketing strategies and resource allocation
decisions driven by quantitative analysis.
2. This course covers basic concepts in marketing process Measuring Brand Assets
3. This course includes Customer Lifetime Value, Regression Analysis, and Spreadsheet
with Formulas.
Course Outcomes:
4. Students will know what are the basic marketing strategies
5. Students learn about the core concepts and tools in marketing
6. Students know about measure brand value, calculate brand value
7. Students understand the marketing models.

Course Contents

The Marketing Process


What is marketing process and its Strategic Challenges? What are Marketing Strategies with data
using Text Analytics? How to utilize data to improve marketing strategies?

Metrics for Measuring Brand Assets


What is Metrics for Measuring Brand Assets? What is Snapple and Brand Value?
How to develop brand personality, develop brand architecture, brand pyramid, measure brand
value, calculate brand value?

Customer Lifetime Value


What is Customer Lifetime Value (CLV)? How to calculate CLV, understand the CLV Formula,
apply the CLV Formula, extend the CLV Formula, use CLV to make decisions?

Dr. Sudhir Gavhane


Dean
Marketing Experiments
What is Spreadsheet with Formulas? How to determine cause and effect through experiments?
How to design basic experiments, design before and after experiments, design full factorial web
experiments? How to calculate projected lift?

Regression Basics
What is Regression Analysis? How to interpret Regression Outputs? What is Multivariable
Regressions, Omitted Variable Bias? How to use Price Elasticity to Evaluate Marketing? What is
Log-Log Models and Marketing Mix Models?

Learning Resources:
1. Marketing Analytics A Practitioner's Guide to Marketing Analytics and Research Methods
By (author): Ashok Charan (NUS, Singapore)
2. Managing Customer Value One Stage at a Time By (author): Dilip Soman (University of
Toronto, Canada), Sara N-Marandi (University of Toronto, Canada)
3. Worldwide Casebook in Marketing Management By (author): Luiz Moutinho (Dublin City
University, Ireland)
4. Data-Driven Marketing: The 15 Metrics Everyone in Marketing Should Know Hardcover –
February 8, 2010 by Mark Jeffery (Author)
5. Lean Analytics: Use Data to Build a Better Startup Faster (Lean Series) Hardcover – March
21, 2013 by Alistair Croll (Author), Benjamin Yoskovitz (Author)
6. Digital Marketing Analytics: Making Sense of Consumer Data in a Digital World (Que
Biz-Tech) Paperback – April 25, 2013 by Chuck Hemann (Author), Ken Burbary (Author)

Pedagogy:

Participative learning, discussions, algorithm, Program writing, experiential learning through


practical problem-solving, assignment, PowerPoint presentation

Assessment Scheme:

Class Continuous Assessment (CCA)


Assignments Test Case study-1 Attendance Case study-2 Any other
10 10 10 10 10 -

Term End Examination : 50 marks External

Dr. Sudhir Gavhane


Dean
Syllabus:
Module Workload in Hrs
Contents
No. Theory Lab Assess
The Marketing Process
Introduction to the Marketing Process, Marketing Process,
1 Strategic Challenge, Marketing Strategy with Data, Using Text 7 - -
Analytics, Utilizing Data to Improve Marketing Strategy,
Improving the Marketing Process with Analytics, case study
Metrics for Measuring Brand Assets
Intro to Metrics for Measuring Brand Assets, Snapple and
Brand Value, Developing Brand Personality, Developing Brand
2 10 - -
Architecture, Brand Pyramid, Measuring Brand Value, Revenue
Premium as a Measure of Brand Equity, Calculating Brand
Value, case study
Customer Lifetime Value
Customer Lifetime Value (CLV),Calculating CLV,
3 Understanding the CLV Formula, Applying the CLV Formula, 10 - -
Extending the CLV Formula, Using CLV to Make Decisions,
CLV: A Forward Looking Measure, case study
Marketing Experiments
Spreadsheet with Formulas, Determining Cause and Effect
through Experiments, Designing Basic Experiments, Designing
Before - After Experiments, Designing Full Factorial Web
4 10 - -
Experiments, Designing an Experiment, Analyzing an
Experiment, Projecting Lift, Calculating Projected Lift, Pitfalls
of Marketing Experiments, Maximizing Effectiveness, case
study
Regression Basics
Using Regression Analysis, What Regressions Reveal,
5 Interpreting Regression Outputs, Multivariable Regressions, 8
Omitted Variable Bias, Using Price Elasticity to Evaluate
Marketing, Understanding Log-Log Models, Marketing Mix
Models

Prepared By Checked By Approved By

Archana Varade Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean
`

COURSE STRUCTURE

Course Code MIT-WPU-MBD-2207


Course Category Elective Big Data Analytics
Course Title Human Resource Analytics
Teaching Scheme and Credits L T Laboratory Credits
Weekly load hrs 4 - - 3
Pre-requisites:
Knowledge of any scripting language, XML.

Course Objectives:
1. To introduce use of data analytics techniques in HR

Course Outcomes:
1. Students will able to use data analytics technique in HR.

Course Contents:
HR Analytics in perspective
Introduction to role of data analytics in HR

A day in the life of HR


Introduction to daily activities of HR using case study

An analytics method
Describes challenges in HR and solution to it using data analytics

Hands-on introduction to HRA


A practical approach to collect and clean data required for HRA.

Toolkits
Introduction to various toolkits required for HRA.

Data challenges
Introduction to statistical methods for processing of data.

Dr. Sudhir Gavhane


Dean, LASC
`

Making HR data operational


Use of HR data for analysis

Predictive analytics
Introduction to use of predictive analysis for HR data .

Learning Resources:
Reference Books:
1. The New HR Analytics: Predicting the Economic Value of Your Company's Human
By Jac FITZ-ENZ
2. Predictive HR Analytics: Mastering the HR Metric By Dr Martin R. Edwards,
Kirsten Edwards
3. Predictive Analytics for Human Resources By Jac Fitz-enz, John Mattox, II
Supplementary Reading:

1. Applying Advanced Analytics to HR Management Decisions : Methods for


Selection, Developing Incentives and Improving Collaboration First Edition
(English, Paperback, James C. Sesil)
Web Resources:

Weblinks:

1. MOOCs:

Pedagogy: Participative learning, discussions, Problem Solving, experiential learning through


practical problem solving, assignment, PowerPoint presentation

Assessment Scheme:
Class Continuous Assessment (CCA) 50 Marks
Assignments Test Presentations Case study MCQ Oral Attendance
20 10 10 - - 10
Term End Examination : 50 Marks

Dr. Sudhir Gavhane


Dean, LASC


`

Syllabus:

Module Workload in Hrs


Contents
No. Theory Lab Assess
HR Analytics in perspective
Analytics roles
1 4 - -
Defining HR Analytics
Typical problems (working session)
A day in the life of HR
2 3 - -
Case Examples
An analytics method
Understanding the organizational system (Lean)
3 Locating the HR challenge in the system 5 - -
Valuing HR Analytics (working session)
Understanding the organizational system
Hands-on introduction to HRA
Typical data sources
Typical questions faced (survey)
Typical data issues
Connecting HR Analytics to business benefit (3 x case studies)
4 9 - -
Techniques for establishing questions
Building support and interest
Obtaining data
Cleaning data (exercise)
Supplementing data
Toolkits
Options, advantages and disadvantages
5 6 - -
Common toolkits: OrgVue, Tableau, Excel, Alteryx, QlikView
Practical exercises
Data challenges
6 Correlation (R2, ecological fallacy, 10 simple stats) 6 - -
Causation
Making HR data operational
7 Case examples 4 - -

Predictive analytics
When to use predictive analysis
8 Importance of innovation 8
What is “the organization as a system”?
Organization design

Dr. Sudhir Gavhane


Dean, LASC


`

Process led design


Workforce planning
Transition management
Impact analysis
Communication
Real time HR Analytics

ggest the below items:


Prepared By Checked By Approved By

Ms. Punam Nikam Ms. Pradnya Mahadik Dr. Sudhir Gavhane


Assistant Professor BOS Chairman Dean, LASC

You might also like