Department of Computer Science: Syllabus MSC (Data Analytics) 2020-2021
Department of Computer Science: Syllabus MSC (Data Analytics) 2020-2021
Syllabus
Department Overview
Vision
The Department of Computer Science endeavors to imbibe the vision of the University
“Excellence and Service”. The department is committed to this philosophy which pervades
every aspect and functioning of the department.
Mission
“To develop IT professionals with ethical and human values”. To accomplish our mission, the
department encourages students to apply their acquired knowledge and skills towards
professional achievements in their career. The department also moulds the students to be
socially responsible and ethically sound.
Programme Objectives
To enable learners to develop knowledge and skills in current and emerging areas of
data analytics.
To critically assess and evaluate business and technical strategies for data analytics.
To demonstrate expert knowledge of data analysis, statistics, tools, techniques and
technologies of data analytics.
To develop project-management, critical-thinking, problem-solving and decision-
making skills.
To formulate and implement a novel research idea and conduct research in the field of
data analytics.
2
Syllabus for MSc (Data Analytics) 2020-21
Programme Outcomes
On successful completion of the MSc programme students will be able to
PO1: Engage in continuous reflective learning in the context of technology and scientific
advancement.
PO2: Identify the need and scope of the Interdisciplinary research.
PO3: Enhance research culture and uphold the scientific integrity and objectivity
PO4: Understand the professional, ethical and social responsibilities
PO5: Understand the importance and the judicious use of technology for the sustainability of
the environment
PO6: Enhance disciplinary competency, employability and leadership skills
PSO1: Problem Analysis and Design: Ability to identify analyze and design solutions for
data analytics problems using fundamental principles of mathematics, Statistics, computing
sciences, and relevant domain disciplines.
PSO2: Modern software tool usage: Acquire the skills in handling data analytics
programming tools towards problem solving and solution analysis for domain specific
problems.
PSO3 Societal and Environmental Concern: Utilize the data analytics theories for societal
and environmental concerns.
PSO4: Professional Ethics: Understand and commit to professional ethics and cyber
regulations, responsibilities, and norms of professional computing practices.
PSO5: Applications in Multi disciplinary domains: Understand the role of statistical
approaches and apply the same to solve the real life problems in the fields of data analytics.
PSO6: Project Management: Apply the research-based knowledge to analyse and solve
advanced problems in data analytics.
3
Syllabus for MSc (Data Analytics) 2020-2021
Trimester I
Total 14 300 12
Trimester II
Total 14 300 12
Trimester III
Total 14 300 12
4
Syllabus for MSc (Data Analytics) 2020-21
Trimester IV
Total 14 300 12
Trimester V
Total 14 300 12
Trimester VI
Total 16 300 12
5
Syllabus for MSc (Data Analytics) 2020-2021
6
Syllabus for MSc (Data Analytics) 2020-21
Trimester – I
Course Objectives
To provide strong foundation for data analytics and application area related to it and
understand the underlying core concepts and emerging technologies in data analytics.
Course Outcomes
INTRODUCTION
Data Analytics - Types – Phases - Quality and Quantity of data – Measurement - Exploratory
data analysis - Business Intelligence.
BIG DATA
Big Data and Cloud technologies - Introduction to HADOOP: Big Data, Apache Hadoop,
MapReduce - Data Serialization - Data Extraction - Stacking Data - Dealing with data.
DATA VISUALIZATION
Introduction to data visualization – Data visualization options – Filters – Dashboard
development tools – Creating an interactive dashboard with dc.js - summary.
7
Syllabus for MSc (Data Analytics) 2020-2021
Essential Reading
[1] Davy Cielen, Arno D. B. Meysman, Mohamed Ali, Introducing Data Science, Manning
Publications Co., 1st edition, 2016.
[2] Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to
Statistical Learning: with Applications in R, Springer, 1st edition, 2013.
[3] Bart Baesens, Analytics in a Big Data World: The Essential Guide to Data Science and its
Applications, Wiley.
[4] D J Patil, Hilary Mason, Mike Loukides, Ethics and Data Science, O’ Reilly, 1st edition,
2018.
Recommended Reading
[1] Dr Anil Maheshwari, Data Analytics Made Accessible, Publisher: Amazon.com Services
LLC.
[2] Joel Grus, Data Science from Scratch: First Principles with Python, O’Reilly, 1st edition,
2015.
[3] Cathy O'Neil, Rachel Schutt, Doing Data Science, Straight Talk from the Frontline, O’
Reilly, 1st edition, 2013.
[4] Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman, Mining of Massive Datasets,
Cambridge University Press, 2nd edition, 2014.
[5] Eric Siegel, Predictive Analytics The Power to Predict Who Will Click, Buy, Lie, or Die,
2nd Ed., Wiley.
8
Syllabus for MSc (Data Analytics) 2020-21
Course Objectives
This course is to equip the students to visualize and analyse the data using R and to
communicate statistical results in correct manner.
Course Outcomes
R AND R STUDIO
Getting started with R - installing R and R studio - getting help - installing and loading
packages - simple arithmetic calculations - data structure – expressions - conditional statements
– functions – loops - R–markdown - introduction to Statistics - probability and data with R.
Lab Exercises
1. R program to illustrate different data structures
2. Defining functions and making report in markdown
Lab Exercises
1. Loading dataset and visualizing data
2. Producing descriptive statistics measures
9
Syllabus for MSc (Data Analytics) 2020-2021
evaluating the normal distribution - working with the normal distribution - binomial
distribution - normal approximation to binomial - working with the binomial distribution.
Lab Exercises
1. Computing probabilities in R
2. Functions for probability distributions in R
ESTIMATION
Introduction to Inference - sampling from population - maximum likelihood estimator - least
square estimator - confidence interval (CI) (for a mean) - accuracy vs. Precision - required
sample size for mean, CI (for the mean) examples.
Lab Exercises
1. Finding ML estimates and least square estimates
2. Constructing confidence interval
TESTING OF HYPOTHESIS
Introduction - hypothesis testing (HT) - decision errors - large sample and small sample tests -
inference for other estimators - significance vs. confidence level - statistical vs. practical
significance - inference for proportions.
Lab Exercises
1. Carrying out large sample tests in R
2. Some small samples tests: t-test, paired t-test in R
Essential Reading
[1] Grolemund G., Hands-on programming with R: write your own functions and simulations,
O' Reilly Media Inc., 2014.
[2]James G., Witten D., Hastie T., & Tibshirani R, An introduction to statistical learning: with
Applications in R, Springer, 2013.
Recommended Reading
[1] Gupta S. C., & Kapoor V. K., Fundamental of Mathematical Statistics, Sultan Chand &
Sons, 2018.
[2] Peng R. D, Exploratory data analysis with R, Lulu.Com, 2012.
[3] Peng R. D, R programming for data science, Leanpub, 2016.
[4] Teetor P, R cookbook: Proven recipes for data analysis, statistics, and graphics, O' Reilly
Media Inc., 2011.
[5] Crawley M. J., The R book, John Wiley & Sons, 2012.
10
Syllabus for MSc (Data Analytics) 2020-21
Course Objectives
Course Outcomes
INTRODUCTION TO PYTHON
Structure of Python Program-Underlying mechanism of Module Execution-Branching and
Looping-Problem Solving Using Branches and Loops-Functions - Lists and Mutability-
Problem Solving Using Lists and Functions.
Lab Exercises
1. Demonstrate usage of branching and looping statements
2. Demonstrate Recursive functions
3. Demonstrate Lists
USING NUMPY
Basics of NumPy-Computation on NumPy-Aggregations-Computation on Arrays-
Comparisons, Masks and Boolean Arrays-Fancy Indexing-Sorting Arrays-Structured Data:
NumPy’s Structured Array.
11
Syllabus for MSc (Data Analytics) 2020-2021
Lab Exercises
1. Demonstrate Aggregation
2. Demonstrate Indexing and Sorting
Essential Reading
[1] Jake VanderPlas, Python Data Science Handbook - Essential Tools for Working with Data,
O’Reily Media Inc., 2016.
[2] Zhang.Y, An Introduction to Python and Computer Programming, Springer Publications,
2016.
Recommended Reading
[1] Joel Grus , Data Science from Scratch First Principles with Python, O’Reilly Media, 2016.
[2] T.R.Padmanabhan, Programming with Python, Springer Publications, 2016.
12
Syllabus for MSc (Data Analytics) 2020-21
Trimester – II
Course Objectives
Linear Algebra plays a fundamental role in the theory of Data Science. This course aims at
introducing the basic notions of vector spaces, Linear Algebra and the use of Linear Algebra in
applications to Data Science.
Course Outcomes
LINEAR MAPS
Definition of Linear Maps - Algebraic Operations on - Null spaces and Injectivity -
Range and Surjectivity - Fundamental Theorems of Linear Maps - Representing a Linear Map
by a Matrix - Invertible Linear Maps - Isomorphic Vector spaces - Linear Map as Matrix
Multiplication - Operators - Products of Vector Spaces - Product of Direct Sum - Quotients of
Vector spaces.
13
Syllabus for MSc (Data Analytics) 2020-2021
Essential Reading
Recommended Reading
[1] E. Davis, Linear algebra and probability for computer science applications, CRC Press,
2012.
[2] J. V. Kepner and J. R. Gilbert, Graph algorithms in the language of linear algebra, Society
for Industrial and Applied Mathematics, 2011.
[3] D. A. Simovici, Linear algebra tools for data mining, World Scientific Publishing, 2012.
[4] P. N. Klein, Coding the matrix: linear algebra through applications to computer science,
Newtonian Press, 2015.
14
Syllabus for MSc (Data Analytics) 2020-21
Course Objectives
The main objective of this course is to fundamental knowledge and practical experience with,
database concepts. It includes the concepts and terminologies which facilitate the construction
of database tables and write effective queries. Also, to Comprehend Data warehouse and its
functions.
Course Outcomes
INTRODUCTION
Concept & Overview of DBMS, Data Models, Database Languages, Database Administrator,
Database Users, Three Schema architecture of DBMS. Basic concepts, Design Issues, Mapping
Constraints, Keys, Entity-Relationship Diagram, Weak Entity Sets, Extended E-R features.
Lab Exercises
1. Data Definition,
2. Table Creation
3. Constraints
Lab Exercises
1. Insert, Select, Update & Delete Commands
2. Nested Queries & Join Queries
3. Views
15
Syllabus for MSc (Data Analytics) 2020-2021
Lab Exercises
1. Importing source data structures
2. Design Target Data Structures
Lab Exercises
1. Create target structure
2. Design and build the ETL mapping
Lab Exercises
1. Perform the ETL process and transform into data map
2. Create the cube and process it
3. Generating Reports
4. Creating the Pivot table and pivot chart using some existing data
Essential Reading
[1] Henry F. Korth and Silberschatz Abraham, Database System Concepts, Mc.Graw Hill.
[2] Thomas Cannolly and Carolyn Begg, Database Systems, A Practical Approach to Design,
Implementation and Management”, Third Edition, Pearson Education, 007.
[3] The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd John
Wiley & Sons Inc., New York, USA, 2002.
Recommended Reading
[1] LiorRokach and OdedMaimon, Data Mining and Knowledge Discovery Handbook,
Springer, 2nd edition, 2010.
16
Syllabus for MSc (Data Analytics) 2020-21
Course Objectives
To preprocess and analyze data, to choose relevant models and algorithms for respective
applications and to develop research interest towards advances in data mining
Course Outcomes
Lab Exercises
1. Identify a dataset, Preprocess the dataset set using normalization techniques
2. Explore data reduction techniques
Lab Exercises
1. Identify frequent itemsets using Apriori Algorithm
2. Generate FP Tree for a transaction dataset
CLASSIFICATION TECHNIQUES
Basic Concepts – Decision Tree Induction – Bayes Classification Methods – Rule-Based
Classification – Model Evaluation and Selection – Techniques to Improve Classification
Accuracy – Bayesian Belief Networks – Classification by Backpropagation – Support Vector
Machines.
17
Syllabus for MSc (Data Analytics) 2020-2021
Lab Exercises
1. Construct Decision Tree for a dataset and identify the order of attributes
2. Apply Bayes Classification
CLUSTERING TECHNIQUES
Cluster Analysis – Partitioning Methods - Hierarchical Methods – Density-Based Methods
(Includes all clustering techniques under the given categories in the Text Book).
Lab Exercises
1. Demonstrate Naïve Bayes Classifier
2. Apply K-Means Clustering for given number of clusters
[1] Jiawei Han, Micheline Kamber and Jian Pie, Data Mining Concept and Techniques,
Morgan and Kaufmann Publisher, Third Edition, 2012.
[2] Arun K Pujari, Data Mining Techniques, Second Edition, Universities Press India Pvt. Ltd.
2010.
Recommended Reading
[1] Daniel T. Larose and Chantal D. Larose, Data Mining and Predictive Analytics, Wiley
Series on Methods and Applications in Data Mining, Wiley Publications.
[2] Ian H. Witten, Eibe Frank and Mark A. Hall, Data Mining: Practical Machine Learning
Tools and Techniques, Morgan and Kaufmann Publisher, Third Edition, 2014.
Web Resources:
[1] https://data-flair.training/blogs/data-mining-tutorial/
[2] https://www.tutorialride.com/data-mining/data-mining-tutorial.htm
18
Syllabus for MSc (Data Analytics) 2020-21
Trimester - III
MDA331: Artificial Intelligence
Total Teaching Hours for Semester: 60
Max Marks: 100 Credits: 04
Course Objectives
This course aims at developing an understanding about the issues involved in defining and
simulating perception, identifying the problems where AI is required and the different methods
available, to compare and contrast different AI techniques available, to define and explain
learning algorithms and to provide the student additional experience in the analysis and
evaluation of complicated systems.
Course Outcomes
CO1: Express the modern view of AI and its foundation
CO2: Illustrate Search Strategies with algorithms and Problems
CO3: Implement Propositional logic and apply inference rules
CO4: Apply suitable techniques for NLP and Game Playing
INTRODUCTION
Introduction to AI, The Foundations of AI, AI Technique -Tic-Tac-Toe. Problem
characteristics, Production system characteristics, Production systems: 8-puzzle problem.
Searching: Uniformed search strategies – Breadth first search, depth first search.
KNOWLEDGE REPRESENTATION
First order logic. Inference in first order logic, propositional Vs. first order inference,
unification & lifts, Clausal form conversion, Forward chaining, Backward chaining,
Resolution.
SELF LEARNING
Propositional logic - syntax & semantics
GAME PLAYING
Overview, Minimax algorithm, Alpha-Beta pruning, Additional Refinements. Probabilistic
Reasoning: Ad Hoc Methods., Expert System, Expert System Shells.
19
Syllabus for MSc (Data Analytics) 2020-2021
Essential Reading
[1] E. Rich and K. Knight, Artificial Intelligence, 3rd Edition, New york: TMH, 2019.
[2] S. Russell and P. Norvig, Artificial Intelligence A Modern Approach, 3rd Edition, Pearson
Education, 2019.
Recommended Reading
[1] Eugene Charniak and Drew McDermott, Introduction to Artificial Intelligence, 2 nd Edition.
Singapore: Pearson Education, 2005.
[2] George F Luger, Artificial Intelligence Structures and Strategies for Complex Problem
Solving, 4th Edition. Singapore: Pearson Education, 2008, ISBN-13 9780321545893.
[3] N.L. Nilsson, Artificial Intelligence: A New Synthesis, 1st Edition, USA: Morgan
Kaufmann, 2000.
[4] Patterson, Introduction to artificial intelligence, ISBN-13: 978-0134771007.
Web Resources
[1] https://ai.google/education/
[2] https://intellipaat.com/blog/tutorial/artificial-intelligence-tutorial/
[3] https://www.javatpoint.com/artificial-intelligence-tutorial
20
Syllabus for MSc (Data Analytics) 2020-21
Course Objectives
This course equips students to assess the relationship between variables in a data set and a
continuous response variable. In this course, students learn to fit simple and multiple linear
regression models using the R program.
Course Outcomes
INTRODUCTION
Introduction to regression: regression through the origin, linear least squares, regression to the
mean, basic definitions: notation for data, the empirical mean, the empirical standard deviation
and variance, normalization, empirical covariance, some facts about correlation.
Lab Exercises in R
1. Visualizing data for model fitting using R
2. Finding least square estimates for parameters in the simple linear model
Lab Exercises in R
1. Building a basic linear regression model for the association between a single
explanatory variable and a response variable.
2. Finding interval estimates and testing hypotheses in a simple linear model.
21
Syllabus for MSc (Data Analytics) 2020-2021
Lab Exercises in R
1. Building a multiple linear regression model for the association between explanatory
variables and a response variable.
2. Finding interval estimates and testing hypotheses in multiple linear models.
Lab Exercises in R
1. Residual analysis of linear regression model
2. Model selection and nested model testing
Lab Exercises in R
1. Building a logistic regression model for the categorical response variable
2. Modelling count data using a Poisson regression model.
Essential Reading
[1] Fox, J., & Weisberg, S, An R companion to applied regression, Sage publications, 2018.
[2] Caffo, B., Regression models for data science in R, Leanpub, 2015.
Recommended Reading
[1] Ciaburro, G., Regression Analysis with R: Design and develop statistical nodes to identify
unique relationships within data at scale, Packt Publishing Ltd, 2018.
[2] Sheather, S., A modern approach to regression with R, Springer Science & Business Media,
2009.
[3] Lilja, D. J., Linear Regression Using R: An Introduction to Data Modeling, University of
Minnesota Libraries Publishing, 2016.
22
Syllabus for MSc (Data Analytics) 2020-21
Course Objectives
The subject is intended to give the knowledge of Big Data evolving in every real-time
applications and how they are manipulated using the emerging technologies. This course
breaks down the walls of complexity in processing Big Data by providing a practical approach
to developing Java applications on top of the Hadoop platform. It describes the Hadoop
architecture and how to work with the Hadoop Distributed File System (HDFS) and HBase in
Ubuntu platform.
Course Outcomes
CO1: Able to understand the Big Data concepts in real time scenario
CO2: Understand the architecture of Hadoop with practical
CO3: Apply map reduce concept to implement in cloud
INTRODUCTION
Distributed file system – Big Data and its importance, Four Vs, Drivers for Big data, Big data
analytics, Big data applications, Algorithms using map reduce, Matrix-Vector Multiplication
by Map Reduce.
Apache Hadoop– Moving Data in and out of Hadoop – Understanding inputs and outputs of
MapReduce - Data Serialization, Problems with traditional large-scale systems-Requirements
for a new approach-Hadoop – Scaling-Distributed Framework-Hadoop v/s RDBMS-Brief
history of Hadoop.
Lab Exercises
1. Word count application in Hadoop.
2. Sorting the data using MapReduce.
CONFIGURATIONS OF HADOOP
Hadoop Processes (NN, SNN, JT, DN, TT)-Temporary directory – UI-Common errors when
running Hadoop cluster, solutions.
Setting up Hadoop on a local Ubuntu host: Prerequisites, downloading Hadoop, setting up
SSH, configuring the pseudo-distributed mode, HDFS directory, NameNode, Examples of
MapReduce, Using Elastic MapReduce, Comparison of local versus EMR Hadoop.
Understanding MapReduce:Key/value pairs,TheHadoop Java API for MapReduce, Writing
MapReduce programs, Hadoop-specific data types, Input/output.
23
Syllabus for MSc (Data Analytics) 2020-2021
Developing MapReduce Programs: Using languages other than Java with Hadoop, Analysing a
large dataset.
Lab Exercises
1. Finding max and min value in Hadoop.
2. Implementation of decision tree algorithms using MapReduce.
Lab Exercises
1. Implementation of K-means Clustering using MapReduce.
2. Generation of Frequent Itemset using MapReduce.
HADOOP STREAMING
Hadoop Streaming - Streaming Command Options - Specifying a Java Class as the
Mapper/Reducer - Packaging Files With Job Submissions - Specifying Other Plug-ins for Jobs.
Lab Exercises
1. Count the number of missing and invalid values through joining two large given
datasets.
2. Using hadoop’s map-reduce, Evaluating Number of Products Sold in Each Country in
the online shopping portal. Dataset is given.
Lab Exercises
1. Analyze the sentiment for product reviews, this work proposes a MapReduce
technique provided by Apache Hadoop.
2. Trend Analysis based on Access Pattern over Web Logs using Hadoop.
24
Syllabus for MSc (Data Analytics) 2020-21
Essential Reading
[1] Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, Professional Hadoop Solutions,
Wiley, 2015.
Tom White, Hadoop: The Definitive Guide, O’Reilly Media Inc., 015.
[3] Garry Turkington, Hadoop Beginner's Guide, Packt Publishing, 2013.
Recommended Reading
[1] Pethuru Raj, Anupama Raman, DhivyaNagaraj and Siddhartha Duggirala, High-
Performance Big-Data Analytics: Computing Systems and Approaches, Springer, 2015.
[2] Jonathan R. Owens, Jon Lentz and Brian Femiano, Hadoop Real-World Solutions
Cookbook, Packt Publishing, 2013.
[3] Tom White, HADOOP: The definitive Guide, O Reilly, 2012.
25