0% found this document useful (0 votes)
73 views25 pages

Department of Computer Science: Syllabus MSC (Data Analytics) 2020-2021

The document outlines the syllabus for the MSc in Data Analytics program offered by the Department of Computer Science at CHRIST (Deemed to be University) in Bangalore, India. The two-year program aims to develop skills in data analytics and prepare students for careers in the growing field. The syllabus covers topics like data mining, machine learning, statistical modeling, programming languages and databases. The program structure includes core courses, electives and a final project in each of the six trimesters.

Uploaded by

pavan kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views25 pages

Department of Computer Science: Syllabus MSC (Data Analytics) 2020-2021

The document outlines the syllabus for the MSc in Data Analytics program offered by the Department of Computer Science at CHRIST (Deemed to be University) in Bangalore, India. The two-year program aims to develop skills in data analytics and prepare students for careers in the growing field. The syllabus covers topics like data mining, machine learning, statistical modeling, programming languages and databases. The program structure includes core courses, electives and a final project in each of the six trimesters.

Uploaded by

pavan kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Department of Computer Science

Interdisciplinary Masters Programme

Syllabus

MSc (Data Analytics)


2020-2021

CHRIST (Deemed to be University), Bangalore.


Karnataka, India
www.christuniversity.in
Syllabus for MSc (Data Analytics) 2020-2021

Department Overview

The Department of Computer Science of CHRIST (Deemed to be University) strives to shape


outstanding computer professionals with ethical and human values to reshape the Nation's
destiny. The training imparted aims to prepare young minds for the challenging opportunities
in the IT industry with a global awareness rooted in the Indian soil, nourished and supported by
experts in the field.

Vision

The Department of Computer Science endeavors to imbibe the vision of the University
“Excellence and Service”. The department is committed to this philosophy which pervades
every aspect and functioning of the department.

Mission

“To develop IT professionals with ethical and human values”. To accomplish our mission, the
department encourages students to apply their acquired knowledge and skills towards
professional achievements in their career. The department also moulds the students to be
socially responsible and ethically sound.

Introduction to the Programme

MSc (Data Analytics) is a Six Trimester Interdisciplinary Postgraduate Degree Programme


conducted by the Department of Computer Science. This programme is designed for working
professionals and graduates who want to launch their career in the lucrative field of data
analytics. As organizations are looking for ways to explore the power of big data, technology
professionals who are experienced in analytics are in high demand. This programme aims to
offer thorough knowledge of the theory and practice of data analytics to become a leading
practitioner in the field of data analytics. This programme accommodates a wide audience of
learners whose specific interests in data analytics may be either technical or business focused.

Programme Objectives

 To enable learners to develop knowledge and skills in current and emerging areas of
data analytics.
 To critically assess and evaluate business and technical strategies for data analytics.
 To demonstrate expert knowledge of data analysis, statistics, tools, techniques and
technologies of data analytics.
 To develop project-management, critical-thinking, problem-solving and decision-
making skills.
 To formulate and implement a novel research idea and conduct research in the field of
data analytics.

2
Syllabus for MSc (Data Analytics) 2020-21

Ethics and Human Values


1. Only proprietary or open source software would be used for academic teaching and
learning purposes.
2. Copying of programs from internet, friends or from other sources is strictly discouraged
since it impairs development of programming skills.
3. Unique Practical (Domain based) exercises ensures that the students don’t involve in
code plagiarism.
4. Projects undertaken by students during the course are done in teams to improve
collaborative work and synergy between team members.
5. Projects involve modularization which initiates students to take individual
responsibility for common goals.
6. Passion for excellence is promoted among the students, be it in software development
or project documentation.
7. Giving due credit to sources during the seminar and research assignment is promoted
among the students
8. The course and its design enforce the practice of good referencing technique to improve
the sense of integrity.
9. Courses involving group discussions and debates on ethical practices and human values
are designed to sensitize the students in dealing with customers and members within the
organization.

Programme Outcomes
On successful completion of the MSc programme students will be able to
PO1: Engage in continuous reflective learning in the context of technology and scientific
advancement.
PO2: Identify the need and scope of the Interdisciplinary research.
PO3: Enhance research culture and uphold the scientific integrity and objectivity
PO4: Understand the professional, ethical and social responsibilities
PO5: Understand the importance and the judicious use of technology for the sustainability of
the environment
PO6: Enhance disciplinary competency, employability and leadership skills

Programme Specific Outcomes

PSO1: Problem Analysis and Design: Ability to identify analyze and design solutions for
data analytics problems using fundamental principles of mathematics, Statistics, computing
sciences, and relevant domain disciplines.
PSO2: Modern software tool usage: Acquire the skills in handling data analytics
programming tools towards problem solving and solution analysis for domain specific
problems.
PSO3 Societal and Environmental Concern: Utilize the data analytics theories for societal
and environmental concerns.
PSO4: Professional Ethics: Understand and commit to professional ethics and cyber
regulations, responsibilities, and norms of professional computing practices.
PSO5: Applications in Multi disciplinary domains: Understand the role of statistical
approaches and apply the same to solve the real life problems in the fields of data analytics.
PSO6: Project Management: Apply the research-based knowledge to analyse and solve
advanced problems in data analytics.

3
Syllabus for MSc (Data Analytics) 2020-2021

Programme Structure of MSc (Data Analytics) -Trimester wise

Trimester I

Course Code Course Title Course No. of Marks Credits


Type Hrs /
Week
MDA131 Principles of Data Analytics Core 04 100 04

MDA171 Statistical Methods using R Core 05 100 04

MDA172 Python for Data Analytics Core 05 100 04

Total 14 300 12

Trimester II

Course Code Course Title Course No. of Marks Credits


Type Hrs /
Week
MDA231 Mathematical Foundation for Data Analytics Core 04 100 04

MDA271 Database Technologies Core 05 100 04

MDA272 Data Mining Core 05 100 04

Total 14 300 12

Trimester III

Course Code Course Title Course No. of Marks Credits


Type Hrs /
Week
MDA331 Artificial Intelligence Core 04 100 04

MDA371 Regression Modelling Core 05 100 04

MDA372 Big Data Analytics Core 05 100 04

Total 14 300 12

4
Syllabus for MSc (Data Analytics) 2020-21

Trimester IV

Course Code Course Title Course No. of Marks Credits


Type Hrs /
Week
MDA471 Machine Learning Core 05 100 04

MDA472 Natural Language Processing DSE 05 100 04

Generic Elective - I GE 04 100 04

Total 14 300 12

Trimester V

Course Code Course Title Course No. of Marks Credits


Type Hrs /
Week

MDA571 Data Visualization Core 05 100 04

MDA572 Neural Networks and Deep Learning DSE 05 100 04

Generic Elective - II GE 04 100 04

Total 14 300 12

Trimester VI

Course Code Course Title Course No. of Marks Credits


Type Hrs /
Week
MDA681 Project Core 08 100 04

Generic Elective - III GE 04 100 04

Generic Elective - IV GE 04 100 04

Total 16 300 12

5
Syllabus for MSc (Data Analytics) 2020-2021

Elective Courses offered by Computer Science Department


Discipline Specific Elective
MDA472 Natural Language Processing DSE 05 100 04

MDA572 Neural Networks and Deep Learning DSE 05 100 04

Generic Elective Courses

MDA461 Business Intelligence GE 04 100 04

MDA561 Internet of Things GE 04 100 04

MDA661 Web Analytics GE 04 100 04

MDA662 Cloud Analytics GE 04 100 04

6
Syllabus for MSc (Data Analytics) 2020-21

Trimester – I

MDA131: Principles of Data Analytics

Total Teaching Hours for Semester: 60


Max Marks: 100 Credits: 04

Course Objectives

To provide strong foundation for data analytics and application area related to it and
understand the underlying core concepts and emerging technologies in data analytics.

Course Outcomes

CO1: Explore the fundamental concepts of data analytics


CO2: Understand data analysis techniques for applications handling large data
CO3: Understand various machine learning algorithms used in data analytics process
CO4: Visualize and present the inference using various tools
CO5: Learn to think through the ethics surrounding privacy, data sharing and algorithmic
decision-making

Unit-1 Teaching Hours: 12

INTRODUCTION
Data Analytics - Types – Phases - Quality and Quantity of data – Measurement - Exploratory
data analysis - Business Intelligence.

Unit-2 Teaching Hours: 12

BIG DATA
Big Data and Cloud technologies - Introduction to HADOOP: Big Data, Apache Hadoop,
MapReduce - Data Serialization - Data Extraction - Stacking Data - Dealing with data.

Unit-3 Teaching Hours: 12

DATA VISUALIZATION
Introduction to data visualization – Data visualization options – Filters – Dashboard
development tools – Creating an interactive dashboard with dc.js - summary.

Unit-4 Teaching Hours: 12

ANALYTICS AND MACHINE LEARNING


Machine learning – Modeling Process – Training model – Validating model – Predicting new
observations –Supervised learning algorithms – Unsupervised learning algorithms.

7
Syllabus for MSc (Data Analytics) 2020-2021

Unit-5 Teaching Hours: 12

ETHICS AND RECENT TRENDS


Data Science Ethics – Doing good data science – Owners of the data - Valuing different
aspects of privacy - Getting informed consent - The Five Cs – Diversity – Inclusion – Future
Trends.

Essential Reading

[1] Davy Cielen, Arno D. B. Meysman, Mohamed Ali, Introducing Data Science, Manning
Publications Co., 1st edition, 2016.
[2] Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to
Statistical Learning: with Applications in R, Springer, 1st edition, 2013.
[3] Bart Baesens, Analytics in a Big Data World: The Essential Guide to Data Science and its
Applications, Wiley.
[4] D J Patil, Hilary Mason, Mike Loukides, Ethics and Data Science, O’ Reilly, 1st edition,
2018.

Recommended Reading

[1] Dr Anil Maheshwari, Data Analytics Made Accessible, Publisher: Amazon.com Services
LLC.
[2] Joel Grus, Data Science from Scratch: First Principles with Python, O’Reilly, 1st edition,
2015.
[3] Cathy O'Neil, Rachel Schutt, Doing Data Science, Straight Talk from the Frontline, O’
Reilly, 1st edition, 2013.
[4] Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman, Mining of Massive Datasets,
Cambridge University Press, 2nd edition, 2014.
[5] Eric Siegel, Predictive Analytics The Power to Predict Who Will Click, Buy, Lie, or Die,
2nd Ed., Wiley.

8
Syllabus for MSc (Data Analytics) 2020-21

MDA171: Statistical Methods using R

Total Teaching Hours for Semester: 75


Max Marks: 100 Credits: 04

Course Objectives

This course is to equip the students to visualize and analyse the data using R and to
communicate statistical results in correct manner.

Course Outcomes

CO1: Understand R and R studio


CO2: Create reports using R markdown
CO3: Analyse data for a given problem
CO4: Apply probability and statistics in real life problems
CO5: Draw scientific inference from data using R

Unit-1 Teaching Hours: 15

R AND R STUDIO
Getting started with R - installing R and R studio - getting help - installing and loading
packages - simple arithmetic calculations - data structure – expressions - conditional statements
– functions – loops - R–markdown - introduction to Statistics - probability and data with R.

Lab Exercises
1. R program to illustrate different data structures
2. Defining functions and making report in markdown

Unit-2 Teaching Hours: 15

EXPLORATORY DATA ANALYSIS


Visualizing numerical data - graphing systems available in R - descriptive Statistics - measures
of central tendency and dispersion – correlation - transforming data - exploring categorical
variables.

Lab Exercises
1. Loading dataset and visualizing data
2. Producing descriptive statistics measures

Unit-3 Teaching Hours: 15

PROBABILITY AND PROBABILITY DISTRIBUTIONS


Introduction - disjoint events - general addition rule – independence - probability examples -
disjoint vs. Independent - conditional probability - probability trees - normal distribution -

9
Syllabus for MSc (Data Analytics) 2020-2021

evaluating the normal distribution - working with the normal distribution - binomial
distribution - normal approximation to binomial - working with the binomial distribution.

Lab Exercises
1. Computing probabilities in R
2. Functions for probability distributions in R

Unit-4 Teaching Hours: 15

ESTIMATION
Introduction to Inference - sampling from population - maximum likelihood estimator - least
square estimator - confidence interval (CI) (for a mean) - accuracy vs. Precision - required
sample size for mean, CI (for the mean) examples.

Lab Exercises
1. Finding ML estimates and least square estimates
2. Constructing confidence interval

Unit-5 Teaching Hours: 15

TESTING OF HYPOTHESIS
Introduction - hypothesis testing (HT) - decision errors - large sample and small sample tests -
inference for other estimators - significance vs. confidence level - statistical vs. practical
significance - inference for proportions.

Lab Exercises
1. Carrying out large sample tests in R
2. Some small samples tests: t-test, paired t-test in R

Essential Reading

[1] Grolemund G., Hands-on programming with R: write your own functions and simulations,
O' Reilly Media Inc., 2014.
[2]James G., Witten D., Hastie T., & Tibshirani R, An introduction to statistical learning: with
Applications in R, Springer, 2013.

Recommended Reading

[1] Gupta S. C., & Kapoor V. K., Fundamental of Mathematical Statistics, Sultan Chand &
Sons, 2018.
[2] Peng R. D, Exploratory data analysis with R, Lulu.Com, 2012.
[3] Peng R. D, R programming for data science, Leanpub, 2016.
[4] Teetor P, R cookbook: Proven recipes for data analysis, statistics, and graphics, O' Reilly
Media Inc., 2011.
[5] Crawley M. J., The R book, John Wiley & Sons, 2012.

10
Syllabus for MSc (Data Analytics) 2020-21

MDA172: Python for Data Analytics


Total Teaching Hours for Semester: 75
Max Marks: 100 Credits: 04

Course Objectives

The objective of this course is to provide comprehensive knowledge of python programming


paradigms required for Data Analytics.

Course Outcomes

CO1: Demonstrate the use of built-in objects of Python


CO2: Demonstrate significant experience with python program development environment
CO3: Implement numerical programming, data handling and visualization through
NumPy, Pandas and MatplotLib modules

Unit-1 Teaching Hours: 15

INTRODUCTION TO PYTHON
Structure of Python Program-Underlying mechanism of Module Execution-Branching and
Looping-Problem Solving Using Branches and Loops-Functions - Lists and Mutability-
Problem Solving Using Lists and Functions.
Lab Exercises
1. Demonstrate usage of branching and looping statements
2. Demonstrate Recursive functions
3. Demonstrate Lists

Unit-2 Teaching Hours: 15

SEQUENCE DATATYPES AND OBJECT-ORIENTED PROGRAMMING


Sequences, Mapping and Sets- Dictionaries- -Classes: Classes and Instances-Inheritance-
Exceptional Handling-Introduction to Regular Expressions using “re” module.
Lab Exercises
1. Demonstrate Tuples and Sets
2. Demonstrate Dictionaries
3. Demonstrate inheritance and exceptional handling
4. Demonstrate use of “re”.

Unit-3 Teaching Hours: 15

USING NUMPY
Basics of NumPy-Computation on NumPy-Aggregations-Computation on Arrays-
Comparisons, Masks and Boolean Arrays-Fancy Indexing-Sorting Arrays-Structured Data:
NumPy’s Structured Array.

11
Syllabus for MSc (Data Analytics) 2020-2021

Lab Exercises
1. Demonstrate Aggregation
2. Demonstrate Indexing and Sorting

Unit-4 Teaching Hours: 15

DATA MANIPULATION WITH PANDAS


Introduction to Pandas Objects - Data indexing and Selection - Operating on Data in Pandas -
Handling Missing Data - Hierarchical Indexing - Combining Data Sets - Aggregation and
Grouping - Pivot Tables.
Lab Exercises
1. Demonstrate handling of missing data
2. Demonstrate hierarchical indexing

Unit-5 Teaching Hours: 15

VISUALIZATION AND MATPLOTLIB


Basic functions of matplotlib - Simple Line Plot, Scatter Plot - Density and Contour Plots -
Histograms, Binnings and Density - Customizing Plot Legends, Colour Bars - Three-
Dimensional Plotting in Matplotlib.
Lab Exercises
1. Demonstrate Scatter Plot
2. Demonstrate 3D plotting

Essential Reading

[1] Jake VanderPlas, Python Data Science Handbook - Essential Tools for Working with Data,
O’Reily Media Inc., 2016.
[2] Zhang.Y, An Introduction to Python and Computer Programming, Springer Publications,
2016.

Recommended Reading

[1] Joel Grus , Data Science from Scratch First Principles with Python, O’Reilly Media, 2016.
[2] T.R.Padmanabhan, Programming with Python, Springer Publications, 2016.

12
Syllabus for MSc (Data Analytics) 2020-21

Trimester – II

MDA231: Mathematical Foundation for Data Analytics

Total Teaching Hours for Semester: 60


Max Marks: 100 Credits: 04

Course Objectives

Linear Algebra plays a fundamental role in the theory of Data Science. This course aims at
introducing the basic notions of vector spaces, Linear Algebra and the use of Linear Algebra in
applications to Data Science.

Course Outcomes

CO1: Understand the properties of Vector spaces


CO2: Use the properties of Linear Maps in solving problems on Linear Algebra
CO3: Demonstrate proficiency on the topics Eigen values, Eigen vectors and Inner Product
Spaces
CO4: Apply mathematics for some applications in Data Science

Unit-1 Teaching Hours: 15

INTRODUCTION TO VECTOR SPACES


Vector Spaces: Rn and Cn, lists, Fnand digression on Fields, Definition of Vector spaces,
Subspaces, sums of Subspaces, Direct Sums, Span and Linear Independence, bases, dimension.

Unit-2 Teaching Hours: 20

LINEAR MAPS
Definition of Linear Maps - Algebraic Operations on - Null spaces and Injectivity -
Range and Surjectivity - Fundamental Theorems of Linear Maps - Representing a Linear Map
by a Matrix - Invertible Linear Maps - Isomorphic Vector spaces - Linear Map as Matrix
Multiplication - Operators - Products of Vector Spaces - Product of Direct Sum - Quotients of
Vector spaces.

Unit-3 Teaching Hours: 10

EIGENVALUES, EIGENVECTORS, AND INNER PRODUCT SPACES


Eigen values and Eigenvectors - Eigenvectors and Upper Triangular matrices – Eigen spaces
and Diagonal Matrices - Inner Products and Norms - Linear functionals on Inner Product
spaces.

13
Syllabus for MSc (Data Analytics) 2020-2021

Unit-4 Teaching Hours: 15

MATHEMATICS APPLIED TO DATA SCIENCE


Singular value decomposition - Handwritten digits and simple algorithm - Classification of
handwritten digits using SVD bases - Tangent distance - Text Mining.

Essential Reading

[1] S. Axler, Linear algebra done right, Springer, 2017.


Eld n ars, Matrix methods in data mining and pattern recognition, Society for Industrial
and Applied Mathematics, 2007.

Recommended Reading

[1] E. Davis, Linear algebra and probability for computer science applications, CRC Press,
2012.
[2] J. V. Kepner and J. R. Gilbert, Graph algorithms in the language of linear algebra, Society
for Industrial and Applied Mathematics, 2011.
[3] D. A. Simovici, Linear algebra tools for data mining, World Scientific Publishing, 2012.
[4] P. N. Klein, Coding the matrix: linear algebra through applications to computer science,
Newtonian Press, 2015.

14
Syllabus for MSc (Data Analytics) 2020-21

MDA271: Database Technologies


Total Teaching Hours for Semester: 75
Max Marks: 100 Credits: 04

Course Objectives

The main objective of this course is to fundamental knowledge and practical experience with,
database concepts. It includes the concepts and terminologies which facilitate the construction
of database tables and write effective queries. Also, to Comprehend Data warehouse and its
functions.

Course Outcomes

CO1: Demonstrate various databases


CO2: Compose effective queries
CO3: Distinguish database from data warehouse and examine its applications

Unit-1 Teaching Hours: 15

INTRODUCTION
Concept & Overview of DBMS, Data Models, Database Languages, Database Administrator,
Database Users, Three Schema architecture of DBMS. Basic concepts, Design Issues, Mapping
Constraints, Keys, Entity-Relationship Diagram, Weak Entity Sets, Extended E-R features.

Lab Exercises
1. Data Definition,
2. Table Creation
3. Constraints

Unit-2 Teaching Hours: 15

RELATIONAL MODEL AND DATABASE DESIGN


SQL and Integrity Constraints, Concept of DDL, DML, DCL. Basic Structure, Set operations,
Aggregate Functions, Null Values, Domain Constraints, Referential Integrity Constraints,
assertions, views, Nested Subqueries, Functional Dependency, Different anomalies in
designing a Database, Normalization : using functional dependencies, Boyce-Codd Normal
Form, 4NF, 5NF.

Lab Exercises
1. Insert, Select, Update & Delete Commands
2. Nested Queries & Join Queries
3. Views

15
Syllabus for MSc (Data Analytics) 2020-2021

Unit-3 Teaching Hours: 15

DATA WAREHOUSE: THE BUILDING BLOCKS


Defining Features, Data Warehouses and Data Marts, Architectural Types, Overview of the
Components, Metadata in the Data warehouse, Data Design and Data Preparation: Principles of
Dimensional Modeling, Dimensional Modeling Advanced Topics From Requirements To Data
Design, The Star Schema, Star Schema Keys, Advantages of the Star Schema, Star Schema:
Examples, Dimensional Modeling: Advanced Topics, Updates to the Dimension Tables,
Miscellaneous Dimensions, The Snowflake Schema, Aggregate Fact Tables, Families Oo
Stars.

Lab Exercises
1. Importing source data structures
2. Design Target Data Structures

Unit-4 Teaching Hours: 15

REQUIREMENTS, REALITIES, ARCHITECTURE AND DATA FLOW


Requirements, ETL Data Structures, Extracting, Cleaning and Conforming, Delivering
Dimension Tables, Delivering Fact Tables.

Lab Exercises
1. Create target structure
2. Design and build the ETL mapping

Unit-5 Teaching Hours: 15

IMPLEMENTATION, OPERATIONS AND ETL SYSTEMS:


Development, Operations, Metadata, Real-Time ETL Systems.

Lab Exercises
1. Perform the ETL process and transform into data map
2. Create the cube and process it
3. Generating Reports
4. Creating the Pivot table and pivot chart using some existing data

Essential Reading
[1] Henry F. Korth and Silberschatz Abraham, Database System Concepts, Mc.Graw Hill.
[2] Thomas Cannolly and Carolyn Begg, Database Systems, A Practical Approach to Design,
Implementation and Management”, Third Edition, Pearson Education, 007.
[3] The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd John
Wiley & Sons Inc., New York, USA, 2002.

Recommended Reading
[1] LiorRokach and OdedMaimon, Data Mining and Knowledge Discovery Handbook,
Springer, 2nd edition, 2010.

16
Syllabus for MSc (Data Analytics) 2020-21

MDA272: Data Mining


Total Teaching Hours for Semester: 75
Max Marks: 100 Credits: 04

Course Objectives

To preprocess and analyze data, to choose relevant models and algorithms for respective
applications and to develop research interest towards advances in data mining

Course Outcomes

CO1: Understand different types of data to be mined


CO2: Categorize the scenario for applying different data mining techniques
CO3: Evaluate different models used for classification and Clustering
CO4: Focus towards research and innovation

Unit-1 Teaching Hours: 15

INTRODUCTION AND DATA PREPROCESSING


Data Mining – Kinds of data to be mined – Kinds of patterns to be mined – Technologies –
Targeted Applications - Major Issues in Data Mining – Data Objects and Attribute Types –
Measuring Data similarity and dissimilarity - Data Cleaning –Data Integration - Data
Reduction – Data Transformation – Data Discretization.

Lab Exercises
1. Identify a dataset, Preprocess the dataset set using normalization techniques
2. Explore data reduction techniques

Unit-2 Teaching Hours: 15

MINING FREQUENT PATTERNS AND ADVANCED PATTERN MINING


Basic Concepts – Frequent Itemset Mining Methods – Pattern Evaluation Methods – Pattern
Mining in Multilevel, Multidimensional space – Constraint-Based Frequent Pattern Mining –
Mining Compressed or Approximate Patterns – Pattern Exploration and Application.

Lab Exercises
1. Identify frequent itemsets using Apriori Algorithm
2. Generate FP Tree for a transaction dataset

Unit-3 Teaching Hours: 15

CLASSIFICATION TECHNIQUES
Basic Concepts – Decision Tree Induction – Bayes Classification Methods – Rule-Based
Classification – Model Evaluation and Selection – Techniques to Improve Classification
Accuracy – Bayesian Belief Networks – Classification by Backpropagation – Support Vector
Machines.

17
Syllabus for MSc (Data Analytics) 2020-2021

Lab Exercises
1. Construct Decision Tree for a dataset and identify the order of attributes
2. Apply Bayes Classification

Unit-4 Teaching Hours: 15

CLUSTERING TECHNIQUES
Cluster Analysis – Partitioning Methods - Hierarchical Methods – Density-Based Methods
(Includes all clustering techniques under the given categories in the Text Book).

Lab Exercises
1. Demonstrate Naïve Bayes Classifier
2. Apply K-Means Clustering for given number of clusters

Unit-5 Teaching Hours: 15

OUTLIER DETECTION AND APPLICATIONS


Outliers and Outlier Analysis – Clustering-Based Approach – Classification-Based Approach –
Mining Complex Data Types – Data Mining Applications.
Lab Exercises
1. Demonstrate Hierarchical clustering for a large dataset
2. Case studies and assignment
Essential Reading

[1] Jiawei Han, Micheline Kamber and Jian Pie, Data Mining Concept and Techniques,
Morgan and Kaufmann Publisher, Third Edition, 2012.
[2] Arun K Pujari, Data Mining Techniques, Second Edition, Universities Press India Pvt. Ltd.
2010.

Recommended Reading

[1] Daniel T. Larose and Chantal D. Larose, Data Mining and Predictive Analytics, Wiley
Series on Methods and Applications in Data Mining, Wiley Publications.
[2] Ian H. Witten, Eibe Frank and Mark A. Hall, Data Mining: Practical Machine Learning
Tools and Techniques, Morgan and Kaufmann Publisher, Third Edition, 2014.

Web Resources:
[1] https://data-flair.training/blogs/data-mining-tutorial/
[2] https://www.tutorialride.com/data-mining/data-mining-tutorial.htm

18
Syllabus for MSc (Data Analytics) 2020-21

Trimester - III
MDA331: Artificial Intelligence
Total Teaching Hours for Semester: 60
Max Marks: 100 Credits: 04

Course Objectives
This course aims at developing an understanding about the issues involved in defining and
simulating perception, identifying the problems where AI is required and the different methods
available, to compare and contrast different AI techniques available, to define and explain
learning algorithms and to provide the student additional experience in the analysis and
evaluation of complicated systems.

Course Outcomes
CO1: Express the modern view of AI and its foundation
CO2: Illustrate Search Strategies with algorithms and Problems
CO3: Implement Propositional logic and apply inference rules
CO4: Apply suitable techniques for NLP and Game Playing

Unit – 1 Teaching Hours: 12

INTRODUCTION
Introduction to AI, The Foundations of AI, AI Technique -Tic-Tac-Toe. Problem
characteristics, Production system characteristics, Production systems: 8-puzzle problem.
Searching: Uniformed search strategies – Breadth first search, depth first search.

Unit – 2 Teaching Hours: 12

LOCAL SEARCH ALGORITHMS


Generate and Test, Hill climbing, simulated annealing search, Constraint satisfaction problems,
Greedy best first search, A* search, AO* search. Toy problems.

Unit – 3 Teaching Hours: 12

KNOWLEDGE REPRESENTATION
First order logic. Inference in first order logic, propositional Vs. first order inference,
unification & lifts, Clausal form conversion, Forward chaining, Backward chaining,
Resolution.
SELF LEARNING
Propositional logic - syntax & semantics

Unit – 4 Teaching Hours: 12

GAME PLAYING
Overview, Minimax algorithm, Alpha-Beta pruning, Additional Refinements. Probabilistic
Reasoning: Ad Hoc Methods., Expert System, Expert System Shells.

19
Syllabus for MSc (Data Analytics) 2020-2021

Unit – 5 Teaching Hours: 12

NATURAL LANGUAGE PROCESSING


Introduction, Practical Applications of NLP, Syntax processing, Semantic Analysis, Pragmatic
and Discourse Processing: Analysis, Perception.

Essential Reading
[1] E. Rich and K. Knight, Artificial Intelligence, 3rd Edition, New york: TMH, 2019.
[2] S. Russell and P. Norvig, Artificial Intelligence A Modern Approach, 3rd Edition, Pearson
Education, 2019.

Recommended Reading
[1] Eugene Charniak and Drew McDermott, Introduction to Artificial Intelligence, 2 nd Edition.
Singapore: Pearson Education, 2005.
[2] George F Luger, Artificial Intelligence Structures and Strategies for Complex Problem
Solving, 4th Edition. Singapore: Pearson Education, 2008, ISBN-13 9780321545893.
[3] N.L. Nilsson, Artificial Intelligence: A New Synthesis, 1st Edition, USA: Morgan
Kaufmann, 2000.
[4] Patterson, Introduction to artificial intelligence, ISBN-13: 978-0134771007.

Web Resources

[1] https://ai.google/education/
[2] https://intellipaat.com/blog/tutorial/artificial-intelligence-tutorial/
[3] https://www.javatpoint.com/artificial-intelligence-tutorial

20
Syllabus for MSc (Data Analytics) 2020-21

MDA371: Regression Modelling


Total Teaching Hours for Semester: 75
Max Marks: 100 Credits: 04

Course Objectives

This course equips students to assess the relationship between variables in a data set and a
continuous response variable. In this course, students learn to fit simple and multiple linear
regression models using the R program.

Course Outcomes

CO1: Understand simple and multiple linear regression models.


CO2: Analyze relationships between multiple variables.
CO3: Build linear models and predict the study variable using the R program
CO4: Validate regression models
CO5: Model categorical and count data

Unit-1 Teaching Hours: 15

INTRODUCTION
Introduction to regression: regression through the origin, linear least squares, regression to the
mean, basic definitions: notation for data, the empirical mean, the empirical standard deviation
and variance, normalization, empirical covariance, some facts about correlation.

Lab Exercises in R
1. Visualizing data for model fitting using R
2. Finding least square estimates for parameters in the simple linear model

Unit-2 Teaching Hours: 15

SIMPLE LINEAR REGRESSION MODEL


Simple linear model with normal errors, regression parameters: interpretation, properties,
estimation and testing of hypotheses, prediction using the regression model. R- squared.

Lab Exercises in R
1. Building a basic linear regression model for the association between a single
explanatory variable and a response variable.
2. Finding interval estimates and testing hypotheses in a simple linear model.

21
Syllabus for MSc (Data Analytics) 2020-2021

Unit-3 Teaching Hours: 15

MULTIVARIABLE REGRESSION ANALYSIS


Multivariable linear regression model, estimation, example with two variables, simple linear
regression: the general case, interpretation of the coefficients, fitted values, residuals and
residual variation.

Lab Exercises in R
1. Building a multiple linear regression model for the association between explanatory
variables and a response variable.
2. Finding interval estimates and testing hypotheses in multiple linear models.

Unit-4 Teaching Hours: 15

RESIDUALS, VARIATION, DIAGNOSTICS AND MODEL SELECTION


Residuals, influential, high leverage and outlying points, residuals, leverage and influence
measures, model selection: the Rumsfeldian triplet, general rules, R squared and adjusted R
squared, variance inflation factor, the impact of over- and under-fitting on residual variance
estimation, covariate model selection.

Lab Exercises in R
1. Residual analysis of linear regression model
2. Model selection and nested model testing

Unit-5 Teaching Hours: 15

GENERALIZED LINEAR MODELS


Logistic regression: modelling binary response, estimation, odds, modelling the odds,
interpreting logistic regression, Poisson distribution and Poisson regression: modelling count
data, estimation, Poisson distribution, linear regression, Poisson regression, mean-variance
relationship, rates.

Lab Exercises in R
1. Building a logistic regression model for the categorical response variable
2. Modelling count data using a Poisson regression model.
Essential Reading
[1] Fox, J., & Weisberg, S, An R companion to applied regression, Sage publications, 2018.
[2] Caffo, B., Regression models for data science in R, Leanpub, 2015.
Recommended Reading
[1] Ciaburro, G., Regression Analysis with R: Design and develop statistical nodes to identify
unique relationships within data at scale, Packt Publishing Ltd, 2018.
[2] Sheather, S., A modern approach to regression with R, Springer Science & Business Media,
2009.
[3] Lilja, D. J., Linear Regression Using R: An Introduction to Data Modeling, University of
Minnesota Libraries Publishing, 2016.

22
Syllabus for MSc (Data Analytics) 2020-21

MDA372: Big Data Analytics


Total Teaching Hours for Semester: 75
Max Marks: 100 Credits: 04

Course Objectives

The subject is intended to give the knowledge of Big Data evolving in every real-time
applications and how they are manipulated using the emerging technologies. This course
breaks down the walls of complexity in processing Big Data by providing a practical approach
to developing Java applications on top of the Hadoop platform. It describes the Hadoop
architecture and how to work with the Hadoop Distributed File System (HDFS) and HBase in
Ubuntu platform.

Course Outcomes

CO1: Able to understand the Big Data concepts in real time scenario
CO2: Understand the architecture of Hadoop with practical
CO3: Apply map reduce concept to implement in cloud

Unit-1 Teaching Hours: 15

INTRODUCTION
Distributed file system – Big Data and its importance, Four Vs, Drivers for Big data, Big data
analytics, Big data applications, Algorithms using map reduce, Matrix-Vector Multiplication
by Map Reduce.
Apache Hadoop– Moving Data in and out of Hadoop – Understanding inputs and outputs of
MapReduce - Data Serialization, Problems with traditional large-scale systems-Requirements
for a new approach-Hadoop – Scaling-Distributed Framework-Hadoop v/s RDBMS-Brief
history of Hadoop.

Lab Exercises
1. Word count application in Hadoop.
2. Sorting the data using MapReduce.

Unit-2 Teaching Hours: 15

CONFIGURATIONS OF HADOOP
Hadoop Processes (NN, SNN, JT, DN, TT)-Temporary directory – UI-Common errors when
running Hadoop cluster, solutions.
Setting up Hadoop on a local Ubuntu host: Prerequisites, downloading Hadoop, setting up
SSH, configuring the pseudo-distributed mode, HDFS directory, NameNode, Examples of
MapReduce, Using Elastic MapReduce, Comparison of local versus EMR Hadoop.
Understanding MapReduce:Key/value pairs,TheHadoop Java API for MapReduce, Writing
MapReduce programs, Hadoop-specific data types, Input/output.

23
Syllabus for MSc (Data Analytics) 2020-2021

Developing MapReduce Programs: Using languages other than Java with Hadoop, Analysing a
large dataset.

Lab Exercises
1. Finding max and min value in Hadoop.
2. Implementation of decision tree algorithms using MapReduce.

Unit-3 Teaching Hours: 15

ADVANCED MAPREDUCE TECHNIQUES


Simple, advanced, and in-between Joins, Graph algorithms, using language-independent data
structures.
Hadoop configuration properties - Setting up a cluster, Cluster access control, managing the
NameNode, Managing HDFS, MapReduce management, Scaling.

Lab Exercises
1. Implementation of K-means Clustering using MapReduce.
2. Generation of Frequent Itemset using MapReduce.

Unit-4 Teaching Hours: 15

HADOOP STREAMING
Hadoop Streaming - Streaming Command Options - Specifying a Java Class as the
Mapper/Reducer - Packaging Files With Job Submissions - Specifying Other Plug-ins for Jobs.
Lab Exercises
1. Count the number of missing and invalid values through joining two large given
datasets.
2. Using hadoop’s map-reduce, Evaluating Number of Products Sold in Each Country in
the online shopping portal. Dataset is given.

Unit-5 Teaching Hours: 15

HIVE & PIG


Architecture, Installation, Configuration, Hive vs RDBMS, Tables, DDL & DML, Partitioning
& Bucketing, Hive Web Interface, Pig, Use case of Pig, Pig Components, Data Model, Pig
Latin.

Lab Exercises
1. Analyze the sentiment for product reviews, this work proposes a MapReduce
technique provided by Apache Hadoop.
2. Trend Analysis based on Access Pattern over Web Logs using Hadoop.

24
Syllabus for MSc (Data Analytics) 2020-21

Essential Reading
[1] Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, Professional Hadoop Solutions,
Wiley, 2015.
Tom White, Hadoop: The Definitive Guide, O’Reilly Media Inc., 015.
[3] Garry Turkington, Hadoop Beginner's Guide, Packt Publishing, 2013.

Recommended Reading
[1] Pethuru Raj, Anupama Raman, DhivyaNagaraj and Siddhartha Duggirala, High-
Performance Big-Data Analytics: Computing Systems and Approaches, Springer, 2015.
[2] Jonathan R. Owens, Jon Lentz and Brian Femiano, Hadoop Real-World Solutions
Cookbook, Packt Publishing, 2013.
[3] Tom White, HADOOP: The definitive Guide, O Reilly, 2012.

25

You might also like