PG Diploma in Data Analytics2024
PG Diploma in Data Analytics2024
A Public –Private-Partnership University under RUSA 2.0 of MHRD(Government of India), established by the
Karnataka Govt. Act No. 24 of 2021
Semester First
Number of credits 3
COURSE OBJECTIVES
This program will make the students learn the process of working with data in large scale. Make the student understand
the existence of data with its wilderness and make use of it.
COURSE OUTCOMES
Philosophies of data science - Data science in a big data world - Benefits and uses of data science and big data - facts of
data: Structured data , Unstructured data, Natural Language, Machine generated data, Audio, Image and video
streaming data - The Big data Eco system: Distributed file system, Distributed Programming framework, Data
Integration frame work, Machine learning Framework, NoSQL Databases, Scheduling tools, Benchmarking Tools,
System Deployment, Service programming and Security.
Overview of the data science process- Retrieving data –Data Preparation: Cleansing, integrating, and transforming data
- Exploratory data analysis – Data Modeling: Model and variable selection, Model execution, Model diagnostic and
model comparison - Presentation and automation: Presenting data, Automating data analysis
Application for machine learning in data science- Tools used in machine learning- Modeling Process – Training model
– Validating model – Predicting new observations –Types of machine learning Algorithm : Supervised learning
algorithms, Unsupervised learning algorithms, Reinforcement Algorithm.- Semi supervised Learning
Distributing data storage and processing with frameworks - Case study: Assessing risk when loaning money - Join the
NoSQL movement - Introduction to NoSQL - Case Study
Introducing connected data and graph databases - Text mining and text analytics - text mining in real world - text
mining techniques - – Map Reduce – Dashboard development tools.
TEXT BOOKS
1. Introducing Data Science, Davy Cielen, Arno D. B. Meysman and Mohamed Ali, Manning Publications, 2016.
2. Think Like a Data Scientist, Brian Godsey, Manning Publications, 2017.
REFERENCE BOOKS
1. Doing Data Science, Straight Talk from the Frontline, Cathy O'Neil, Rachel Schutt, O‟ Reilly, 1st edition, 2013.
2. Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman, Cambridge University
Press, 2nd edition, 2014
3. An Introduction to Statistical Learning: with Applications in R, Gareth James, Daniela Witten, Trevor Hastie,
Robert Tibshirani, Springer, 1st edition, 2013
BLUE PRINT
Semester First
Number of credits 3
COURSE OBJECTIVES
This course concentrates on introduction, principles, design and implementation of DBMS. It introduces about the
distributed system and brief about data mining and data warehouse. To
provide strong foundation of database concepts and develop skills for the design and
implementation of a database application with a brief exposure to advanced database concepts.
COURSE OUTCOMES
Data- Database- Database management system- Characteristics of the database approach- Role of Database
administrators- Role of Database Designers- End Users- Advantages of Using a DBMS-Data models, Schema and
Instances –Database design - Database Engine – 1 tier architecture – 2 tier architecture- 3 tier architecture – History of
Database Management systems- Types of Databases.
Data Model and Types of Data Model- Relational Data Model- Hierarchical Model- Network Data Model-
Object/Relational Model- Object-Oriented Model- Entity-Relationship Model- Modeling using E-R
Diagrams- Notation used in E-R Model- Relationships and Relationship Types- Cardinalities. Subclasses,
Super classes and Inheritance – Specialization and Generalization – Characteristics of Specialization and
Generalization – Modeling of UNION types with categories- An example University EER Schema.
Structure of relational databases- Properties of relational databases and Tables –Structure of relational databases –
Database Schema – Armstrong Axioms – Functional Dependency-Anomalies in a Database- Properties of
Normalized Relations- First Normalization- Second Normal Form Relation- Third Normal Form- Boyce-Codd
Normal Form (BNCF).
Categories of SQL Commands; Data Definition; Data Manipulation Statements, SELECT - The Basic Form,
Subqueries, Functions, GROUP BY Feature, Updating the Database, Data Definition Facilities. MongoDB
Overview- MongoDB Data modeling.
REFERENCE BOOKS
1. Elmasri Ramez and Navathe Shamkant B, Fundamentals of Database Systems, Addison -Wesley, 6th
Edition, 2010.
2. Silberschatz, Korth, Sudarshan, Database System Concepts, 5 Edition, McGraw Hill, 2006.
3. O`neil Patricand, O`neil Elizabeth, Database Principles, Programming and Performance, 2nd Edition,
Margon Kaufmann Publishers Inc, 2008.
BLUE PRINT
Semester FIRST
Number of credits 3
COURSE OBJECTIVES:
This course is designed to teach students how to analyse different types of data using Python. Students will learn how
to prepare data for analysis, perform simple statistical analysis, create meaningful data visualizations and predict future
trends from data.
COURSE OUTCOMES:
NumPy ndarray - Vectorization Operation - Array Indexing and Slicing - Transposing Array
and Swapping Axes - Saving and Loading Array - Universal Functions - Mathematical and
Statistical Functions in NumPy .
Series and DataFrame data structures in pandas - Creation of Data Frames – Accessing the
columns in a DataFrame - Accessing the rows in a DataFrame - Panda‟s Index Objects - Reindexing Series and
DataFrames - Dropping entries from Series and Data Frames - Indexing, Selection and Filtering in Series and Data
Frames - Arithmetic Operations between Data Frames and Series - Function Application and Mapping.
Combining and Merging Data Sets – Reshaping and Pivoting – Data Transformation – String
manipulations – Regular Expressions.
Group By Mechanics – Data Aggregation – GroupWise Operations – Transformations – Pivot Tables – Cross
Tabulations – Date and Time data types.
Matplotlib and Seaborn Packages – Plotting Graph - Controlling Graphs – Adding Text –
More Graph Types – Getting and Setting Values – Patches.
SELF STUDY 5 Hrs.
REFERENCE BOOKS:
BLUE PRINT
Number of credits 3
COURSE OBJECTIVES:
The course aims to explain the basic concepts of statistical methods and develop analytical ability to solve real-world
problems using these methodologies.
COURSE OUTCOMES:
Concepts of measurement, scales of measurement, design of data collection formats with illustration, data quality and
issues with date collection systems with examples from business, cleaning and treatment of missing data, Sampling
techniques.
Principles of data visualization and different methods of presenting data in business analytics
Frequency table, histogram, measures of location, measures of spread, skewness, curtosis, percentiles, box plot, relative
frequency distribution as a statistics model
Covariance, Correlation coefficient, properties of Correlation coefficient, Rank correlation, linear regression (two
variables), Multiple correlation and partial correlation.
BLUEPRINT
Code number: PGDDS1422
Title of the paper: BASIC STATISTICAL METHODS
Semester FIRST
Number of credits 1
Semester FIRST
Number of credits 1
List of programs -
1. Introduction to Python interpreter
2. Control statements
3. functions, I/O, File handling, Packages/Libraries
4. Exception Handling, OO Programming.
5. Use of different packages for Data analytics and visualization
Semester FIRST
1. DDL
2. EER diagram
3. DML
4. Different types of JOIN operations
5. Manipulating database using Python
Semester SECOND
Number of credits 3
COURSE OBJECTIVES:
This course will provide the students to understand the concepts of Machine Learning, supervised learning and their
applications, the concepts and algorithms of unsupervised learning, the concepts and algorithms of advanced learning.
COURSE OUTCOMES:
Machine Learning–Types of Machine Learning –Machine Learning process- preliminaries, testing Machine Learning
algorithms, turning data into Probabilities, and Statistics for Machine Learning Probability theory – Probability
Distributions – Decision Theory.
UNIT 2: SUPERVISED LEARNING 10 Hrs.
Linear Models for Regression, Linear Models for Classification, Discriminant Functions, Probabilistic Generative
Models, Probabilistic Discriminative Models, Decision Tree Learning, Bayesian Learning, Naïve Bayes, Ensemble
Methods – Bagging and Boosting, Mixture of experts, Support Vector Machines.
Dimensionality Reduction, Linear Discriminant Analysis, Factor Analysis, Principal Components Analysis,
Independent Components Analysis, TSNE.
REFERENCE BOOKS:
BLUEPRINT
Code number: PGDDS2122
Title of the paper: Machine Learning
Semester SECOND
Number of credits 3
COURSE OBJECTIVES:
To help students understand the „intuition‟ behind the concepts of Linear Algebra and which in turn will help them to
see its applications in later courses.
COURSE OUTCOMES:
CO1: Understand the most fundamental concept „vector‟ that constructs Linear Algebra.
CO2: Able to gain knowledge of two Fundamental topics of Linear Algebra and Vector Space
CO3: Understanding two Fundamentals topics of Linear Algebra and Linear Transformation
CO4: Building the Basics of Linear Programming
Introduction to Linear Algebra, Difference Between Linear Algebra & Matrix Analysis, Revision of Basic Geometry,
Definition of Vectors - Examples, Two Fundamental Vectors – Geometric Vectors and Rn Vectors, Properties of
Vectors, Linear Combination of Vectors, Decomposition of Vectors, Linear Independent & Linearly Dependent
Vectors and Span of Vectors.
Definition of Vector Space – Examples, Definition of Subspaces – Examples, Union & Intersection of Subspaces,
Definition of Basis Vectors – Standard Basis and Dimension of Vector Space
Definition of Linear Transformation – Examples, Introduction to Matrix, Matrix as Linear Transformation, Matrix
Multiplication (Composition of Linear Transformations) – Three Perspectives: 1. Column, 2. Row & 3. Dot Product,
Concept of Determinant – Area, Volume, Hyper-plane, etc., System of Linear Equations – Column & Null Space,
Gaussian Elimination, Row Reduced Echelon Form, Eigenvalues & Eigenvectors, Inverse Matrix and Positive Definite
& Semi-Definite Matrix.
Introduction to Linear Programming – Examples, Problems in LP, Convex Sets, Corner Points, Feasibility, Basic
Feasible Solutions and Simplex Method
REFERENCE BOOKS :
1. Introduction to Linear Algebra, Gilbert Strang 5th Edition.
2. Linear Programming, G. Hadley.
BLUEPRINT
CODE NUMBER: PGDDS2222
TITLE OF THE PAPER: LINEAR ALGEBRA