0% found this document useful (0 votes)
61 views

PGP-Data Science - Course Module With Internship Module

The document outlines the modules and topics covered in a data science course. The modules include foundations of programming using Python and R, databases, statistics, machine learning techniques like regression, classification, and featurization. Specific topics within these modules include data types, control flow, data visualization, linear and logistic regression, decision trees, Naive Bayes, clustering, and model deployment. Case studies are also included to apply techniques to datasets.

Uploaded by

Mehulkumar Hire
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

PGP-Data Science - Course Module With Internship Module

The document outlines the modules and topics covered in a data science course. The modules include foundations of programming using Python and R, databases, statistics, machine learning techniques like regression, classification, and featurization. Specific topics within these modules include data types, control flow, data visualization, linear and logistic regression, decision trees, Naive Bayes, clustering, and model deployment. Case studies are also included to apply techniques to datasets.

Uploaded by

Mehulkumar Hire
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

R

In Partnership with: or Through

PGP-
DATA SCIENCE
Course Module
Foundations
Introduction to Programming using Python
Introduction Data Structures
Ø Introduction to Python Ø List
Ø Basic Programming syntax Ø Tuple
Ø Variables Ø Dictionary
Ø Basic Arithmetic & logical operators Ø Array
(int, float) Ø List Comprehension
Ø Data Types

Conditional Statement in Python Iteration (loops)


Ø If Ø While Loop
Ø If-else Ø For Loop
Ø elif

Advance Python Python as OOP Language


Ø Functions Ø OOPS Concept- Class, objects,
Ø Methods Detailed Introduction
Ø Map Function Ø Inheritance-Multi level Inheritance,
Ø Reduce Single level Inheritance
Ø Filter Ø Encapsulation
Ø Lambda Ø Polymorphism
Ø Generators
Ø Iterators

Introduction to Programming using R


Ø Introduction to R Language Ø Reading Data from csv,excel files

Ø How to install R Ø Creating a vector and vector operation

Ø Documentation in R Ø Initializing data frame

Ø Hello world Ø Control structure

Ø Package in R Ø Data VIsualization in R

Ø Data Types in R Ø Creating Bar Chart

Ø Data structures Ø Creating Histogram and box plot

Ø Conditional statement in R Ø Plotting with base graphics

Ø Loops in R Ø Plotting and coloring in R

Ø Subsetting Ø Machine Learning Algorithms Using R


Database Management System using My SQL Workbench

Introduction
Introduction to DBMS
An Introduction to Relational Database
Concepts and SQL Accessing

Working on My SQL work bench


Data Servers MYSQL/RDBMS Concepts
Extraction, Transformation and Loading (“ETL”) Processes
Retrieve data from Single Tables-(use of SELECT Statement) and the
power of WHERE and ORDER by Clause. Retrieve and Transform data
from multiple Tables using JOINS and Unions
Introduction to Views Working with Aggregate functions, grouping and
summarizing Records Writing Sub queries

Statistical Methods for Decision Making


Ø Brief Introduction To Statistics

Ø Probability distribution

Ø Normal distribution

Ø Poisson's distribution

Ø Bayes' theorem

Ø Central limit theorem

Ø Hypothesis testing

Ø One Sample T-Test

Ø Two Sample T-Test

Ø Anova and Chi-Square

Ø Pearson Co-relation

Ø Co-Variance

Ø Chebyshiv-Inequality Formula
Exploring Data Analysis
Ø Reading the Data

Ø Cleaning the Data

Ø Data Visualization in Python

Ø Summary statistics (mean, median, mode, variance, standard deviation)

Ø Seaborn

Ø Matplotlib

Ø Population VS sample

Ø Univariate and Multivariate statistics

Ø Types of variables – Categorical and Continuous

Ø Coefficient of correlations, Skewness and kurtosis


Machine Learning Techniques

Supervised Learning -
Regression
The concept of logit
Looking at regression through the
The failure of OLS in estimating
perspective of machine learning
parameters for a logistic regression
Accuracy scores as a metric of model
performance Introduction to the concept of
Maximum likelihood estimation
Measuring the importance of
individual variables in a regression Advantages of the maximum
model likelihood approach

Review - testing for individual Modelling a logistic regression


significance vs joint significance problem with a case study

Using the adjusted R^2 to compare Making predictions and evaluating


model with different number of parameters
independent variables

Approaches to feature selection

Forward and backward selection Ensemble Techniques


Parameter tuning and Model
evaluation
Bagging
Extending linear regression
Boosting
Data transformations and Bagging & Boosting Examples
normalization
Log transformation of dependent
and independent variables Machine Learning Model
Case study: - Deployment using Flask

Dealing with categorical independent Introduction to Model Deployment


variables
Introduction to Flask in Python
One hot encoding vs dummy variable
regression How to deploy Applications in Flask?
Types of Model deployment
Case study on linear regression

Modelling probabilistic dependent


variables
The sigmoid function and odds ratio
Machine Learning Techniques

Regression
Linear
Introduction
Regression

Introduction to Regression Introduction to Linear Regression


Looking at regression through the Accuracy scores as a metric of model
perspective of machine learning performance
Brief Introduction to Regression
Techniques Measuring the importance of
individual variables in a regression
Brief Introduction to Best Fit line in model
Regression
Review - testing for individual
significance vs joint significance
Logistic
Regression Using the adjusted R^2 to compare
model with different number of
Introduction to Logistic Regression independent variables
Log transformation of dependent
Approaches to feature selection
and independent variables
Dealing with categorical independent Forward and backward selection
variables
Parameter tuning and Model
One hot encoding vs dummy variable evaluation
Modelling probabilistic dependent
Extending linear regression
variables
The sigmoid function and odds ratio Data transformations and
normalization
The concept of logit
The failure of OLS in estimating L1 & L2(LASSO AND RIDGE)
parameters for a logistic regression
Introduction to the concept of
Maximum likelihood estimation
Case Study
Advantages of the maximum
likelihood approach sigmoid function Case study on Linear Regression
Modelling a logistic regression
Case study on Logistic Regression
problem with a case study
Making predictions and evaluating
parameters
Featurization

Featurization, Model Selection & amp; Tuning

Feature engineering

Model selection and tuning

Model performance measures

Regularising Linear models

ML pipeline

Bootstrap sampling

Grid search CV
Randomized search CV
K fold cross-validation
Classification
Decision
Introduction
Trees

Introduction to Classification Entropy and Ginny


Looking at Classification through Information Gain
the perspective of machine learning
Decision trees – Simple decision
Brief Introduction to Classification trees. Visualizing decision trees and
Techniques nodes and splits.
Balancing Data set
Working of the Decision tree
Binary classification vs Multi class algorithm.
classification
Importance and usage of Entropy
and Gini index.
Classification Manually calculating entropy using
Techniques gini formula and working out how to
split decision nodes
CART - Extending decision trees to
Evaluating decision tree models.
regressing problems.
Advantages of using CART. Accuracy metrics – precision, recall
and confusion matrix
The Bayes theorem. Prior probability.
Interpretation for accuracy metric.
KNN CLASSIFIER
The Gaussian NAÏVE'S BAYES Building a a robust decision tree
Classifier. model. k-fold cross validation -
Advantages against simple train
Assumptions of the Naive Bayes
test split.
Classifier.
Functioning of the Naïve's Bayes
algorithm.
Evaluating the model - Precision, Case Study
Recall, Accuracy metrics and k-fold
cross validation Case study on Classification Data Set
Random Forest
Voting Classifier
ROC Curve and AUC for binary
classification for Naive Bayes.
Extending Bayesian Classification for
Multiclass Classification
Support Vector Machine
KNN
Unsupervised Learning

K-Means
Introduction
Algorithm

What is Unsupervised Learning? The K-means algorithm.


The two major Unsupervised Measures of distance – Euclidean,
Learning problems - Dimensionality Manhattan and Minowski distance.
reduction and clustering.
The concept of within cluster sums
of squares.
Clustering
Using the elbow plot to select
Algorithms
optimum number of cluster's.

The different approaches to Case study on k-means clustering.


clustering – Heirarchical and K means
Comparison of k means and
clustering.
agglomerative approaches to
Heirarchical clustering - The concept clustering.
of agglomerative and divisive clustering.
Agglomerative Clustering – Working
PCA (Principal
of the basic algorithms.
Component Analysis)
Distance matrix - Interpreting
dendograms. Noise in the data and dimensional
Choosing the threshold to determine reduction.
the optimum number of clusters. Capturing Variance - The concept of
a principal components.
Assumptions in using PCA.
Case Study
The working of the PCA algorithm.
Eigen vectors and orthogonality of
The relationship between
principal components.
unsupervised and supervised
learning. What is complexity curve?
Case study on Dimensionality Advantages of using PCA.
reduction followed by a supervised Bulid a model using Principal
learning model. components and comparing with
Case study on Clustering followed normal model. What is the difference?
by classification model. Putting it all together.
Data Visualization
Using Tableau
Building Dash Boards and story
boards
Introduction to Visualization, Rules
of Visualization Building Dash Boards and Story
Data Types, Sources, Connections, Boards in Data Studio
Loading, Reshaping

Data Aggregation
Data Visualization
Working with Continuous and Using Power Bi
Discrete Data
Using Filters Introduction to Microsoft Power BI

Using Calculated Fields and The key features of Power BI


parameters workflow

Creating Tables and Charts Desktop application

Building Dash Boards and story BI service


boards
File data sources
Sharing Your Work and Publishing
Sourcing data from the web
for wider audience
(OData and Azure)

Building a dashboard
Data Visualization Using
Google Data Studio Data visualization
Publishing to the cloud
Introduction to Visualization
DAX data computation
Introduction to Google Data Studio

How Data Studio Works? Row context

Data Types, Sources, Connections, Filter context


Loading, Reshaping Analytics pane
Data Aggregation Creating columns and measures
Working with Continuous and Data drill down and drill up
Discrete Data
Creating tables
Report Edit Mode in Data Studio.

Using Filters in Data Studio Binned tables


Data modeling and relationships
Using Calculated Fields and
parameters Power BI components such as
Creating Tables and Charts Power View, Map, Query, and Pivot
R

Internship
Module
Module 1: Natural Language Processing and Speech Recognition
Lesson 1 - Introduction to Natural Language Processing
Lesson 2 - Feature Engineering on Text Data Lesson
Lesson 3 - Natural Language Understanding Techniques
Lesson 4 - Natural Language Generation
Lesson 5 - Natural Language Processing Libraries
Lesson 6 - Natural Language Processing with Machine Learning and Deep Learning
Lesson 7 - Introduction of Speech Recognition
Lesson 8 - Signal Processing and Speech Recognition Models
Lesson 9 - Speech to Text
Lesson 10 - Text to Speech
Lesson 11 - Voice Assistant Devices
Module 2 : Text Mining And Sentimental Analysis
Lesson 1 - Text cleaning, regular expressions, Stemming, Lemmatization
Lesson 2 - Word cloud, Principal Component Analysis, Bigrams & Trigrams
Lesson 3 - Web scrapping, Text summarization, Lex Rank algorithm
Lesson 4 - Latent Dirichlet Allocation (LDA) Technique
Lesson 5 - Word2vec Architecture (Skip Grams vs CBOW)
Lesson 6 - Text classification, Document vectors, Text classification using Doc2vec

Module 3: Reinforcement Learning


Lesson 1 - Introduction to Reinforcement Learning
Lesson 2 - Reinforcement Learning Framework and Elements
Lesson 3 - Multi-Arm Bandit
Lesson 4 - Markov Decision Process
Lesson 5 - Solution Methods
Lesson 6 - Q-value and Advantage Based Algorithms
Module 4: Time Series Forecasting
Lesson 1 - What is Time Series?
Lesson 2 - Regression vs Time Series
Lesson 3 - Examples of Time Series data
Lesson 4 - Trend, Seasonality, Noise and Stationarity
Lesson 5 - Time Series Operations
Lesson 6 - Detrending
Lesson 7 - Successive Differences
Lesson 8 - Moving Average and Smoothing
Lesson 9 - Exponentially weighted forecasting model
Lesson 10 - Lagging
Lesson 11 - Correlation and Auto-correlation
Lesson 12 - Holt Winters Methods
Lesson 13 - Single Exponential smoothing
Lesson 14 - Holt's linear trend method
Lesson 15 - Holt's Winter seasonal method
Lesson 16 - ARIMA and SARIMA
Module 5
Lesson 1 - Introduction To AI And Deep Learning
Lesson 2 - Artificial Neural Network Lesson
Lesson 3 - Deep Neural Network and Tools Lesson
Lesson 4 - Deep Neural Net Optimization, Tuning, and Interpretability
Lesson 5 - Convolutional Neural Net(CNN)
Lesson 6 - Recurrent Neural Networks
Lesson 7 – Autoencoders

Module 6: Advanced Deep Learning and Computer Vision


Lesson 1 - Course Introduction
Lesson 2 - Prerequisites for the course
Lesson 3 - RBM and DBNs
Lesson 4 - Variational AutoEncoder
Lesson 5 - Working with Deep Generative Models
Lesson 6 - Applications: Neural Style Transfer and Object Detection
Lesson 7 - Distributed & Parallel Computing for Deep Learning Models
Lesson 8 - Reinforcement Learning
Lesson 9 - Deploying Deep Learning Models and Beyond
Lesson 10 - Introduction to Image data
Lesson 11 - Introduction to Convolutional Neural Networks
Lesson 12 - Famous CNN architectures
Lesson 13 - Transfer Learning
Lesson 14 - Object detection
Lesson 15 - Semantic segmentation
Lesson 16 - Instance Segmentation
Lesson 17 - Other variants of convolution
Lesson 18 - Metric Learning
Lesson 19 - Siamese Networks
Lesson 20 - Triplet Loss
(USA)
2-Industrial Park Drive, E-Waldorf, MD, 20602,
United States

CONTACT US
(INDIA)
B-44, Sector-59, Noida Uttar Pradesh 201301

(USA)
+1-844-889-4054

(INDIA)
+91-92-5000-4000

[email protected]

www.careerera.com

You might also like