0% found this document useful (0 votes)
65 views3 pages

Unit 1 - Week (1 - 4) : Planning and Thinking Skills For Architecting Data Science Solutions

This document outlines the curriculum for a data science training program organized into 6 units over 24 weeks. Unit 1 focuses on data science fundamentals like the types of analysis and modeling approaches. Unit 2 covers foundational analytics skills in Excel, Tableau, and Python/R. Unit 3 is about statistical modeling, probabilities, and exploratory data analysis. Unit 4 addresses data pre-processing techniques. Unit 5 covers data visualization and linear regression. Unit 6 teaches logistic regression, time series analysis, and other discriminative statistical models.

Uploaded by

Avijit Manna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views3 pages

Unit 1 - Week (1 - 4) : Planning and Thinking Skills For Architecting Data Science Solutions

This document outlines the curriculum for a data science training program organized into 6 units over 24 weeks. Unit 1 focuses on data science fundamentals like the types of analysis and modeling approaches. Unit 2 covers foundational analytics skills in Excel, Tableau, and Python/R. Unit 3 is about statistical modeling, probabilities, and exploratory data analysis. Unit 4 addresses data pre-processing techniques. Unit 5 covers data visualization and linear regression. Unit 6 teaches logistic regression, time series analysis, and other discriminative statistical models.

Uploaded by

Avijit Manna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Unit 1 | Week (1 - 4)

Planning and Thinking Skills for Architecting Data Science


Solutions
 10V's of data, Understanding Classification, Segmentation, Regression and Optimization (The
general tasks of a Data Scientist)
 Understanding Statistical (Discriminative and Generative), Non-Parametric (Instance Based and
Iterative) Models Graphically
 The Latest Trends: Sub-Space, Spectral, Kernel and Neural Networks

Unit 2 | Week (5 - 8)

Foundation Courses
 Data Analytics in Excel - foundation to dashboarding
 Visualization using Tableau
 Python / R Programming - coding structures, data handling, control structures, etc.

Unit 3 | Week (9 - 12)


Statistical modeling & EDA for Predictive Analytics

 Analytics Problem Solving - CRISP-DM Framework for business problem solving


 Probabilities, joint and conditional probabilities, simulations and estimations. Introduction to
gaussian mixtures and anomaly detection
 Data types, basic probabilities, Probability distributions (Discrete and Continuous) -Bernoulli,
Binomial, Multinomial and Poisson distribution
 Describing the relationship between attributes: Covariance; Correlation; ChiSquare
 Special emphasis on Normal distribution; Central Limit Theorem
 Inferential stats: t, f chi-square testing
 Inferential statistics: How to learn about the population from a sample and vice versa; Sampling
distributions; Confidence Intervals, Hypothesis Testing.
 Case Study - Uber Supply Gap - summarize and visualize your solutions using Uber supplydata.
Unit 4 | Week (13 - 16)

Data Pre-Processing

 Introduction to R/Python, Binning, Standardization, Normalization


 Type Conversion, Merging
 Normal Curves, Central Tendency and Outlier Detection
 Dimensionality Reduction: PCA, SVD approaches
 Handling Missing Values (K-NN, MI, Clustering etc.)

Unit 5 | Week (17 - 20)

Data Visualization in R / Python

 Data Exploration - Histograms, Bar Chart, Box Plot, Line Graph, Scatter Plot
 Data Storytelling - The Science, ggplot, Bubble Charts with Multiple Dimensions, Gauge Charts,
Treemap, Heat Map and Motion Charts

Linear Regression
 Approach: Model Estimation, MLE & Error Function, Optimization through Gradient Descent for
finding parameters
 Constructing a Linear Regression, Diagnostics
 Interpretation and Applications
 Case Study 1 - Help a digital media company understand why their viewership is falling and
propose recommendations to increase viewership
 Case Study 2 - Create a model to understand the factors that influence car prices in the US.

Unit 5 | Week (17 - 20)

Decision Trees
 Rule Based Knowledge: Logic of Rules, Evaluating Rules, Rule Induction and Association
Rules.
 Construction of Decision Trees through Simplified Examples; Choosing the "Best" attribute at
each Non-Leaf node; Entropy; Information Gain, Gini Index, Chi Square, Regression Trees.
 Generalizing Decision Trees; Information Content and Gain Ratio; Dealing with Numerical
Variables; other Measures of Randomness
 Pruning a Decision Tree; Cost as a consideration; Unwrapping Trees as Rules Oblique Decision
Trees
 Oblique Decision Trees
 Case Study - Predict whether a customer will default on loan or not
Instance based learning
 K-NN method, wilson editing and triangulation
 K-NN in collaborative filtering, digit recognition

Ensembles
 Methods of Ensembling (Stacking, Mixture of Experts)K-NN in collaborative filtering, digit
recognition
 Bagging and Random forest (Logic, Practical Applications)
 Ada Boost
 Gradient Boosting Machines

Unit 6 | Week (21 - 24)

Discriminative Statistical Models: Logistic Regression


 Why Linear Regression Fails and Logit Function
 Approach: Model Estimation, MLE & Error Function, Optimization through Gradient Descent for
finding parameters
 Constructing Logistic Regression, Diagnostics
 Interpretation and Applications
 Case Study 1 - Predict employee attrition in a large organization.
 Case Study 2 - Predict whether the customers will buy a life insurance policy using a large
insurer's past customer data.

Time Series
 Regression on Time.
 Modeling Seasonality as Deviation
 Statistician's Approach: Components of a Time Series and Estimation Methods
 Smoothing: Moving Average, Weighted and Exponential Moving
 Holt Winters Method
 Box-Jenkins and ARIMA
 Case Study - Forecast gold prices using past 30 years data.

You might also like