0% found this document useful (0 votes)
40 views

1.descriptive Statistics and Probability Distributions:: Datascience Course Content

The document outlines the course content for a data science course. It covers 10 modules that include topics such as descriptive statistics, probability distributions, inferential statistics, hypothesis testing, linear regression, logistic regression, time series analysis, machine learning techniques like decision trees, neural networks, and support vector machines. It also covers data preparation, model validation, and visualization using Tableau. Real-world case studies are included for each major topic area.

Uploaded by

paramreddy2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

1.descriptive Statistics and Probability Distributions:: Datascience Course Content

The document outlines the course content for a data science course. It covers 10 modules that include topics such as descriptive statistics, probability distributions, inferential statistics, hypothesis testing, linear regression, logistic regression, time series analysis, machine learning techniques like decision trees, neural networks, and support vector machines. It also covers data preparation, model validation, and visualization using Tableau. Real-world case studies are included for each major topic area.

Uploaded by

paramreddy2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 10

DATASCIENCE COURSE CONTENT :

1.Descriptive Statistics and Probability Distributions:


 Introduction about Statistics
 Different Types of Variables
 Measures of Central Tendency with examples
 Measures of Dispersion
 Probability & Distributions
 Probability Basics
 Binomial Distribution and its properties
 Poisson distribution and its properties
 Normal distribution and its properties

2.Inferential Statistics and Testing of Hypothesis


 Sampling methods
 Different methods of estimation
 Testing of Hypothesis & Tests
 Analysis of Variance

3.Covariance & Correlation


->> Predictive Modeling Steps and Methodology with Live example:
 Data Preparation
 Exploratory Data analysis
 Model Development
 Model Validation
 Model Implementation

4.Supervised Techniques:
->> Multiple linear Regression
 Linear Regression - Introduction - Applications
 Assumptions of Linear Regression
 Building Linear Regression Model
 Understanding standard metrics (Variable significance, R-square/Adjusted R-Square, Global
hypothesis etc)
 Validation of Linear Regression Models (Re running Vs. Scoring)
 Standard Business Outputs (Decile Analysis, Error distribution (histogram), Model equation, drivers
etc)
 Interpretation of Results - Business Validation - Implementation on new data
 Real time case study of Manufacturing and Telecom Industry to estimate the future revenue using the
models
->> Logistic Regression - Introduction - Applications
 Linear Regression Vs. Logistic Regression Vs. Generalized Linear Models
 Building Logistic Regression Model
 Understanding standard model metrics (Concordance, Variable significance, Hosmer Lemeshov Test,
Gini, KS, Misclassification etc)
 Validation of Logistic Regression Models (Re running Vs. Scoring)
 Standard Business Outputs (Decile Analysis, ROC Curve)
 Probability Cut-offs, Lift charts, Model equation, drivers etc)
 Interpretation of Results - Business Validation - Implementation on new data
 Real time case study to Predict the Churn customers in the Banking and Retail industry
->> Partial Least Square Regression
 Partial Least square Regression - Introduction - Applications
 Difference between Linear Regression and Partial Least Square Regression
 Building PLS Model
 Understanding standard metrics (Variable significance, R-square/Adjusted R-Square, Global
hypothesis etc)
 Interpretation of Results - Business Validation - Implementation on new data
 Sharing the real time example to identify the key factors which are driving the Revenue

5.Variable Reduction Techniques


->> Factor Analysis
->> Principle component analysis
 Assumptions of PCA
 Working Mechanism of PCA
 Types of Rotations
 Standardization
 Positives and Negatives of PCA

6.Supervised Techniques Classification:


->> CHAID
->> CART
->> Difference between CHAID and CART
->> Random Forest
 Decision tree vs. Random Forest
 Data Preparation
 Missing data imputation
 Outlier detection
 Handling imbalance data
 Random Record selection
 Random Forest R parameters
 Random Variable selection
 Optimal number of variables selection
 Calculating Out Of Bag (OOB) error rate
 Calculating Out of Bag Predictions
->> Couple of Real time use cases which are related to Telecom and Retail Industry. Identification of
the Churn.

7.Unsupervised Techniques:
->> Segmentation for Marketing Analysis
  Need for segmentation
 Criterion of segmentation
 Types of distances
 Clustering algorithms
 Hierarchical clustering
 K-means clustering
 Deciding number of clusters
 Case study
->> Business Rules Criteria
->> Real time use case to identify the Most Valuable revenue generating Customers.

8.Time series Analysis:


->> Time Series Components( Trend, Seasonality, Cyclicity and Level) and Decomposition
->> Basic Techniques
 Averages,
 Smoothening etc
Advanced Techniques
 AR Models,
 ARIMA
 UCM
 ->> Hybrid Model
->> Understanding Forecasting Accuracy - MAPE, MAD, MSE etc
->> Couple of use cases, To forecast the future sales of products

9.Text Analytics
->> Gathering text data from web and other sources
Processing raw web data
Collecting twitter data with Twitter API
->> Naive Bayes Algorithm
 Assumptions and of Naïve Bayes
 Processing of Text data
 Handling Standard and Text data
 Building Naïve Bayes Model
 Understanding standard model metrics
 Validation of the Models (Re running Vs. Scoring)
->> Sentiment analysis
 Goal Setting
 Text Preprocessing
 Parsing the content
 Text refinement
 Analysis and Scoring
->> Use case of Health care industry, To identify the sentiment of the patients on Specified hospital
by extracting the data from the TWITTER.

10.Visualization Using Tableau:


->> Live connectivity from R to Tableau
Generating the Reports and Charts

Data Science Online Training Course Content


Module:1 – Descriptive & Inferential Statistics
1. Turning Data into Information
• Data Visualization
• Measures of Central Tendency
• Measures of Variability
• Measures of Shape
• Covariance, Correlation
• Using Software-Real Time Problems
2.Probability Distributions
• Probability Distributions: Discrete Random Variables
• Mean, Expected Value
• Binomial Random Variable
• Poisson Random Variable
• Continuous Random Variable
• Normal distribution
• Using Software-Real Time Problems
3.Sampling Distributions
• Central Limit Theorem
• Sampling Distributions for Sample Proportion, p-hat
• Sampling Distribution of the Sample Mean, x-bar
• Using Software-Real Time Problems
4.Confidence Intervals
• Statistical Inference
• Constructing confidence intervals to estimate a population Mean,
Variance, Proportion
• Using Software-Real Time Problems
5.Hypothesis Testing
• Hypothesis Testing
• Type I and Type II Errors
• Decision Making in Hypothesis Testing
• Hypothesis Testing for a Mean, Variance, Proportion
• Power in Hypothesis Testing
• Using Software-Real Time Problems
6.Comparing Two Groups
• Comparing Two Groups
• Comparing Two Independent Means, Proportions
• Pairs wise testing for Means
• Two Variances Test(F-Test)
• Using Software-Real Time Problems
7. Analysis of Variance (ANOVA)
• One-Way and Two-way ANOVA
• ANOVA Assumptions
• Multiple Comparisons (Tukey, Dunnett)
• Using Software-Real Time Problems
8.Association Between Categorical Variables
• Two Categorical Variables Relation
• Statistical Significance of Observed Relationship / Chi-Square Test
• Calculating the Chi-Square Test Statistic
• Contingency Table
• Using Software-Real Time Problems

Module:2 – Applied Regression Methods


1.Simple Linear Regression(SLR)
 Prerequisite Mathematics
 The Simple Linear Regression Model
 What is The Common Error Variance?
 The Coefficient of Determination
 Hypothesis Test for the Population Correlation Coefficient
 Using Software-Real Time Problems
2.SLR Model Evaluation
 Inference for the Population Intercept and Slope
 The Analysis of Variance (ANOVA) table and the F-test
 Equivalent linear relationship tests
 Decomposing the Error
 The Lack of Fit F-test
 Using Software-Real Time Problems
3.SLR Estimation & Prediction
 Confidence Interval for the Mean Response
 Prediction Interval for a New Response
 Using Software-Real Time Problems
4.SLR Model Assumptions
 Model Assumptions Diagnostics
 Using Software-Real Time Problems
5.Multiple Linear
Regression(MLR)
 The Multiple Linear Regression Model
 Using Software-Real Time Problems
6.MLR Model Evaluation
 The General Linear Test
 Sequential (or Extra) Sums of Squares
 The Hypothesis Tests for the Slopes
 Partial R-squared
 Lack of Fit Testing in the Multiple Regression Setting
 Using Software-Real Time Problems
7.MLR Estimation, Prediction & Model Assumptions
 Confidence Interval for the Mean Response
 Prediction Interval for a New Response
 Model Assumptions Diagnostics
 Using Software-Real Time Problems
8.Categorical Predictors
 Coding Qualitative Variables
 Additive Effects
 Interaction Effects
 Using Software-Real Time Problems
9.Data Transformations
 Using Software-Real Time Problems
10. Model Building
 Forward Selection/Backward Elimination
 Stepwise Regression
 Adjusted R-Sq, Mallows Cp, PRESS, AIC, BIC, SBC, AICC
 Outliers and Influential Data Points
 Cooks Distance/DIFBETAS/DFFITS
 Using Software-Real Time Problems

Module:3 – Applied Time Series Analysis


1. Time Series Basics
• Overview
• ACF and AR(1) Model
2. MA Models, PACF
• Moving Average Models (MA models)
• PACF
• Using Software-Real Time Problems
3. ARIMA models
• Non-seasonal ARIMA
• Diagnostics
• Forecasting
• Using Software-Real Time Problem
4. Seasonal Models
• Seasonal ARIMA
• Identifying Seasonal Models
• Using Software-Real Time Problems
5. Smoothing and Decomposition Methods
• Decomposition Models
• Smoothing Time Series
• Using Software-Real Time Problems
6. Periodogram
• Periodogram
• Using Software-Real Time Problems
7. Regression with ARIMA errors; CCF; 2 Time Series
• Linear Regression Models with Autoregressive Errors
• CCF and Lagged Regressions
• Using Software-Real Time Problems

Module:4 – Machine Learning


1.Introduction
• Application Examples
• Supervised Learning
• Unsupervised Learning
2.Regression Shrinkage Methods
• Ridge RegressionüLasso
• Using Software-Real Time Problems
3.Classification
• Logistic Regression
• Discriminant Analysis
• Nearest-Neighbor Methods
• Using Software-Real Time Problems
4. Tree-based Methods

• The Basics of Decision Trees


• Regression Trees
• Classification Trees
• Ensemble Methods
• Bagging, Boosting, Bootstrap, Random Forests
• Using Software-Real Time Problems
5. Neural Networks
• Introduction
• Single Layer Perceptron
• Multi-layer Perceptron
• Forward Feed and Backward Propagation
• Using Software-Real Time Problems
6.Support Vector Machine
• Support Vector Classifier
• Support Vector Machine
• SVMs with More than Two Classes
• Using Software-Real Time Problems
7.Dimension Reduction Methods
• Principal Components Regression (PCR)
• Partial Least Squares (PLS)
• Using Software-Real Time Problems
8.Association rules
• Market Basket Analysis
• Using Software-Real Time Problems
Module:5 – SAS/R Programming
1.Base SAS
• Working with SAS program syntax
• Examining SAS data sets
• Accessing SAS libraries
• Producing Detail Reports
• Sorting and grouping report data
• Enhancing reports
• Formatting Data Values
• Creating user-defined formats
• Reading SAS Data Sets
• Customizing a SAS data set
• Handling missing data
• Manipulating Data
• Combining SAS Data Sets
• Creating Summary Reports
• Controlling Input and Output
• Summarizing Data
• Reading Raw Data Files
• Data Transformations
• Debugging Techniques
• Using the PUTLOG statement
• Processing Data Iteratively
• Restructuring a Data Set
• Creating and Maintaining Permanent Formats
2.SAS SQL
• Working with SAS program syntax
• Basic Queries
• Examining SAS data sets
• Sub-Queries
• Accessing SAS libraries
• Joins (SQL)
• Producing Detail Reports
• Operators
• Sorting and grouping report data
• Creating Tables and Views
• Enhancing reports
• Managing Tables
• Formatting Data Values
3. SAS Macros
• Creating user-defined formats
• Macro Variables
• Reading SAS Data Sets
• Definitions
• Customizing a SAS data set
• Data Step and SQL Interfaces
• Handling missing data
4. R Programming
• Manipulating Data
• RCMDR Package
• Combining SAS Data Sets
• Rattle Package
• Creating Summary Reports
data science online training

You might also like