Data Science Content
Data Science Content
Data Science Content
2. Business statistics
a. Data types
Continuous variables
Ordinal Variables
Categorical variables
Time Series
Miscellaneous
b. Descriptive statistics
c. Sampling
Need for Sampling?
Different types of Sampling
Simple random sampling
Systematic sampling
Stratified Sampling
d. Data distributions
Normal Distribution – Characteristics of a normal
distribution
Binomial Distribution
e. Inferential statistics
f. Hypothesis testing
Type I error
Type II error
Null and alternate hypothesis
Reject or acceptance criterion
3. Introduction to R
a. A Primer to R programming
b. What is R? similarities to OOP and SQL
c. Types of objects in R – lists, matrices, arrays, data.frames etc.
d. Creating new variables or updating existing variables
e. IF statements and conditional loops - For, while etc.
f. String manipulations
g. Sub setting data from matrices and data.frames
h. Casting and melting data to long and wide format.
Flat No: 212, 2nd Floor, Annapurna Block, Aditya Enclave, Ameerpet, Hyd
[email protected] www.kellytechno.com Ph: 998 570 6789. Online: 001 973 780 6789.
DATA SCIENCE
i. Merging datasets
5. Introduction to Python:
a. Understanding the reason of Python’s popularity
b. Basics of Python: Operations, loops, functions, dictionaries
c. Advanced operations with text: Finding, Sequencing and basic analytics
d. Ground-up for Deep-Learning
6. Predictive analytics
a. Different types of predictive analytics – prediction, forecasting,
optimization, segmentation etc.
b. Supervised learning
Prediction (Linear)
1. Simple Linear Regression
2. Assumptions
3. Model development and interpretation
4. Sum of least squares
5. Model validation – tests to validate assumptions
6. Multiple linear regression
7. Disadvantages of linear models
Classification
1. Logistic Regression
1. Need for logistic regression
2. Logit link function
3. Maximum likelihood estimation
4. Model development and interpretation
5. Confusion Matrix – error measurement
6. ROC curve
Flat No: 212, 2nd Floor, Annapurna Block, Aditya Enclave, Ameerpet, Hyd
[email protected] www.kellytechno.com Ph: 998 570 6789. Online: 001 973 780 6789.
DATA SCIENCE
7. Measuring sensitivity and specificity
8. Advantages and disadvantages of logistic regression
models
2. Decision trees
1. C5.0
2. Classification and Regression trees(CART)
a. Process of tree building
b. Entropy and Gini Index
c. Problem of over fitting
d. Pruning a tree back
e. Trees for Prediction (Linear) – example
f. Tress for classification models – example
g. Advantages of tree based models?
b. Advanced methods
1. Support Vector machines
2. Neural networks
3. Introduction to deep learning
4. Introduction to online learning
d. Un-Supervised learning
Cluster analysis
1. Hierarchical clustering
2. K-Means clustering
3. Distance measures
4. Applications of cluster analysis – Customer Segmentation
Flat No: 212, 2nd Floor, Annapurna Block, Aditya Enclave, Ameerpet, Hyd
[email protected] www.kellytechno.com Ph: 998 570 6789. Online: 001 973 780 6789.