Data Science Topics
Data Science Topics
Contents
1. Introduction to data science.....................................................................................................................................2
2. Statistics.................................................................................................................................................................... 2
3. Probability.................................................................................................................................................................3
4. Python....................................................................................................................................................................... 3
5. Data and Data Science Thinking...............................................................................................................................5
6. Data Analytics Overview...........................................................................................................................................5
7. Introduction to Artificial Intelligence.......................................................................................................................5
8. Machine Learning, Data Science and Artificial Intelligence.....................................................................................5
Data Science Course Content
2. Statistics
Introduction to Statistics
o Statistical and Non-Statistical Analysis
o Major categories of statistics – Frequency and Bayesian
o Difference between Statistics and Probabilities
o Statistical terms
o Difference between Descriptive Statistics and Inferential Statistics
o Understanding of Population and Samples
Descriptive Statistics
Inferential Statistics
Central Limit Theorem
Types of variables
o Nominal/Categorical
o Ordinal
o Interval/Ratio
o Continuous, Time Series
Central Tendency
o Mean
o Weighted mean
o Trimmed mean/Truncated Mean
o Interquartile mean
o Trimmed Mean
o Median
o Mode
Measure of Statistical dispersions
o Variance and Bessel correction
o Standard Deviation
o Standard Error
o Margin of Error
o IQR
o Range
o Mean absolute difference
o median absolute deviation
o Coefficient of variance
o Skewness
o Kurtosis
o Degrees of freedom
Data Science Course Content
Sampling Techniques
Sampling errors
Sample size estimation
Point estimation & margin of error
Multi Collinearity
Co-variance and correlation
P- value and critical value approach
T-Distribution and T-Statistics
Hypothesis testing’s
o What is Hypothesis Testing
o Different types of Errors (Type I and Type II Errors)
o Z-test
o T-test
o Chi-square test
o ANOVA (one way and two way)
o F-test & f score
o P-Value & Significance Level
3. Probability
Probability
Venn diagram
counting (permutation & combination)
Expectation
Conditional probability
Joint Probability
Marginal Probability
Mutually exclusive events and Rules of Independence
Rules of Probabilities
Bayesian Network
Random Variables and Expected Values
Bayes theorem
Maximum likelihood estimation
Probability Distributions
o Continuous Distributions- (Normal, uniform, T, F, chi square)
o "Discrete Distributions- (Bernoulli, binomial, Poisson)
o Empirical Rules with Z- Score
4. Python
Why python for data analysis
how to install Anaconda
Running few simple programs using python
"Python objects
o Lists
o Strings
o sets
o file objects
Data Science Course Content
o Tuples
o Dictionaries"
o Arrays, Data frames in python
"Python Libraries
o Numpy
o Scipy
o Matplotlib
o Pandas
o Scikit Learn
o Seaborn
o os
o regular expressions
Introduction to Series and Data frames
Visualization on dataset using python
Distribution analysis in python
Box plot in python
Comments in python
Functions in python
conversion functions
Math functions
User defined Functions
Parameters and arguments of functions
Range functions python
Recursive function and its examples
"Conditionals in python
o If loop
o elif
o if elif else"
o "Loops in python
o for loop
o while loop"
What is pandas
Benefits of using pandas
Broadcasting in Python
Array shape manipulations
Data structures in pandas
o Series
o Data frame
o Panel"
"Various Data Frame Operations
o Selection
o Deletion etc.
o "Grouping, Merging, and Reshaping of Data
o Groupby
o Aggregate
o Transform
o Filtering
o Merging and joining (concat and append)
o Drop "
Apply functions in pandas
Accessing the objects in python by index
Creating matrixes using numpy
Data Science Course Content
Continuous variable
Categorical variable
o Bivariate Analysis
Continuous - Continuous
Categorical and Categorical
Categorical and Continuous "
o Feature Engineering
o Variable transformation
o Variable /Feature Creation
o Project
Supervised Regression Algorithms
o Simple Linear Regression
o Multiple Linear Regression
o Ordinary Least Square(OLS)
o Decision tree Regression
o Random Forest Regression
o GLM (Poisson regression, spline)
o Support Vector Machines Regression
o Error and Accuracy
o Gradient Descent
o Regularization Techniques
o Maximum Likelihood estimation(MLE)
o Probabilistic diagnosis of outliers
o L2 and L1 Norms
o Ridge Regression
o Lasso Regression and ElasticNet
o Project
Supervised Classification Algorithms
o Logistic regression classification
o Multiclass Classification using Logistic Regression
o Decision tree Classification
o Random Forest classification
o Support Vector Machines classification
o What is Naïve Bayes theorem and the limitation
o Naïve Bayes Classification
o Ada boost/ Adaptive - Boosting Algorithm
o GBM
o Probability in Classification
o Creating the log loss formula with entropy
o Softmax Function
o MLE in classification
o Understanding the Neural Networks
o SVM
o Gradient Boosting
o XG Boost (Extreme Gradient Boosting)
o Project
Unsupervised Algorithms
o K-means Clustering
o Hierarchical clustering
o Association Rule Mining
o KNN Classifier
Data Science Course Content
o PCA
o Project
Model Evaluation Metrics
o ROC Curves
o Confusion matrix
o Accuracy
o Recall & Precision
o Specificity & Sensitivity
o Receiver Operating Characteristic (ROC) curve
o Area Under Curve (AUC)
o F1-Score
o AIC & BIC Scores
o R squared & Adjusted R squared
o RMSE, MSE
Model selection Techniques
o Cross validation
o Boot strap
o Model selection using Statistical tests
o Grid search
o Evaluation Matrix
Natural Language Processing (NLP)
o What is NLP
o Cleaning Text
o Tokenization
o Term Frequency (TF)
o Term Frequency – Inverse Document Frequency (TF-IDF)
o Document Term Matrix
Additional Support – Interview Questions, Sample Resumes, Resume Building Assistance and Live
Projects