0% found this document useful (0 votes)
9 views7 pages

Data Science Topics

ok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views7 pages

Data Science Topics

ok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Data Science Course Content

Contents
1. Introduction to data science.....................................................................................................................................2
2. Statistics.................................................................................................................................................................... 2
3. Probability.................................................................................................................................................................3
4. Python....................................................................................................................................................................... 3
5. Data and Data Science Thinking...............................................................................................................................5
6. Data Analytics Overview...........................................................................................................................................5
7. Introduction to Artificial Intelligence.......................................................................................................................5
8. Machine Learning, Data Science and Artificial Intelligence.....................................................................................5
Data Science Course Content

1. Introduction to data science

1. What is Data Science


2. How it is different from Big Data and Data Analytics
3. Data Driven decision making
4. Purpose and Business problems
5. How Data Scientist work
6. Skills of a data scientist
7. Different sectors using Data science
8. Real World Applications
9. Future of AI and how the world is changing

2. Statistics
 Introduction to Statistics
o Statistical and Non-Statistical Analysis
o Major categories of statistics – Frequency and Bayesian
o Difference between Statistics and Probabilities
o Statistical terms
o Difference between Descriptive Statistics and Inferential Statistics
o Understanding of Population and Samples
 Descriptive Statistics
 Inferential Statistics
 Central Limit Theorem
 Types of variables
o Nominal/Categorical
o Ordinal
o Interval/Ratio
o Continuous, Time Series
 Central Tendency
o Mean
o Weighted mean
o Trimmed mean/Truncated Mean
o Interquartile mean
o Trimmed Mean
o Median
o Mode
 Measure of Statistical dispersions
o Variance and Bessel correction
o Standard Deviation
o Standard Error
o Margin of Error
o IQR
o Range
o Mean absolute difference
o median absolute deviation
o Coefficient of variance
o Skewness
o Kurtosis
o Degrees of freedom
Data Science Course Content

o Law of Large Numbers


o Confidence Level & Interval
o P value and its interpretation
o Correlation and auto correlation & correlation matrix
o Correlation ratio

 Sampling Techniques
 Sampling errors
 Sample size estimation
 Point estimation & margin of error
 Multi Collinearity
 Co-variance and correlation
 P- value and critical value approach
 T-Distribution and T-Statistics
 Hypothesis testing’s
o What is Hypothesis Testing
o Different types of Errors (Type I and Type II Errors)
o Z-test
o T-test
o Chi-square test
o ANOVA (one way and two way)
o F-test & f score
o P-Value & Significance Level

3. Probability
 Probability
 Venn diagram
 counting (permutation & combination)
 Expectation
 Conditional probability
 Joint Probability
 Marginal Probability
 Mutually exclusive events and Rules of Independence
 Rules of Probabilities
 Bayesian Network
 Random Variables and Expected Values
 Bayes theorem
 Maximum likelihood estimation
 Probability Distributions
o Continuous Distributions- (Normal, uniform, T, F, chi square)
o "Discrete Distributions- (Bernoulli, binomial, Poisson)
o Empirical Rules with Z- Score

4. Python
 Why python for data analysis
 how to install Anaconda
 Running few simple programs using python
 "Python objects
o Lists
o Strings
o sets
o file objects
Data Science Course Content

o Tuples
o Dictionaries"
o Arrays, Data frames in python
 "Python Libraries
o Numpy
o Scipy
o Matplotlib
o Pandas
o Scikit Learn
o Seaborn
o os
o regular expressions
 Introduction to Series and Data frames
 Visualization on dataset using python
 Distribution analysis in python
 Box plot in python
 Comments in python
 Functions in python
 conversion functions
 Math functions
 User defined Functions
 Parameters and arguments of functions
 Range functions python
 Recursive function and its examples
 "Conditionals in python
o If loop
o elif
o if elif else"
o "Loops in python
o for loop
o while loop"
 What is pandas
 Benefits of using pandas
 Broadcasting in Python
 Array shape manipulations
 Data structures in pandas
o Series
o Data frame
o Panel"
 "Various Data Frame Operations
o Selection
o Deletion etc.
o "Grouping, Merging, and Reshaping of Data
o Groupby
o Aggregate
o Transform
o Filtering
o Merging and joining (concat and append)
o Drop "
 Apply functions in pandas
 Accessing the objects in python by index
 Creating matrixes using numpy
Data Science Course Content

 Statistical operators using Numpy

5. Data and Data Science Thinking


 Basics of data categorization and different formats of data
o Structured Data
o Unstructured Data
o Time Series
 Why and how to raise the right question
 Difference between deductive learning and inductive learning
 Primary and Secondary data collection process
 Things you should avoid as a data scientist
 Correlation is not the causation and its importance
 Limitations as a data scientist
 Transformation of intuition-based decision making to data driven
 Story Telling

6. Data Analytics Overview


 Data Analytics Process
 Exploratory Data Analysis(EDA)
 Types of Analytics
 How to start with Data Analytics Project
 Intro to Web Scrapping and Beautiful Soup

7. Introduction to Artificial Intelligence


 ML fundamentals
 ML use cases and practical use in daily life
 Supervised and unsupervised learning
 Classification and regression problem
 Intro to Scikit- learn
 Hyper parameters and Model Validation
 Feature Engineering

8. Machine Learning, Data Science and Artificial Intelligence


 Supervised Learning
 Unsupervised Learning
 Difference between Classification and Regression
 Data pre-processing
o What is data set.
o What is training set
o What is test set and need for test set
o Missing values treatment
o Expectation-Maximization technique for missing value
o using Gradient
o Feature scaling
o Feature transformation
o binning
o one hot encoding
o Feature engineering
o Outliers treatment
o Bias and Variance trade off
o Over fitting and Under fitting
 Exploratory Data analysis(EDA)
o Univariate analysis
Data Science Course Content

 Continuous variable
 Categorical variable
o Bivariate Analysis
 Continuous - Continuous
 Categorical and Categorical
 Categorical and Continuous "
o Feature Engineering
o Variable transformation
o Variable /Feature Creation
o Project
 Supervised Regression Algorithms
o Simple Linear Regression
o Multiple Linear Regression
o Ordinary Least Square(OLS)
o Decision tree Regression
o Random Forest Regression
o GLM (Poisson regression, spline)
o Support Vector Machines Regression
o Error and Accuracy
o Gradient Descent
o Regularization Techniques
o Maximum Likelihood estimation(MLE)
o Probabilistic diagnosis of outliers
o L2 and L1 Norms
o Ridge Regression
o Lasso Regression and ElasticNet
o Project
 Supervised Classification Algorithms
o Logistic regression classification
o Multiclass Classification using Logistic Regression
o Decision tree Classification
o Random Forest classification
o Support Vector Machines classification
o What is Naïve Bayes theorem and the limitation
o Naïve Bayes Classification
o Ada boost/ Adaptive - Boosting Algorithm
o GBM
o Probability in Classification
o Creating the log loss formula with entropy
o Softmax Function
o MLE in classification
o Understanding the Neural Networks
o SVM
o Gradient Boosting
o XG Boost (Extreme Gradient Boosting)
o Project
 Unsupervised Algorithms
o K-means Clustering
o Hierarchical clustering
o Association Rule Mining
o KNN Classifier
Data Science Course Content

o PCA
o Project
 Model Evaluation Metrics
o ROC Curves
o Confusion matrix
o Accuracy
o Recall & Precision
o Specificity & Sensitivity
o Receiver Operating Characteristic (ROC) curve
o Area Under Curve (AUC)
o F1-Score
o AIC & BIC Scores
o R squared & Adjusted R squared
o RMSE, MSE
 Model selection Techniques
o Cross validation
o Boot strap
o Model selection using Statistical tests
o Grid search
o Evaluation Matrix
 Natural Language Processing (NLP)
o What is NLP
o Cleaning Text
o Tokenization
o Term Frequency (TF)
o Term Frequency – Inverse Document Frequency (TF-IDF)
o Document Term Matrix

Additional Support – Interview Questions, Sample Resumes, Resume Building Assistance and Live
Projects

You might also like