0% found this document useful (0 votes)
26 views

Data Science

Uploaded by

kollilokesh24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Data Science

Uploaded by

kollilokesh24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Full Stack

DATA SCIENCE & AI


PYTHON
Introduction to Data Science  Tuple Immutable concept
 len() || count() || index()
 Introduction to Data Science
 Forward indexing
 Discussion on Course Curriculum
 Backward Indexing
 Introduction to Programming
Dictionary and Dictionary comprehension
Python – Basics
 create a dictionary using variable
 Introduction to Python: Installation and
 keys:values concept
Running
 len() || keys() || values() || items()
(Jupyter Notebook, .py file from terminal,
 get() || pop() || update()
Google Colab)
 comparision of datastructure
 Data types and type conversion
 Introduce to range()
 Variables
 pass range() in the list
 Operators
 range() arguments
 Flow Control : If, Elif, Else
 For loop introduction using range()
 Loops
 Python Identifier Functions
 Building Funtions (print, type, id, sys, len)
 Inbuilt vs User Defined
Python - Data Types & Utilities  User Defined Function
 Function Argument
 List, List of Lists and List Comprehension
 Types of Function Arguments
 List creation
 Actual Argument
 Create a list with variable
 Global variable vs Local variable
 List mutable concept
 Anonymous Function | LAMBDA
 len() || append() || pop()
 insert() || remove() || sort() || reverse() Packages
 Forward indexing
 Backward Indexing Map Reduce
 Forward slicing OOPs
 Backward slicing
 Step slicing Class & Object:

Set  what is mean by inbuild class


 how to creat user class
 SET creation with variable  crate a class & object
 len() || add() || remove() || pop()  __init__ method
 union() | intersection() || difference()  python constructor
Tuple  constructor, self & comparing objects
 instane variable & class variable
 TUPLE Creation
 Create Tuple with variable Methods:
 what is instance method  Data Frame Attributes
 what is class method  Data Frame Methods
 what is static method  Rename Column & Index
 Accessor & Mutator  Inplace Parameter
 Handling missing or NaN values
Python DECORATOR:
 iLoc and Loc
 how to use decorator  Data Frame – Filtering
 inner class, outerclass  Data Frame – Sorting
 Inheritence  Data Frame – GroupBy
 Merging or Joining
Polymorphism:  Data Frame – Concat
 DataFrame - Adding, dropping columns &
 duck typing
rows
 operator overloading
 DataFrame - Date and time
 method overloading
 DataFrame - Concatenate Multiple csv files
 method overridding
 Magic method Numpy
 Abstract class & Abstract method
 Iterator  Introduction, Installation, pip command,
 Generators in python import numpy package,
ModuleNotFoundError, Famous Alias name to
Python - Production Level Numpy
 Fundamentals – Create Numpy Array, Array
 Error / Exception Handling
Manipulation, Mathematical Operations,
 File Handling
Indexing & Slicing
 Docstrings  Numpy Attributes
 Modularization
 Important Methods- min(),max(), sum(),
Pickling & Unpickling reshape(), count_nonzero(), sort(), flatten()
etc.,
Pandas  adding value to array of values
 Diagonal of a Matrix
 Introduction, Fundamentals, Importing
 Trace of a Matrix
Pandas, Aliasing, DataFrame
 Parsing, Adding and Subtracting Matrices
 Series – Intro, Creating Series Object, Empty
 "Statistical Functions: numpy.mean()
Series Object, Create series from
 numpy.median()
List/Array/Column from DataFrame, Index in
 numpy.std()
Series, Accessing values in Series
 numpy.sum()
 NaN Value
 numpy.min()"
 Series – Attributes (Values, index, dtypes,
size)  Filter in Numpy
 Series – Methods – head(), tail(), sum(), Matplotlib
count(), nunique() etc.,
 Date Frame  Introduction
 Loading Different Files  Pyplot
 Figure Class
 Axes Class  lmplot() function
 Setting Limits and Tick Labels  Seaborn Facetgrid() function
 Multiple Plots  Multi-plot grids
 Legend  Statistical Plots:
 Different Types of Plots:  Color Palettes:
 Line Graph  Faceting:
 Bar Chart  Regression Plots:
 Histograms,  Distribution Plots
 Scatter Plot  Categorical Plots:
 Pie Chart  Pair Plots
 3D Plots
Scipy
 Working with Images
 Customizing Plots  Signal and Image Processing (scipy.signal,
scipy.ndimage):
Seaborn
 Linear Algebra (scipy.linalg):
 catplot() function  Integration (scipy.integrate)
 stripplot() function  Statistics (scipy.stats):
 boxplot() function  Spatial Distance and Clustering (scipy.spatial):
 violinplot() function
Statsmodels
 pointplot() function
 barplot() function  Linear Regression (statsmodels.regression):
 Visualizing statistical relationship with  Time Series Analysis (statsmodels.tsa):
Seaborn relplot() function  Statistical Tests (statsmodels.stats)
 scatterplot() function  Anova (statsmodels.stats.anova):
 regplot() function  Datasets (statsmodels.datasets):
Mathematics
Set Theory  Binomial, Poisson, Normal Distribution,
Standard Normal Distribution
 Data Representation & Database Operations
 Guassian Distribution, Uniform Distribution
Combinatorics  Z Score
 Skewness
 Feature Selection  Kurtosis
 Permutations and Combinations for Sampling  Geometric Distribution
 Hyperparameter Tuning  Hyper Geometric Distribution
 Experiment Design  Markov Chain
 Data Partitioning and Cross-Validation
Linear Algebra
Probability
 Linear Equations
 Basics  Matrices(Matrix Algebra: Vector Matrix
 Theoretical Probability Vector matrix multiplication Matrix matrix
 Empirical Probability multiplication)
 Addition Rule  Determinant
 Multiplication Rule  Eigen Value and Eigen Vector
 Conditional Probability
 Total Probability Euclidean Distance & Manhattan Distance
 Probability Decision Tree Calculus
 Bayes Theorem
 Sensitivity & Specificity in Probability  Differentiation
 Bernouli Naïve Bayes, Gausian Naïve Bayes,  Partial Differentiation
Multinomial Naïve Bayes  Max & Min

Distributions Indices & Logarithms


STATISTICS
Introduction  Dependent Variable
 Independent Variable
 Population & Sample
 Control Moderating & Mediating
 Reference & Sampling technique
Frequency Distribution Table
Types of Data
 Nominal, Ordinal, Interval, Ratio
 Qualitative or Categorical – Nominal &
Ordinal
Types of Variables.
 Quantitative or Numerical – Discrete &
 Categorical Variables - Nomial variable &
Continuous
ordinal variables
 Cross Sectional Data & Time Series Data
 Numerical Variables: discreate & continuous
Measures of Central Tendency  Dependent Variable
 Independent Variable
 Mean, Mode & Median – Their frequency  Control Moderating & Mediating
distribution
Frequency Distribution Table
Descriptive statistic Measures of symmetry
 Relative Frequency, Cumulative Frequency
 skewness (positive skew, negative skew, zero  Histogram
skew)  Scatter Plots
 kurtosis (Leptokurtic, Mesokurtic,  Range
Platrykurtic)  Calculate Class Width:
Measurement of Spread  Create Intervals
 Count Frequencies
 Range, Variance, Standard Deviation  Construct the Table

Measures of variability Correlation, Regression & Collinearity

 Interquartile Range (IQR):  Pearson & Spearman Correlation Methods


 Mean Absolute Deviation (MAD)  Regression Error Metrics
 Coefficient of variation
 Covariance Others

Levels of Data Measurement  Percentiles, Quartiles, Inner Quartile Range


 Different types of Plots for Continuous,
 Nominal, Ordinal, Interval, Ratio Categorical variable
 Box Plot, Outliers
Variable
 Confidence Intervals
 Types of Variables.  Central Limit Theorem
 Categorical Variables - Nomial variable &  Degree of freedom
ordinal variables
Bias and Variance in ML
 Numerical Variables: discreate & continuous
Entropy in ML Inferential Statistics

Information Gain  Hypothesis Testing: One tail, two tail and p-


value
Surprise in ML
 Formulation of Null & Alternate Hypothesis
Loss Function & Cost Function  Type-I error & Type-II error
 Statistical Tests:
 Mean Squared Error, Mean Absolute Error –  Sample Test
Loss Function  ANOVA Test
 Huber Loss Function  Chi-square Test
 Cross Entropy Loss Function  Z-Test & T-Test

SQL
Introduction SQL Commands

 DBMS vs RDBMS  Create


 Intro to SQL  Insert
 SQL vs NoSQL  Alter, Modify, Rename, Update
 MySQL Installation  Delete, Truncate, Drop
 Grant, Revoke
Keys
 Commit, Rollback
 Primary Key  Select
 Foreign Key
SQL Clause
Constraints
 Where
 Unique  Distinct
 Not NULL  OrderBy
 Check  GroupBy
 Default  Having
 Auto Increment  Limit

CRUD Operations Operators

 Create  Comparison Operators


 Retrieve  Logical Operators
 Update  Membership Operators
 Delete  Identity Operators

SQL Languages Wild Cards

 Data Definition Language (DDL) Aggregate Functions


 Data Query Language
SQL Joins
 Data Manipulation Language (DML)
 Data Control Language  Inner Join & Outer Join
 Transaction Control Language  Left Join & Right Join
 Self & Cross Join
 Natural Join
EDA & ML
EDA • Feature Engineering – Adding new features as
per requirement, Modifying the data
 Univariate Analysis • Data Cleaning – Treating the missing values,
 Bivariate Analysis Outliers
 Multivariate Analysis • Data Wrangling – Encoding, Feature
Transformations, Feature Scaling
Data Visualisation
• Feature Selection – Filter Methods, Wrapper
 Various Plots on different datatypes Methods, Embedded Methods
 Plots for Continuous Variables • Dimension Reduction – Principal Component
 Plots for Discrete Variables Analysis (Sparse PCA & Kernel PCA), Singular
 Plots for Time Series Variables Value Decomposition
• Non Negative Matrix Factorization
ML Introduction
Regression
 What is Machine Learning?
 Types of Machine Learning Methods • Introduction to Regression
• Supervised Learning • Mathematics involved in Regression
• Unsupervised Learning • Regression Algorithms:
• Reinforcement Learning) • Simple Linear Regression
 Classification problem in general • Multiple Linear Regression
• Polynomial Regression
 Validation Techniques: CV,OOB
• Lasso Regression
 Different types of metrics for Classification
• Ridge Regression
 Curse of dimensionality
• Elastic Net Regression
 Feature Transformations
 Feature Selection Evaluation Metrics for Regression:
 Imabalanced Dataset and its effect on
Classification • Mean Absolute Error (MAE)
 Bias Variance Tradeoff • Mean Squared Error (MSE)
• Root Mean Squared Error (RMSE)
Important Element of Machine Learning • R²
• Adjusted R²
Multiclass Classification
Classification
• One-vs-All
• Overfitting and Underfitting • Introduction
• Error Measures • K-Nearest Neighbors
• PCA learning • Logistic Regression:
• Statistical learning approaches • Implementation and Optimizations
• Introduce to SKLEARN FRAMEWORK • Stochastic gradient descent
algorithms
Data Processing
• Finding the optimal HyperParameters
• Creating training and test sets, Data scaling through Grid Search
and Normalisation • Support Vector Machines (Linear SVM):
 Linear support vector machines • Accuracy & F1 Score
• Scikit-learn implementation • Precision & Recall
• Linear Classification • Sensitivity & Specificity
• Kernel-based classification • True Positive Rate, False Positive Rate
 Radial Basis Function • ROC & ROC_AUC
 Polynomial Kernel
Clustering
 Sigmoid Kernel
 Custom Kernels Introduction
• Non-linear examples
• 2 features forms straight line & 3 features K-Means Clustering:
forms plane
• Finding the optimal number of clusters
• Hyperplane and Support vectors
• Optimizing the inertia
• Controlled support vector machines
• Cluster instability
• Support vector Regression
• Elbow method
• Kernel SVM (Non-Linear SVM)
• Naives Bayes: Hierarchical Clustering
• Bayes theorem
• Naive Bayes Classifiers Agglomerative clustering
• Naive Bayes in scikit learn ( Bernoulli DBSCAN Clustering
Naive Bayes, Mulitnomial Naive
Bayes, Guassian Naive Bayes)" Association Rules
• Decision Trees:
• Market Basket Analysis
 Binary Decision Trees
• Apriori Algorithm
 Binary decisions
 CART Algorithm Recommendation Engines
 Impurity measures (Gini impurity
index, Cross-entropy impurity index, • Collaborative Filtering:
Misclassification impurity index) • User based collaborative filtering
 Feature importance • Item based collaborative filtering
 Decision tree classification with scikit- • Recommendation Engines
learn Time Series & Forecasting
• Random Forest / Bagging:
• Random Forests and Features • What is Time series data
importance in Random Forest • Different components of time series data
• AdaBoost • Stationary of time series data
• Gradient tree boosting • ACF, PACF
• Voting classifier • Time Series Models:
• Ensemble:Bagging • AR
• Ensemble:Boosting" • ARMA
• Ada Boost • ARIMA
• Gradient Boost • SARIMAX
• XG Boost
• Evaluation Metrics for Classification: Model Selection & Evaluation
• Confusion Matrix Over Fitting & Under Fitting
• Biance-Variance Tradeoff Others
o Cross Validation:
• Dummy Variable, Onehotencoding
o Stratified Cross validation
o K-Fold Cross validation • gridsearchcv vs randomizedsearchcv
• Hyper Parameter Tuning ML Pipeline
• Joblib And Pickling
ML Model Deployment in Flask
PowerBI
Introduction Hierarchies, Filters

• Power BI for Data scientist • Creating Hierarchies


• Types of reports • Drill Down options
• Data source types • Expand and show
• Installation • Visual filter,Page filter,Report filter
• Drill Thru Reports
Basic Report Design
Power Query
• Data sources and Visual types
• Canvas and fields • Power Query transformation
• Table and Tree map • Table and Column Transformations
• Format button and Data Labels • Text and time transformations
• Legend,Category and Grid • Power query functions
• CSV and PDF Exports • Merge and append transformations

Visual Sync, Grouping DAX Functions

• Slicer visual • DAX Architecture,Entity Sets


• Orientation,selection process • DAX Data types,Syntax Rules
• Slicer:Number,Text,slicer list • DAX measures and calculations
• Bin count,Binning • Creating measures
• Creating Columns
Deep Learning
Deep learning at Glance • Vanishing Gradient Descend
• Fine-tuning neural network hyperparameter
• Introduction to Neural Network • Number of hidden layers, Number of neurons
• Biological and Artificial Neuron per hidden layer
• Introduction to perceptron
• Activation function
• Perceptron and its learning rule and • INSTALLATION OF YOLO V8, KERAS, THEANO
drawbacks
• Multilayer Perceptron, loss function PY-TORCH Library
• Neural Network Activation function
RNN (Recurrent Neural Network)
Training MLP: Backpropagation
• Introduction to RNN
Cost Function • Back Propagation through time
• Input and output sequences
Gradient Descent Backpropagation - Vanishing and • RNN vs ANN
Exploding Gradient Problem
• LSTM (Long Short-Term Memory)
Introduce to Py-torch • Different types of RNN: LSTM, GRU
• Biirectional RNN
Regularization • Sequential-to-sequential architecture
(Encoder Decoder)
Optmizers
• BERT Transformers
Hyperparameters and tuning of the same • Text generation and classification using Deep
Learning
TENSORFLOW FRAMEWORK • Generative-AI (Chat-GPT)

• Introduction to TensorFlow Basics of Image Processing


• TensorFlow Basic Syntax
• TensorFlow Graphs • Histogram of images
• Variables and Placeholders • Basic filters applied on the images
• TensorFlow Playground
Convolutional Neural Networks (CNN)
ANN (Artificial Neural Network)
• ImageNet Dataset
• ANN Architecture • Project: Image Classification
• Forward & Backward Propagation, Epoch • Different types of CNN architectures
• Introduction to TensorFlow, Keras • Recurrent Neural Network (RNN)
• Using pre-trained model: Transfer Learning
Natural Language Processing (NLP)
Natural Language Processing (NLP) • TextBlob
• Installing textblob library
• Text Cleaning • Simple TextBlob Sentiment Analysis Example
• Texts, Tokens
• Using NLTK’s Twitter Corpus
• Basic text classification based on Bag of
Words Spacy Library

Document Vectorization • Introduction, What is a Token, Tokenization


• Stop words in spacy library
• Bag of Words
• Stemming
• TF-IDF Vectorizer
• Lemmatization,
• n-gram: Unigram, Bigram • Lemmatization through NLTK
• Word vectorizer basics, One Hot Encoding • Lemmatization using spacy
• Count Vectorizer
• Word Frequency Analysis
• Word cloud and gensim • Counter
• Word2Vec and Glove • Part of Speech, Part of Speech Tagging
• Text classification using Word2Vec and Glove
• Pos by using spacy and nltk
• Parts of Speech Tagging (PoS Tagging or POST) • Dependency Parsing
• Topic Modelling using LDA • Named Entity Recognition(NER)
• Sentiment Analysis • NER with NLTK
Twitter Sentiment Analysis Using Textblob • NER with spacy
Computer Vision
Human vision vs Computer vision OPEN AI

• CNN Architecture • Introduction to Open AI


• CONVOLUTION – MAX POOLING – FLATTEN • Generative AI
LAYER – FULLY CONNECTED LAYER • Chat Gpt (3.5)
• CNN Architecture • LLM (Large Language Model)
• Striding and padding • Classification Tasks with Generative AI
• Max pooling • Content Generation and Summarization with
• Data Augmentation Generative AI
• Introduction to OpenCV & YoloV3 Algorithm • Information Retrieval and Synthesis workflow
with Gen AI
Image Processing with OpenCV
Time Series and Forecasting
• Image basics with OpenCV
• Opening Image Files with OpenCV • Time Series Forecasting using Deep Learning
• Drawing on Images, Image files with OpenCV • Seasonal-Trend decomposition using LOESS
• Face Detection with OpenCV (STL) models.
• Bayesian time series analysis
Video Processing with OpenCV
MakerSuite Google
• Introduction to Video Basics, Object Detection
• Object Detection with OpenCV • PaLM API
• MUM models
Reinforcement Learning
Azure ML
• Introduction to Reinforcement Learning
• Architecture of Reinforcement Learning
• •Reinforcement Learning with Open AI
• Policy Gradient Theory

You might also like