Data Science
Data Science
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Combined Class Methods — Use SMOTE together with edited nearest-
neighbours (ENN). Here, ENN is used as the cleaning method after
SMOTE over-sampling to obtain a cleaner space.
Developed by Wilson (1972), the ENN method works by finding the K-
nearest neighbor of each observation first, then check whether the
majority class from the observation’s k-nearest neighbor is the same
as the observation’s class or not.
If the majority class of the observation’s K-nearest neighbor and the
observation’s class is different, then the observation and its K-
nearest neighbor are deleted from the dataset. In default, the
number of nearest-neighbor used in ENN is K=3.
As ENN removes the observation and its K-nearest neighbor instead
of just removing observation and its 1-nearest neighbor that are
having different classes. Thus, ENN can be expected to give more in-
depth data cleaning.
Test model performane for each of above technique and choose best
performing model.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
No or little multicollinearity (observations should be independent of each
other)
Assumption of additivity means effect changes in one of the features on the
response variable does not depend on the values of the other features.
Homoscedasticity- There should not be unequal variance in data
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Overfitting models have low error in training set but high error in test set.
This behavior is called as ‘High Variance’
Consider below example of variance(overfitting) where complicated function
creates lots of unnecessary curves and angles that are not related with data.
Using handcrafted rules and feature Deep Learning algorithms need large
engineering, ML algorithms can work well data to understand it perfectly. Deep
with small data. But its performance learning performances increases as data
plateau once data increases. increases.
ML models take less time to train DL models take more time to train
Ml models are easy to interpret as DL models are black box and its very
comare to DL models. difficult to interpret the results.
Standard Deviations
1 1
68%
−3 −2 −1 +1 +2 +3
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Explain confusion matrix
The confusion matrix is one of the most powerful tools for predictive analysis
in machine learning.
A confusion matrix gives you information about how your machine classifier
has performed, pitting properly classified examples against misclassified
examples.
Confusion matrices are used to visualize important predictive analytics like
recall, specificity, accuracy, and precision.
Confusion matrices are useful because they give direct comparisons of
values like True Positives, False Positives, True Negatives and False
Negatives. In contrast, other machine learning classification metrics like
“Accuracy” give less useful information, as Accuracy is simply the difference
between correct predictions divided by the total number of predictions.
All estimation parameters of the confusion matrix are based on 4 basic
inputs namely True Positive, False Positive, True Negative and False Negative.
Confusion matrices have two types of errors: Type I (False Positive) and Type
II (False Negative). False Positive contains one negative word (False) so it’s a
Type I error. False Negative has two negative words (False + Negative) so it’s
a Type II error.
From our confusion matrix, we can calculate five different metrics measuring
the validity of our model.
ACCURACY
Precision answers the question: How many patients tested +ve are
actually +ve?
SENSITIVITY (RECALL)
Sensitivity answers the question: Of all the patients that are +ve, how
many did the test correctly predict?
SPECIFICITY
Specificity answers the question: Of all the patients that are -ve, how
many did the test correctly predict?
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
What is null hypothesis and alternate
hypothesis?
The null hypothesis states that a population parameter (such as the mean,
the standard deviation, and so on) is equal to a hypothesized value. The null
hypothesis is often an initial claim that is based on previous analyses or
specialized knowledge.
The alternative hypothesis states that a population parameter is smaller,
greater, or different than the hypothesized value in the null hypothesis. The
alternative hypothesis is what you might believe to be true or hope to prove
true.
So when running a hypothesis test/experiment, the null hypothesis says that
there is no difference or no change between the two tests. The alternate
hypothesis is the opposite of the null hypothesis and states that there is a
difference between the two tests.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
In most fields, acceptable p-values should be under 0.05 while in other
fields a p-value of under 0.01 is required.
So when a result has a p-value of 0.05 or lower we can reject null
hypothesis and accept the alternate hypothesis.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
In bagging we build independent estimators on different samples of the
original data set and average or vote across all the predictions.
Bagging is a short form of Bootstrap Aggregating. It is an ensemble
learning approach used to improve the stability and accuracy of machine
learning algorithms.
Since multiple model predictions are averaged together to form the final
predictions, Bagging reduces variance and helps to avoid overfitting.
Although it is usually applied to decision tree methods, it can be used with
any type of method.
Bagging is a special case of the model averaging approach, in case of
regression problem we take mean of the output and in case of classification
we take the majority vote.
Bagging is more helpfull if we have over fitting (high variance) base models.
We can also build independent estimators of same type on each subset.
These independent estimators also enable us to parallelly process and
increase the speed.
Most popular bagging estimator is 'Bagging Tress' also knows as 'Random
Forest'
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Bootstrapping
It is a resampling technique, where large numbers of smaller samples of the
same size are repeatedly drawn, with replacement, from a single original
sample.
So this technique will enable us to produce as many subsample as we
required from the original training data.
The defination is simple to understand, but "replacement" word may be
confusing sometimes. Here 'replacement' word signifies that the same
obervation may repeat more than once in a given sample, and hence this
technique is also known as sampleing with replacement
As you can see in above image we have training data with observations from
X1 to X10. In first bootstrap training sample X6, X10 and X2 are repeated
where as in second training sample X3, X4, X7 and X9 are repeated.
Bootstrap sampling helps us to generate random sample from given training
data for each model in order to genralise the final estimation.
So in case of Bagging we create multiple number of bootstrap samples from
given data to train our base models. Each sample will contain training and
test data sets which are different from each other and remember that
training sample may contain duplicate observations.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
hyperparameter 'oob_score = True' then Out-of-Bag score will be calculated
for every decision tree.
Finally, we aggregate all the errors from all the decision trees and we will
determine the overall OOB error rate for the classification.
For more details refer. https://towardsdatascience.com/what-is-out-of-bag-
oob-score-in-random-forest-a7fa23d710
Delete low quality records completely which have too much missing data
Impute the values by educated guess, taking average or regression
Use domain knwledge to impute values
Ref. https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51
Outliers can drastically change the results of the data analysis and statistical
modeling.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Use capping methods. Any value which out of range of 5th and 95th
percentile can be considered as outlier
Data points, three or more standard deviation away from mean are
considered outlier
Apart from visualization we can also use Z-Score or Extreme Value Analysis
(parametric) to detect outliers.
Deleting observations
We delete outlier values if it is due to data entry error, data processing error or
outlier observations are very small in numbers. We can also use trimming at both
ends to remove outliers.
Imputing
We can use mean, median, mode imputation methods.
Treat separately
If there are significant number of outliers, we should treat them separately in the
statistical model. One of the approach is to treat both groups as two different
groups and build individual model for both groups and then combine the output.
Reference: https://www.linkedin.com/pulse/techniques-outlier-detection-
treatment-suhas-jk/
Evaluation metric used for linear regression are MSE, MAE, R-squared,
Adjusted R-squared, and RMSE.
MSE penalizes large errors, MAE does not penalize large errors, RMSE
penalizes large errors and R-squared or Coefficient of Determination
represent the strength of the relationship between your model and the
dependent variable.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Reference: https://www.youtube.com/watch?
v=lRAgottY8XU&list=PLjW9PIyfCennBOprV3CPoqMX8SW-qNlUa
f'(x)= -8x + 4
-8x + 4 = 0
x = 0.5
This functiona will have concave shape. So the maximum point is (0.5, 14)
Reference: https://www.youtube.com/watch?
v=lRAgottY8XU&list=PLjW9PIyfCennBOprV3CPoqMX8SW-qNlUa
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
We can see from above correlation matrix that there is high correlation(.98)
between X1 and X2, also high correlation(.88) between X1 and X3, similarly
there is high correlation(.75) between X2 and X3
All the variables are correlated to each other. In regression this would result
in multicollinearity. We can try methods such as dimension reduction, feature
selection, stepwise regression to choose the correct input variables for
predictiong Y
Second part of question is - should we use all the variables for modeling?
Using multicolnear feature in modeling doesnt help. We should remove
all the multicolnear feature and keep unique feature so that explaining
the model predictions also becomes easy.
It will also make model less complex and we dont have to store many
features.
Prediction vs Inference
Inference and prediction are two often confused terms, perhaps in part because
they are not mutually exclusive.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Reference: https://www.datasciencecentral.com/profiles/blogs/inference-vs-
prediction-in-one-picture
Backward selection starts with a full model, then step by step we reduce
the regressor variables and find the model with the least RSS, largest R², or
the least MSE. The variables to drop would be the ones with high p-values.
Forward selection starts with a null model, then step by step we increase
the regressor variables until we can no longer improve the error performance
of the model. We usually pick the model with the highest adjusted R².
Reference: https://youtu.be/5JZsSNLXXuE
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Select few random sub sample from given dataset
Construct a decision tree for every sub sample and predict the result.
Perform the voting on prediction from each tree.
At the end select the most voted result as final prediction.
Reference: https://satishgunjal.com/random_forest/
Random forest will create three sub sample of 9 training examples each
Random forest algorithm will create three different decision tree for each sub
sample
Notice that each tree uses different criteria to split the data
Now it is straight forward analysis for the algorithm to predict the shape of
given figure if its shape and color is known. Let’s check the predictions of
each tree for blue color triangle, (here shape input is missing)
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Tree 1 will predict: triangle
Tree 2 will predict: square
Tree 2 will predict: triangle
Since the majority of voting is for triangle final prediction is ‘triangle shape’
Now, lets check predictions for circle with no color defined (color attribute is
missing here)
Please note this is over simplified example, but you get an idea how multiple
tree with different split criteria helps to handle missing features
Reference: https://satishgunjal.com/random_forest/
Since overfitting algorithm captures the noise in data, reducing the number
of features will help. We can manually select only important features or can
use model selection algorithm
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js for same
We can also use the ‘Regularization’ technique. It works well when we have
lots of slightly useful features. Sklearn linear model(Ridge and LASSO) uses
regularization parameter ‘alpha’ to control the size of the coefficients by
imposing a penalty.
K-fold cross validation. In this technique we divide the training data in
multiple batches and use each batch for training and testing the model.
Increasing the training data also helps to avoid overfitting.
Reference: https://satishgunjal.com/underfitting_overfitting/
Step1: Out of 9 balls, place three balls on each side (you will have three
remaining balls)
Reference: https://www.youtube.com/watch?
v=5JZsSNLXXuE&list=PLwWVLyefnzgpWxe2WEPrmHqHzwHlyZw1U&index=2&t=
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Bivariate Analysis
Multivariate Analysis
Filter Methods
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Filter feature selection methods use statistical techniques to evaluate the
relationship between each input variable and the target variable, and these
scores are used as the basis to choose (filter) those input variables that will
be used in the model.
These methods are faster and less computationally expensive than wrapper
methods.
Information Gain
Information gain calculates the reduction in entropy from the transformation of a
dataset. It can be used for feature selection by evaluating the Information gain of
each variable in the context of the target variable.
Chi-square Test
The Chi-square test is used for categorical features in a dataset. We calculate
Chi-square between each feature and the target and select the desired number
of features with the best Chi-square scores.
Correlation Coefficient
Correlation is a measure of the linear relationship of 2 or more variables.
Through correlation, we can predict one variable from the other. The logic behind
using correlation for feature selection is that the good variables are highly
correlated with the target. Furthermore, variables should be correlated with the
target but should be uncorrelated among themselves.
Wrapper Methods
Wrapper feature selection methods create many models with different
subsets of input features and select those features that result in the best
performing model according to a performance metric.
These methods are unconcerned with the variable types, although they can
be computationally expensive.
The wrapper methods usually result in better predictive accuracy than filter
methods.
Reference: https://www.analyticsvidhya.com/blog/2020/10/feature-selection-
techniques-in-machine-learning/
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
16
17
Fizz
19
Buzz
Fizz
22
23
Fizz
Buzz
26
Fizz
28
29
FizzBuzz
31
32
Fizz
34
Buzz
Fizz
37
38
Fizz
Buzz
41
Fizz
43
44
FizzBuzz
46
47
Fizz
49
Buzz
Reference:
https://predictivehacks.com/tip-how-to-define-your-distance-function-for-
hierarchical-clustering/
4.242640687119285
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
What is the angle between the hour
and minute hands of clock when the
time is half past six?
Reference: https://youtu.be/5JZsSNLXXuE
Evaluate
Evaluation metric of the current model is calculated to determine if new
algorithm is needed.
Compare
The new models are compared against each other to determine which model
performs the best.
Rebuild
The best performing model is re-built on the current set of data.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Reference: https://youtu.be/5JZsSNLXXuE
Collaborative Filtering
It is based on the past interactions recorded between users and items in
order to produce new recommendations.
e.g. Music service recommends track that are often played by other users
with similar interests
Reference: https://youtu.be/5JZsSNLXXuE
Reference: https://youtu.be/5JZsSNLXXuE
Visualization
To find the number of clusters manually by data visualization is one of the
most common method.
Domain knowledge and proper understanding of given data also help to
make more informed decisions.
Since its manual exercise there is always a scope for ambiguous
observations, in such cases we can also use ‘Elbow Method’
Elbow Method
In Elbow method we run the K-Means algorithm multiple times over a loop,
with an increasing number of cluster choice(say from 1 to 10) and then
plotting a clustering score as a function of the number of clusters.
Clustering score is nothing but sum of squared distances of samples to their
closest cluster center.
Elbow is the point on the plot where clustering score (distortion) slows down,
and the value of cluster at that point gives us the optimum number of
clusters to have.
But sometimes we don’t get clear elbow point on the plot, in such cases its
very hard to finalize the number of clusters.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Reference: https://satishgunjal.com/kmeans/#5
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Reference: https://satishgunjal.com/time_series/
Using KNN we can compute the missing variable value by using the nearest
neighbors.
Reference: https://youtu.be/5JZsSNLXXuE
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
- Assiciation rules
- Student Test
Reference: https://youtu.be/5JZsSNLXXuE
Reference: https://builtin.com/data-science/step-step-explanation-principal-
component-analysis, https://towardsdatascience.com/a-one-stop-shop-for-
principal-component-analysis-5582fb7e0a9c
Scaling
This means that you're transforming your data so that it fits within a specific
scale, like 0-100 or 0-1. By scaling your variables, you can help compare
different variables on equal footing.
Scaling is required in case of distance based algorithms like support vector
machines (SVM) or k-nearest neighbors (KNN).
For example, you might be looking at the prices of some products in both Yen
and US Dollars. One US Dollar is worth about 100 Yen, but if you don't scale
your prices, methods like SVM or KNN will consider a difference in price of 1
Yen as important as a difference of 1 US Dollar! This clearly doesn't fit with
our intuitions of the world. With currency, you can convert between
currencies. But what about if you're looking at something like height and
weight? It's not entirely clear how many pounds should equal one inch (or
how many kilograms should equal one meter).
Notice that the shape of the data doesn't change, but that instead of ranging
from 0 to 8ish, it now ranges from 0 to 1. Here we have used min-max
scaling
Normalization
Normalization is a more radical transformation, it changes data distribution
to 'normal distribution'
Notice that the shape of our data has changed. Before normalizing it was
almost L-shaped. But after normalizing it looks more like the outline of a bell
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
(hence "bell curve"). Here we have used Box-Cox Transformation.
Standardization
Standardization typically means rescales data to have a mean of 0 and a
standard deviation of 1 (unit variance). For most applications standardization
is recommended.
Reference: https://www.kaggle.com/code/alexisbcook/scaling-and-
normalization/tutorial
Above two cases are not very likely to occur because they can easily
be spotted while doing the modelling. Below are few data leakage
examples that are hard to troubleshoot.
In general, if we see that the model which we build is too good to be true
(i.,e gives predicted and actual output the same), then we should get
suspicious and data leakage cannot be ruled out.
At that time, the model might be somehow memorizing the relations
between feature and target instead of learning and generalizing it for the
unseen data.
So, it is advised that before the testing, the prior documented results are
weighed against the expected results.
Using EDA
While doing the Exploratory Data Analysis (EDA), we may detect features
that are very highly correlated with the target variable. Of course, some
features are more correlated than others but a surprisingly high
correlation needs to be checked and handled carefully.
We should pay close attention to those features. So, with the help of
EDA, we can examine the raw data through statistical and visualization
tools.
High weight features
After the completion of the model training, if features are having very
high weights, then we should pay close attention. Those features might
be leaky.
Reference: https://www.analyticsvidhya.com/blog/2021/07/data-leakage-and-
its-effect-on-the-performance-of-an-ml-
model/#:~:text=How%20does%20it%20exactly%20happen,“leakage”%20instea
Select the features such a way that they do not contain information about
the target variable, which
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js is not naturally available at the time of prediction.
Create a Separate Validation Set
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
It can be all jumbled up
But there are many cases where the data tends to be around a central value
with no bias left or right, and it gets close to a "Normal Distribution" like this:
Normal
Distribution
"Bell Curve"
Reference: https://www.mathsisfun.com/data/standard-normal-distribution.html
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Expalin covariance and correlation
Covariance and Correlation are two mathematical concepts which are
commonly used in the field of probability and statistics. Both concepts
describe the relationship between two variables.
In case of High correlation, two sets of data are strongly linked together
Reference: https://www.mathsisfun.com
What is TF-IDF?
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
TF-IDF is a statistical measure that evaluates how relevant a word is to a
document in a collection of documents. This is done by multiplying two
metrics: how many times a word appears in a document, and the inverse
document frequency of the word across a set of documents
It is used in information retrieval and text mining
TF-IDF (term frequency-inverse document frequency) was invented for
document search and information retrieval. It works by increasing
proportionally to the number of times a word appears in a document, but is
offset by the number of documents that contain the word. So, words that are
common in every document, such as this, what, and if, rank low even though
they may appear many times, since they don’t mean much to that document
in particular.
However, if the word Bug appears many times in a document, while not
appearing many times in others, it probably means that it’s very relevant.
For example, if what we’re doing is trying to find out which topics some NPS
responses belong to, the word Bug would probably end up being tied to the
topic Reliability, since most responses containing that word would be about
that topic.
Reference: https://monkeylearn.com/blog/what-is-tf-idf/
We can use pandas library which has easy to use data structures and high
performance data analysis tools
R is more suitable for ML than text analytics
Python is faster for all types of text analytics.
The presence of feature value X in the formula will affect the step size of the
gradient descent.
The difference in ranges of features will cause different step sizes for each
feature.
To ensure that the gradient descent moves smoothly towards the minima
and that the steps for gradient descent are updated at the same rate for all
the features, we scale the data before feeding it to the model.
Having features on a similar scale can help the gradient descent converge
more quickly towards the minima.
Reference: https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-
machine-learning-normalization-standardization/
For example, let’s say we have data containing high school CGPA scores of
students (ranging from 0 to 5) and their future incomes (in thousands
Rupees):
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Since both the features have different scales, there is a chance that higher
weightage is given to features with higher magnitude. This will impact the
performance of the machine learning algorithm and obviously, we do not
want our algorithm to be biassed towards one feature.
Scaling has brought both the features into the picture and the distances are
now more comparable than they were before we applied scaling.
Reference: https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-
machine-learning-normalization-standardization/
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Why feature scaling not required in
tree based algorithms
Tree-based algorithms, on the other hand, are fairly insensitive to the scale
of the features. Think about it, a decision tree is only splitting a node based
on a single feature. The decision tree splits a node on a feature that
increases the homogeneity of the node. This split on a feature is not
influenced by other features.
So, there is virtually no effect of the remaining features on the split. This is
what makes them invariant to the scale of the features!
Reference: https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-
machine-learning-normalization-standardization/
Reference: https://360digitmg.com/mlops-interview-questions-answers
TODO
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
A PM tells you that a weekly active
user metric is up by 5% but email
notification open rate is down by 2%.
WHat would you investigate to dignose
this problem?
Email open rate is calculated by dividing the number of emails opened by the
number of emails sent minus any bounces. A good open rate is between 17-
28%2. Email notification open rate is a type of email open rate that measures
how many users open an email that notifies them about something.
Weekly active user metric (WAU) is a measure of how many users are active on a
website or app in a given week. It can be influenced by many factors, such as
user acquisition, retention, engagement and churn.
To diagnose the problem of WAU being up but email notification open rate being
down, you might want to investigate:
How are you defining active users? Are they performing meaningful actions on
your website or app that indicate engagement and loyalty? How are you
segmenting your users based on their behavior, preferences and needs? Are you
sending relevant and personalized email notifications to each segment? How are
you optimizing your email subject lines, preheaders, sender names and content
to capture attention and interest? Are you using clear and compelling calls to
action? How are you testing and measuring your email performance? Are you
using tools like A/B testing, analytics and feedback surveys to improve your
email strategy?
References
https://www.youtube.com/watch?v=k6QWYwOvJs0&t=1149s
https://towardsdatascience.com/taking-the-confusion-out-of-confusion-
matrices-c1ce054b3d3e
https://kambria.io/blog/confused-about-the-confusion-matrix-learn-all-about-
it/#:~:text=Confusion%20matrices%20are%20used%20to,True%20Negatives%2
https://projects.uplevel.work/insights/confusion-matrix-accuracy-sensitivity-
specificity-precision-f1-score-how-to-interpret
https://www.analyticsvidhya.com/blog/2020/04/confusion-matrix-machine-
learning/
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
https://towardsdatascience.com/imbalanced-classification-in-python-smote-
enn-method-
db5db06b8d50#:~:text=The%20Concept%3A%20Edited%20Nearest%20Neighb
https://www.youtube.com/watch?v=Aarb0_Cw_48&ab_channel=JayFeng
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js