0% found this document useful (0 votes)
59 views2 pages

Impact of Outliers On Machine Learning Models

This document discusses the impact of outliers on various machine learning models. It finds that KNN performance is affected by outliers when K is low, but not when K is high. Naive Bayes is impacted by outliers depending on the implementation. Linear regression is worst affected by outliers, which shift the regression line. Logistic regression and decision trees are more robust to outliers, though outliers can still impact them in some cases. SVM performance decreases when outliers shrink the margin. The conclusion recommends detecting and removing outliers before training models.

Uploaded by

Atif Saeed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views2 pages

Impact of Outliers On Machine Learning Models

This document discusses the impact of outliers on various machine learning models. It finds that KNN performance is affected by outliers when K is low, but not when K is high. Naive Bayes is impacted by outliers depending on the implementation. Linear regression is worst affected by outliers, which shift the regression line. Logistic regression and decision trees are more robust to outliers, though outliers can still impact them in some cases. SVM performance decreases when outliers shrink the margin. The conclusion recommends detecting and removing outliers before training models.

Uploaded by

Atif Saeed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Before you start reading this blog, you must have to read my previous blog which is about what

is outlier, how we can detect and remove the outlier from out dataset. Because that blog is the
prerequisite of this article. Here you can find that article:

In this blog we are going to discuss, what are the impact of outlier on different machine
learning models. As we know machine learning are sensitive to the range of dataset and data
distribution, so the presence of outliers can spoil the whole training process i.e., model takes
much time to train or model results low accuracy or poor results on the testing data.

Impact of outlier on KNN Algorithm:


When we are working with KNN algorithm to train out model, KNN is sensitive to outliers,
because a single mislabeled or wrong example of data value dramatically change the class
boundaries or misclassify the result, but this is based on the K values. If K value is low, the
model is susceptible to outlier or you can say model impacted by outlier values, on the other
hand if K value is high then the model is robust to outliers.

Impact of Outlier on Naïve Bayes:


If there is a single line question, is the outlier affect Naïve Bayes model, the answer is yes,
outlier affect the naïve bayes. But there are multiple types of naïve bayes implementation and
we have to study detail before the answer.
If we are dealing with Bernoulli Naïve Bayes, which applied on the word feature, this model
always produce zero probabilities, if there is a word that is not present in our training dataset,
so such word or outlier can be a problem.
If you are working with Gaussian Naïve Bayes and outliers are present in your dataset, this will
impact on the shape of gaussian distribution and also affect the mean value.
But in the case of Naïve bayes there is a well known solution available i.e. Laplace smoothing, in
this method we will add an artificial count for every word.
Impact of Outlier on Linear Regression
In the case of Linear Regression, outlier has worst impact on the model, because outlier shift
the regression line and the model equal completely changed that will results bad prediction or
estimation.
Impact of Outlier on Logistic Regression
Outliers are not impacted the logistic regression, because sigmoid function handle the outlier or
neglect the impact of outlier, but if there is any extreme values of outlier, that may be
somehow impact the model and results.
Impact of Outliers on SVM
As we know SVM (Support Vector Machine) is one of the popular machine learning
classification model, but SVM has a major drawback is sensitive to outliers within in the training
samples. When you are working with SVM, the outlier will shrink the margin and decision
boundry will be sub-optimal and the end result will be poor classification.
Impact of outliers on Decision Tree or Random Forest
When we are working on tree based models or algorithm, outliers are not much impacted these
models, because we split our data based on median into two halfs and further divided, so the
median is roubust to outliers, so tree based model are also robust the outliers. But there is an
issue, if there is an outlier that will result in increase the depth of the tree, that will make the
model overfitted.

Concluding Remarks:
As we have discussed in our last article about the outliers, detection and removal and now we
discussed above the impact of the outliers on different models. So before start working on any
type of machine learning model, we must have to detect and remove the outliers in our dataset
so our model will perform better on the test dataset with better results.

You might also like