UNIT IV Na-Ve Bayes Classifier Algorithm
UNIT IV Na-Ve Bayes Classifier Algorithm
Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine
the probability of a hypothesis with prior knowledge. It depends on the conditional
probability.
o The formula for Bayes' theorem is given as:
Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of
a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
P(B) is Marginal Probability: Probability of Evidence.
Problem: If the weather is sunny, then the Player should play or not?
Solution: To solve this, first consider the below dataset:
Outlook Play
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 4
Weather No Yes
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
yApplying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)
Hence on a Sunny day, Player can play the game.
Advantages of Naïve Bayes Classifier:
o Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
o It can be used for Binary as well as Multi-class Classifications.
o It performs well in Multi-class predictions as compared to the other Algorithms.
o It is the most popular choice for text classification problems.
Cost function-
o The different values for weights or coefficient of lines (a0, a1) gives the different line of
regression, and the cost function is used to estimate the values of the coefficient for the
best fit line.
o Cost function optimizes the regression coefficients or weights. It measures how a linear
regression model is performing.
o We can use the cost function to find the accuracy of the mapping function, which maps
the input variable to the output variable. This mapping function is also known
as Hypothesis function.
For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is
the average of squared error occurred between the predicted values and actual values.
It can be written as:
For the above linear equation, MSE can be calculated as:
Where,
N=Total number of observation
Yi = Actual value
(a1xi+a0)= Predicted value.
Residuals: The distance between the actual value and predicted values is called
residual. If the observed points are far from the regression line, then the residual will be
high, and so cost function will high. If the scatter points are close to the regression line,
then the residual will be small and hence the cost function.
Gradient Descent:
o Gradient descent is used to minimize the MSE by calculating the gradient of the cost
function.
o A regression model uses gradient descent to update the coefficients of the line by
reducing the cost function.
o It is done by a random selection of values of coefficient and then iteratively update the
values to reach the minimum cost function.
Model Performance:
The Goodness of fit determines how the line of regression fits the set of observations.
The process of finding the best model out of various models is called optimization. It
can be achieved by below method:
1. R-squared method:
o R-squared is a statistical method that determines the goodness of fit.
o It measures the strength of the relationship between the dependent and independent
variables on a scale of 0-100%.
o The high value of R-square determines the less difference between the predicted
values and actual values and hence represents a good model.
o It is also called a coefficient of determination, or coefficient of multiple
determination for multiple regression.
o It can be calculated from the below formula:
In this section, we will create a Simple Linear Regression model to find out the best
fitting line for representing the relationship between these two variables.
To implement the Simple Linear regression model in machine learning using Python, we
need to follow the below steps:
Step-1: Data Pre-processing
The first step for creating the Simple Linear Regression model is data pre-processing.
We have already done it earlier in this tutorial. But there will be some changes, which
are given in the below steps:
o First, we will import the three important libraries, which will help us for loading the
dataset, plotting the graphs, and creating the Simple Linear Regression model.
1. import numpy as nm
2. import matplotlib.pyplot as mtp
3. import pandas as pd
o Next, we will load the dataset into our code:
1. data_set= pd.read_csv('Salary_Data.csv')
By executing the above line of code (ctrl+ENTER), we can read the dataset on our
Spyder IDE screen by clicking on the variable explorer option.
The above output shows the dataset, which has two variables: Salary and Experience.
Note: In Spyder IDE, the folder containing the code file must be saved as a
working directory, and the dataset or csv file should be in the same folder.
o After that, we need to extract the dependent and independent variables from the given
dataset. The independent variable is years of experience, and the dependent variable is
salary. Below is code for it:
1. x= data_set.iloc[:, :-1].values
2. y= data_set.iloc[:, 1].values
In the above lines of code, for x variable, we have taken -1 value since we want to
remove the last column from the dataset. For y variable, we have taken 1 value as a
parameter, since we want to extract the second column and indexing starts from the
zero.
By executing the above line of code, we will get the output for X and Y variable as:
In the above output image, we can see the X (independent) variable and Y (dependent)
variable has been extracted from the given dataset.
o Next, we will split both variables into the test set and training set. We have 30
observations, so we will take 20 observations for the training set and 10 observations
for the test set. We are splitting our dataset so that we can train our model using a
training dataset and then test the model using a test dataset. The code for this is given
below:
Training Dataset:
o For simple linear Regression, we will not use Feature Scaling. Because Python libraries
take care of it for some cases, so we don't need to perform it here. Now, our dataset is
well prepared to work on it and we are going to start building a Simple Linear
Regression model for the given problem.
In the above plot, we can see the real values observations in green dots and predicted
values are covered by the red regression line. The regression line shows a correlation
between the dependent and independent variable.
The good fit of the line can be observed by calculating the difference between actual
values and predicted values. But as we can see in the above plot, most of the
observations are close to the regression line, hence our model is good for the training
set.
Step: 5. visualizing the Test set results:
In the previous step, we have visualized the performance of our model on the training
set. Now, we will do the same for the Test set. The complete code will remain the same
as the above code, except in this, we will use x_test, and y_test instead of x_train and
y_train.
Here we are also changing the color of observations and regression line to differentiate
between the two plots, but it is optional.
1. #visualizing the Test set results
2. mtp.scatter(x_test, y_test, color="blue")
3. mtp.plot(x_train, x_pred, color="red")
4. mtp.title("Salary vs Experience (Test Dataset)")
5. mtp.xlabel("Years of Experience")
6. mtp.ylabel("Salary(In Rupees)")
7. mtp.show()
Output:
By executing the above line of code, we will get the output as:
In the above plot, there are observations given by the blue color, and prediction is
given by the red regression line. As we can see, most of the observations are close to
the regression line, hence we can say our Simple Linear Regression is a good model
and able to make good predictions.
MLR equation:
In Multiple Linear Regression, the target variable(Y) is a linear combination of multiple
predictor variables x1, x2, x3, ..., xn. Since it is an enhancement of Simple Linear
Regression, so the same is applied for the multiple linear regression equation, the
equation becomes:
1. Y= b<sub>0</sub>+b<sub>1</sub>x<sub>1</sub>+ b<sub>2</sub>x<sub>2</sub>+
b<sub>3</sub>x<sub>3</sub>+...... bnxn ............... (a)
Where,
Y= Output/Response variable
b0, b1, b2, b3, bn....= Coefficients of the model.
x1, x2, x3, x4, ...= Various Independent/feature variable
1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
o Importing dataset: Now we will import the dataset(50_CompList), which contains all
the variables. Below is the code for it:
1. #importing datasets
2. data_set= pd.read_csv('50_CompList.csv')
Output: We will get the dataset as:
In above output, we can clearly see that there are five variables, in which four variables
are continuous and one is categorical variable.
o Extracting dependent and independent Variables:
Note: In MLR, we will not do feature scaling as it is taken care by the library, so we
don't need to do it manually.
Step: 2- Fitting our MLR model to the Training set:
Now, we have well prepared our dataset in order to provide training, which means we
will fit our regression model to the training set. It will be similar to as we did in Simple
Linear Regression model. The code for this will be:
1. #Fitting the MLR model to the training set:
2. from sklearn.linear_model import LinearRegression
3. regressor= LinearRegression()
4. regressor.fit(x_train, y_train)
Output:
Out[9]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
normalize=False)
Now, we have successfully trained our model using the training dataset. In the next
step, we will test the performance of the model using the test dataset.
In the above output, we have predicted result set and test set. We can check model
performance by comparing these two value index by index. For example, the first index
has a predicted value of 103015$ profit and test/real value of 103282$ profit. The
difference is only of 267$, which is a good prediction, so, finally, our model is
completed here.
o We can also check the score for training dataset and test dataset. Below is the code for
it:
1. print('Train Score: ', regressor.score(x_train, y_train))
2. print('Test Score: ', regressor.score(x_test, y_test))
Output: The score is:
Train Score: 0.9501847627493607
Test Score: 0.9347068473282446
The above score tells that our model is 95% accurate with the training dataset and 93%
accurate with the test dataset.
Note: In the next topic, we will see how we can improve the performance of the
model using the Backward Elimination process.
Applications of Multiple Linear Regression:
There are mainly two applications of Multiple Linear Regression:
o Effectiveness of Independent variable on prediction:
o Predicting the impact of changes
From the figure above it’s very clear that there are multiple lines (our
hyperplane here is a line because we are considering only two input
features x1, x2) that segregate our data points or do a classification between
red and blue circles. So how do we choose the best line or in general the
best hyperplane that segregates our data points?
One reasonable choice as the best hyperplane is the one that represents
the largest separation or margin between the two classes.
Multiple hyperplanes separate the data from two classes
Here we have one blue ball in the boundary of the red ball. So how does
SVM classify the data? It’s simple! The blue ball in the boundary of red
ones is an outlier of blue balls. The SVM algorithm has the characteristics
to ignore the outlier and finds the best hyperplane that maximizes the
margin. SVM is robust to outliers.
So in this type of data point what SVM does is, finds the maximum margin
as done with previous data sets along with that it adds a penalty each time
a point crosses the margin. So the margins in these types of cases are
called soft margins. When there is a soft margin to the data set, the SVM
tries to minimize (1/margin+∧(∑penalty)). Hinge loss is a commonly used
penalty. If no violations no hinge loss.If violations hinge loss proportional to
the distance of violation.
Till now, we were talking about linearly separable data(the group of blue
balls and red balls are separable by a straight line/linear line). What to do if
data are not linearly separable?
In this case, the new variable y is created as a function of distance from the
origin. A non-linear function that creates a new variable is referred to as a
kernel.
The vector W represents the normal vector to the hyperplane. i.e the
direction perpendicular to the hyperplane. The parameter b in the equation
represents the offset or distance of the hyperplane from the origin along the
normal vector w.
The distance between a data point x_i and the decision boundary can be
calculated as:
where ||w|| represents the Euclidean norm of the weight vector w.
Euclidean norm of the normal vector W
For Linear SVM classifier :
Optimization:
● For Hard margin linear SVM classifier:
The target variable or label for the ith training instance is denoted by the
symbol ti in this statement. And ti=-1 for negative occurrences (when yi= 0)
and ti=1positive instances (when yi = 1) respectively. Because we require
where,
● αi is the Lagrange multiplier associated with the ith training
sample.
● K(xi, xj) is the kernel function that computes the similarity between
two samples xi and xj. It allows SVM to handle nonlinear
classification problems by implicitly mapping the samples into a
higher-dimensional feature space.
● The term ∑αi represents the sum of all Lagrange multipliers.
The SVM decision boundary can be described in terms of these optimal
Lagrange multipliers and the support vectors once the dual issue has been
solved and the optimal Lagrange multipliers have been discovered. The
training samples that have i > 0 are the support vectors, while the decision
boundary is supplied by:
The SVM kernel is a function that takes low-dimensional input space and
transforms it into higher-dimensional space, ie it converts nonseparable
problems to separable problems. It is mostly useful in non-linear separation
problems. Simply put the kernel, does some extremely complex data
transformations and then finds out the process to separate the data based
on the labels or outputs defined.
Advantages of SVM
● Effective in high-dimensional cases.
● Its memory is efficient as it uses a subset of training points in the
decision function called support vectors.
● Different kernel functions can be specified for the decision
functions and its possible to specify custom kernels.
SVM implementation in Python
Predict if cancer is Benign or malignant. Using historical data about
patients diagnosed with cancer enables doctors to differentiate malignant
cases and benign ones are given independent attributes.
Steps
● Load the breast cancer dataset from sklearn.datasets
● Separate input features and target variables.
● Buil and train the SVM classifiers using RBF kernel.
● Plot the scatter plot of the input features.
● Plot the decision boundary.
● Plot the decision boundary
# Load the important packages
from sklearn.datasets import load_breast_cancer
import matplotlib.pyplot as plt
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.svm import SVC
# Scatter plot
plt.scatter(X[:, 0], X[:, 1],
c=y,
s=20, edgecolors="k")
plt.show()
Output: