0% found this document useful (0 votes)
78 views33 pages

UNIT IV Na-Ve Bayes Classifier Algorithm

The document discusses the Naive Bayes classifier algorithm. It is a supervised learning algorithm based on Bayes' theorem used for classification problems. The document then explains what makes it 'naive', how it works, advantages, disadvantages and applications.

Uploaded by

akashbaloni10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views33 pages

UNIT IV Na-Ve Bayes Classifier Algorithm

The document discusses the Naive Bayes classifier algorithm. It is a supervised learning algorithm based on Bayes' theorem used for classification problems. The document then explains what makes it 'naive', how it works, advantages, disadvantages and applications.

Uploaded by

akashbaloni10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Naïve Bayes Classifier Algorithm

o Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes


theorem and used for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional training dataset.
o Naïve Bayes Classifier is one of the simple and most effective Classification algorithms
which helps in building the fast machine learning models that can make quick
predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the probability of
an object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.

Why is it called Naïve Bayes?


The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be
described as:
o Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the
bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an
apple. Hence each feature individually contributes to identify that it is an apple without
depending on each other.
o Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.

Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine
the probability of a hypothesis with prior knowledge. It depends on the conditional
probability.
o The formula for Bayes' theorem is given as:

Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of
a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
P(B) is Marginal Probability: Probability of Evidence.

Working of Naïve Bayes' Classifier:


Working of Naïve Bayes' Classifier can be understood with the help of the below
example:
Suppose we have a dataset of weather conditions and corresponding target variable
"Play". So using this dataset we need to decide that whether we should play or not on a
particular day according to the weather conditions. So to solve this problem, we need
to follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

Problem: If the weather is sunny, then the Player should play or not?
Solution: To solve this, first consider the below dataset:

Outlook Play

0 Rainy Yes

1 Sunny Yes

2 Overcast Yes

3 Overcast Yes

4 Sunny No

5 Rainy Yes

6 Sunny Yes

7 Overcast Yes

8 Rainy No

9 Sunny No

10 Sunny Yes

11 Rainy No

12 Overcast Yes
13 Overcast Yes

Frequency table for the Weather Conditions:

Weather Yes No

Overcast 5 0

Rainy 2 2

Sunny 3 2

Total 10 4

Likelihood table weather condition:

Weather No Yes

Overcast 0 5 5/14= 0.35

Rainy 2 2 4/14=0.29

Sunny 2 3 5/14=0.35

All 4/14=0.29 10/14=0.71

yApplying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)
Hence on a Sunny day, Player can play the game.
Advantages of Naïve Bayes Classifier:
o Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
o It can be used for Binary as well as Multi-class Classifications.
o It performs well in Multi-class predictions as compared to the other Algorithms.
o It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:


o Naive Bayes assumes that all features are independent or unrelated, so it cannot learn
the relationship between features.

Applications of Naïve Bayes Classifier:


o It is used for Credit Scoring.
o It is used in medical data classification.
o It can be used in real-time predictions because Naïve Bayes Classifier is an eager
learner.
o It is used in Text classification such as Spam filtering and Sentiment analysis.

Types of Naïve Bayes Model:


There are three types of Naive Bayes Model, which are given below:
o Gaussian: The Gaussian model assumes that features follow a normal distribution. This
means if predictors take continuous values instead of discrete, then the model assumes
that these values are sampled from the Gaussian distribution.
o Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
multinomial distributed. It is primarily used for document classification problems, it
means a particular document belongs to which category such as Sports, Politics,
education, etc.
The classifier uses the frequency of words for the predictors.
o Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the
predictor variables are the independent Booleans variables. Such as if a particular word
is present or not in a document. This model is also famous for document classification
tasks.

Linear Regression in Machine Learning


Linear regression is one of the easiest and most popular Machine Learning algorithms. It
is a statistical method that is used for predictive analysis. Linear regression makes
predictions for continuous/real or numeric variables such as sales, salary, age, product
price, etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and
one or more independent (y) variables, hence called as linear regression. Since linear
regression shows the linear relationship, which means it finds how the value of the
dependent variable is changing according to the value of the independent variable.
The linear regression model provides a sloped straight line representing the relationship
between the variables. Consider the below image:

Mathematically, we can represent a linear regression as:


y= a0+a1x+ ε
Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error
The values for x and y variables are training datasets for Linear Regression model
representation.

Types of Linear Regression


Linear regression can be further divided into two types of the algorithm:
o Simple Linear Regression:
If a single independent variable is used to predict the value of a numerical dependent
variable, then such a Linear Regression algorithm is called Simple Linear Regression.
o Multiple Linear regression:
If more than one independent variable is used to predict the value of a numerical
dependent variable, then such a Linear Regression algorithm is called Multiple Linear
Regression.

Linear Regression Line


A linear line showing the relationship between the dependent and independent
variables is called a regression line. A regression line can show two types of
relationship:
o Positive Linear Relationship:
If the dependent variable increases on the Y-axis and independent variable increases
on X-axis, then such a relationship is termed as a Positive linear relationship.

o Negative Linear Relationship:


If the dependent variable decreases on the Y-axis and independent variable increases
on the X-axis, then such a relationship is called a negative linear relationship.

Finding the best fit line:


When working with linear regression, our main goal is to find the best fit line that
means the error between predicted values and actual values should be minimized. The
best fit line will have the least error.
The different values for weights or the coefficient of lines (a0, a1) gives a different line of
regression, so we need to calculate the best values for a0 and a1 to find the best fit line,
so to calculate this we use cost function.

Cost function-
o The different values for weights or coefficient of lines (a0, a1) gives the different line of
regression, and the cost function is used to estimate the values of the coefficient for the
best fit line.
o Cost function optimizes the regression coefficients or weights. It measures how a linear
regression model is performing.
o We can use the cost function to find the accuracy of the mapping function, which maps
the input variable to the output variable. This mapping function is also known
as Hypothesis function.

For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is
the average of squared error occurred between the predicted values and actual values.
It can be written as:
For the above linear equation, MSE can be calculated as:

Where,
N=Total number of observation
Yi = Actual value
(a1xi+a0)= Predicted value.
Residuals: The distance between the actual value and predicted values is called
residual. If the observed points are far from the regression line, then the residual will be
high, and so cost function will high. If the scatter points are close to the regression line,
then the residual will be small and hence the cost function.

Gradient Descent:
o Gradient descent is used to minimize the MSE by calculating the gradient of the cost
function.
o A regression model uses gradient descent to update the coefficients of the line by
reducing the cost function.
o It is done by a random selection of values of coefficient and then iteratively update the
values to reach the minimum cost function.

Model Performance:
The Goodness of fit determines how the line of regression fits the set of observations.
The process of finding the best model out of various models is called optimization. It
can be achieved by below method:
1. R-squared method:
o R-squared is a statistical method that determines the goodness of fit.
o It measures the strength of the relationship between the dependent and independent
variables on a scale of 0-100%.
o The high value of R-square determines the less difference between the predicted
values and actual values and hence represents a good model.
o It is also called a coefficient of determination, or coefficient of multiple
determination for multiple regression.
o It can be calculated from the below formula:

Assumptions of Linear Regression


Below are some important assumptions of Linear Regression. These are some formal
checks while building a Linear Regression model, which ensures to get the best
possible result from the given dataset.
o Linear relationship between the features and target:
Linear regression assumes the linear relationship between the dependent and
independent variables.
o Small or no multicollinearity between the features:
Multicollinearity means high-correlation between the independent variables. Due to
multicollinearity, it may difficult to find the true relationship between the predictors and
target variables. Or we can say, it is difficult to determine which predictor variable is
affecting the target variable and which is not. So, the model assumes either little or no
multicollinearity between the features or independent variables.
o Homoscedasticity Assumption:
Homoscedasticity is a situation when the error term is the same for all the values of
independent variables. With homoscedasticity, there should be no clear pattern
distribution of data in the scatter plot.
o Normal distribution of error terms:
Linear regression assumes that the error term should follow the normal distribution
pattern. If error terms are not normally distributed, then confidence intervals will
become either too wide or too narrow, which may cause difficulties in finding
coefficients.
It can be checked using the q-q plot. If the plot shows a straight line without any
deviation, which means the error is normally distributed.
o No autocorrelations:
The linear regression model assumes no autocorrelation in error terms. If there will be
any correlation in the error term, then it will drastically reduce the accuracy of the
model. Autocorrelation usually occurs if there is a dependency between residual errors.
Simple Linear Regression in Machine
Learning
Simple Linear Regression is a type of Regression algorithms that models the
relationship between a dependent variable and a single independent variable. The
relationship shown by a Simple Linear Regression model is linear or a sloped straight
line, hence it is called Simple Linear Regression.
The key point in Simple Linear Regression is that the dependent variable must be a
continuous/real value. However, the independent variable can be measured on
continuous or categorical values.
Simple Linear regression algorithm has mainly two objectives:
o Model the relationship between the two variables. Such as the relationship between
Income and expenditure, experience and Salary, etc.
o Forecasting new observations. Such as Weather forecasting according to temperature,
Revenue of a company according to the investments in a year, etc.

Simple Linear Regression Model:


The Simple Linear Regression model can be represented using the below equation:
y= a0+a1x+ ε
Where,
a0= It is the intercept of the Regression line (can be obtained putting x=0)
a1= It is the slope of the regression line, which tells whether the line is increasing or
decreasing.
ε = The error term. (For a good model it will be negligible)

Implementation of Simple Linear Regression


Algorithm using Python
Problem Statement example for Simple Linear Regression:
Here we are taking a dataset that has two variables: salary (dependent variable) and
experience (Independent variable). The goals of this problem is:
o We want to find out if there is any correlation between these two variables
o We will find the best fit line for the dataset.
o How the dependent variable is changing by changing the independent variable.

In this section, we will create a Simple Linear Regression model to find out the best
fitting line for representing the relationship between these two variables.
To implement the Simple Linear regression model in machine learning using Python, we
need to follow the below steps:
Step-1: Data Pre-processing
The first step for creating the Simple Linear Regression model is data pre-processing.
We have already done it earlier in this tutorial. But there will be some changes, which
are given in the below steps:
o First, we will import the three important libraries, which will help us for loading the
dataset, plotting the graphs, and creating the Simple Linear Regression model.

1. import numpy as nm
2. import matplotlib.pyplot as mtp
3. import pandas as pd
o Next, we will load the dataset into our code:

1. data_set= pd.read_csv('Salary_Data.csv')
By executing the above line of code (ctrl+ENTER), we can read the dataset on our
Spyder IDE screen by clicking on the variable explorer option.

The above output shows the dataset, which has two variables: Salary and Experience.
Note: In Spyder IDE, the folder containing the code file must be saved as a
working directory, and the dataset or csv file should be in the same folder.
o After that, we need to extract the dependent and independent variables from the given
dataset. The independent variable is years of experience, and the dependent variable is
salary. Below is code for it:

1. x= data_set.iloc[:, :-1].values
2. y= data_set.iloc[:, 1].values
In the above lines of code, for x variable, we have taken -1 value since we want to
remove the last column from the dataset. For y variable, we have taken 1 value as a
parameter, since we want to extract the second column and indexing starts from the
zero.
By executing the above line of code, we will get the output for X and Y variable as:

In the above output image, we can see the X (independent) variable and Y (dependent)
variable has been extracted from the given dataset.
o Next, we will split both variables into the test set and training set. We have 30
observations, so we will take 20 observations for the training set and 10 observations
for the test set. We are splitting our dataset so that we can train our model using a
training dataset and then test the model using a test dataset. The code for this is given
below:

1. # Splitting the dataset into training and test set.


2. from sklearn.model_selection import train_test_split
3. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 1/3, random_state=
0)
By executing the above code, we will get x-test, x-train and y-test, y-train dataset.
Consider the below images:
Test-dataset:

Training Dataset:

o For simple linear Regression, we will not use Feature Scaling. Because Python libraries
take care of it for some cases, so we don't need to perform it here. Now, our dataset is
well prepared to work on it and we are going to start building a Simple Linear
Regression model for the given problem.

Step-2: Fitting the Simple Linear Regression to the Training Set:


Now the second step is to fit our model to the training dataset. To do so, we will import
the LinearRegression class of the linear_model library from the scikit learn. After
importing the class, we are going to create an object of the class named as a regressor.
The code for this is given below:
1. #Fitting the Simple Linear Regression model to the training dataset
2. from sklearn.linear_model import LinearRegression
3. regressor= LinearRegression()
4. regressor.fit(x_train, y_train)
In the above code, we have used a fit() method to fit our Simple Linear Regression
object to the training set. In the fit() function, we have passed the x_train and y_train,
which is our training dataset for the dependent and an independent variable. We have
fitted our regressor object to the training set so that the model can easily learn the
correlations between the predictor and target variables. After executing the above lines
of code, we will get the below output.
Output:
Out[7]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
normalize=False)
Step: 3. Prediction of test set result:
dependent (salary) and an independent variable (Experience). So, now, our model is
ready to predict the output for the new observations. In this step, we will provide the
test dataset (new observations) to the model to check whether it can predict the
correct output or not.
We will create a prediction vector y_pred, and x_pred, which will contain predictions
of test dataset, and prediction of training set respectively.
1. #Prediction of Test and Training set result
2. y_pred= regressor.predict(x_test)
3. x_pred= regressor.predict(x_train)
On executing the above lines of code, two variables named y_pred and x_pred will
generate in the variable explorer options that contain salary predictions for the training
set and test set.
Output:
You can check the variable by clicking on the variable explorer option in the IDE, and
also compare the result by comparing values from y_pred and y_test. By comparing
these values, we can check how good our model is performing.
Step: 4. visualizing the Training set results:
Now in this step, we will visualize the training set result. To do so, we will use the
scatter() function of the pyplot library, which we have already imported in the
pre-processing step. The scatter () function will create a scatter plot of observations.
In the x-axis, we will plot the Years of Experience of employees and on the y-axis,
salary of employees. In the function, we will pass the real values of training set, which
means a year of experience x_train, training set of Salaries y_train, and color of the
observations. Here we are taking a green color for the observation, but it can be any
color as per the choice.
Now, we need to plot the regression line, so for this, we will use the plot() function of
the pyplot library. In this function, we will pass the years of experience for training set,
predicted salary for training set x_pred, and color of the line.
Next, we will give the title for the plot. So here, we will use the title() function of
the pyplot library and pass the name ("Salary vs Experience (Training Dataset)".
After that, we will assign labels for x-axis and y-axis using xlabel() and ylabel()
function.
Finally, we will represent all above things in a graph using show(). The code is given
below:
1. mtp.scatter(x_train, y_train, color="green")
2. mtp.plot(x_train, x_pred, color="red")
3. mtp.title("Salary vs Experience (Training Dataset)")
4. mtp.xlabel("Years of Experience")
5. mtp.ylabel("Salary(In Rupees)")
6. mtp.show()
Output:
By executing the above lines of code, we will get the below graph plot as an output.

In the above plot, we can see the real values observations in green dots and predicted
values are covered by the red regression line. The regression line shows a correlation
between the dependent and independent variable.
The good fit of the line can be observed by calculating the difference between actual
values and predicted values. But as we can see in the above plot, most of the
observations are close to the regression line, hence our model is good for the training
set.
Step: 5. visualizing the Test set results:
In the previous step, we have visualized the performance of our model on the training
set. Now, we will do the same for the Test set. The complete code will remain the same
as the above code, except in this, we will use x_test, and y_test instead of x_train and
y_train.
Here we are also changing the color of observations and regression line to differentiate
between the two plots, but it is optional.
1. #visualizing the Test set results
2. mtp.scatter(x_test, y_test, color="blue")
3. mtp.plot(x_train, x_pred, color="red")
4. mtp.title("Salary vs Experience (Test Dataset)")
5. mtp.xlabel("Years of Experience")
6. mtp.ylabel("Salary(In Rupees)")
7. mtp.show()
Output:
By executing the above line of code, we will get the output as:

In the above plot, there are observations given by the blue color, and prediction is
given by the red regression line. As we can see, most of the observations are close to
the regression line, hence we can say our Simple Linear Regression is a good model
and able to make good predictions.

Multiple Linear Regression


In the previous topic, we have learned about Simple Linear Regression, where a single
Independent/Predictor(X) variable is used to model the response variable (Y). But there
may be various cases in which the response variable is affected by more than one
predictor variable; for such cases, the Multiple Linear Regression algorithm is used.
Moreover, Multiple Linear Regression is an extension of Simple Linear regression as it
takes more than one predictor variable to predict the response variable. We can define
it as:
Multiple Linear Regression is one of the important regression algorithms which models the linear
relationship between a single dependent continuous variable and more than one independent
variable.
Example:
Prediction of CO2 emission based on engine size and number of cylinders in a car.
Some key points about MLR:
o For MLR, the dependent or target variable(Y) must be the continuous/real, but the
predictor or independent variable may be of continuous or categorical form.
o Each feature variable must model the linear relationship with the dependent variable.
o MLR tries to fit a regression line through a multidimensional space of data-points.

MLR equation:
In Multiple Linear Regression, the target variable(Y) is a linear combination of multiple
predictor variables x1, x2, x3, ..., xn. Since it is an enhancement of Simple Linear
Regression, so the same is applied for the multiple linear regression equation, the
equation becomes:
1. Y= b<sub>0</sub>+b<sub>1</sub>x<sub>1</sub>+ b<sub>2</sub>x<sub>2</sub>+
b<sub>3</sub>x<sub>3</sub>+...... bnxn ............... (a)
Where,
Y= Output/Response variable
b0, b1, b2, b3, bn....= Coefficients of the model.
x1, x2, x3, x4, ...= Various Independent/feature variable

Assumptions for Multiple Linear Regression:


o A linear relationship should exist between the Target and predictor variables.
o The regression residuals must be normally distributed.
o MLR assumes little or no multicollinearity (correlation between the independent
variable) in data.

Implementation of Multiple Linear Regression model using


Python:
To implement MLR using Python, we have below problem:
Problem Description:
We have a dataset of 50 start-up companies. This dataset contains five main
information: R&D Spend, Administration Spend, Marketing Spend, State, and Profit for
a financial year. Our goal is to create a model that can easily determine which
company has a maximum profit, and which is the most affecting factor for the profit of
a company.
Since we need to find the Profit, so it is the dependent variable, and the other four
variables are independent variables. Below are the main steps of deploying the MLR
model:
1. Data Pre-processing Steps
2. Fitting the MLR model to the training set
3. Predicting the result of the test set

Step-1: Data Pre-processing Step:


The very first step is data pre-processing, which we have already discussed in this
tutorial. This process contains the below steps:
o Importing libraries: Firstly we will import the library which will help in building the
model. Below is the code for it:

1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
o Importing dataset: Now we will import the dataset(50_CompList), which contains all
the variables. Below is the code for it:

1. #importing datasets
2. data_set= pd.read_csv('50_CompList.csv')
Output: We will get the dataset as:
In above output, we can clearly see that there are five variables, in which four variables
are continuous and one is categorical variable.
o Extracting dependent and independent Variables:

1. #Extracting Independent and dependent Variable


2. x= data_set.iloc[:, :-1].values
3. y= data_set.iloc[:, 4].values
Output:
Out[5]:
array([[165349.2, 136897.8, 471784.1, 'New York'],
[162597.7, 151377.59, 443898.53, 'California'],
[153441.51, 101145.55, 407934.54, 'Florida'],
[144372.41, 118671.85, 383199.62, 'New York'],
[142107.34, 91391.77, 366168.42, 'Florida'],
[131876.9, 99814.71, 362861.36, 'New York'],
[134615.46, 147198.87, 127716.82, 'California'],
[130298.13, 145530.06, 323876.68, 'Florida'],
[120542.52, 148718.95, 311613.29, 'New York'],
[123334.88, 108679.17, 304981.62, 'California'],
[101913.08, 110594.11, 229160.95, 'Florida'],
[100671.96, 91790.61, 249744.55, 'California'],
[93863.75, 127320.38, 249839.44, 'Florida'],
[91992.39, 135495.07, 252664.93, 'California'],
[119943.24, 156547.42, 256512.92, 'Florida'],
[114523.61, 122616.84, 261776.23, 'New York'],
[78013.11, 121597.55, 264346.06, 'California'],
[94657.16, 145077.58, 282574.31, 'New York'],
[91749.16, 114175.79, 294919.57, 'Florida'],
[86419.7, 153514.11, 0.0, 'New York'],
[76253.86, 113867.3, 298664.47, 'California'],
[78389.47, 153773.43, 299737.29, 'New York'],
[73994.56, 122782.75, 303319.26, 'Florida'],
[67532.53, 105751.03, 304768.73, 'Florida'],
[77044.01, 99281.34, 140574.81, 'New York'],
[64664.71, 139553.16, 137962.62, 'California'],
[75328.87, 144135.98, 134050.07, 'Florida'],
[72107.6, 127864.55, 353183.81, 'New York'],
[66051.52, 182645.56, 118148.2, 'Florida'],
[65605.48, 153032.06, 107138.38, 'New York'],
[61994.48, 115641.28, 91131.24, 'Florida'],
[61136.38, 152701.92, 88218.23, 'New York'],
[63408.86, 129219.61, 46085.25, 'California'],
[55493.95, 103057.49, 214634.81, 'Florida'],
[46426.07, 157693.92, 210797.67, 'California'],
[46014.02, 85047.44, 205517.64, 'New York'],
[28663.76, 127056.21, 201126.82, 'Florida'],
[44069.95, 51283.14, 197029.42, 'California'],
[20229.59, 65947.93, 185265.1, 'New York'],
[38558.51, 82982.09, 174999.3, 'California'],
[28754.33, 118546.05, 172795.67, 'California'],
[27892.92, 84710.77, 164470.71, 'Florida'],
[23640.93, 96189.63, 148001.11, 'California'],
[15505.73, 127382.3, 35534.17, 'New York'],
[22177.74, 154806.14, 28334.72, 'California'],
[1000.23, 124153.04, 1903.93, 'New York'],
[1315.46, 115816.21, 297114.46, 'Florida'],
[0.0, 135426.92, 0.0, 'California'],
[542.05, 51743.15, 0.0, 'New York'],
[0.0, 116983.8, 45173.06, 'California']], dtype=object)
As we can see in the above output, the last column contains categorical variables
which are not suitable to apply directly for fitting the model. So we need to encode this
variable.
Encoding Dummy Variables:
As we have one categorical variable (State), which cannot be directly applied to the
model, so we will encode it. To encode the categorical variable into numbers, we will
use the LabelEncoder class. But it is not sufficient because it still has some relational
order, which may create a wrong model. So in order to remove this problem, we will
use OneHotEncoder, which will create the dummy variables. Below is code for it:
1. #Catgorical data
2. from sklearn.preprocessing import LabelEncoder, OneHotEncoder
3. labelencoder_x= LabelEncoder()
4. x[:, 3]= labelencoder_x.fit_transform(x[:,3])
5. onehotencoder= OneHotEncoder(categorical_features= [3])
6. x= onehotencoder.fit_transform(x).toarray()
Here we are only encoding one independent variable, which is state as other variables
are continuous.
Output:
As we can see in the above output, the state column has been converted into dummy
variables (0 and 1). Here each dummy variable column is corresponding to the one
State. We can check by comparing it with the original dataset. The first column
corresponds to the California State, the second column corresponds to the Florida
State, and the third column corresponds to the New York State.
Note: We should not use all the dummy variables at the same time, so it must be 1
less than the total number of dummy variables, else it will create a dummy
variable trap.
o Now, we are writing a single line of code just to avoid the dummy variable trap:

1. #avoiding the dummy variable trap:


2. x = x[:, 1:]
If we do not remove the first dummy variable, then it may introduce multicollinearity in
the model.
As we can see in the above output image, the first column has been removed.
o Now we will split the dataset into training and test set. The code for this is given below:

1. # Splitting the dataset into training and test set.


2. from sklearn.model_selection import train_test_split
3. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.2, random_state=
0)
The above code will split our dataset into a training set and test set.
Output: The above code will split the dataset into training set and test set. You can
check the output by clicking on the variable explorer option given in Spyder IDE. The
test set and training set will look like the below image:
Test set:
Training set:

Note: In MLR, we will not do feature scaling as it is taken care by the library, so we
don't need to do it manually.
Step: 2- Fitting our MLR model to the Training set:
Now, we have well prepared our dataset in order to provide training, which means we
will fit our regression model to the training set. It will be similar to as we did in Simple
Linear Regression model. The code for this will be:
1. #Fitting the MLR model to the training set:
2. from sklearn.linear_model import LinearRegression
3. regressor= LinearRegression()
4. regressor.fit(x_train, y_train)
Output:
Out[9]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
normalize=False)
Now, we have successfully trained our model using the training dataset. In the next
step, we will test the performance of the model using the test dataset.

Step: 3- Prediction of Test set results:


The last step for our model is checking the performance of the model. We will do it by
predicting the test set result. For prediction, we will create a y_pred vector. Below is the
code for it:
1. #Predicting the Test set result;
2. y_pred= regressor.predict(x_test)
By executing the above lines of code, a new vector will be generated under the variable
explorer option. We can test our model by comparing the predicted values and test set
values.
Output:

In the above output, we have predicted result set and test set. We can check model
performance by comparing these two value index by index. For example, the first index
has a predicted value of 103015$ profit and test/real value of 103282$ profit. The
difference is only of 267$, which is a good prediction, so, finally, our model is
completed here.
o We can also check the score for training dataset and test dataset. Below is the code for
it:
1. print('Train Score: ', regressor.score(x_train, y_train))
2. print('Test Score: ', regressor.score(x_test, y_test))
Output: The score is:
Train Score: 0.9501847627493607
Test Score: 0.9347068473282446
The above score tells that our model is 95% accurate with the training dataset and 93%
accurate with the test dataset.
Note: In the next topic, we will see how we can improve the performance of the
model using the Backward Elimination process.
Applications of Multiple Linear Regression:
There are mainly two applications of Multiple Linear Regression:
o Effectiveness of Independent variable on prediction:
o Predicting the impact of changes

Support Vector Machine (SVM) Algorithm



Support Vector Machine (SVM) is a powerful machine learning algorithm
used for linear or nonlinear classification, regression, and even outlier
detection tasks. SVMs can be used for a variety of tasks, such as text
classification, image classification, spam detection, handwriting
identification, gene expression analysis, face detection, and anomaly
detection. SVMs are adaptable and efficient in a variety of applications
because they can manage high-dimensional data and nonlinear
relationships.
SVM algorithms are very effective as we try to find the maximum
separating hyperplane between the different classes available in the target
feature.

Support Vector Machine


Support Vector Machine (SVM) is a supervised machine learning algorithm
used for both classification and regression. Though we say regression
problems as well it’s best suited for classification. The main objective of the
SVM algorithm is to find the optimal hyperplane in an N-dimensional space
that can separate the data points in different classes in the feature space.
The hyperplane tries that the margin between the closest points of different
classes should be as maximum as possible. The dimension of the
hyperplane depends upon the number of features. If the number of input
features is two, then the hyperplane is just a line. If the number of input
features is three, then the hyperplane becomes a 2-D plane. It becomes
difficult to imagine when the number of features exceeds three.
Let’s consider two independent variables x1, x2, and one dependent variable
which is either a blue circle or a red circle.

Linearly Separable Data points

From the figure above it’s very clear that there are multiple lines (our
hyperplane here is a line because we are considering only two input
features x1, x2) that segregate our data points or do a classification between
red and blue circles. So how do we choose the best line or in general the
best hyperplane that segregates our data points?

How does SVM work?

One reasonable choice as the best hyperplane is the one that represents
the largest separation or margin between the two classes.
Multiple hyperplanes separate the data from two classes

So we choose the hyperplane whose distance from it to the nearest data


point on each side is maximized. If such a hyperplane exists it is known as
the maximum-margin hyperplane/hard margin. So from the above
figure, we choose L2. Let’s consider a scenario like shown below

Selecting hyperplane for data with outlier

Here we have one blue ball in the boundary of the red ball. So how does
SVM classify the data? It’s simple! The blue ball in the boundary of red
ones is an outlier of blue balls. The SVM algorithm has the characteristics
to ignore the outlier and finds the best hyperplane that maximizes the
margin. SVM is robust to outliers.

Hyperplane which is the most optimized one

So in this type of data point what SVM does is, finds the maximum margin
as done with previous data sets along with that it adds a penalty each time
a point crosses the margin. So the margins in these types of cases are
called soft margins. When there is a soft margin to the data set, the SVM
tries to minimize (1/margin+∧(∑penalty)). Hinge loss is a commonly used
penalty. If no violations no hinge loss.If violations hinge loss proportional to
the distance of violation.
Till now, we were talking about linearly separable data(the group of blue
balls and red balls are separable by a straight line/linear line). What to do if
data are not linearly separable?

Original 1D dataset for classification


Say, our data is shown in the figure above. SVM solves this by creating a
new variable using a kernel. We call a point xi on the line and we create a
new variable yi as a function of distance from origin o.so if we plot this we
get something like as shown below

Mapping 1D data to 2D to become able to separate the two classes

In this case, the new variable y is created as a function of distance from the
origin. A non-linear function that creates a new variable is referred to as a
kernel.

Support Vector Machine Terminology

1. Hyperplane: Hyperplane is the decision boundary that is used to


separate the data points of different classes in a feature space. In
the case of linear classifications, it will be a linear equation i.e.
wx+b = 0.
2. Support Vectors: Support vectors are the closest data points to
the hyperplane, which makes a critical role in deciding the
hyperplane and margin.
3. Margin: Margin is the distance between the support vector and
hyperplane. The main objective of the support vector machine
algorithm is to maximize the margin. The wider margin indicates
better classification performance.
4. Kernel: Kernel is the mathematical function, which is used in SVM
to map the original input data points into high-dimensional feature
spaces, so, that the hyperplane can be easily found out even if the
data points are not linearly separable in the original input space.
Some of the common kernel functions are linear, polynomial,
radial basis function(RBF), and sigmoid.
5. Hard Margin: The maximum-margin hyperplane or the hard
margin hyperplane is a hyperplane that properly separates the
data points of different categories without any misclassifications.
6. Soft Margin: When the data is not perfectly separable or contains
outliers, SVM permits a soft margin technique. Each data point
has a slack variable introduced by the soft-margin SVM
formulation, which softens the strict margin requirement and
permits certain misclassifications or violations. It discovers a
compromise between increasing the margin and reducing
violations.
7. C: Margin maximisation and misclassification fines are balanced
by the regularisation parameter C in SVM. The penalty for going
over the margin or misclassifying data items is decided by it. A
stricter penalty is imposed with a greater value of C, which results
in a smaller margin and perhaps fewer misclassifications.
8. Hinge Loss: A typical loss function in SVMs is hinge loss. It
punishes incorrect classifications or margin violations. The
objective function in SVM is frequently formed by combining it with
the regularisation term.
9. Dual Problem: A dual Problem of the optimisation problem that
requires locating the Lagrange multipliers related to the support
vectors can be used to solve SVM. The dual formulation enables
the use of kernel tricks and more effective computing.

Mathematical intuition of Support Vector Machine

Consider a binary classification problem with two classes, labeled as +1


and -1. We have a training dataset consisting of input feature vectors X and
their corresponding class labels Y.
The equation for the linear hyperplane can be written as:

The vector W represents the normal vector to the hyperplane. i.e the
direction perpendicular to the hyperplane. The parameter b in the equation
represents the offset or distance of the hyperplane from the origin along the
normal vector w.
The distance between a data point x_i and the decision boundary can be
calculated as:
where ||w|| represents the Euclidean norm of the weight vector w.
Euclidean norm of the normal vector W
For Linear SVM classifier :

Optimization:
● For Hard margin linear SVM classifier:

The target variable or label for the ith training instance is denoted by the
symbol ti in this statement. And ti=-1 for negative occurrences (when yi= 0)
and ti=1positive instances (when yi = 1) respectively. Because we require

the decision boundary that satisfy the constraint:


● For Soft margin linear SVM classifier:

● Dual Problem: A dual Problem of the optimisation problem that


requires locating the Lagrange multipliers related to the support
vectors can be used to solve SVM. The optimal Lagrange
multipliers α(i) that maximize the following dual objective function

where,
● αi is the Lagrange multiplier associated with the ith training
sample.
● K(xi, xj) is the kernel function that computes the similarity between
two samples xi and xj. It allows SVM to handle nonlinear
classification problems by implicitly mapping the samples into a
higher-dimensional feature space.
● The term ∑αi represents the sum of all Lagrange multipliers.
The SVM decision boundary can be described in terms of these optimal
Lagrange multipliers and the support vectors once the dual issue has been
solved and the optimal Lagrange multipliers have been discovered. The
training samples that have i > 0 are the support vectors, while the decision
boundary is supplied by:

Types of Support Vector Machine

Based on the nature of the decision boundary, Support Vector Machines


(SVM) can be divided into two main parts:
● Linear SVM: Linear SVMs use a linear decision boundary to
separate the data points of different classes. When the data can
be precisely linearly separated, linear SVMs are very suitable.
This means that a single straight line (in 2D) or a hyperplane (in
higher dimensions) can entirely divide the data points into their
respective classes. A hyperplane that maximizes the margin
between the classes is the decision boundary.
● Non-Linear SVM: Non-Linear SVM can be used to classify data
when it cannot be separated into two classes by a straight line (in
the case of 2D). By using kernel functions, nonlinear SVMs can
handle nonlinearly separable data. The original input data is
transformed by these kernel functions into a higher-dimensional
feature space, where the data points can be linearly separated. A
linear SVM is used to locate a nonlinear decision boundary in this
modified space.

Popular kernel functions in SVM

The SVM kernel is a function that takes low-dimensional input space and
transforms it into higher-dimensional space, ie it converts nonseparable
problems to separable problems. It is mostly useful in non-linear separation
problems. Simply put the kernel, does some extremely complex data
transformations and then finds out the process to separate the data based
on the labels or outputs defined.
Advantages of SVM
● Effective in high-dimensional cases.
● Its memory is efficient as it uses a subset of training points in the
decision function called support vectors.
● Different kernel functions can be specified for the decision
functions and its possible to specify custom kernels.
SVM implementation in Python
Predict if cancer is Benign or malignant. Using historical data about
patients diagnosed with cancer enables doctors to differentiate malignant
cases and benign ones are given independent attributes.
Steps
● Load the breast cancer dataset from sklearn.datasets
● Separate input features and target variables.
● Buil and train the SVM classifiers using RBF kernel.
● Plot the scatter plot of the input features.
● Plot the decision boundary.
● Plot the decision boundary
# Load the important packages
from sklearn.datasets import load_breast_cancer
import matplotlib.pyplot as plt
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.svm import SVC

# Load the datasets


cancer = load_breast_cancer()
X = cancer.data[:, :2]
y = cancer.target

#Build the model


svm = SVC(kernel="rbf", gamma=0.5, C=1.0)
# Trained the model
svm.fit(X, y)

# Plot Decision Boundary


DecisionBoundaryDisplay.from_estimator(
svm,
X,
response_method="predict",
cmap=plt.cm.Spectral,
alpha=0.8,
xlabel=cancer.feature_names[0],
ylabel=cancer.feature_names[1],
)

# Scatter plot
plt.scatter(X[:, 0], X[:, 1],
c=y,
s=20, edgecolors="k")
plt.show()

Output:

Breast Cancer Classifications with SVM RBF kernel

You might also like