0% found this document useful (0 votes)
35 views47 pages

AI 4 Unit Notes

Artificial intelligence all notes with solutions

Uploaded by

vinaykvs111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views47 pages

AI 4 Unit Notes

Artificial intelligence all notes with solutions

Uploaded by

vinaykvs111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Supervised Machine Learning

Supervised learning is the types of machine learning in which machines are trained
using well "labelled" training data, and on basis of that data, machines predict the
output. The labelled data means some input data is already tagged with the correct
output.

In supervised learning, the training data provided to the machines work as the
supervisor that teaches the machines to predict the output correctly. It applies the
same concept as a student learns in the supervision of the teacher.

Supervised learning is a process of providing input data as well as correct output data
to the machine learning model. The aim of a supervised learning algorithm is to find
a mapping function to map the input variable(x) with the output variable(y).

In the real-world, supervised learning can be used for Risk Assessment, Image
classification, Fraud Detection, spam filtering, etc.

How Supervised Learning Works?


In supervised learning, models are trained using labelled dataset, where the model
learns about each type of data. Once the training process is completed, the model is
tested on the basis of test data (a subset of the training set), and then it predicts the
output.

The working of Supervised learning can be easily understood by the below example
and diagram:
Suppose we have a dataset of different types of shapes which includes square,
rectangle, triangle, and Polygon. Now the first step is that we need to train the model
for each shape.

o If the given shape has four sides, and all the sides are equal, then it will be labelled as
a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.

Now, after training, we test our model using the test set, and the task of the model is
to identify the shape.

The machine is already trained on all types of shapes, and when it finds a new shape,
it classifies the shape on the bases of a number of sides, and predicts the output.

Steps Involved in Supervised Learning:


o First Determine the type of training dataset
o Collect/Gather the labelled training data.
o Split the training dataset into training dataset, test dataset, and validation dataset.
o Determine the input features of the training dataset, which should have enough
knowledge so that the model can accurately predict the output.
o Determine the suitable algorithm for the model, such as support vector machine,
decision tree, etc.
o Execute the algorithm on the training dataset. Sometimes we need validation sets as
the control parameters, which are the subset of training datasets.
o Evaluate the accuracy of the model by providing the test set. If the model predicts the
correct output, which means our model is accurate.

Types of supervised Machine learning Algorithms:


Supervised learning can be further divided into two types of problems:
1. Regression

Regression algorithms are used if there is a relationship between the input variable
and the output variable. It is used for the prediction of continuous variables, such as
Weather forecasting, Market Trends, etc. Below are some popular Regression
algorithms which come under supervised learning:

o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression

2. Classification

Classification algorithms are used when the output variable is categorical, which means
there are two classes such as Yes-No, Male-Female, True-false, etc.

Spam Filtering,

o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines

Advantages of Supervised learning:


o With the help of supervised learning, the model can predict the output on the basis of
prior experiences.
o In supervised learning, we can have an exact idea about the classes of objects.
o Supervised learning model helps us to solve various real-world problems such as fraud
detection, spam filtering, etc.

Disadvantages of supervised learning:


o Supervised learning models are not suitable for handling the complex tasks.
o Supervised learning cannot predict the correct output if the test data is different from
the training dataset.
o Training required lots of computation times.
o In supervised learning, we need enough knowledge about the classes of object.

o Regression Analysis in Machine learning


o Regression analysis is a statistical method to model the relationship between a
dependent (target) and independent (predictor) variables with one or more
independent variables. More specifically, Regression analysis helps us to
understand how the value of the dependent variable is changing corresponding
to an independent variable when other independent variables are held fixed. It
predicts continuous/real values such as temperature, age, salary, price, etc.
o We can understand the concept of regression analysis using the below example:
o Example: Suppose there is a marketing company A, who does various
advertisement every year and get sales on that. The below list shows the
advertisement made by the company in the last 5 years and the corresponding
sales:

o
o Now, the company wants to do the advertisement of $200 in the year 2019 and
wants to know the prediction about the sales for this year. So to solve such
type of prediction problems in machine learning, we need regression analysis.
o In Regression, we plot a graph between the variables which best fits the given
datapoints, using this plot, the machine learning model can make predictions
about the data. In simple words, "Regression shows a line or curve that
passes through all the datapoints on target-predictor graph in such a way
that the vertical distance between the datapoints and the regression line is
minimum." The distance between datapoints and line tells whether a model
has captured a strong relationship or not.

Why do we use Regression Analysis?


As mentioned above, Regression analysis helps in the prediction of a continuous variable.
There are various scenarios in the real world where we need some future predictions such as
weather condition, sales prediction, marketing trends, etc., for such case we need some
technology which can make predictions more accurately. So for such case we need
Regression analysis which is a statistical method and used in machine learning and data
science. Below are some other reasons for using Regression analysis:

o Regression estimates the relationship between the target and the independent
variable.
o It is used to find the trends in data.
o It helps to predict real/continuous values.
o By performing the regression, we can confidently determine the most important
factor, the least important factor, and how each factor is affecting the other
factors.

Types of Regression
There are various types of regressions which are used in data science and machine learning.
Each type has its own importance on different scenarios, but at the core, all the regression
methods analyze the effect of the independent variable on dependent variables. Here we are
discussing some important types of regression which are given below:

o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
o Ridge Regression
o Lasso Regression:

Linear Regression:
o Linear regression is a statistical regression method which is used for predictive analysis.
o It is one of the very simple and easy algorithms which works on regression and shows
the relationship between the continuous variables.
o It is used for solving the regression problem in machine learning.
o Linear regression shows the linear relationship between the independent variable (X-
axis) and the dependent variable (Y-axis), hence called linear regression.
o If there is only one input variable (x), then such linear regression is called simple linear
regression. And if there is more than one input variable, then such linear regression is
called multiple linear regression.
o The relationship between variables in the linear regression model can be explained
using the below image. Here we are predicting the salary of an employee on the basis
of the year of experience.
o Below is the mathematical equation for Linear regression:

1. Y= aX+b
Here, Y = dependent variables (target variables),
X= Independent variables (predictor variables),
a and b are the linear coefficients

Some popular applications of linear regression are:

o Analyzing trends and sales estimates


o Salary forecasting
o Real estate prediction
o Arriving at ETAs in traffic.

Logistic Regression:
o Logistic regression is another supervised learning algorithm which is used to solve the
classification problems. In classification problems, we have dependent variables in a
binary or discrete format such as 0 or 1.
o Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or
No, True or False, Spam or not spam, etc.
o It is a predictive analysis algorithm which works on the concept of probability.
o Logistic regression is a type of regression, but it is different from the linear regression
algorithm in the term how they are used.
o Logistic regression uses sigmoid function or logistic function which is a complex cost
function. This sigmoid function is used to model the data in logistic regression. The
function can be represented as:

o f(x)= Output between the 0 and 1 value.


o x= input to the function
o e= base of natural logarithm.

When we provide the input values (data) to the function, it gives the S-curve as follows:

ADVERTISEMENT

o t uses the concept of threshold levels, values above the threshold level are rounded up
to 1, and values below the threshold level are rounded up to 0.

There are three types of logistic regression:

o Binary(0/1, pass/fail)
o Multi(cats, dogs, lions)
o Ordinal(low, medium, high)
Polynomial Regression:
o Polynomial Regression is a type of regression which models the non-linear
dataset using a linear model.
o It is similar to multiple linear regression, but it fits a non-linear curve between the value
of x and corresponding conditional values of y.
o Suppose there is a dataset which consists of datapoints which are present in a non-
linear fashion, so for such case, linear regression will not best fit to those datapoints.
To cover such datapoints, we need Polynomial regression.
o In Polynomial regression, the original features are transformed into polynomial
features of given degree and then modeled using a linear model. Which means the
datapoints are best fitted using a polynomial line.

o The equation for polynomial regression also derived from linear regression equation
that means Linear regression equation Y= b0+ b1x, is transformed into Polynomial
regression equation Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.
o Here Y is the predicted/target output, b0, b1,... bn are the regression coefficients. x
is our independent/input variable.
o The model is still linear as the coefficients are still linear with quadratic
Note: This is different from Multiple Linear regression in such a way that in
Polynomial regression, a single element has different degrees instead of multiple
variables with the same degree.
Support Vector Regression:
Support Vector Machine is a supervised learning algorithm which can be used for
regression as well as classification problems. So if we use it for regression problems,
then it is termed as Support Vector Regression.

Support Vector Regression is a regression algorithm which works for continuous


variables. Below are some keywords which are used in Support Vector Regression:

o Kernel: It is a function used to map a lower-dimensional data into higher dimensional


data.
o Hyperplane: In general SVM, it is a separation line between two classes, but in SVR, it
is a line which helps to predict the continuous variables and cover most of the
datapoints.
o Boundary line: Boundary lines are the two lines apart from hyperplane, which creates
a margin for datapoints.
o Support vectors: Support vectors are the datapoints which are nearest to the
hyperplane and opposite class.

In SVR, we always try to determine a hyperplane with a maximum margin, so that


maximum number of datapoints are covered in that margin. The main goal of SVR is
to consider the maximum datapoints within the boundary lines and the
hyperplane (best-fit line) must contain a maximum number of datapoints.
Consider the below image:
Here, the blue line is called hyperplane, and the other two lines are known as boundary
lines.

Decision Tree Regression:


o Decision Tree is a supervised learning algorithm which can be used for solving both
classification and regression problems.
o It can solve problems for both categorical and numerical data
o Decision Tree regression builds a tree-like structure in which each internal node
represents the "test" for an attribute, each branch represent the result of the test, and
each leaf node represents the final decision or result.
o A decision tree is constructed starting from the root node/parent node (dataset), which
splits into left and right child nodes (subsets of dataset). These child nodes are further
divided into their children node, and themselves become the parent node of those
nodes. Consider the below image:

Above image showing the example of Decision Tee regression, here, the model is
trying to predict the choice of a person between Sports cars or Luxury car.

o Random forest is one of the most powerful supervised learning algorithms which is
capable of performing regression as well as classification tasks.
o The Random Forest regression is an ensemble learning method which combines
multiple decision trees and predicts the final output based on the average of each tree
output. The combined decision trees are called as base models, and it can be
represented more formally as:

g(x)= f0(x)+ f1(x)+ f2(x)+....

o Random forest uses Bagging or Bootstrap Aggregation technique of ensemble


learning in which aggregated decision tree runs in parallel and do not interact with
each other.
o With the help of Random Forest regression, we can prevent Overfitting in the model
by creating random subsets of the dataset.

Ridge Regression:
o Ridge regression is one of the most robust versions of linear regression in which a small
amount of bias is introduced so that we can get better long term predictions.
o The amount of bias added to the model is known as Ridge Regression penalty. We
can compute this penalty term by multiplying with the lambda to the squared weight
of each individual features.
o The equation for ridge regression will be:

o A general linear or polynomial regression will fail if there is high collinearity between
the independent variables, so to solve such problems, Ridge regression can be used.
o Ridge regression is a regularization technique, which is used to reduce the complexity
of the model. It is also called as L2 regularization.
o It helps to solve the problems if we have more parameters than samples.

Lasso Regression:
o Lasso regression is another regularization technique to reduce the complexity of the
model.
o It is similar to the Ridge Regression except that penalty term contains only the absolute
weights instead of a square of weights.
o Since it takes absolute values, hence, it can shrink the slope to 0, whereas Ridge
Regression can only shrink it near to 0.
o It is also called as L1 regularization. The equation for Lasso regression will be:

Classification Algorithm in Machine Learning


As we know, the Supervised Machine Learning algorithm can be broadly classified into
Regression and Classification Algorithms. In Regression algorithms, we have predicted the
output for continuous values, but to predict the categorical values, we need Classification
algorithms.

What is the Classification Algorithm?


The Classification algorithm is a Supervised Learning technique that is used to identify the
category of new observations on the basis of training data. In Classification, a program learns
from the given dataset or observations and then classifies new observation into a number of
classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc.
Classes can be called as targets/labels or categories.
Unlike regression, the output variable of Classification is a category, not a value, such as
"Green or Blue", "fruit or animal", etc. Since the Classification algorithm is a Supervised
learning technique, hence it takes labeled input data, which means it contains input with the
corresponding output.

In classification algorithm, a discrete output function(y) is mapped to input variable(x).

The best example of an ML classification algorithm is Email Spam Detector.

The main goal of the Classification algorithm is to identify the category of a given
dataset, and these algorithms are mainly used to predict the output for the categorical
data.

Classification algorithms can be better understood using the below diagram. In the
below diagram, there are two classes, class A and Class B. These classes have features
that are similar to each other and dissimilar to other classes.

K-Nearest Neighbor(KNN) Algorithm for


Machine Learning
o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new data point based on
the similarity. This means when new data appears then it can be easily classified into a
well suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly it is
used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any assumption
on underlying data.
o It is also called a lazy learner algorithm because it does not learn from the training
set immediately instead it stores the dataset and at the time of classification, it
performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new data,
then it classifies that data into a category that is much similar to the new data.
o Example: Suppose, we have an image of a creature that looks similar to cat and dog,
but we want to know either it is a cat or dog. So for this identification, we can use the
KNN algorithm, as it works on a similarity measure. Our KNN model will find the similar
features of the new data set to the cats and dogs images and based on the most similar
features it will put it in either cat or dog category.

Why do we need a K-NN Algorithm?


Suppose there are two categories, i.e., Category A and Category B, and we have a new
data point x1, so this data point will lie in which of these categories. To solve this type
of problem, we need a K-NN algorithm. With the help of K-NN, we can easily identify
the category or class of a particular dataset. Consider the below diagram:
How does K-NN work?
The K-NN working can be explained on the basis of the below algorithm:

o Step-1: Select the number K of the neighbors


o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in each
category.
o Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.
o Step-6: Our model is ready.

Suppose we have a new data point and we need to put it in the required category.
Consider the below image:
o Firstly, we will choose the number of neighbors, so we will choose the k=5.
o Next, we will calculate the Euclidean distance between the data points. The Euclidean
distance is the distance between two points, which we have already studied in
geometry. It can be calculated as:
o By calculating the Euclidean distance we got the nearest neighbors, as three nearest
neighbors in category A and two nearest neighbors in category B. Consider the below
image:
o There is no particular way to determine the best value for "K", so we need to try some
values to find the best out of them. The most preferred value for K is 5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of
outliers in the model.
o Large values for K are good, but it may find some difficulties.

Advantages of KNN Algorithm:


o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:


o Always needs to determine the value of K which may be complex some time.
o The computation cost is high because of calculating the distance between the data
points for all the training samples.

Python implementation of the KNN algorithm


To do the Python implementation of the K-NN algorithm, we will use the same
problem and dataset which we have used in Logistic Regression. But here we will
improve the performance of the model. Below is the problem description:

Problem for K-NN Algorithm: There is a Car manufacturer company that has
manufactured a new SUV car. The company wants to give the ads to the users who are
interested in buying that SUV. So for this problem, we have a dataset that contains
multiple user's information through the social network. The dataset contains lots of
information but the Estimated Salary and Age we will consider for the independent
variable and the Purchased variable is for the dependent variable. Below is the
dataset:
Steps to implement the K-NN algorithm:

o Data Pre-processing step


o Fitting the K-NN algorithm to the Training set
o Predicting the test result
o Test accuracy of the result(Creation of Confusion matrix)
o Visualizing the test set result.

Data Pre-Processing Step:

The Data Pre-processing step will remain exactly the same as Logistic Regression.
Below is the code for it:

1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('user_data.csv')
8.
9. #Extracting Independent and dependent Variable
10. x= data_set.iloc[:, [2,3]].values
11. y= data_set.iloc[:, 4].values
12.
13. # Splitting the dataset into training and test set.
14. from sklearn.model_selection import train_test_split
15. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_stat
e=0)
16.
17. #feature Scaling
18. from sklearn.preprocessing import StandardScaler
19. st_x= StandardScaler()
20. x_train= st_x.fit_transform(x_train)
21. x_test= st_x.transform(x_test)

22. Support Vector Machine Algorithm


23. Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine Learning.
24. The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data
point in the correct category in the future. This best decision boundary is called a
hyperplane.
25. SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support
Vector Machine. Consider the below diagram in which there are two different
categories that are classified using a decision boundary or hyperplane:
26.
27. Example: SVM can be understood with the example that we have used in the KNN
classifier. Suppose we see a strange cat that also has some features of dogs, so if we
want a model that can accurately identify whether it is a cat or dog, so such a model
can be created by using the SVM algorithm. We will first train our model with lots of
images of cats and dogs so that it can learn about different features of cats and dogs,
and then we test it with this strange creature. So as support vector creates a decision
boundary between these two data (cat and dog) and choose extreme cases (support
vectors), it will see the extreme case of cat and dog. On the basis of the support
vectors, it will classify it as a cat. Consider the below diagram:
Types of SVM
SVM can be of two types:

o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line, then such data is
termed as linearly separable data, and classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is termed
as non-linear data and classifier used is called as Non-linear SVM classifier.

Hyperplane and Support Vectors in the SVM algorithm:


Hyperplane: There can be multiple lines/decision boundaries to segregate the classes
in n-dimensional space, but we need to find out the best decision boundary that helps
to classify the data points. This best boundary is known as the hyperplane of SVM.

The dimensions of the hyperplane depend on the features present in the dataset,
which means if there are 2 features (as shown in image), then hyperplane will be a
straight line. And if there are 3 features, then hyperplane will be a 2-dimension plane.

We always create a hyperplane that has a maximum margin, which means the
maximum distance between the data points.

Support Vectors:

The data points or vectors that are the closest to the hyperplane and which affect the
position of the hyperplane are termed as Support Vector. Since these vectors support
the hyperplane, hence called a Support vector.

How does SVM works?


Linear SVM:

The working of the SVM algorithm can be understood by using an example. Suppose
we have a dataset that has two tags (green and blue), and the dataset has two features
x1 and x2. We want a classifier that can classify the pair(x1, x2) of coordinates in either
green or blue. Consider the below image:
So as it is 2-d space so by just using a straight line, we can easily separate these two
classes. But there can be multiple lines that can separate these classes. Consider the
below image:

Hence, the SVM algorithm helps to find the best line or decision boundary; this best
boundary or region is called as a hyperplane. SVM algorithm finds the closest point
of the lines from both the classes. These points are called support vectors. The distance
between the vectors and the hyperplane is called as margin. And the goal of SVM is
to maximize this margin. The hyperplane with maximum margin is called the optimal
hyperplane.

Non-Linear SVM:

If data is linearly arranged, then we can separate it by using a straight line, but for non-
linear data, we cannot draw a single straight line. Consider the below image:
So to separate these data points, we need to add one more dimension. For linear data,
we have used two dimensions x and y, so for non-linear data, we will add a third
dimension z. It can be calculated as:

z=x2 +y2

By adding the third dimension, the sample space will become as below image:
So now, SVM will divide the datasets into classes in the following way. Consider the
below image:

ADVERTISEMENT
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we
convert it in 2d space with z=1, then it will become as:

Hence we get a circumference of radius 1 in case of non-linear data.

ADVERTISEMENT

ADVERTISEMENT

Python Implementation of Support Vector Machine

Now we will implement the SVM algorithm using Python. Here we will use the same
dataset user_data, which we have used in Logistic regression and KNN classification.

o Data Pre-processing step

Till the Data pre-processing step, the code will remain the same. Below is the code:

1. #Data Pre-processing Step


2. # importing libraries
3. import numpy as nm
4. import matplotlib.pyplot as mtp
5. import pandas as pd
6.
7. #importing datasets
8. data_set= pd.read_csv('user_data.csv')
9.
10. #Extracting Independent and dependent Variable
11. x= data_set.iloc[:, [2,3]].values
12. y= data_set.iloc[:, 4].values
13.
14. # Splitting the dataset into training and test set.
15. from sklearn.model_selection import train_test_split
16. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)
17. #feature Scaling
18. from sklearn.preprocessing import StandardScaler
19. st_x= StandardScaler()
20. x_train= st_x.fit_transform(x_train)
21. x_test= st_x.transform(x_test)

After executing the above code, we will pre-process the data. The code will give the
dataset as:
The scaled output for the test set will be:
Fitting the SVM classifier to the training set:

Now the training set will be fitted to the SVM classifier. To create the SVM classifier,
we will import SVC class from Sklearn.svm library. Below is the code for it:

1. from sklearn.svm import SVC # "Support vector classifier"


2. classifier = SVC(kernel='linear', random_state=0)
3. classifier.fit(x_train, y_train)

In the above code, we have used kernel='linear', as here we are creating SVM for
linearly separable data. However, we can change it for non-linear data. And then we
fitted the classifier to the training dataset(x_train, y_train)

Output:

Out[8]:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
kernel='linear', max_iter=-1, probability=False, random_state=0,
shrinking=True, tol=0.001, verbose=False)

The model performance can be altered by changing the value of C(Regularization


factor), gamma, and kernel.

o Predicting the test set result:


Now, we will predict the output for test set. For this, we will create a new vector y_pred.
Below is the code for it:

1. #Predicting the test set result


2. y_pred= classifier.predict(x_test)

After getting the y_pred vector, we can compare the result of y_pred and y_test to
check the difference between the actual value and predicted value.

Output: Below is the output for the prediction of the test set:
o Creating the confusion matrix:
Now we will see the performance of the SVM classifier that how many incorrect
predictions are there as compared to the Logistic regression classifier. To create the
confusion matrix, we need to import the confusion_matrix function of the sklearn
library. After importing the function, we will call it using a new variable cm. The function
takes two parameters, mainly y_true( the actual values) and y_pred (the targeted value
return by the classifier). Below is the code for it:

1. #Creating the Confusion matrix


2. from sklearn.metrics import confusion_matrix
3. cm= confusion_matrix(y_test, y_pred)

Output:

Naïve Bayes Classifier Algorithm


ADVERTISEMENT

o Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes


theorem and used for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional training dataset.
o Naïve Bayes Classifier is one of the simple and most effective Classification algorithms
which helps in building the fast machine learning models that can make quick
predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.

Why is it called Naïve Bayes?


The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be
described as:

o Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the
bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an
apple. Hence each feature individually contributes to identify that it is an apple without
depending on each other.
o Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.

Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine
the probability of a hypothesis with prior knowledge. It depends on the conditional
probability.
o The formula for Bayes' theorem is given as:

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.

P(B|A) is Likelihood probability: Probability of the evidence given that the


probability of a hypothesis is true.

Backward Skip 10sPlay VideoForward Skip 10s

P(A) is Prior Probability: Probability of hypothesis before observing the evidence.


P(B) is Marginal Probability: Probability of Evidence.

Working of Naïve Bayes' Classifier:


Working of Naïve Bayes' Classifier can be understood with the help of the below
example:

Suppose we have a dataset of weather conditions and corresponding target variable


"Play". So using this dataset we need to decide that whether we should play or not on
a particular day according to the weather conditions. So to solve this problem, we need
to follow the below steps:

1. Convert the given dataset into frequency tables.


2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

Problem: If the weather is sunny, then the Player should play or not?

Solution: To solve this, first consider the below dataset:

Outlook Play

0 Rainy Yes

1 Sunny Yes

2 Overcast Yes

3 Overcast Yes

4 Sunny No

5 Rainy Yes

6 Sunny Yes

7 Overcast Yes
8 Rainy No

9 Sunny No

10 Sunny Yes

11 Rainy No

12 Overcast Yes

13 Overcast Yes

Frequency table for the Weather Conditions:

Weather Yes No

Overcast 5 0

Rainy 2 2

Sunny 3 2

Total 10 5

Likelihood table weather condition:

Weather No Yes

Overcast 0 5 5/14= 0.3

Rainy 2 2 4/14=0.29

Sunny 2 3 5/14=0.35

All 4/14=0.29 10/14=0.71


Applying Bayes'theorem:

P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)

P(Sunny|Yes)= 3/10= 0.3

P(Sunny)= 0.35

P(Yes)=0.71

So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)

P(Sunny|NO)= 2/4=0.5

ADVERTISEMENT

P(No)= 0.29

P(Sunny)= 0.35

ADVERTISEMENT

ADVERTISEMENT

So P(No|Sunny)= 0.5*0.29/0.35 = 0.41

So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)

Hence on a Sunny day, Player can play the game.

Advantages of Naïve Bayes Classifier:


o Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
o It can be used for Binary as well as Multi-class Classifications.
o It performs well in Multi-class predictions as compared to the other Algorithms.
o It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:


o Naive Bayes assumes that all features are independent or unrelated, so it cannot learn
the relationship between features.

Applications of Naïve Bayes Classifier:


o It is used for Credit Scoring.
o It is used in medical data classification.
o It can be used in real-time predictions because Naïve Bayes Classifier is an eager
learner.
o It is used in Text classification such as Spam filtering and Sentiment analysis.

Types of Naïve Bayes Model:


There are three types of Naive Bayes Model, which are given below:

o Gaussian: The Gaussian model assumes that features follow a normal distribution. This
means if predictors take continuous values instead of discrete, then the model assumes
that these values are sampled from the Gaussian distribution.
o Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
multinomial distributed. It is primarily used for document classification problems, it
means a particular document belongs to which category such as Sports, Politics,
education, etc.
The classifier uses the frequency of words for the predictors.
o Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the
predictor variables are the independent Booleans variables. Such as if a particular word
is present or not in a document. This model is also famous for document classification
tasks.

Fuzzy Logic | Introduction



The•• term fuzzy refers to things that are not clear or are vague. In the real
world many times we encounter a situation when we can’t determine whether
the state is true or false, their fuzzy logic provides very valuable flexibility for
reasoning. In this way, we can consider the inaccuracies and uncertainties of
any situation.
Fuzzy Logic is a form of many-valued logic in which the truth values of
variables may be any real number between 0 and 1, instead of just the
traditional values of true or false. It is used to deal with imprecise or uncertain
information and is a mathematical method for representing vagueness and
uncertainty in decision-making.
Fuzzy Logic is based on the idea that in many cases, the concept of true or
false is too restrictive, and that there are many shades of gray in between. It
allows for partial truths, where a statement can be partially true or false,
rather than fully true or false.
Fuzzy Logic is used in a wide range of applications, such as control systems,
image processing, natural language processing, medical diagnosis, and
artificial intelligence.
The fundamental concept of Fuzzy Logic is the membership function, which
defines the degree of membership of an input value to a certain set or
category. The membership function is a mapping from an input value to a
membership degree between 0 and 1, where 0 represents non-membership
and 1 represents full membership.
Fuzzy Logic is implemented using Fuzzy Rules, which are if-then statements
that express the relationship between input variables and output variables in
a fuzzy way. The output of a Fuzzy Logic system is a fuzzy set, which is a set of
membership degrees for each possible output value.
In summary, Fuzzy Logic is a mathematical method for representing
vagueness and uncertainty in decision-making, it allows for partial truths, and
it is used in a wide range of applications. It is based on the concept of
membership function and the implementation is done using Fuzzy rules.
In the boolean system truth value, 1.0 represents the absolute truth value and
0.0 represents the absolute false value. But in the fuzzy system, there is no
logic for the absolute truth and absolute false value. But in fuzzy logic, there is
an intermediate value too present which is partially true and partially false.

ARCHITECTURE
Its Architecture contains four parts :
• RULE BASE: It contains the set of rules and the IF-THEN conditions
provided by the experts to govern the decision-making system, on
the basis of linguistic information. Recent developments in fuzzy
theory offer several effective methods for the design and tuning of
fuzzy controllers. Most of these developments reduce the number of
fuzzy rules.
• FUZZIFICATION: It is used to convert inputs i.e. crisp numbers into
fuzzy sets. Crisp inputs are basically the exact inputs measured by
sensors and passed into the control system for processing, such as
temperature, pressure, rpm’s, etc.
• INFERENCE ENGINE: It determines the matching degree of the
current fuzzy input with respect to each rule and decides which rules
are to be fired according to the input field. Next, the fired rules are
combined to form the control actions.
• DEFUZZIFICATION: It is used to convert the fuzzy sets obtained by
the inference engine into a crisp value. There are several
defuzzification methods available and the best-suited one is used
with a specific expert system to reduce the error.

Membership function
Definition: A graph that defines how each point in the input space is mapped
to membership value between 0 and 1. Input space is often referred to as the
universe of discourse or universal set (u), which contains all the possible
elements of concern in each particular application.
There are largely three types of fuzzifiers:
• Singleton fuzzifier
• Gaussian fuzzifier
• Trapezoidal or triangular fuzzifier
What is Fuzzy Control?
• It is a technique to embody human-like thinkings into a control
system.
• It may not be designed to give accurate reasoning but it is designed
to give acceptable reasoning.
•It can emulate human deductive thinking, that is, the process people
use to infer conclusions from what they know.
• Any uncertainties can be easily dealt with the help of fuzzy logic.
Advantages of Fuzzy Logic System
• This system can work with any type of inputs whether it is
imprecise, distorted or noisy input information.
• The construction of Fuzzy Logic Systems is easy and understandable.
• Fuzzy logic comes with mathematical concepts of set theory and the
reasoning of that is quite simple.
• It provides a very efficient solution to complex problems in all fields
of life as it resembles human reasoning and decision-making.
• The algorithms can be described with little data, so little memory is
required.
Disadvantages of Fuzzy Logic Systems
• Many researchers proposed different ways to solve a given problem
through fuzzy logic which leads to ambiguity. There is no systematic
approach to solve a given problem through fuzzy logic.
• Proof of its characteristics is difficult or impossible in most cases
because every time we do not get a mathematical description of our
approach.
• As fuzzy logic works on precise as well as imprecise data so most of
the time accuracy is compromised.
Application
• It is used in the aerospace field for altitude control of spacecraft and
satellites.
• It has been used in the automotive system for speed control, traffic
control.
• It is used for decision-making support systems and personal
evaluation in the large company business.
• It has application in the chemical industry for controlling the pH,
drying, chemical distillation process.
• Fuzzy logic is used in Natural language processing and various
intensive applications in Artificial Intelligence.
• Fuzzy logic is extensively used in modern control systems such as
expert systems.
• Fuzzy Logic is used with Neural Networks as it mimics how a person
would make decisions, only much faster. It is done by Aggregation of
data and changing it into more meaningful data by forming partial
truths as Fuzzy sets.

Natural Language Processing –


••
Natural language processing (NLP) is a subfield of Artificial Intelligence (AI).
This is a widely used technology for personal assistants that are used in various
business fields/areas. This technology works on the speech provided by the
user breaks it down for proper understanding and processes it accordingly.
This is a very recent and effective approach due to which it has a really high
demand in today’s market. Natural Language Processing is an upcoming field
where already many transitions such as compatibility with smart devices, and
interactive talks with a human have been made possible. Knowledge
representation, logical reasoning, and constraint satisfaction were the
emphasis of AI applications in NLP. Here first it was applied to semantics and
later to grammar. In the last decade, a significant change in NLP research has
resulted in the widespread use of statistical approaches such as machine
learning and data mining on a massive scale. The need for automation is never-
ending courtesy of the amount of work required to be done these days. NLP is
a very favorable, but aspect when it comes to automated applications. The
applications of NLP have led it to be one of the most sought-after methods of
implementing machine learning. Natural Language Processing (NLP) is a field
that combines computer science, linguistics, and machine learning to study how
computers and humans communicate in natural language. The goal of NLP is
for computers to be able to interpret and generate human language. This not
only improves the efficiency of work done by humans but also helps in
interacting with the machine. NLP bridges the gap of interaction between
humans and electronic devices.
Natural Language Processing (NLP) is a subfield of artificial intelligence that
deals with the interaction between computers and humans in natural language.
It involves the use of computational techniques to process and analyze natural
language data, such as text and speech, with the goal of understanding the
meaning behind the language.
NLP is used in a wide range of applications, including machine translation,
sentiment analysis, speech recognition, chatbots, and text classification. Some
common techniques used in NLP include:
1. Tokenization: the process of breaking text into individual words or
phrases.
2. Part-of-speech tagging: the process of labeling each word in a
sentence with its grammatical part of speech.
3. Named entity recognition: the process of identifying and
categorizing named entities, such as people, places, and
organizations, in text.
4. Sentiment analysis: the process of determining the sentiment of a
piece of text, such as whether it is positive, negative, or neutral.
5. Machine translation: the process of automatically translating text
from one language to another.
6. Text classification: the process of categorizing text into predefined
categories or topics.
Recent advances in deep learning, particularly in the area of neural networks,
have led to significant improvements in the performance of NLP systems. Deep
learning techniques such as Convolutional Neural Networks (CNNs) and
Recurrent Neural Networks (RNNs) have been applied to tasks such as
sentiment analysis and machine translation, achieving state-of-the-art results.
Overall, NLP is a rapidly evolving field that has the potential to revolutionize
the way we interact with computers and the world around us.
What is Natural Language Processing?
Natural language processing (NLP) is a field of computer science and artificial
intelligence that aims to make computers understand human language. NLP
uses computational linguistics, which is the study of how language works, and
various models based on statistics, machine learning, and deep learning. These
technologies allow computers to analyze and process text or voice data, and to
grasp their full meaning, including the speaker’s or writer’s intentions and
emotions.
NLP powers many applications that use language, such as text translation, voice
recognition, text summarization, and chatbots. You may have used some of
these applications yourself, such as voice-operated GPS systems, digital
assistants, speech-to-text software, and customer service bots. NLP also helps
businesses improve their efficiency, productivity, and performance by
simplifying complex tasks that involve language.
Human language is filled with ambiguities that make it incredibly difficult to
write software that accurately determines the intended meaning of text or
voice data. Homonyms, homophones, sarcasm, idioms, metaphors, grammar
and usage exceptions, variations in sentence structure—these just a few of the
irregularities of human language that take humans years to learn, but that
programmers must teach natural language-driven applications to recognize
and understand accurately from the start, if those applications are going to be
useful.
NLP Tasks
Several NLP tasks break down human text and voice data in ways that help the
computer make sense of what it’s ingesting. Some of these tasks include the
following:
• Speech recognition, also known as speech-to-text, is a challenging
task that involves converting voice data into text data. This
technology is essential for any application that requires voice
commands or spoken responses. However, people’s speaking habits,
such as speaking quickly, slurring words, using different accents, and
incorrect grammar, make speech recognition even more challenging.
• Part of speech tagging, also known as grammatical tagging, is a
crucial process that determines the part of speech of a specific word
or piece of text based on its context and usage. For example, it can
identify ‘make’ as a verb in ‘I can make a paper plane’ and as a noun
in ‘What make of car do you own?’
• Word sense disambiguation is a semantic analysis process that
selects the most appropriate meaning of a word with multiple
meanings based on the given context. This process is helpful in
distinguishing the meaning of the verb ‘make’ in ‘make the grade’
(achieve) versus ‘make a bet’ (place).
• Named entity recognition (NEM) identifies useful entities or
phrases, such as ‘Kentucky’ as a location or ‘Fred’ as a person’s name.
Co-reference resolution is the task of identifying when two words
refer to the same entity, such as determining that ‘she’ refers to
‘Mary.’ Sentiment analysis is a process that attempts to extract
subjective qualities, including attitudes, emotions, sarcasm,
confusion, and suspicion, from text.
• Natural language generation is the opposite of speech recognition,
as it involves putting structured information into human language.
Overall, understanding these processes is essential in building
effective natural language processing systems.
Natural Language Processing
Natural Language Processing (NLP) is a field of Artificial Intelligence (AI) and
Computer Science that is concerned with the interactions between computers
and humans in natural language. The goal of NLP is to develop algorithms and
models that enable computers to understand, interpret, generate, and
manipulate human languages.
Common Natural Language Processing (NLP) Task:
• Text and speech processing: This includes Speech recognition, text-
&-speech processing, encoding(i.e converting speech or text to
machine-readable language), etc.
• Text classification: This includes Sentiment Analysis in which the
machine can analyze the qualities, emotions, and sarcasm from text
and also classify it accordingly.
• Language generation: This includes tasks such as machine
translation, summary writing, essay writing, etc. which aim to
produce coherent and fluent text.
• Language interaction: This includes tasks such as dialogue systems,
voice assistants, and chatbots, which aim to enable natural
communication between humans and computers.
NLP techniques are widely used in a variety of applications such as search
engines, machine translation, sentiment analysis, text summarization, question
answering, and many more. NLP research is an active field and recent
advancements in deep learning have led to significant improvements in NLP
performance. However, NLP is still a challenging field as it requires an
understanding of both computational and linguistic principles.
Working of Natural Language Processing (NLP)
Working in natural language processing (NLP) typically involves using
computational techniques to analyze and understand human language. This can
include tasks such as language understanding, language generation, and
language interaction.
The field is divided into three different parts:
1. Speech Recognition — The translation of spoken language into text.
2. Natural Language Understanding (NLU) — The computer’s ability to
understand what we say.
3. Natural Language Generation (NLG) — The generation of natural
language by a computer.
NLU and NLG are the key aspects depicting the working of NLP devices. These
2 aspects are very different from each other and are achieved using different
methods.
Individuals working in NLP may have a background in computer science,
linguistics, or a related field. They may also have experience with programming
languages such as Python, and C++ and be familiar with various NLP libraries
and frameworks such as NLTK, spaCy, and OpenNLP.
Speech Recognition:
• First, the computer must take natural language and convert it into
machine-readable language. This is what speech recognition or
speech-to-text does. This is the first step of NLU.
• Hidden Markov Models (HMMs) are used in the majority of voice
recognition systems nowadays. These are statistical models that use
mathematical calculations to determine what you said in order to
convert your speech to text.
• HMMs do this by listening to you talk, breaking it down into small
units (typically 10-20 milliseconds), and comparing it to pre-
recorded speech to figure out which phoneme you uttered in each
unit (a phoneme is the smallest unit of speech). The program then
examines the sequence of phonemes and uses statistical analysis to
determine the most likely words and sentences you were speaking.
Natural Language Understanding (NLU):
The next and hardest step of NLP is the understanding part.
• First, the computer must comprehend the meaning of each word. It
tries to figure out whether the word is a noun or a verb, whether it’s
in the past or present tense, and so on. This is called Part-of-Speech
tagging (POS).
• A lexicon (a vocabulary) and a set of grammatical rules are also built
into NLP systems. The most difficult part of NLP is understanding.
• The machine should be able to grasp what you said by the conclusion
of the process. There are several challenges in accomplishing this
when considering problems such as words having several meanings
(polysemy) or different words having similar meanings (synonymy),
but developers encode rules into their NLU systems and train them
to learn to apply the rules correctly.
Natural Language Generation (NLG):
NLG is much simpler to accomplish. NLG converts a computer’s machine-
readable language into text and can also convert that text into audible speech
using text-to-speech technology.
• First, the NLP system identifies what data should be converted to
text. If you asked the computer a question about the weather, it most
likely did an online search to find your answer, and from there it
decides that the temperature, wind, and humidity are the factors that
should be read aloud to you.
• Then, it organizes the structure of how it’s going to say it. This is
similar to NLU except backward. NLG system can construct full
sentences using a lexicon and a set of grammar rules.
• Finally, text-to-speech takes over. The text-to-speech engine uses a
prosody model to evaluate the text and identify breaks, duration, and
pitch. The engine then combines all the recorded phonemes into one
cohesive string of speech using a speech database.
Some common roles in Natural Language Processing (NLP) include:
• NLP engineer: designing and implementing NLP systems and models
• NLP researcher: conducting research on NLP techniques and
algorithms
• ML engineer: Designing and deployment of various machine learning
models including NLP.
• NLP data scientist: analyzing and interpreting NLP data
• NLP consultant: providing expertise in NLP to organizations and
businesses.
Working in NLP can be both challenging and rewarding as it requires a good
understanding of both computational and linguistic principles. NLP is a fast-
paced and rapidly changing field, so it is important for individuals working in
NLP to stay up-to-date with the latest developments and advancements.
Technologies related to Natural Language Processing
There are a variety of technologies related to natural language processing
(NLP) that are used to analyze and understand human language. Some of the
most common include:
1. Machine learning: NLP relies heavily on machine
learning techniques such as supervised and unsupervised learning,
deep learning, and reinforcement learning to train models to
understand and generate human language.
2. Natural Language Toolkits (NLTK) and other libraries: NLTK is a
popular open-source library in Python that provides tools for NLP
tasks such as tokenization, stemming, and part-of-speech tagging.
Other popular libraries include spaCy, OpenNLP, and CoreNLP.
3. Parsers: Parsers are used to analyze the syntactic structure of
sentences, such as dependency parsing and constituency parsing.
4. Text-to-Speech (TTS) and Speech-to-Text (STT) systems: TTS
systems convert written text into spoken words, while STT systems
convert spoken words into written text.
5. Named Entity Recognition (NER) systems: NER systems identify
and extract named entities such as people, places, and organizations
from the text.
6. Sentiment Analysis: A technique to understand the emotions or
opinions expressed in a piece of text, by using various techniques
like Lexicon-Based, Machine Learning-Based, and Deep Learning-
based methods
7. Machine Translation: NLP is used for language translation from one
language to another through a computer.
8. Chatbots: NLP is used for chatbots that communicate with other
chatbots or humans through auditory or textual methods.
9. AI Software: NLP is used in question-answering software for
knowledge representation, analytical reasoning as well as
information retrieval.
Applications of Natural Language Processing (NLP):
• Spam Filters: One of the most irritating things about email is spam.
Gmail uses natural language processing (NLP) to discern which
emails are legitimate and which are spam. These spam filters look at
the text in all the emails you receive and try to figure out what it
means to see if it’s spam or not.
• Algorithmic Trading: Algorithmic trading is used for predicting
stock market conditions. Using NLP, this technology examines news
headlines about companies and stocks and attempts to comprehend
their meaning in order to determine if you should buy, sell, or hold
certain stocks.
• Questions Answering: NLP can be seen in action by using Google
Search or Siri Services. A major use of NLP is to make search engines
understand the meaning of what we are asking and generate natural
language in return to give us the answers.
• Summarizing Information: On the internet, there is a lot of
information, and a lot of it comes in the form of long documents or
articles. NLP is used to decipher the meaning of the data and then
provides shorter summaries of the data so that humans can
comprehend it more quickly

You might also like