0% found this document useful (0 votes)
21 views

ML Assignment3 Solution

Linear, polynomial, logistic regression and gradient descent algorithms are explained with examples: 1) Linear regression finds a linear relationship between variables and makes predictions. Polynomial regression extends linear regression to fit nonlinear data by adding polynomial terms. 2) Logistic regression predicts categorical outcomes by fitting an S-shaped logistic function between 0-1. It is used for classification problems like predicting disease risk. 3) Gradient descent is an optimization algorithm that minimizes a cost function by taking steps in the direction of the steepest descent. It is used to find the optimal parameters for machine learning models.

Uploaded by

Neha Gowda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

ML Assignment3 Solution

Linear, polynomial, logistic regression and gradient descent algorithms are explained with examples: 1) Linear regression finds a linear relationship between variables and makes predictions. Polynomial regression extends linear regression to fit nonlinear data by adding polynomial terms. 2) Logistic regression predicts categorical outcomes by fitting an S-shaped logistic function between 0-1. It is used for classification problems like predicting disease risk. 3) Gradient descent is an optimization algorithm that minimizes a cost function by taking steps in the direction of the steepest descent. It is used to find the optimal parameters for machine learning models.

Uploaded by

Neha Gowda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

RAJARAJESWARI COLLEGE OF ENGINEERING

Kumbalgodu, Bangalore-74
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & MACHINE LEARNING
Assignment 2

Sl.No Questions CO

1 Explain linear regression with an example 3

Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a
statistical method that is used for predictive analysis. Linear regression makes predictions for
continuous/real or numeric variables such as sales, salary, age, product price, etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and one or more
independent (y) variables, hence called as linear regression. Since linear regression shows the
linear relationship, which means it finds how the value of the dependent variable is changing
according to the value of the independent variable.
The linear regression model provides a sloped straight line representing the relationship between
the variables. Consider the below image:

Mathematically, we can represent a linear regression as:


y= a0+a1x+ ε
Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error
The values for x and y variables are training datasets for Linear Regression model representation.
Types of Linear Regression
Linear regression can be further divided into two types of the algorithm:
o Simple Linear Regression:
If a single independent variable is used to predict the value of a numerical dependent
variable, then such a Linear Regression algorithm is called Simple Linear Regression.
o Multiple Linear regression:
If more than one independent variable is used to predict the value of a numerical dependent
variable, then such a Linear Regression algorithm is called Multiple Linear Regression.

2 Explain polynomial regression with an example. 3

o Polynomial Regression is a regression algorithm that models the relationship between a


dependent(y) and independent variable(x) as nth degree polynomial. The Polynomial
Regression equation is given below:
y= b0+b1x1+ b2x12+ b2x13+...... bnx1n
o It is also called the special case of Multiple Linear Regression in ML. Because we add
some polynomial terms to the Multiple Linear regression equation to convert it into
Polynomial Regression.
o It is a linear model with some modification in order to increase the accuracy.
o The dataset used in Polynomial regression for training is of non-linear nature.
o It makes use of a linear regression model to fit the complicated and non-linear functions
and datasets.
o Hence, "In Polynomial regression, the original features are converted into Polynomial
features of required degree (2,3,..,n) and then modeled using a linear model."
he need of Polynomial Regression in ML can be understood in the below points:
o If we apply a linear model on a linear dataset, then it provides us a good result as we have
seen in Simple Linear Regression, but if we apply the same model without any
modification on a non-linear dataset, then it will produce a drastic output. Due to which
loss function will increase, the error rate will be high, and accuracy will be decreased.
o So for such cases, where data points are arranged in a non-linear fashion, we need the
Polynomial Regression model. We can understand it in a better way using the below
comparison diagram of the linear dataset and non-linear dataset.

o In the above image, we have taken a dataset which is arranged non-linearly. So if we try to
cover it with a linear model, then we can clearly see that it hardly covers any data point.
On the other hand, a curve is suitable to cover most of the data points, which is of the
Polynomial model.
o Hence, if the datasets are arranged in a non-linear fashion, then we should use the
Polynomial Regression model instead of Simple Linear Regression.
Equation of the Polynomial Regression Model:
Simple Linear Regression equation: y = b0+b1x .........(a)
Multiple Linear Regression equation: y= b0+b1x+ b2x2+ b3x3+....+ bnxn .........(b)
2 3 n
Polynomial Regression equation: y= b0+b1x + b2x + b3x +....+ bnx ..........(c)
3 Explain logistic regression with an example. 3

o Logistic regression is one of the most popular Machine Learning algorithms, which comes
under the Supervised Learning technique. It is used for predicting the categorical
dependent variable using a given set of independent variables.
o Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or
False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic
values which lie between 0 and 1.
o Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
o In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
o The curve from the logistic function indicates the likelihood of something such as whether
the cells are cancerous or not, a mouse is obese or not based on its weight, etc.
o Logistic Regression is a significant machine learning algorithm because it has the ability to
provide probabilities and classify new data using continuous and discrete datasets.
o Logistic Regression can be used to classify the observations using different types of data
and can easily determine the most effective variables used for the classification. The below
image is showing the logistic function:

Logistic regression is used to solve classification problems, and the most common use case
is binary logistic regression, where the outcome is binary (yes or no). In the real world, you can
see logistic regression applied across multiple areas and fields.
• In health care, logistic regression can be used to predict if a tumor is likely to be benign or
malignant.
• In the financial industry, logistic regression can be used to predict if a transaction is
fraudulent or not.
• In marketing, logistic regression can be used to predict if a targeted audience will respond
or not.
The three types of logistic regression
1. Binary logistic regression - When we have two possible outcomes, like our original
example of whether a person is likely to be infected with COVID-19 or not.
2. Multinomial logistic regression - When we have multiple outcomes, say if we build out
our original example to predict whether someone may have the flu, an allergy, a cold, or
COVID-19.
3. Ordinal logistic regression - When the outcome is ordered, like if we build out our
original example to also help determine the severity of a COVID-19 infection, sorting it
into mild, moderate, and severe cases.

4 Explain gradient descent with an example. 3


Gradient Descent is an optimization algorithm for finding a local minimum of a differentiable
function. Gradient descent in machine learning is simply used to find the values of a
function's parameters (coefficients) that minimize a cost function as far as possible.
Imagine a blindfolded man who wants to climb to the top of a hill with the fewest steps along the
way as possible. He might start climbing the hill by taking really big steps in the steepest
direction, which he can do as long as he is not close to the top. As he comes closer to the top,
however, his steps will get smaller and smaller to avoid overshooting it. This process can be
described mathematically using the gradient.
Imagine the image below illustrates our hill from a top-down view and the red arrows are the steps
of our climber. Think of a gradient in this context as a vector that contains the direction of the
steepest step the blindfolded man can take and also how long that step should be.

Instead of climbing up a hill, think of gradient descent as hiking down to the bottom of a valley.
This is a better analogy because it is a minimization algorithm that minimizes a given function.
The equation below describes what the gradient descent algorithm does: b is the next position of
our climber, while a represents his current position. The minus sign refers to the minimization part
of the gradient descent algorithm. The gamma in the middle is a waiting factor and the gradient
term ( ∆f(a) ) is simply the direction of the steepest descent

So this formula basically tells us the next position we need to go, which is the direction of the
steepest descent. Let’s look at another example to really drive the concept home.
5 Explain Support Vector Machine with an example. 3
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-
dimensional space into classes so that we can easily put the new data point in the correct category
in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases
are called as support vectors, and hence algorithm is termed as Support Vector Machine. Consider
the below diagram in which there are two different categories that are classified using a decision
boundary or hyperplane:

Example: SVM can be understood with the example that we have used in the KNN classifier.
Suppose we see a strange cat that also has some features of dogs, so if we want a model that can
accurately identify whether it is a cat or dog, so such a model can be created by using the SVM
algorithm. We will first train our model with lots of images of cats and dogs so that it can learn
about different features of cats and dogs, and then we test it with this strange creature. So as
support vector creates a decision boundary between these two data (cat and dog) and choose
extreme cases (support vectors), it will see the extreme case of cat and dog. On the basis of the
support vectors, it will classify it as a cat. Consider the below diagram:

SVM algorithm can be used for Face detection, image classification, text categorization, etc.
SVM can be of two types:
o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can
be classified into two classes by using a single straight line, then such data is termed as
linearly separable data, and classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means
if a dataset cannot be classified by using a straight line, then such data is termed as non-
linear data and classifier used is called as Non-linear SVM classifier.
6 Write a note on regularized linear models. 5

In machine learning, we often face the problem when our model behaves well on training data but
behaves very poorly on test data. This happens when the model closely follows the training data
i.e overfits the data.

Regularization is a technique to reduce overfitting. The term regularization means the act of
bringing to uniformity.
Complex models can detect a subtle pattern in the data, but if the data is noisy(contains irrelevant
information) or the dataset is too small, the model will end up detecting the pattern in the noise
itself. When we use this model to predict our results, the result will not be accurate and the error
will be more than the expected error.
In linear regression, the final output is the weighted sum of the feature variables which is
represented by the below equation.
y = w1x1+w2x2+w3x3+…+wn xn+w₀
Here weights w1, w2, …, wn represent the importance of the features(x1, x2,..xn). A feature will
be of high importance if it has a large weight associated with it.
The error in linear regression will be the mean squared error which is given below:

For a linear model, regularization is achieved by constraining the weights of the model. To
constrain the weights first we need to understand how these weights are calculated. Weights are
calculated as per the cost function, for linear regression cost function is mean squared error.
Weights are tweaked each time and MSE is calculated and the set that has minimum MSE will be
considered as the final output.
To regularize the model, the regularization term will be added to the cost function.
Regularized Cost Function = MSE+ Regularization term
Here we will see three different regularization term to constrain the weights of the model, thus
three different regularized linear regression algorithms:
1. Ridge Regression
2. Lasso Regression
3. Elastic Net
7 Explain Bayes theorem with a example. 5

The Bayes theorem is a mathematical formula for calculating conditional probability in probability
and statistics. In other words, it's used to figure out how likely an event is based on its proximity to
another. Bayes law or Bayes rule are other names for the theorem.
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which
determines the probability of an event with uncertain knowledge.
In probability theory, it relates the conditional probability and marginal probabilities of two
random events.
Bayes' theorem was named after the British mathematician Thomas Bayes. The Bayesian
inference is an application of Bayes' theorem, which is fundamental to Bayesian statistics.
It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
The formula for the Bayes theorem can be written in a variety of ways. The following is the most
common version:
P(A ∣ B) = P(B ∣ A)P(A) / P(B)
P(A ∣ B) is the conditional probability of event A occurring, given that B is true.
P(B ∣ A) is the conditional probability of event B occurring, given that A is true.
P(A) and P(B) are the probabilities of A and B occurring independently of one another.
Bayes' theorem allows updating the probability prediction of an event by observing new
information of the real world.
Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A). This
is very useful in cases where we have a good probability of these three terms and want to
determine the fourth one. Suppose we want to perceive the effect of some unknown cause, and
want to compute that cause, then the Bayes' rule becomes:

Example-1: what is the probability that a patient has diseases meningitis with a stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it occurs 80% of
the time. He is also aware of some more facts, which are given as follows:
o The Known probability that a patient has meningitis disease is 1/30,000.
o The Known probability that a patient has a stiff neck is 2%.
Let a be the proposition that patient has stiff neck and b be the proposition that patient has
meningitis. , so we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02

Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff neck.
8 Write a note on Bayes Optimal Classifier 5
Bayes Optimal Classifier is a probabilistic model that finds the most probable prediction using the
training data and space of hypotheses to make a prediction for a new data instance.
The Bayes Optimal Classifier is a probabilistic model that makes the most probable prediction for
a new example.
It is described using the Bayes Theorem that provides a principled way for calculating a
conditional probability. It is also closely related to the Maximum a Posteriori: a probabilistic
framework referred to as MAP that finds the most probable hypothesis for a training dataset.
In practice, the Bayes Optimal Classifier is computationally expensive, if not intractable to
calculate, and instead, simplifications such as the Gibbs algorithm and Naive Bayes can be used to
approximate the outcome.
Let there be 5 hypotheses h1 through h5.

Thus, the Bayes optimal procedure recommends the robot turn left.
9 Elaborate on Naïve Bayes Classifier . 5
o Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional training dataset.
o Naïve Bayes Classifier is one of the simple and most effective Classification algorithms
which helps in building the fast machine learning models that can make quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the probability
of an object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described
as:
o Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the
bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an
apple. Hence each feature individually contributes to identify that it is an apple without
depending on each other.
o Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine
the probability of a hypothesis with prior knowledge. It depends on the conditional
probability.
o The formula for Bayes' theorem is given as:

Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.
Working of Naïve Bayes' Classifier can be understood with the help of the below example:
Suppose we have a dataset of weather conditions and corresponding target variable "Play". So
using this dataset we need to decide that whether we should play or not on a particular day
according to the weather conditions. So to solve this problem, we need to follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.
10 Write a note on Bayesian Belief Network 5
Bayesian belief network is key computer technology for dealing with probabilistic events and to
solve a problem which has uncertainty. We can define a Bayesian network as:
"A Bayesian network is a probabilistic graphical model which represents a set of variables and
their conditional dependencies using a directed acyclic graph."
It is also called a Bayes network, belief network, decision network, or Bayesian model.
Bayesian networks are probabilistic, because these networks are built from a probability
distribution, and also use probability theory for prediction and anomaly detection.
Real world applications are probabilistic in nature, and to represent the relationship between
multiple events, we need a Bayesian network. It can also be used in various tasks
including prediction, anomaly detection, diagnostics, automated insight, reasoning, time
series prediction, and decision making under uncertainty.
Bayesian Network can be used for building models from data and experts opinions, and it consists
of two parts:
o Directed Acyclic Graph
o Table of conditional probabilities.
The generalized form of Bayesian network that represents and solve decision problems under
uncertain knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:

o Each node corresponds to the random variables, and a variable can


be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional probabilities
between random variables. These directed links or arrows connect the pair of nodes in the
graph.
These links represent that one node directly influence the other node, and if there is no
directed link that means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables represented by the
nodes of the network graph.
o If we are considering node B, which is connected with node A by a directed
arrow, then node A is called the parent of Node B.
o Node C is independent of node A.
he Bayesian network has mainly two components:
o Causal Component
o Actual numbers
Each node in the Bayesian network has condition probability distribution P(Xi |Parent(Xi) ),
which determines the effect of the parent on that node.
11 Discuss about concept learning with an example. 4

“A task of acquiring potential hypothesis (solution) that best fits the given training examples.”

Concept learning is the task of inferring a Boolean-valued function from a set of training
examples. The purpose of inferring this function is to use it as a general rule for classifying unseen
data.
Concept learning is based on a type of learning called inductive learning. In inductive learning, the
learner learns by example. In other words, the learner discovers the rules of a particular concept by
learning the examples of that concept. For example, if a student teaches himself algebra, the more
he practices different types of examples and solutions, the more he will understand the general
rules of algebra. The idea is the same for concept learning: a machine is taught the different
examples of a concept, and by learning these examples, it will “discover” the general rule(s) that
apply to that concept.
Concept learning thus involves learning a function (which is a rule) from a set of training
examples.
concept learning aims to find a function or rule that truly represents the particular concept being
learned. The function must be a true representation of the concept so that it can be able to make
accurate classifications of unseen data. By “true representation”, it means that the function must be
able to approximate the true value of a target concept. The target concept refers to what we’re
trying to classify. A Boolean-valued function, denoted c(x), can take on two or more possible
categories. The aim is generally to determine the category of the target concept that a certain
object belongs to.
According to the Inductive Learning Hypothesis, if a function can approximate the target concept
well enough over training examples, then it will be able to approximate the target concept well for
unseen examples.
For example, suppose an algebra learner has gained an understanding of the general rules of
algebra based on the examples they’ve practiced. In that case, they’ll be able to apply those rules
to solve any new problems that they encounter. Similarly, in concept learning, an inferred function
will be able to approximate and classify new data based on how well it has learned in the past.
Concept learning works in two ways. It works by:
Inferring a function from a set of training examples.
Searching to find the function that best fits the training examples.

You might also like