Classification and Regression
Classification and Regression
where:
•Y is the dependent variable
•X is the independent variable
•β0 is the intercept
•β1 is the slope
LEAST SQUARE METHOD
https://www.geeksforgeeks.org/least-square-method/
EXAMPLE:DATA GIVEN
EXAMPLE:LINEAR
REGRESSION
Regression line: y = wx + b
ANOTHER EXAMPLE: UNTIL
BEST FIT
LINEARITY PROBLEM IN
LINEAR REGRESSION
If the relationship is not linear between dependent variable and independent variable,
then linear regression will not be an accurate model.
HANDS ON LINEAR
REGRESSION
https://github.com/shanmugavel007/Machine_learning-lab/blob/main/notebooks/Exercise%201.ipynb
STOCHASTIC GRADIENT
DESCENT REGRESSION
GRADIENT DESCENT
a method to find the parameters that minimize a function,
with guarantees for finding the true minimum if the function is
convex
gradient descent can be thought of as a ball rolling down a
hill. At each point on the hill, a ball will roll down where the
hill is steepest. In this analogy, the hill is the function we
want to minimize and steepest direction is the gradient of the
function.
In ML, the function we are interested in minimizing is often a loss function for
our model
The loss function can be viewed as the average of the loss at each point in
the training set ,
where:
•Y is the dependent variable
•X1, X2, …, Xp are the independent variables
•β0 is the intercept
•β1, β2, …, βn are the slopes
EXAMPLE:MULTI-LINEAR
REGRESSION
HOW DOES
MULTI
LINEAR
REGRESSIO
N WORKS?
POLYNOMIAL REGRESSION
Polynomial Regression is a type of regression analysis in which the relationship
between the independent variable x and the dependent variable y is modeled as an
nth-degree polynomial
where,
• y is the dependent variable
• x is the independent variable
• β0, β1, β2, β3, …, βn are the coefficients of the polynomial equation
• n is the degree of the polynomial equation
The coefficients β0, β1, β2, β3, …, βn are estimated from the data using regression
analysis methods such as least squares or maximum likelihood.
For more information: https://medium.com/@shuv.sdr/polynomial-regression-in-python-58198fb0973f
CLASSIFICATION
2. EAGER LEARNERS
Eager learners construct a classification model based on the given training data before
receiving data for classification. It must be able to commit to a single hypothesis that
covers the entire instance space. Because of this, eager learners take a long time for
training and less time for predicting.
Examples: Decision tree, naive Bayes, support vector machines and
K-NEAREST NEIGHBORS (K-
NN)
The k-nearest neighbors algorithm, also known as KNN or k-NN, is a non-parametric,
supervised learning classifier, which uses proximity to make classifications or predictions about
the grouping of an individual data point
•K-NN algorithm makes no pre assumption on how your data is distributed. This feature of K-NN
is known as non parametric.
IMAGINE LIKE THIS
It can also be used for
Multi Classification
problem
K-NN ALGORITHM STEP
•Step-1: Select the number K of the
neighbors
•Step-2: Calculate the Euclidean distance of
K number of neighbors
•Step-3: Take the K nearest neighbors as
per the calculated Euclidean distance.
•Step-4: Among these k neighbors, count
the number of the data points in each
category.
•Step-5: Assign the new data points to that
category for which the number of the
neighbor is maximum.
•Step-6: Our model is ready.
IF IT HAPPENS IN AN
ITERATION
HOW DO WE CALCULATE THE
DISTANCE?
the most commonly known methods are:
Euclidian, Manhattan (for continuous) and Hamming distance (for categorical).
HOW TO CHOSE THE VALUE
OF K?
HOW TO CHOOSE THE VALUE
OF K
there is no rule or formula to derive the value of K. One value of K may work
wonders on one type of data set but may fail on other data set.
But we do have some guidelines:
•To begin with you may choose a value of K = square root of number of observations
in data set. At the same time it is also advisable to choose an odd value of K to avoid
any ties between most frequent neighbor classes.
•Based on this value of K you can run K-NN algorithm on test set and evaluate the
prediction using one of many available metrics in machine learning.
•You may then try to increase and decrease the value K till you cant increase the
prediction accuracy any further
VALUE OF K – NOT VERY
SMALL / LARGE
•choosing a very small value of K then any outliers present in the
neighborhood of data in consideration will incorrectly influence
the classification result.
•On the other hand a very large value of K will defeat the whole
purpose of K-NN algorithm where you might end up exploring
data outside neighborhood of data in consideration. This will again
end up with not so correct classifications.
HANDS ON K-NEAREST
NEIGHBORS
https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning