0% found this document useful (0 votes)
26 views27 pages

Unit Iii

The document discusses machine learning algorithms and Gaussian processes regression. It provides an overview of linear regression, describing linear regression equations and the ordinary least squares method. It then explains Gaussian processes regression, which uses a Gaussian distribution over functions to model probabilistic predictions, and discusses its advantages of interpolating observations and producing probabilistic outputs. However, it notes Gaussian processes are computationally expensive and do not scale well to high-dimensional problems.

Uploaded by

mahih16237
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views27 pages

Unit Iii

The document discusses machine learning algorithms and Gaussian processes regression. It provides an overview of linear regression, describing linear regression equations and the ordinary least squares method. It then explains Gaussian processes regression, which uses a Gaussian distribution over functions to model probabilistic predictions, and discusses its advantages of interpolating observations and producing probabilistic outputs. However, it notes Gaussian processes are computationally expensive and do not scale well to high-dimensional problems.

Uploaded by

mahih16237
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Data Analytics and Visualization

IMBA III SEM

Prepared By
S Naresh Kumar
Department of Computer Science and Engineering
UNIT-III

Regression Equations- Linear models, ordinary least squares

 Gaussian Processes regression.


Machine Learning
Machine Learning is a concept which allows the machine to learn from
examples and experience, and that too without being explicitly
programmed.
 Supervised
 Unsupervised
Semi –supervised DL
ML
Reinforcement AI
Algorithms are

 Linear Regression
 Logistic Regression
 Decision Tree
 SVM
 Naive Bayes
 kNN
 K-Means
 Random Forest
 Dimensionality Reduction Algorithms
 Gradient Boosting algorithms
 GBM
 XGBoost
 LightGBM
 CatBoost
The algorithm is mainly divided into:
Training Phase
Testing phase
Training Phase
You take a randomly selected specimen of apples from the market (training
data), make a table of all the physical characteristics of each apple, like color,
size, shape, grown in which part of the country, sold by which vendor, etc
(features), along with the sweetness, juiciness, ripeness of that apple
(output variables).
You feed this data to the machine learning algorithm
(classification/regression), and it learns a model of the correlation
between an average apple’s physical characteristics, and its quality.
Testing Phase :
Next time when you go shopping, you will measure the characteristics of the
apples which you are purchasing(test data)and feed it to the Machine
Learning algorithm.
It will use the model which was computed earlier to predict if the apples are
sweet, ripe and/or juicy.
The algorithm may internally use the rules, similar to the one you manually
wrote earlier (for eg, a decision tree).
 Finally, you can now shop for apples with great confidence, without worrying
about the details of how to choose the best apples.
Linear regression

•Linear regression is a linear model,


•Linear regression methods attempt to solve the regression problem
by making the assumption that the dependent variable is (at least to
some approximation) a linear function of the independent variables
•i.e. model that assumes a linear relationship between the input
variables (x) and the single output variable (y). More specifically,
that y can be calculated from a linear combination of the input
variables (x).
A linear regression line has an equation of the form
Y = a + bX
where X is the explanatory (independent ) variable and Y is the dependent
variable. The slope of the line is b (regression coefficient), and a is the
intercept (constant)

Regression’s dependent variable. It may be called an outcome variable,


criterion variable, endogenous variable, or regressand.
The independent variables can be called exogenous variables, predictor
variables, or regressors.
Regression can be simple linear regression or multiple linear regression
• A single input variable (x), is used to predict the value of a
dependent variable ,such method is referred as simple linear
regression.
Y=c+bX

• A multiple input variables, are used to predict the value of a


dependent variable. ,such method is referred as multiple
linear regression.

• The difference between the two is the number of independent


variables. In both cases there is only a single dependent
variable.
The core idea is to obtain a line that best fits the data. The best fit line is the one
for which total prediction error (all data points) are as small as possible. Error is
the distance between the point to the regression line.(Least Square error)
Least-Squares Regression
The most common method for fitting a
regression line(best fit) is the method of
least-squares. This method calculates the
best-fitting line for the observed data by
minimizing the sum of the squares of the
vertical deviations from each data point
to the line
regr = linear_model.LinearRegression()
The least square regression line for the set of n data points is given by the equation of a line in slope intercept
form:

y=ax+b
Steps To find the line of best fit for N points:

Step 1: For each (x,y) point calculate x2 and xy


Step 2: Sum all x, y, x2 and xy, which gives us Σx, Σy, Σx2 and Σxy (
Σ means "sum up")
Step 3: Calculate Slope m:
m = (N Σ(xy) − Σx Σy)/(N Σ(x2) − (Σx)2)
(N is the number of points.)
Step 4: Calculate Intercept b:
b = (Σy − m Σ)/xN
Step 5: Assemble the equation of a line
y = mx + b
Problem - 1 :
Problem 2
Problem 3
Advantages of Linear Regression

1. Linear Regression performs well when the dataset is linearly


separable. We can use it to find the nature of the relationship among
the variables.

2. Linear Regression is easier to implement, interpret and very efficient


to train.

3. Linear Regression is prone to over-fitting but it can be easily avoided


using some dimensionality reduction techniques, regularization (L1
and L2) techniques and cross-validation.
Disadvantages of Linear Regression

1. Linear regression requires that the dependent variable is a continuous


numerical variable

2. Linear Regression Is Limited to Linear Relationships between the dependent


variable and the independent variables. In the real world, the data is rarely
linearly separable. It assumes that there is a straight-line relationship
between the dependent and independent variables which is incorrect many
times.

3. Non-linear data cannot be well fitted. So you need to first determine whether
the variables are linear.
4. Prone to outliers: Linear regression is very sensitive to outliers (anomalies).
E.g. if most of your data lives in the range (20,50) on the x-axis, but you have one
or two points out at x= 200, this could significantly swing your regression results.

5. Prone to multicollinearity: Before applying Linear regression,


multicollinearity should be removed (using dimensionality reduction
techniques) because it assumes that there is no relationship among independent
variables.
6. Prone to noise and overfitting: If the number of observations are lesser than
the number of features, Linear Regression should not be used, otherwise it may
lead to overfit because is starts considering noise in this scenario while building
the model.
Gaussian Processes (GP)
Gaussian Processes (GP) are a generic supervised learning method
designed to solve regression and probabilistic classification problems.
•A Gaussian process is a probability distribution over possible functions.

Normal distribution, also known as


the Gaussian distribution, is a
. probability distribution that is symmetric
about the mean, showing that data near
the mean are more frequent in occurrence
than data far from the mean. In graph
form, normal distribution will appear as a
bell curve. Lazy learner
In probability theory, a normal (or Gaussian or Gauss or Laplace–
Gauss) distribution is a type of continuous probability distribution for a
real-valued random variable. The general form of its
probability density function is

The parameter is the mean or expectation of the distribution (and also its
median and mode), while the parameter is its standard deviation.[1] The
variance of the distribution is .[2] A random variable with a Gaussian
distribution is said to be normally distributed, and is called a normal
deviate.
The normal distribution contains the curve between the x values
and corresponding to the y values but the gaussian
distribution made the curve with the x random variables and
corresponding the PDF values.

Gaussian processes are computationally expensive


The advantages of Gaussian processes are:
•The prediction interpolates the observations (at least for regular kernels).
•The prediction is probabilistic (Gaussian) so that one can compute
empirical confidence intervals and decide based on those if one should refit
(online fitting, adaptive fitting) the prediction in some region of interest.
•Versatile: different kernels can be specified. Common kernels are provided,
but it is also possible to specify custom kernels.
The disadvantages of Gaussian processes include:
•They are not sparse, i.e., they use the whole samples/features information
to perform the prediction.
•They lose efficiency in high dimensional spaces – namely when the
number of features exceeds a few dozens
Gaussian processes are a non-parametric method.
Parametric approaches distill knowledge about the training data
into a set of numbers. For linear regression this is just two numbers, the
slope and the intercept, whereas other approaches like neural networks
may have 10s of millions. This means that after they are trained the cost of
making predictions is dependent only on the number of parameters.
However as Gaussian processes are non-parametric (although
kernel hyperparameters blur the picture) they need to take into account
the whole training data each time they make a prediction. This means not
only that the training data has to be kept at inference time but also means
that the computational cost of predictions scales (cubically!) with the
number of training samples.

You might also like