0% found this document useful (0 votes)
9 views56 pages

Class 8 - Linear Regression

The document provides an introduction to supervised learning and linear regression in machine learning, explaining key concepts such as overfitting, underfitting, and the distinction between supervised and unsupervised learning. It details linear regression, including the relationship between variables, measures of association, and the formulation of multiple regression models. Additionally, it discusses the advantages and disadvantages of linear regression, highlighting its simplicity and interpretability, as well as its limitations regarding linearity and sensitivity to outliers.

Uploaded by

aman gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views56 pages

Class 8 - Linear Regression

The document provides an introduction to supervised learning and linear regression in machine learning, explaining key concepts such as overfitting, underfitting, and the distinction between supervised and unsupervised learning. It details linear regression, including the relationship between variables, measures of association, and the formulation of multiple regression models. Additionally, it discusses the advantages and disadvantages of linear regression, highlighting its simplicity and interpretability, as well as its limitations regarding linearity and sensitivity to outliers.

Uploaded by

aman gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Intro to supervised learning and Linear Regression

– Topics
Machine Learning:
⮚ Intro to machine learning, learning from data.
⮚ Supervised and Unsupervised learning, , train - test data.
⮚ Overfitting and Under fitting

Linear Regression:
⮚ Linear relation between two variables, measures of association – correlation and
covariance.
⮚ A simple fit, best fit line – measure of a regression fit.
⮚ Multiple regression
⮚ R squared.
Machine Learning
⮚ The ability of a computer to do some task without being explicitly programmed.
⮚ The ability to do the tasks come from the underlying model which is the result of
the learning process.
⮚ The model is generated by learning from huge volume of data, huge both in
breadth and depth reflecting the real world in which the processes are
performed.

What machine learning algorithms do?


⮚ Search through the data to look for patterns in form of trends, cycles,
associations, etc.
⮚ Express these patterns as mathematical structures.
Supervised Machine Learning

⮚ Class of machine learning that work on externally supplied instances in form of predictor attributes and
associated target values.

⮚ The target values are the ‘correct answers’ for the predictor model which can either be a regression
model or a classification model (classifying data into classes.)

⮚ The model learns from the training data using these ‘correct answers/target variables’ as reference
variables.

⮚ The model thus generated is used to make predictions about data not seen by the model before.
⮚ Ex1 : model to predict the resale value of a car based on its mileage, age, color etc.
⮚ Ex2 : model to determine the type of a tumor.

⮚ If the model does very well with the training data but fails with test data(unseen data), overfitting is said
to have taken place. However, if the data does not capture the features of train data itself, we term it as
under fitting.
Linear Regression
⮚ The term “Regression” generally refers to predicting a target value, which is generally a real
number, for a data point based on its attributes.

⮚ The term “linear” in linear regression refers to the fact that the method models data with linear
combination of the explanatory variables (attributes).

⮚ In case of linear regression with a single explanatory variable, the linear combination can be
expressed as :
⮚ response = intercept + constant*explanatory variable
R^2
Multiple regression
⮚ Till now we have seen a simple regression where we have one attribute or independent variable.
⮚ However, in the real world, a data point has various important attributes and they need to be
catered to while developing a regression model.

⮚ Ex: predicting price of a house, we need to consider various attributes related with this house. Such a regression
problem is an example of a multiple regression.
⮚ This can be represented by :

target = constant1*feature1 + constant2*feature2 + constant3*feature3 + …..+ intercept

The model aims to find the constants and intercept such that this line is the best fit.
Pros and Cons of Linear Regression
Advantages
⮚ Simple to implement and easier to interpret the outputs coefficient.

Disadvantages
⮚Assumes a linear relationships between dependent and independent
variables.
⮚Outliers can have huge effects on regression.
⮚Linear Regression assume independence between attributes.

You might also like