Class 8 - Linear Regression
Class 8 - Linear Regression
– Topics
Machine Learning:
⮚ Intro to machine learning, learning from data.
⮚ Supervised and Unsupervised learning, , train - test data.
⮚ Overfitting and Under fitting
Linear Regression:
⮚ Linear relation between two variables, measures of association – correlation and
covariance.
⮚ A simple fit, best fit line – measure of a regression fit.
⮚ Multiple regression
⮚ R squared.
Machine Learning
⮚ The ability of a computer to do some task without being explicitly programmed.
⮚ The ability to do the tasks come from the underlying model which is the result of
the learning process.
⮚ The model is generated by learning from huge volume of data, huge both in
breadth and depth reflecting the real world in which the processes are
performed.
⮚ Class of machine learning that work on externally supplied instances in form of predictor attributes and
associated target values.
⮚ The target values are the ‘correct answers’ for the predictor model which can either be a regression
model or a classification model (classifying data into classes.)
⮚ The model learns from the training data using these ‘correct answers/target variables’ as reference
variables.
⮚ The model thus generated is used to make predictions about data not seen by the model before.
⮚ Ex1 : model to predict the resale value of a car based on its mileage, age, color etc.
⮚ Ex2 : model to determine the type of a tumor.
⮚ If the model does very well with the training data but fails with test data(unseen data), overfitting is said
to have taken place. However, if the data does not capture the features of train data itself, we term it as
under fitting.
Linear Regression
⮚ The term “Regression” generally refers to predicting a target value, which is generally a real
number, for a data point based on its attributes.
⮚ The term “linear” in linear regression refers to the fact that the method models data with linear
combination of the explanatory variables (attributes).
⮚ In case of linear regression with a single explanatory variable, the linear combination can be
expressed as :
⮚ response = intercept + constant*explanatory variable
R^2
Multiple regression
⮚ Till now we have seen a simple regression where we have one attribute or independent variable.
⮚ However, in the real world, a data point has various important attributes and they need to be
catered to while developing a regression model.
⮚ Ex: predicting price of a house, we need to consider various attributes related with this house. Such a regression
problem is an example of a multiple regression.
⮚ This can be represented by :
The model aims to find the constants and intercept such that this line is the best fit.
Pros and Cons of Linear Regression
Advantages
⮚ Simple to implement and easier to interpret the outputs coefficient.
Disadvantages
⮚Assumes a linear relationships between dependent and independent
variables.
⮚Outliers can have huge effects on regression.
⮚Linear Regression assume independence between attributes.