AI lab7
AI lab7
Practical No. 7
To perform Linear Regession using python
Student’s Roll no: _______________ Points Scored: __________________________
OBJECTIVES: Upon successful completion of this practical, the students will be able to:
Understand linear regression and its basic concepts.
Understand types of linear regression.
Build a machine learning model to predict linear regression problem for given dataset.
Regression
What Is Regression?
Regression searches for relationships among variables. For example, you can observe several
employees of some company and try to understand how their salaries depend on their features,
such as experience, education level, role, city of employment, and so on.
This is a regression problem where data related to each employee represents one observation. The
presumption is that the experience, education, role, and city are the independent features, while the
salary depends on them.
Similarly, you can try to establish the mathematical dependence of housing prices on area, number
of bedrooms, distance to the city center, and so on.
Generally, in regression analysis, you consider some phenomenon of interest and have a number of
observations. Each observation has two or more features. Following the assumption that at least
one of the features depends on the others, you try to establish a relation among them.
In other words, you need to find a function that maps some features or variables to others
sufficiently well.
The dependent features are called the dependent variables, outputs, or responses. The independent
features are called the independent variables, inputs, regressors, or predictors.
Regression problems usually have one continuous and unbounded dependent variable. The inputs,
however, can be continuous, discrete, or even categorical data such as gender, nationality, or brand.
It’s a common practice to denote the outputs with 𝑦 and the inputs with 𝑥. If there are two or more
independent variables, then they can be represented as the vector 𝐱 = (𝑥₁, …, 𝑥ᵣ), where 𝑟 is the
number of inputs.
Regression is also useful when you want to forecast a response using a new set of predictors. For
example, you could try to predict electricity consumption of a household for the next hour given the
outdoor temperature, time of day, and number of residents in that household.
Linear Regression
Linear regression is an algorithm that provides a linear relationship between an independent
variable and a dependent variable to predict the outcome of future events. It is a statistical method
used in data science and machine learning for predictive analysis.
The figure above shows the relationship between the quantity of apple and the cost price. How
much do you need to pay for 7kg of apples? I know it’s easy. If 1kg costs 5$ then 7kg cost
7*5=35$ or you will just draw a perpendicular line from point 7 along the y-axis until it touches the
linear equation and the corresponding value on the y-axis is the answer as shown by the green
dotted line on the graph. But we are going to solve using the formula of a linear equation.
Now, if I have to find the price of 9.5 kg of apple then according to our model mx+b = 5 * 9.5 + 0
= $47.5 is the answer. By now you might have understood that m and b are the main ingredients of
the linear equation or in other words m and b are called parameters.
A company name ABC provides you a data on the houses’ size and its price. The company requires
providing them a machine learning model that can predict houses’ prices for any given size. Let’s
say what would be the best-estimated price for area 3000 feet square? If you are thinking to fit a
line somewhere between the dataset and draw a verticle line from 3000 on the x-axis until it
touches the line and then the corresponding value on the y-axis i.e 470 would be the answer, then
you are on right track, it is represented by the green dotted line in the figure below.
A company name ABC provides you a data on the houses’ size and its price. The company
requires providing them a machine learning model that can predict houses’ prices for any
given size. Let’s say what would be the best-estimated price for area 3000 feet square? If you
are thinking to fit a line somewhere between the dataset and draw a verticle line from 3000 on the
x-axis until it touches the line and then the corresponding value on the y-axis i.e 470 would be the
answer, then you are on right track, it is represented by the green dotted line in the figure below.
Let’s do it in another way, if we could find the equation of line y = mx+b that we use to fit the data
represented by the blue inclined line then we can easily find the model that can predict the housing
prices for any given area. In machine learning lingo function y = mx+b is also called a hypothesis
function where m and b can be represented by theta0 and theta1 respectively. theta0 is also
called a bias term and theta1,theta2,.. are called weights.
See the blue line in the picture above, By taking any two samples that touch or very close to the line
we can find the theta1 (slope) = 0.132 and theta zero = 80 as shown in the figure. Now we can
use our hypothesis function to predict housing price for size 3000 feet square i.e 80+3000*0.132 =
476. $476,000 could be the best-estimated price for a house of size 3000 feet square and this could
be a reasonable way to prepare a machine learning model when you have just 50 samples and with
only one feature(size).
Simple linear regression reveals the correlation between a dependent variable (input) and an
independent variable (output). Primarily, this regression type describes the following:
Relationship strength between the given variables.
Example: The relationship between pollution levels and rising temperatures. The value of the
dependent variable is based on the value of the independent variable.
Example: The value of pollution level at a specific temperature.
Multiple linear regression
Multiple linear regression establishes the relationship between independent variables (two or
more) and the corresponding dependent variable. Here, the independent variables can be either
continuous or categorical. This regression type helps foresee trends, determine future values, and
predict the impacts of changes.
Example: Consider the task of calculating blood pressure. In this case, height, weight, and amount
of exercise can be considered independent variables. Here, we can use multiple linear regression to
analyze the relationship between the three independent variables and one dependent variable, as
all the variables considered are quantitative.
The MSE, MAE, RMSE, and R-Squared metrics are mainly used to evaluate the prediction error rates
and model performance in regression analysis.
MAE (Mean absolute error) represents the difference between the original and predicted values
extracted by averaged the absolute difference over the data set.
MSE (Mean Squared Error) represents the difference between the original and predicted values
extracted by squared the average difference over the data set.
RMSE (Root Mean Squared Error) is the error rate by the square root of MSE.
R-squared (Coefficient of determination) represents the coefficient of how well the values fit
compared to the original values. The value from 0 to 1 interpreted as percentages. The higher the
value is, the better the model is.
Linear Regression using Python.
Steps:
Lab Tasks
1. Perform linear regression on Boston house price prediction. Add all steps screenshots.
The End