Linear Regression
Linear Regression
Supervised learning
● Supervised learning trains machines using labeled data, where each input has a corresponding
correct output. This labeled data acts like a teacher guiding the machine to learn the
relationship between inputs and outputs, enabling it to predict outputs for new, unseen data.
● The aim of a supervised learning algorithm is to find a mapping function to map the input
variable(x) with the output variable(y).
● In the real-world, supervised learning can be used for Risk Assessment, Image
classification, Fraud Detection, spam filtering, etc.
Types of supervised Machine learning Algorithms:
What is Linear Regression?
● Linear Regression is a key data science tool for predicting continuous outcomes
● Linear regression predicts the relationship between two variables by assuming they
have a straight-line connection.
● It finds the best line that minimizes the differences between predicted and actual
values.
Simple Linear Regression involves one independent variable, while Multiple Linear
Regression involves two or more independent variables.
Simple Linear Regression
● In a simple linear regression, there is one independent variable and one dependent variable.
● The model estimates the slope and intercept of the line of best fit, which represents the
relationship between the variables.
● The slope represents the change in the dependent variable for each unit change in the
independent variable, while the intercept represents the predicted value of the dependent
variable when the independent variable is zero.
Equation of a Line: The relationship in simple linear regression (with one independent variable)
is described by the equation:
Steps to Perform Linear Regression
Dataset
1 50
2 55
3 65
4 70
5 75
https://docs.google.com/spreadsheets/d/1yyMW4g9jkmzHGOMKuGnFoF7BsL1
qsQc0HPR2oBoq9zY/edit?usp=sharing
Mean SLOPE
What is the Best Fit Line?
In simple terms, the best-fit line is a line that best fits the given scatter plot. Mathematically, you
obtain the best-fit line by minimizing the Residual Sum of Squares (RSS).
The MSE is the average of the squared differences between the actual values and the predicted
values. It measures the average of the squares of the errors.
R-squared (R²)
The R-squared value of approximately 0.9855 indicates that approximately 98.55% of the
variance in test scores is explained by the number of hours studied. This suggests a very good
fit of the model to the data.
Gradient Descent for Linear Regression
Gradient descent is an optimization algorithm used to minimize the cost function in linear
regression. It iteratively adjusts the model parameters to find the minimum value of the cost
function.
Cost Function
The cost function (also called the loss function) for linear regression is the Mean Squared Error
(MSE):
https://colab.research.google.com/drive/1hcrChy5_KIjn2JstJ4QPLTvn51gC
VbSx?usp=sharing
Implementation in Python
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Your data
hours_studied = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
# Reshape to 2D array
test_scores = np.array([50, 55, 65, 70, 75])
# Make predictions
predicted_scores = model.predict(hours_studied)
Output:
# Print coefficients Coefficient: [6.5]
print("Coefficient:" , model.coef_)
Intercept: 43.5
print("Intercept:" , model.intercept_)