0% found this document useful (0 votes)
14 views11 pages

Dav Exp

The document outlines an experiment focused on implementing Simple Linear Regression using Python and R, with the aim of enabling students to apply various regression techniques for prediction. It includes detailed instructions for data handling, regression analysis, and evaluation metrics such as R squared and RMSE. Additionally, it provides assessment criteria for students' performance and concludes with questions related to real-world applications of regression and statistical concepts.

Uploaded by

justtrial748
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views11 pages

Dav Exp

The document outlines an experiment focused on implementing Simple Linear Regression using Python and R, with the aim of enabling students to apply various regression techniques for prediction. It includes detailed instructions for data handling, regression analysis, and evaluation metrics such as R squared and RMSE. Additionally, it provides assessment criteria for students' performance and concludes with questions related to real-world applications of regression and statistical concepts.

Uploaded by

justtrial748
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 11

Experiment No.

TE (AI&DS) ROLL NO :100

Date of Implementation: 14.02.2024

Aim: Implement Simple Linear Regression in Python / R

Programming Language Used : Python / R, RStudio

Upon completion of this experiment, students will be able to


LO2: Implement various Regression techniques for prediction.

Indicator Poor Average Good

Timeline Maintains Submission not One or More than


Maintains deadline (3)
submission deadline (3) done (0) One week late (1-2)

Completion and Completed whole


Document is just
Organization (3) N/A document and neatly
acceptable (1-2)
organized (3)

Program Performance Could not perform Implemented few Full implementation


(2) at all (0) parts (1) (2)

Knowledge In depth
Unable to answer Unable to answer Able to answer all
knowledge of the questions (0) few questions (1) questions (2)
Experiment (2)

Assessment Marks:
Timeline

Completion and
Organization

Program Performance

Knowledge
Total: (Out of 15)

Teacher’s Sign: Student Sign:

EXPERIMENT 3

Implement Simple Linear Regression in Python / R

Aim

Tools Python, R

Simple linear regression is used to estimate the relationship


between two quantitative variables. You can use simple linear regression when
Theory you want to know:
1. How strong the relationship is between two variables (e.g., the
relationship between rainfall and soil erosion).
2. The value of the dependent variable at a certain value of
the independent variable (e.g., the amount of soil erosion at a certain
level of rainfall).
Regression models describe the relationship between variables by fitting a line
to the observed data. Linear regression models use a straight line, while logistic
and nonlinear regression models use a curved line. Regression allows you to
estimate how a dependent variable changes as the independent variable(s)
change.
Simple linear regression is a parametric test, meaning that it makes certain
assumptions about the data. These assumptions are: Homogeneity of variance,
Independence of observations, data follows a normal distribution and
relationship between the independent and dependent variable is linear.
1. Perform the given task in python
Consider the following data
Implementati X 0 1 2 3 4 5 6 7 8 9
on y 1 3 2 5 7 8 8 9 10 12

● Print first 5 rows of data


● Display scatter plot of data
● Calculate (using formula) and print regression coefficients b0 and b1
● Display regression line equation
● Calculate and print coefficient of determination (R squared, Residual
sum of squares (RSS), and RMSE
● Plot regression line
● Predict the value of y given x=10

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error

X = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

data = pd.DataFrame({'X': X, 'y': y})


print("First 5 rows of data:")
print(data.head())

plt.scatter(X, y, color='blue')
plt.title("Scatter Plot of X vs y")
plt.xlabel("X")
plt.ylabel("y")
plt.show()

mean_X = np.mean(X)
mean_y = np.mean(y)
b1 = np.sum((X - mean_X) * (y - mean_y)) / np.sum((X -
mean_X)**2)

b0 = mean_y - b1 * mean_X

print(f"Regression coefficients: b0 = {b0:.2f}, b1 =


{b1:.2f}")

regression_equation = f"y = {b0:.2f} + {b1:.2f} * X"


print(f"Regression Line Equation: {regression_equation}")

y_pred = b0 + b1 * X

RSS = np.sum((y - y_pred)**2)

total_sum_of_squares = np.sum((y - mean_y)**2)


r_squared = 1 - (RSS / total_sum_of_squares)

rmse = np.sqrt(mean_squared_error(y, y_pred))

print(f"R squared: {r_squared:.4f}")


print(f"Residual Sum of Squares (RSS): {RSS:.4f}")
print(f"Root Mean Squared Error (RMSE): {rmse:.4f}")

plt.scatter(X, y, color='blue', label='Data Points')


plt.plot(X, y_pred, color='red', label='Regression Line')
plt.title("Regression Line")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.show()

x_new = 10
y_new = b0 + b1 * x_new
print(f"Predicted value of y for x = 10: y = {y_new:.2f}")
3. Perform the above task in R
4.
X <- c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
y <- c(1, 3, 2, 5, 7, 8, 8, 9, 10, 12)

data <- data.frame(X, y)


cat("First 5 rows of data:\n")
print(head(data, 5))

plot(X, y, main = "Scatter Plot of X vs y", xlab = "X", ylab = "y", pch =
19, col = "blue")

mean_X <- mean(X)


mean_y <- mean(y)

b1 <- sum((X - mean_X) * (y - mean_y)) / sum((X - mean_X)^2)


b0 <- mean_y - b1 * mean_X

cat("Regression coefficients: b0 =", round(b0, 2), ", b1 =", round(b1, 2),


"\n")

regression_equation <- paste("y =", round(b0, 2), "+", round(b1, 2), "*
X")
cat("Regression Line Equation:", regression_equation, "\n")

y_pred <- b0 + b1 * X

RSS <- sum((y - y_pred)^2)

total_sum_of_squares <- sum((y - mean_y)^2)


r_squared <- 1 - (RSS / total_sum_of_squares)

rmse <- sqrt(mean((y - y_pred)^2))

cat("R squared:", round(r_squared, 4), "\n")


cat("Residual Sum of Squares (RSS):", round(RSS, 4), "\n")
cat("Root Mean Squared Error (RMSE):", round(rmse, 4), "\n")

abline(a = b0, b = b1, col = "red") # Add the regression line to the
scatter plot
x_new <- 10
y_new <- b0 + b1 * x_new # Use the regression formula to predict y
for x = 10

cat("Predicted value of y for x = 10: y =", round(y_new, 2), "\n")


3. Perform the following task using python
● Import the packages numpy, pandas and the sklearn.linear_model
● Read data set (advertising.csv)
● Select the column ‘TV’ as independent variable and ‘sales’ as
dependent variable
● Divide data into training and testing split
● Create an instance of the class Linear Regression, which will represent
the regression model
● Fit the model for training data
● Get coefficients of regression and coefficient of determination from the
model
● Apply the model for predictions on testing data.
● Show the residual error plot for training (with green color dots), testing
data (blue color) and zero residual error line
4. Perform the above task using R (Use lm function, summary function, predict
function, plot function)

Conclusion

1. Give real world applications of regression


2. Explain standard error of coefficients
Post Lab 3. Explain coefficient of correlation, coefficient of determination
Questions:

You might also like