Unit 3 5
Unit 3 5
Objective: Use Simple Linear Regression to predict house prices based on their size
Database: https://www.kaggle.com/c/boston-housing.
Tasks:
1. Load and explore the dataset.
2. Create a scatter plot to visualize the relationship between house size and price.
3. Implement Simple Linear Regression to predict prices.
4. Evaluate the model's performance using R² and Mean Squared Error (MSE).
import pandas as pd
data_path = '/content/drive/MyDrive/nkphd/bostan/'
# Load the datasets
submission_example = pd.read_csv(os.path.join(data_path,
'submission_example.csv'))
train = pd.read_csv(os.path.join(data_path, 'train.csv'))
test = pd.read_csv(os.path.join(data_path, 'test.csv'))
print("\nTrain Dataset:")
print(train.head())
print("\nTest Dataset:")
print(test.head())
from google.colab import drive
drive.mount('/content/drive')
import pandas as pd
# Scatter plot for relationship between house size ('rm') and price
('medv')
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(10, 6))
sns.scatterplot(x=train['rm'], y=train['medv'])
plt.title('Relationship Between House Size (RM) and Price (MEDV)',
fontsize=14)
plt.xlabel('Average Number of Rooms per Dwelling (RM)', fontsize=12)
plt.ylabel('Median Value of Owner-Occupied Homes (MEDV)', fontsize=12)
plt.grid(True)
plt.show()
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')
import pandas as pd
# Model evaluation
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
# Display results
print("Mean Squared Error (MSE):", mse)
print("R-squared (R^2):", r2)
print("Coefficient (Slope):", linear_regressor.coef_[0])
print("Intercept:", linear_regressor.intercept_)
Drive already mounted at /content/drive; to attempt to forcibly remount,
call drive.mount("/content/drive", force_remount=True).
Mean Squared Error (MSE): 36.361622515889756
R-squared (R^2): 0.5959747117709422
Coefficient (Slope): 8.584424490365215
Intercept: -30.96185860010203
from sklearn.metrics import mean_squared_error, r2_score
# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
# Calculate R²
r2 = r2_score(y_test, y_pred)
The Simple Linear Regression model for predicting house prices was evaluated using MSE and
R². The MSE was 36.36, indicating the average squared error in predictions. The R² score of
0.596 shows that 59.6% of the variance in house prices is explained by the number of rooms per
dwelling. The slope of 8.58 indicates an increase in house price by 8.58 units per additional
room. While the model shows a moderate fit, including more features could improve accuracy.