0% found this document useful (0 votes)
30 views4 pages

Unit 3 5

The document outlines a project to predict house prices using Simple Linear Regression based on house size. It includes steps for data loading, visualization, model implementation, and performance evaluation using Mean Squared Error (MSE) and R². The results indicate a moderate fit with an R² score of 0.596 and a MSE of 36.36, suggesting potential for improvement by including additional features.

Uploaded by

mcanarender
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views4 pages

Unit 3 5

The document outlines a project to predict house prices using Simple Linear Regression based on house size. It includes steps for data loading, visualization, model implementation, and performance evaluation using Mean Squared Error (MSE) and R². The results indicate a moderate fit with an R² score of 0.596 and a MSE of 36.36, suggesting potential for improvement by including additional features.

Uploaded by

mcanarender
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Predicting House Prices

Objective: Use Simple Linear Regression to predict house prices based on their size
Database: https://www.kaggle.com/c/boston-housing.
Tasks:
1. Load and explore the dataset.
2. Create a scatter plot to visualize the relationship between house size and price.
3. Implement Simple Linear Regression to predict prices.
4. Evaluate the model's performance using R² and Mean Squared Error (MSE).
import pandas as pd

data_path = '/content/drive/MyDrive/nkphd/bostan/'
# Load the datasets
submission_example = pd.read_csv(os.path.join(data_path,
'submission_example.csv'))
train = pd.read_csv(os.path.join(data_path, 'train.csv'))
test = pd.read_csv(os.path.join(data_path, 'test.csv'))

# Display first few rows to confirm


print("Submission Example:")
print(submission_example.head())

print("\nTrain Dataset:")
print(train.head())

print("\nTest Dataset:")
print(test.head())
from google.colab import drive
drive.mount('/content/drive')

import pandas as pd

# Define dataset path


data_path = '/content/drive/MyDrive/nkphd/bostan/'

# Load train dataset


train = pd.read_csv(data_path + 'train.csv')

# Scatter plot for relationship between house size ('rm') and price
('medv')
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(10, 6))
sns.scatterplot(x=train['rm'], y=train['medv'])
plt.title('Relationship Between House Size (RM) and Price (MEDV)',
fontsize=14)
plt.xlabel('Average Number of Rooms per Dwelling (RM)', fontsize=12)
plt.ylabel('Median Value of Owner-Occupied Homes (MEDV)', fontsize=12)
plt.grid(True)
plt.show()
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

import pandas as pd

# Load the train dataset


data_path = '/content/drive/MyDrive/nkphd/bostan/'
train = pd.read_csv(data_path + 'train.csv')

# Implement Simple Linear Regression


from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Prepare the data


X = train[['rm']] # Average number of rooms per dwelling
y = train['medv'] # Median value of owner-occupied homes

# Split data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Initialize and fit the linear regression model


linear_regressor = LinearRegression()
linear_regressor.fit(X_train, y_train)

# Predict on the test set


y_pred = linear_regressor.predict(X_test)

# Model evaluation
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Display results
print("Mean Squared Error (MSE):", mse)
print("R-squared (R^2):", r2)
print("Coefficient (Slope):", linear_regressor.coef_[0])
print("Intercept:", linear_regressor.intercept_)
Drive already mounted at /content/drive; to attempt to forcibly remount,
call drive.mount("/content/drive", force_remount=True).
Mean Squared Error (MSE): 36.361622515889756
R-squared (R^2): 0.5959747117709422
Coefficient (Slope): 8.584424490365215
Intercept: -30.96185860010203
from sklearn.metrics import mean_squared_error, r2_score
# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)

# Calculate R²
r2 = r2_score(y_test, y_pred)

# Print the results


print("Mean Squared Error (MSE):", mse)
print("R-squared (R²):", r2)

Mean Squared Error (MSE): 36.361622515889756


R-squared (R²): 0.5959747117709422

Evaluation of performance using R² and Mean Squared Error (MSE).

The Simple Linear Regression model for predicting house prices was evaluated using MSE and
R². The MSE was 36.36, indicating the average squared error in predictions. The R² score of
0.596 shows that 59.6% of the variance in house prices is explained by the number of rooms per
dwelling. The slope of 8.58 indicates an increase in house price by 8.58 units per additional
room. While the model shows a moderate fit, including more features could improve accuracy.

You might also like