Predicting House Prices
Predicting House Prices
Submitted by
SHARMA KUSH PARESHBHAI
ENROLLMENT NO. : 226230316198
&
RANGANI AYUSH NAVEENBHAI
ENROLLMENT NO. : 226230316182
DIPLOMA IN ENGINEERING
In
INFORMATION TECHNOLOGY
2 | Page
INDEX
4. Output 8
1.
2 | Page
1. Introduction to Predicting House Prices Model
2 | Page
2. Code of Predicting House Prices Model
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
2 | Page
predicted_price = model.predict(new_house_features)
print("Predicted price of the new house:", predicted_price[0])
2 | Page
3. Explanation of Code
This Python code demonstrates the process of building and evaluating a linear
regression model to predict house prices. It covers essential steps such as data
loading, preprocessing, model training, evaluation, and making predictions for
new data points. Below is a theoretical explanation of each step involved in the
code.
Importing Libraries
The dataset, stored in a CSV file named 'Housing.csv', is loaded into a pandas
DataFrame. This step involves reading the file and converting its contents into a
structured format that can be easily manipulated and analyzed.
The first few rows of the dataset are printed to the console. This initial
examination helps understand the structure of the data, including the columns
(features) and their types.
2 | Page
The dataset is divided into features (`X`) and the target variable (`y`).
- `X` contains all the columns except for the target column 'price'.
- `y` contains the target variable, which is the 'price' column. This separation is
crucial for training the model, as it allows the model to learn the relationship
between the features and the target.
The data is split into training and testing sets using the `train_test_split` function.
- The training set (80% of the data) is used to train the model.
- The testing set (20% of the data) is used to evaluate the model's performance.
This split ensures that the model is tested on unseen data, providing a measure
of its generalization ability.
Before training the model, categorical variables in the training data are converted
into numeric values using one-hot encoding (`pd.get_dummies`). This step
transforms categorical features into a format suitable for the linear regression
model. The model is then trained on the training data (`X_train` and `y_train`),
learning the relationships between the features and the target variable.
Making Predictions
Similar to the training data, categorical variables in the testing data are also
converted into numeric values using one-hot encoding. The trained model is used
to make predictions on the testing set (`X_test`). This step involves applying the
2 | Page
learned relationships to estimate the target variable (price) for the test set.
The model's performance is evaluated using the Mean Squared Error (MSE),
which measures the average squared difference between the actual and
predicted prices. A lower MSE indicates better model performance. The MSE is
printed to provide a quantitative measure of the model's accuracy.
The code also demonstrates predicting the price of a new house based on its
features. An example list of features is created, following the same format and
preprocessing steps as the training data (including one-hot encoding if
necessary). The trained model is used to predict the price of the new house, and
the predicted price is printed. This step shows how the model can be used in
practical applications to estimate the price of new properties based on their
attributes.
By following these steps, the code builds a linear regression model capable of
predicting house prices, evaluates its performance, and demonstrates how to use
it for new predictions. Each step is crucial for ensuring the model is accurate and
reliable in real-world scenarios.
4. Output
2 | Page
2 | Page