0% found this document useful (0 votes)
6 views

Coding Question

This are some of coding questions which I hope would be helpful for someone learning programing

Uploaded by

zain.bsba90
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Coding Question

This are some of coding questions which I hope would be helpful for someone learning programing

Uploaded by

zain.bsba90
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

### Coding Question: Building a Machine Learning Model to Predict Housing Prices

**Problem Statement:**

You are given a dataset containing various features of houses along with their prices. Your task is to
build a machine learning model to predict the prices of houses based on their features. You will use the
popular Boston Housing dataset for this task.

**Dataset:**

The dataset consists of the following features:

1. CRIM: per capita crime rate by town

2. ZN: proportion of residential land zoned for lots over 25,000 sq. ft.

3. INDUS: proportion of non-retail business acres per town

4. CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)

5. NOX: nitric oxides concentration (parts per 10 million)

6. RM: average number of rooms per dwelling

7. AGE: proportion of owner-occupied units built prior to 1940

8. DIS: weighted distances to five Boston employment centres

9. RAD: index of accessibility to radial highways

10. TAX: full-value property tax rate per $10,000

11. PTRATIO: pupil-teacher ratio by town

12. B: 1000(Bk - 0.63)^2 where Bk is the proportion of Black residents by town

13. LSTAT: % lower status of the population

14. MEDV: Median value of owner-occupied homes in $1000s

**Tasks:**

1. Load and explore the dataset.

2. Preprocess the data.

3. Split the data into training and testing sets.


4. Train a machine learning model (e.g., Linear Regression).

5. Evaluate the model.

6. Make predictions using the trained model.

### Step-by-Step Solution

#### 1. Load and Explore the Dataset

```python

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.datasets import load_boston

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset

boston = load_boston()

boston_df = pd.DataFrame(boston.data, columns=boston.feature_names)

boston_df['MEDV'] = boston.target

# Display the first few rows of the dataset

print(boston_df.head())

# Summary statistics

print(boston_df.describe())
# Check for missing values

print(boston_df.isnull().sum())

# Correlation matrix

plt.figure(figsize=(12, 10))

sns.heatmap(boston_df.corr(), annot=True, cmap='coolwarm')

plt.show()

```

#### 2. Preprocess the Data

```python

# Features and target variable

X = boston_df.drop('MEDV', axis=1)

y = boston_df['MEDV']

# Standardize the data

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

```

#### 3. Train a Machine Learning Model (Linear Regression)

```python

# Train the model


model = LinearRegression()

model.fit(X_train, y_train)

# Model coefficients

print("Coefficients:", model.coef_)

print("Intercept:", model.intercept_)

```

#### 4. Evaluate the Model

```python

# Make predictions on the testing set

y_pred = model.predict(X_test)

# Evaluate the model

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)

print("R-squared:", r2)

# Plot the results

plt.figure(figsize=(10, 6))

plt.scatter(y_test, y_pred, color='blue')

plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red', linewidth=2)

plt.xlabel('Actual')

plt.ylabel('Predicted')

plt.title('Actual vs Predicted')

plt.show()
```

#### 5. Make Predictions Using the Trained Model

```python

# Predicting on new data (example)

new_data = np.array([[0.1, 18.0, 2.31, 0.0, 0.538, 6.575, 65.2, 4.0900, 1, 296.0, 15.3, 396.90, 4.98]])

new_data_scaled = scaler.transform(new_data)

predicted_price = model.predict(new_data_scaled)

print("Predicted price:", predicted_price)

```

### Explanation of the Code

1. **Loading and Exploring the Dataset**:

- The Boston Housing dataset is loaded using `load_boston()` from `sklearn.datasets`.

- The dataset is converted into a DataFrame for easier exploration and manipulation.

- Summary statistics and correlation matrix are generated to understand the data better.

2. **Preprocessing the Data**:

- Features (`X`) and target variable (`y`) are separated.

- The features are standardized using `StandardScaler`.

- The dataset is split into training and testing sets using `train_test_split`.

3. **Training the Model**:

- A Linear Regression model is instantiated and trained on the training data.

- Model coefficients and intercept are printed.


4. **Evaluating the Model**:

- Predictions are made on the testing set.

- Mean Squared Error (MSE) and R-squared (R²) are calculated to evaluate the model's performance.

- A scatter plot is generated to visualize the actual vs predicted values.

5. **Making Predictions**:

- An example of making a prediction on new data is provided. The new data is scaled using the same
scaler used during training, and the model predicts the house price.

This extensive example covers the entire process of building a machine learning model to predict
housing prices, from data loading and preprocessing to model training, evaluation, and prediction.

You might also like