0% found this document useful (0 votes)

19 views4 pages

Continuous Assessment

AI Continuous Assessment

Uploaded by

garyluk6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views4 pages

Continuous Assessment

AI Continuous Assessment

Uploaded by

garyluk6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Continuous Assessment

Deadline: 14th September 2024

You will be given a data set called the Boston Housing Dataset. The Boston Housing Dataset is a derived from
information collected by the U.S. Census Service concerning housing in the area of Boston MA. The following
describes the dataset columns:

 CRIM - per capita crime rate by town

 ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
 INDUS - proportion of non-retail business acres per town.
 CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
 NOX - nitric oxides concentration (parts per 10 million)
 RM - average number of rooms per dwelling
 AGE - proportion of owner-occupied units built prior to 1940
 DIS - weighted distances to five Boston employment centres
 RAD - index of accessibility to radial highways
 TAX - full-value property-tax rate per $10,000
 PTRATIO - pupil-teacher ratio by town
 B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
 LSTAT - % lower status of the population
 MEDV - Median value of owner-occupied homes in $1000's

The goal of this project is to predict the housing prices of a town or a suburb based on the features of the locality
provided to us. In the process, we need to identify the most important features affecting the price of the house.
We need to employ techniques of data preprocessing and build a linear regression model that predicts the prices
for the unseen data.

Initialization

Import the necessary libraries and overview of the data set

# Import libraries for data manipulation

import pandas as pd

import numpy as np

# Import libraries for data visualization

import matplotlib.pyplot as plt

import seaborn as sns

from statsmodels.graphics.gofplots import ProbPlot

# Import libraries for building linear regression model

from statsmodels.formula.api import ols

import statsmodels.api as sm
from sklearn.linear_model import LinearRegression

# Import library for preparing data

from sklearn.model_selection import train_test_split

# Import library for data preprocessing

from sklearn.preprocessing import MinMaxScaler
import warnings
warnings.filterwarnings("ignore")

Load the data

df = pd.read_csv("boston.csv")
df.head()

1. Describe the data set, how many rows and columns are there? What are the data types? What is the
average, min, max of each column? [5 marks]
Hint: info, describe method

df.info()
df.describe()

2. Plot a histogram to visualize the columns and how do you interpret the results? [5 marks]
Hint: sns.histplot

# let's plot all the columns to look at their distributions

for i in df.columns:
plt.figure(figsize=(7, 4))
sns.histplot(data=df, x=i, kde = True)
plt.show()

3. MEDV is our dependent variable, run a log transformation on this feature. Why do you think we need
to perform this?
Hint: Examine the distribution of MEDV and log MEDV

df['MEDV_log'] = np.log(df['MEDV'])

sns.histplot(data=df,x='MEDV_log',kde=True)

4. Check the correlation using heatmap and how do you interpret the results? [5 marks]
Hint: sns.heatmap

plt.figure(figsize = (12,8))
cmap = sns.diverging_palette(230,20,as_cmap=True)
sns.heatmap(df.corr(),annot=True,fmt='.2f',cmap=cmap)
plt.show()

5. Visualize the relationship between the AGE and DIS columns using a scatter plot and how do you
interpret the results? [5 marks]
Hint: sns.scatterplot

# scatterplot to visualize the relationship between AGE and DIS

plt.figure(figsize=(6, 6))
sns.scatterplot(x = 'AGE', y = 'DIS', data = df)
plt.show()

6. Do the same with RAD and TAX [5 marks]

# scatterplot to visulaize the relationship between RAD and TAX
plt.figure(figsize=(6, 6))
sns.scatterplot(x = 'RAD', y = 'TAX', data = df)
plt.show()

7. Do the same with INDUS and TAX [5 marks]

8. Do the same with RM and MEDV [5 marks]
9. Do the same with LSTAT and MEDV [5 marks]
10. Do the same with DIS and NOX [5 marks]

11. Split the data into the dependent and independent variables and further split it into train and test set in a
ratio of 70:30 for train and test sets. [5 marks]
Hint: add_constant, train_test_split

# separate the dependent and independent variable

Y = df['MEDV_log']
X = df.drop(columns = {'MEDV', 'MEDV_log'})

# add the intercept term

X = sm.add_constant(X)

# splitting the data in 70:30 ratio of train to test data

X_train, X_test, y_train, y_test = train_test_split(X, Y,
test_size=0.30 , random_state=1)

12. Check for multicollinearity [5 marks]

Hint: variance_inflation_factor

from statsmodels.stats.outliers_influence import

variance_inflation_factor

# function to check VIF

def checking_vif(train):
vif = pd.DataFrame()
vif["feature"] = train.columns

# calculating VIF for each feature

vif["VIF"] = [
variance_inflation_factor(train.values, i) for i in
range(len(train.columns))
]
return vif

print(checking_vif(X_train))

13. Drop the TAX column and check if multicollinearity is resolved [5 marks]

# creating the model after dropping TAX

X_train = X_train.drop(columns = {'TAX'})

# checking for VIF

print(checking_vif(X_train))

14. Build a linear regression model which uses all features except for the TAX feature to predict log
MEDV [5 marks]
Hint: sm.OLS

# create the model

model1 = sm.OLS(y_train, X_train).fit()

# get the model summary

model1.summary()

15. Interpret the results [5 marks]

16. Create the model after dropping columns 'MEDV', 'MEDV_log', 'TAX', 'ZN', 'AGE', 'INDUS' from df
DataFrame [5 marks]

# creating the model after dropping columns 'MEDV', 'MEDV_log',

'TAX', 'ZN', 'AGE', 'B' 'INDUS' from df dataframe
Y = df['MEDV_log']
X = df.drop(columns = {'MEDV', 'MEDV_log', 'TAX', 'ZN', 'AGE', 'B',
'INDUS'})
X = sm.add_constant(X)

#splitting the data in 70:30 ratio of train to test data

X_train, X_test, y_train, y_test = train_test_split(X, Y,
test_size=0.30 , random_state=1)

# create the model

model2 = sm.OLS(y_train, X_train).fit() #write your code here
# get the model summary
model2.summary()

17. Is this model better? [5 marks]

18. Let’s assume that you were just given the above data set. Write a short paragraph summarizing your
statistical findings. This type of exercise mimics the approach of a typical quantitative research
exercise whereby you are given some data and you need to extract some insight from it. The above
instructions are typical steps one could follow to tackle these types of problems. [15 marks]

Sony rcp-1530 1st-Edition Rev.1 MM
No ratings yet
Sony rcp-1530 1st-Edition Rev.1 MM
172 pages
Linear Regression Assignment
0% (2)
Linear Regression Assignment
8 pages
House Price Prediction: Project Description
No ratings yet
House Price Prediction: Project Description
11 pages
Off-Line Programming Techniques For Multirobot Cooperation System
No ratings yet
Off-Line Programming Techniques For Multirobot Cooperation System
17 pages
FM Modulators: Experiment 7
100% (2)
FM Modulators: Experiment 7
17 pages
Martin Luther's Legacy: Reforming Reformation Theology For The 21st Century
100% (8)
Martin Luther's Legacy: Reforming Reformation Theology For The 21st Century
369 pages
Atp3 34x40
No ratings yet
Atp3 34x40
228 pages
Handout - Chaldean Oracles, Divination and Theurgy
100% (1)
Handout - Chaldean Oracles, Divination and Theurgy
5 pages
One Word Answer Questions Covering Dermatology
50% (2)
One Word Answer Questions Covering Dermatology
5 pages
This Study Resource Was: Supply Chain Management
No ratings yet
This Study Resource Was: Supply Chain Management
4 pages
Eat Pray Love Reaction
100% (1)
Eat Pray Love Reaction
2 pages
Basic Question Bank With Answers and Explanations
No ratings yet
Basic Question Bank With Answers and Explanations
275 pages
f389 Saw Filter
No ratings yet
f389 Saw Filter
9 pages
CT 230
No ratings yet
CT 230
21 pages
The Boston Housing Dataset
100% (2)
The Boston Housing Dataset
4 pages
BITSAT Preference Sheet 2021
No ratings yet
BITSAT Preference Sheet 2021
4 pages
Gaurav - Data Mining Lab Assignment
No ratings yet
Gaurav - Data Mining Lab Assignment
36 pages
BAUDM Assignment Predicting Boston Housing Prices
No ratings yet
BAUDM Assignment Predicting Boston Housing Prices
6 pages
Sklearn Tutorial: DNN On Boston Data
No ratings yet
Sklearn Tutorial: DNN On Boston Data
9 pages
Project Report PDF
No ratings yet
Project Report PDF
15 pages
Brochure Cosec Tam
No ratings yet
Brochure Cosec Tam
8 pages
798 - Section 06
No ratings yet
798 - Section 06
6 pages
IMD MBA Class Profiles
No ratings yet
IMD MBA Class Profiles
16 pages
Some Exercises
No ratings yet
Some Exercises
9 pages
2.multiple Currencies in Purchase Order Release Strategy
No ratings yet
2.multiple Currencies in Purchase Order Release Strategy
4 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
2016, Yamasaki Et Al, Auditory Perceptual Evaluation of Normal and Dysphonic Voices Using The Voice Deviation Scale J Voice
No ratings yet
2016, Yamasaki Et Al, Auditory Perceptual Evaluation of Normal and Dysphonic Voices Using The Voice Deviation Scale J Voice
5 pages
Data Engineer - Ireland
No ratings yet
Data Engineer - Ireland
3 pages
CQF June 2021 M4L4 Solutions
No ratings yet
CQF June 2021 M4L4 Solutions
14 pages
Asian Countries
No ratings yet
Asian Countries
4 pages
Unit 5
No ratings yet
Unit 5
171 pages
Meal Planning
No ratings yet
Meal Planning
31 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
Linear Reg
No ratings yet
Linear Reg
25 pages
20MIS1025 - Regression - Ipynb - Colaboratory
No ratings yet
20MIS1025 - Regression - Ipynb - Colaboratory
5 pages
L5 Evaluating Materials
No ratings yet
L5 Evaluating Materials
11 pages
Assignment 2 - LP1
No ratings yet
Assignment 2 - LP1
7 pages
Monika Sree 11-07-2024
No ratings yet
Monika Sree 11-07-2024
36 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
ZHAO - Variability of Surface Heat Fluxes and Its Driving Forces at Different Time Scales Over A Large Ephemeral Lake in China - 2018
No ratings yet
ZHAO - Variability of Surface Heat Fluxes and Its Driving Forces at Different Time Scales Over A Large Ephemeral Lake in China - 2018
19 pages
Assignment AI-ML
No ratings yet
Assignment AI-ML
13 pages
Preparation of Blood Films For Malaria Detection
No ratings yet
Preparation of Blood Films For Malaria Detection
10 pages
83 Sklearn Pipeline
No ratings yet
83 Sklearn Pipeline
8 pages
ML File
No ratings yet
ML File
37 pages
Week 6 LAB
No ratings yet
Week 6 LAB
13 pages
DL (Pra 01)
No ratings yet
DL (Pra 01)
9 pages
House Pricing
No ratings yet
House Pricing
15 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
14 pages
Tugas Inggris Ridwan TaufikC1B230115 An23 Kls Pesantren
No ratings yet
Tugas Inggris Ridwan TaufikC1B230115 An23 Kls Pesantren
5 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
Train
No ratings yet
Train
17 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
16 pages
DL 1
No ratings yet
DL 1
4 pages
Tax Quizzer
No ratings yet
Tax Quizzer
3 pages
External
No ratings yet
External
11 pages
GIVER Study Guide
No ratings yet
GIVER Study Guide
5 pages
Lab Questionbank
No ratings yet
Lab Questionbank
3 pages
T2 Summary VHA
No ratings yet
T2 Summary VHA
14 pages
Xgboost
No ratings yet
Xgboost
12 pages
ML Recordjp
No ratings yet
ML Recordjp
35 pages
Cap8 Predicting Continuous Target Variables With Regression Analysis - Thakur Ankita 2016 - Python Real World Data Science
No ratings yet
Cap8 Predicting Continuous Target Variables With Regression Analysis - Thakur Ankita 2016 - Python Real World Data Science
36 pages
Making Predictions
No ratings yet
Making Predictions
13 pages
Data Science Record - 05
No ratings yet
Data Science Record - 05
20 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
9 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
A4 Dsbda Sana
No ratings yet
A4 Dsbda Sana
16 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
Exp 9-10
No ratings yet
Exp 9-10
6 pages
Data Analytucs 1
No ratings yet
Data Analytucs 1
5 pages
Machine Learning-SEAIML-241P (PR) Bharat
No ratings yet
Machine Learning-SEAIML-241P (PR) Bharat
42 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
Lab (Work) Experiment File Priyanka Rajak 0901MC221056
No ratings yet
Lab (Work) Experiment File Priyanka Rajak 0901MC221056
19 pages
pastPaper2024Spring Assm02
No ratings yet
pastPaper2024Spring Assm02
24 pages
DSBDA Prac4 2
No ratings yet
DSBDA Prac4 2
1 page
Mlalllabprgs
No ratings yet
Mlalllabprgs
17 pages
620 Case Study4
No ratings yet
620 Case Study4
2 pages
Assignment 4
No ratings yet
Assignment 4
7 pages
ML Manual
No ratings yet
ML Manual
30 pages
(Ebook PDF) Linear Algebra With Applications 9th Edition Download
100% (2)
(Ebook PDF) Linear Algebra With Applications 9th Edition Download
50 pages
DSBDA Assignment 4 Jupyter Notebook
No ratings yet
DSBDA Assignment 4 Jupyter Notebook
5 pages
ML Lab Manual
No ratings yet
ML Lab Manual
60 pages
Machine Learning Labnem
No ratings yet
Machine Learning Labnem
5 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
18 pages
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
No ratings yet
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
127 pages
ML Lab Experiment Shivansh
No ratings yet
ML Lab Experiment Shivansh
29 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Bosch Mono-Jetronic Eng 1
No ratings yet
Bosch Mono-Jetronic Eng 1
1 page
Data Interpretation Guide For All Competitive and Admission Exams
From Everand
Data Interpretation Guide For All Competitive and Admission Exams
Mohmmad Khaja Shareef
2.5/5 (6)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet